@Author: Han Zhao
@Note: Please cite the following paper if you use the tool developed in this package.
Self-Adaptive Hierarchical Sentence Model
by H. Zhao, Z. Lu and P. Poupart, IJCAI 2015.
@Required lib:
* The package is developed using the libs with specific versions listed above. You may also need a txt file that contains all the word vectors, one line per word. *
Supporting the following models configured by a file.
Usage:
>> python adasent_subjectivity.py -h
usage: adasent_subjectivity.py [-h] [-s SIZE] [-l RATE] [-e EPOCH] [-n NAME]
[-m MODEL] [-d SEED] [-k FOLDS]
config
positional arguments:
config
optional arguments:
-h, --help show this help message and exit
-s SIZE, --size SIZE The size of each batch used to be trained.
-l RATE, --rate RATE Learning rate of AdaGrad.
-e EPOCH, --epoch EPOCH
Number of the training epoch.
-n NAME, --name NAME Name used to save the model.
-m MODEL, --model MODEL
Model name to use as initializer.
-d SEED, --seed SEED Random seed used for generation.
-k FOLDS, --folds FOLDS
K-fold cross-validation to compute the performance
score.
all the other parameters are optional except config
, which is a required configuration file to specify the architecture of each model for each data set. An example configuration file for AdaSent
on the SUBJ data set is provided in the adasent-exp/
folder.
A typical successful running example will show the following information. You may observe different warning message depending on the exact version of theano
you're using, but most of them should be similar.
[MainProcess, 9275] [DEBUG] Positive Instances: 5000
[MainProcess, 9275] [DEBUG] Negative Instances: 5000
[MainProcess, 9275] [DEBUG] Size of the data sets: 10000
[MainProcess, 9275] [DEBUG] Time used to load and shuffle SUBJ dataset: 0.035635 seconds.
[MainProcess, 9275] [DEBUG] Blank index: </s>
[MainProcess, 9275] [DEBUG] Time used to build sparse and dense input word-embedding matrices: 30.463479 seconds.
[MainProcess, 9275] [DEBUG] Default positive percentage in dataset: 0.500000
[MainProcess, 9275] [DEBUG] Default negative percentage in dataset: 0.500000
[MainProcess, 9275] [DEBUG] Partition the whole data set into 10 folds.
[MainProcess, 9275] [DEBUG] Start...
[MainProcess, 9275] [DEBUG] ==================================================
[MainProcess, 9275] [DEBUG] Training on the 0 th fold
[MainProcess, 9275] [DEBUG] Training size on current partition: 9000, test size on current partition: 1000
[MainProcess, 9275] [DEBUG] Building new model on 0 th fold.
[MainProcess, 9275] [DEBUG] No designated model, training from scratch...
[MainProcess, 9275] [DEBUG] Building Gated Recursive Convolutional Neural Network Encoder...
/Library/Python/2.7/site-packages/theano/gof/cmodule.py:284: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
rval = __import__(module_name, {}, {}, [module_name])
[MainProcess, 9275] [DEBUG] Finished constructing the structure of GrCNNEncoder:
[MainProcess, 9275] [DEBUG] Size of the input dimension: 50
[MainProcess, 9275] [DEBUG] Size of the hidden dimension: 50
[MainProcess, 9275] [DEBUG] Activation function: tanh
[MainProcess, 9275] [DEBUG] Creating Gating Network...
[MainProcess, 9275] [DEBUG] AdaSent built finished...
[MainProcess, 9275] [DEBUG] Total number of parameters in AdaSent: 13205
[MainProcess, 9275] [DEBUG] Time used to building the model: 172.439958 seconds.
[MainProcess, 9275] [DEBUG] 0 CV partition, Training epoch: 0, total cost: 10895.759564, Training accuracy = 0.698778
[MainProcess, 9275] [DEBUG] 0 CV partition, Training epoch: 0, total cost: 833.788072, Test accuracy: 0.832000
[MainProcess, 9275] [DEBUG] Parameter: U, L2-norm: 19.829668045
[MainProcess, 9275] [DEBUG] Parameter: W_l, L2-norm: 1.6544675827
[MainProcess, 9275] [DEBUG] Parameter: W_r, L2-norm: 1.79139578342
[MainProcess, 9275] [DEBUG] Parameter: Wb, L2-norm: 0.49146386981
[MainProcess, 9275] [DEBUG] Parameter: G_l, L2-norm: 3.23852229118
[MainProcess, 9275] [DEBUG] Parameter: G_r, L2-norm: 3.49429821968
[MainProcess, 9275] [DEBUG] Parameter: Gb, L2-norm: 0.0631450340152
[MainProcess, 9275] [DEBUG] Parameter: W_hidden, L2-norm: 4.71216487885
[MainProcess, 9275] [DEBUG] Parameter: b_hidden, L2-norm: 0.335952311754
[MainProcess, 9275] [DEBUG] Parameter: W_softmax, L2-norm: 1.41855752468
[MainProcess, 9275] [DEBUG] Parameter: b_softmax, L2-norm: 0.0285037215799
[MainProcess, 9275] [DEBUG] Parameter: Weighting vector, L2-norm: 1.1962364912
[MainProcess, 9275] [DEBUG] Parameter: Word-Embedding, L2-norm: 655.600830078
[MainProcess, 9275] [DEBUG] 0 CV partition, Training epoch: 1, total cost: 6454.488050, Training accuracy = 0.853444
[MainProcess, 9275] [DEBUG] 0 CV partition, Training epoch: 1, total cost: 619.867141, Test accuracy: 0.861000
[MainProcess, 9275] [DEBUG] Parameter: U, L2-norm: 15.7782564163
[MainProcess, 9275] [DEBUG] Parameter: W_l, L2-norm: 2.12384343147
[MainProcess, 9275] [DEBUG] Parameter: W_r, L2-norm: 1.98495721817
[MainProcess, 9275] [DEBUG] Parameter: Wb, L2-norm: 0.507406651974
[MainProcess, 9275] [DEBUG] Parameter: G_l, L2-norm: 3.00104594231
[MainProcess, 9275] [DEBUG] Parameter: G_r, L2-norm: 3.2772564888
[MainProcess, 9275] [DEBUG] Parameter: Gb, L2-norm: 0.0897985771298
[MainProcess, 9275] [DEBUG] Parameter: W_hidden, L2-norm: 3.94150781631
[MainProcess, 9275] [DEBUG] Parameter: b_hidden, L2-norm: 0.370790868998
[MainProcess, 9275] [DEBUG] Parameter: W_softmax, L2-norm: 1.63154172897
[MainProcess, 9275] [DEBUG] Parameter: b_softmax, L2-norm: 0.0260569117963
[MainProcess, 9275] [DEBUG] Parameter: Weighting vector, L2-norm: 1.08205080032
[MainProcess, 9275] [DEBUG] Parameter: Word-Embedding, L2-norm: 655.600830078
......
Once the model has been trained and saved, you can use the trained model to visualize the process of decision making in AdaSent, e.g., Fig. 6 in our paper.
>> python show_heatmap.py -h
usage: show_heatmap.py [-h] [-m MODEL] name
positional arguments:
name
optional arguments:
-h, --help show this help message and exit
-m MODEL, --model MODEL
Using model trained before
show_heatmap.py
is an interactive program which allows the user to specify which input sequence in the input to visualize. An example image is provided as follows:

Please free feel to contact han.zhao@uwaterloo.ca if you have any questions. Happy hacking!