Self-Adaptive Hierarchical Sentence Model

@Author: Han Zhao

@Note: Please cite the following paper if you use the tool developed in this package.

Self-Adaptive Hierarchical Sentence Model

by H. Zhao, Z. Lu and P. Poupart, IJCAI 2015.

@Required lib:

* The package is developed using the libs with specific versions listed above. You may also need a txt file that contains all the word vectors, one line per word. *


Supporting the following models configured by a file.

  1. Vanilla NBoW model for sequence summarization
  2. Convolutional Neural Network
  3. Multilayer Perceptron
  4. [Denoising|Sparse|Deep] AutoEncoder
  5. [Bidirectional (w/o Weight Ties)] Recurrent Neural Network
  6. Gated Recursive Convolutional Neural Network
  7. Self-Adaptive Hierarchical Sentence Model

Usage:

>> python adasent_subjectivity.py -h
usage: adasent_subjectivity.py [-h] [-s SIZE] [-l RATE] [-e EPOCH] [-n NAME]
                               [-m MODEL] [-d SEED] [-k FOLDS]
                               config

positional arguments:
  config

optional arguments:
  -h, --help            show this help message and exit
  -s SIZE, --size SIZE  The size of each batch used to be trained.
  -l RATE, --rate RATE  Learning rate of AdaGrad.
  -e EPOCH, --epoch EPOCH
                        Number of the training epoch.
  -n NAME, --name NAME  Name used to save the model.
  -m MODEL, --model MODEL
                        Model name to use as initializer.
  -d SEED, --seed SEED  Random seed used for generation.
  -k FOLDS, --folds FOLDS
                        K-fold cross-validation to compute the performance
                        score.

all the other parameters are optional except config, which is a required configuration file to specify the architecture of each model for each data set. An example configuration file for AdaSent on the SUBJ data set is provided in the adasent-exp/ folder.

A typical successful running example will show the following information. You may observe different warning message depending on the exact version of theano you're using, but most of them should be similar.

[MainProcess, 9275] [DEBUG]  Positive Instances: 5000
[MainProcess, 9275] [DEBUG]  Negative Instances: 5000
[MainProcess, 9275] [DEBUG]  Size of the data sets: 10000
[MainProcess, 9275] [DEBUG]  Time used to load and shuffle SUBJ dataset: 0.035635 seconds.
[MainProcess, 9275] [DEBUG]  Blank index: </s>
[MainProcess, 9275] [DEBUG]  Time used to build sparse and dense input word-embedding matrices: 30.463479 seconds.
[MainProcess, 9275] [DEBUG]  Default positive percentage in dataset: 0.500000
[MainProcess, 9275] [DEBUG]  Default negative percentage in dataset: 0.500000
[MainProcess, 9275] [DEBUG]  Partition the whole data set into 10 folds.
[MainProcess, 9275] [DEBUG]  Start...
[MainProcess, 9275] [DEBUG]  ==================================================
[MainProcess, 9275] [DEBUG]  Training on the 0 th fold
[MainProcess, 9275] [DEBUG]  Training size on current partition: 9000, test size on current partition: 1000
[MainProcess, 9275] [DEBUG]  Building new model on 0 th fold.
[MainProcess, 9275] [DEBUG]  No designated model, training from scratch...
[MainProcess, 9275] [DEBUG]  Building Gated Recursive Convolutional Neural Network Encoder...
/Library/Python/2.7/site-packages/theano/gof/cmodule.py:284: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
  rval = __import__(module_name, {}, {}, [module_name])
[MainProcess, 9275] [DEBUG]  Finished constructing the structure of GrCNNEncoder:
[MainProcess, 9275] [DEBUG]  Size of the input dimension: 50
[MainProcess, 9275] [DEBUG]  Size of the hidden dimension: 50
[MainProcess, 9275] [DEBUG]  Activation function: tanh
[MainProcess, 9275] [DEBUG]  Creating Gating Network...
[MainProcess, 9275] [DEBUG]  AdaSent built finished...
[MainProcess, 9275] [DEBUG]  Total number of parameters in AdaSent: 13205
[MainProcess, 9275] [DEBUG]  Time used to building the model: 172.439958 seconds.
[MainProcess, 9275] [DEBUG]  0 CV partition, Training epoch: 0, total cost: 10895.759564, Training accuracy = 0.698778
[MainProcess, 9275] [DEBUG]  0 CV partition, Training epoch: 0, total cost: 833.788072, Test accuracy: 0.832000
[MainProcess, 9275] [DEBUG]  Parameter: U, L2-norm: 19.829668045
[MainProcess, 9275] [DEBUG]  Parameter: W_l, L2-norm: 1.6544675827
[MainProcess, 9275] [DEBUG]  Parameter: W_r, L2-norm: 1.79139578342
[MainProcess, 9275] [DEBUG]  Parameter: Wb, L2-norm: 0.49146386981
[MainProcess, 9275] [DEBUG]  Parameter: G_l, L2-norm: 3.23852229118
[MainProcess, 9275] [DEBUG]  Parameter: G_r, L2-norm: 3.49429821968
[MainProcess, 9275] [DEBUG]  Parameter: Gb, L2-norm: 0.0631450340152
[MainProcess, 9275] [DEBUG]  Parameter: W_hidden, L2-norm: 4.71216487885
[MainProcess, 9275] [DEBUG]  Parameter: b_hidden, L2-norm: 0.335952311754
[MainProcess, 9275] [DEBUG]  Parameter: W_softmax, L2-norm: 1.41855752468
[MainProcess, 9275] [DEBUG]  Parameter: b_softmax, L2-norm: 0.0285037215799
[MainProcess, 9275] [DEBUG]  Parameter: Weighting vector, L2-norm: 1.1962364912
[MainProcess, 9275] [DEBUG]  Parameter: Word-Embedding, L2-norm: 655.600830078
[MainProcess, 9275] [DEBUG]  0 CV partition, Training epoch: 1, total cost: 6454.488050, Training accuracy = 0.853444
[MainProcess, 9275] [DEBUG]  0 CV partition, Training epoch: 1, total cost: 619.867141, Test accuracy: 0.861000
[MainProcess, 9275] [DEBUG]  Parameter: U, L2-norm: 15.7782564163
[MainProcess, 9275] [DEBUG]  Parameter: W_l, L2-norm: 2.12384343147
[MainProcess, 9275] [DEBUG]  Parameter: W_r, L2-norm: 1.98495721817
[MainProcess, 9275] [DEBUG]  Parameter: Wb, L2-norm: 0.507406651974
[MainProcess, 9275] [DEBUG]  Parameter: G_l, L2-norm: 3.00104594231
[MainProcess, 9275] [DEBUG]  Parameter: G_r, L2-norm: 3.2772564888
[MainProcess, 9275] [DEBUG]  Parameter: Gb, L2-norm: 0.0897985771298
[MainProcess, 9275] [DEBUG]  Parameter: W_hidden, L2-norm: 3.94150781631
[MainProcess, 9275] [DEBUG]  Parameter: b_hidden, L2-norm: 0.370790868998
[MainProcess, 9275] [DEBUG]  Parameter: W_softmax, L2-norm: 1.63154172897
[MainProcess, 9275] [DEBUG]  Parameter: b_softmax, L2-norm: 0.0260569117963
[MainProcess, 9275] [DEBUG]  Parameter: Weighting vector, L2-norm: 1.08205080032
[MainProcess, 9275] [DEBUG]  Parameter: Word-Embedding, L2-norm: 655.600830078
......

Once the model has been trained and saved, you can use the trained model to visualize the process of decision making in AdaSent, e.g., Fig. 6 in our paper.

>> python show_heatmap.py -h
usage: show_heatmap.py [-h] [-m MODEL] name

positional arguments:
  name

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        Using model trained before

show_heatmap.py is an interactive program which allows the user to specify which input sequence in the input to visualize. An example image is provided as follows: ![SUBJ-10.png](file:///Users/hanzhao/Documents/Project/Noah-adasent/snippet/SUBJ-10.png)

Please free feel to contact han.zhao@uwaterloo.ca if you have any questions. Happy hacking!