Introduction
Distributional Semantics Models (DSMs) have become widely ac-
cepted as successful models for lexical semantics. However their ex-
tension to handling larger structural units such as entire sentences re-
mains challenging. Compositional DSMs (CDSMs) aim to successfully
model sentence semantics by taking into account grammatical structure
and logical words, which are ignored by simpler models. We explore a
recursive matrix-vector space model, where each word or phrase has
associated with it a vector capturing its semantics, as well as a matrix
capturing how it alters the meanings of other words or phrases in its
vicinity. We proceed to test this proposed CDSM on the tasks of seman-
tic relatedness score prediction and semantic entailment classification,
over the SICK data set of approximately 10,000 sentence pairs.
Related Work
Our work is primarily based on the recursive matrix-vector
spaces model proposed by Socher, Huval, Manning and Ng[6]. In this model
each word has associated with it a vector and a matrix. The vector captures
the semantics of the word itself and is obtained from the underlying Distri-
butional Semantics Model. The matrix captures how the word can alter the
semantics of other words in its neighborhood, essentially approximating the
effects the effects of ‘operator words’ on semantics.
The authors then outline a two step procedure for evaluating sentence
semantics:
1. Build the parse tree for the given sentence
2. Recursively combine the words according to the syntactic structure of the parse tree, proceeding in a bottom up manner to obtain the
semantic representations for the entire sentence
The evaluation of the matrix and the vector at each combination step in
the recursive procedure is done in such a way that the dimensions of both
are preserved, making the bottom up recursive approach to combination
feasible.
Dataset and Task Description
We have chosen the first task of SemEval-2014, which has two subtasks -
semantic relatedness and semantic entailment prediction. These need to be
performed over the SICK (Sentences Involving Compositional Knowledge)
dataset, specifically designed for this challenge. The dataset consists of a little under 10,000 sentence pairs hand labeled
with semantic similarity scores (on a scale of 1 to 5) and the nature
of semantic entailment (entailment, contradiction, or other) between
them.
1. Obtain the vectors representing the semantics of all sentence pairs
(training, validation and test sets) using our chosen CDSM model.
2. Applying appropriate regression and classification techniques to pre-
dict the semantic similarity score and semantic entailment relationship
respectively
.
Socher provides code for this model to solve the problem of classifying rela-
tions between words in a sentence[6]. We have suitably modified this code for
obtaining the required sentence semantics vectors. Socher’s implementation
uses of the Stanford Parser[2] to obtain the required parse trees.
Regression techniques are used to estimate the relatedness score between
the sentence pairs in the test set. The regression model was trained using
the samples for the training set, whose sentence semantics vectors are cal-
culated above, and whose similarity scores are known.We have explored two
techniques - logistic regression and neural networks.
Classification techniques are used to predict the nature of the semantic en-
tailment relationship between the sentence pairs in the test set. Once again
we train the classification model using labeled samples from the training set.
We have explored the technique of neural networks for this purpose.
Semantic Similarity
Logistic Regression
For logistic regression we divided the data into the following two parts
1. Training set = 9427 samples
2. Test Set = 500 samples
The hypothesis function was calculated without using regularisation.
The mean of the absolute difference between the actual and predicted se-
mantic similarity scores over the test set = 2.96
Neural Networks
For neural networks the data was divided into the following three parts:
1. Training set = 7070 samples
2. Test Set = 500 samples
3. Validation Set = 1885 samples
Here we kept the test set fixed and from the remaining samples, the valida-
tion and the training set are chosen at random. We used a neural network
consisting of one hidden layer containing 200 neurons. The weights of
the neural net converged after 15 iterations, as no further reduction in error
was observed over any of the three sets.
The mean of the absolute difference between the actual and pre-
dicted semantic similarity scores over the test set = 0.71
Semantic Entailment
Semantic Entailment relationship prediction over the test set is carried out
using a trained neural net. The division of the data into training, validation
and test sets is done as discussed above. The neural network
makes use of a single hidden input layer of 700 neurons.
Code and Other Resources
References
1. Semeval’14 task 1:
http://alt.qcri.org/semeval2014/task1/
2. Stanford natural language processing parser:
http://nlp.stanford.edu/software/lex-parser.shtml.
3. Edward Grefenstette and Mehrnoosh Sadrzadeh. Experimental support
for a categorical compositional distributional model of meaning. Proceed-
ings of the 2011 Conference on Empirical Methods in Natural Language
Processing, 2011.
4. Zellig Harris. Distributional structure. Word, 10(23):146–162, 1954.
5. Jeff Mitchell and Mirella Lapata. Composition in distributional models
of semantics. Cognitive Science, 2010. To appear.
6. Richard Socher, Brody Huval, Christopher D. Manning, and An-
drew Y. Ng. Semantic Compositionality Through Recursive Matrix-
Vector Spaces. In Proceedings of the 2012 Conference on Empirical
Methods in Natural Language Processing (EMNLP), 2012.