CS365: Artificial Intelligence

CDSMs for Semantic Relatedness and Entailment

S Sai Krishna Prasad (11620) & Sidharth Gupta (11714)
Dept .of Computer Science and Engineering, IIT Kanpur
Prof. Amitabha Mukerjee

Introduction

Distributional Semantics Models (DSMs) have become widely ac- cepted as successful models for lexical semantics. However their ex- tension to handling larger structural units such as entire sentences re- mains challenging. Compositional DSMs (CDSMs) aim to successfully model sentence semantics by taking into account grammatical structure and logical words, which are ignored by simpler models. We explore a recursive matrix-vector space model, where each word or phrase has associated with it a vector capturing its semantics, as well as a matrix capturing how it alters the meanings of other words or phrases in its vicinity. We proceed to test this proposed CDSM on the tasks of seman- tic relatedness score prediction and semantic entailment classification, over the SICK data set of approximately 10,000 sentence pairs.

Related Work

Our work is primarily based on the recursive matrix-vector spaces model proposed by Socher, Huval, Manning and Ng[6]. In this model each word has associated with it a vector and a matrix. The vector captures the semantics of the word itself and is obtained from the underlying Distri- butional Semantics Model. The matrix captures how the word can alter the semantics of other words in its neighborhood, essentially approximating the effects the effects of ‘operator words’ on semantics.

The authors then outline a two step procedure for evaluating sentence semantics:
1. Build the parse tree for the given sentence
2. Recursively combine the words according to the syntactic structure of the parse tree, proceeding in a bottom up manner to obtain the semantic representations for the entire sentence
The evaluation of the matrix and the vector at each combination step in the recursive procedure is done in such a way that the dimensions of both are preserved, making the bottom up recursive approach to combination feasible.

Dataset and Task Description

We have chosen the first task of SemEval-2014, which has two subtasks - semantic relatedness and semantic entailment prediction. These need to be performed over the SICK (Sentences Involving Compositional Knowledge) dataset, specifically designed for this challenge. The dataset consists of a little under 10,000 sentence pairs hand labeled with semantic similarity scores (on a scale of 1 to 5) and the nature of semantic entailment (entailment, contradiction, or other) between them.

Approach

We took a two step approach to solve each sub-task:

1. Obtain the vectors representing the semantics of all sentence pairs (training, validation and test sets) using our chosen CDSM model.
2. Applying appropriate regression and classification techniques to pre- dict the semantic similarity score and semantic entailment relationship respectively .

Obtaining Sentence Semantics Vectors

Socher provides code for this model to solve the problem of classifying rela- tions between words in a sentence[6]. We have suitably modified this code for obtaining the required sentence semantics vectors. Socher’s implementation uses of the Stanford Parser[2] to obtain the required parse trees.

Predicting Semantic Similarity Scores

Regression techniques are used to estimate the relatedness score between the sentence pairs in the test set. The regression model was trained using the samples for the training set, whose sentence semantics vectors are cal- culated above, and whose similarity scores are known.We have explored two techniques - logistic regression and neural networks.

Predicting Semantic Entailment Relationship

Classification techniques are used to predict the nature of the semantic en- tailment relationship between the sentence pairs in the test set. Once again we train the classification model using labeled samples from the training set. We have explored the technique of neural networks for this purpose.

Result

Semantic Similarity

Logistic Regression

For logistic regression we divided the data into the following two parts
1. Training set = 9427 samples
2. Test Set = 500 samples
The hypothesis function was calculated without using regularisation. The mean of the absolute difference between the actual and predicted se- mantic similarity scores over the test set = 2.96

Neural Networks

For neural networks the data was divided into the following three parts:
1. Training set = 7070 samples
2. Test Set = 500 samples
3. Validation Set = 1885 samples
Here we kept the test set fixed and from the remaining samples, the valida- tion and the training set are chosen at random. We used a neural network consisting of one hidden layer containing 200 neurons. The weights of the neural net converged after 15 iterations, as no further reduction in error was observed over any of the three sets.

The mean of the absolute difference between the actual and pre- dicted semantic similarity scores over the test set = 0.71

Semantic Entailment

Semantic Entailment relationship prediction over the test set is carried out using a trained neural net. The division of the data into training, validation and test sets is done as discussed above. The neural network makes use of a single hidden input layer of 700 neurons.

Code and Other Resources

References

1. Semeval’14 task 1: http://alt.qcri.org/semeval2014/task1/
2. Stanford natural language processing parser: http://nlp.stanford.edu/software/lex-parser.shtml.
3. Edward Grefenstette and Mehrnoosh Sadrzadeh. Experimental support for a categorical compositional distributional model of meaning. Proceed- ings of the 2011 Conference on Empirical Methods in Natural Language Processing, 2011.
4. Zellig Harris. Distributional structure. Word, 10(23):146–162, 1954.
5. Jeff Mitchell and Mirella Lapata. Composition in distributional models of semantics. Cognitive Science, 2010. To appear.
6. Richard Socher, Brody Huval, Christopher D. Manning, and An- drew Y. Ng. Semantic Compositionality Through Recursive Matrix- Vector Spaces. In Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2012.