Seminar by Mohit Bansal

Web-scale Features for Disambiguation in Full-scale Syntactic Parsing

Mohit Bansal
Univ. of California, Berkeley

    Date:    Wednesday, December 26th, 2012
    Time:    12NOON
    Venue:   CS101.

Abstract:

A major current trend in natural language processing (NLP) tasks is to employ as much data as possible. The small amount of training data employed for specific NLP tasks is usually not sufficient and it also requires expert manual annotation. Therefore, NLP tasks would benefit greatly from exploiting a 'free' resource like the Web, which is the largest dataset available to us and represents vast amounts of world knowledge. The challenge in this direction is to extract the right kind of information from this huge pool of noisy data, to help make deep linguistic decisions.

In this work, we use Web information as powerful syntactic cues and improve on state-of-the-art natural language parsers. We compute features (properties) for competing collections of words in the ambiguity decision at hand. These features encode both surface evidence of lexical affinities as well as paraphrase-based cues to syntactic structure, and help disambiguate various head-modifier relations, for example, the tricky prepositional phrase (PP) attachment and noun compound bracketing ambiguities. Our Web features address the full range of syntactic attachment ambiguities, and are integrated into both full-scale dependency and constituent parsers. We show relative error reductions of 7.0% over the second-order dependency parser of McDonald and Pereira (2006), 9.2% over the latent-annotation constituent parser of Petrov et al. (2006), and 3.4% over a non-local constituent reranker.

This is joint work with Dan Klein and appeared in ACL 2011

Back to Seminars in 2012-13