Seminar by Uma Sawant

Uma Sawant
IIT Bombay, India

    Date:    Thursday, August 21st, 2014
    Time:    5:00 PM
    Venue:   CS101.

Abstract:

Entities such as people, locations, cars, phones and other objects surround us and dominate our search experiences. In this talk I will describe my work in query interpretation and response ranking for an entity-aware search engine, where the response can be in terms of documents, answer entities or query-related entities.

In the first part, I will talk about query interpretation and ranking using entity-annotated corpus and structured knowledge base. Much current work focuses on formal interpretation of natural language questions, with the goal of executing the resulting structured queries on knowledge graphs (KGs) such as Freebase. In our work we address two limitations of this approach when applied to open domain, entity-oriented Web queries. First, Web queries are rarely well-formed questions. They are “telegraphic”, with missing verbs, prepositions, clauses, case and phrase clues. Second, the KG is always incomplete, unable to directly answer many queries. We propose a novel technique to segment a telegraphic query and assign a coarse-grained purpose to each segment: a base entity, a relation type, a target entity type, and contextual words. Query segmentation is integrated with the KG and an unstructured corpus where mentions of entities have been linked to the KG. Extensive experiments on the ClueWeb corpus and Freebase, using over a thousand telegraphic queries adapted from TREC, INEX, and Web-Questions, show the efficacy of our approach. For one benchmark, MAP improves from 0.2–0.29 (competitive baselines) to 0.42 (our system). NDCG@10 improves from 0.29–0.36 to 0.54.

In the second part, I will talk about improving document search results using entity annotated corpus and KGs. Entity linking for documents potentially establishes new paths bridging queries through the KG to documents. If keyword queries could be magically augmented with entities mentioned in known relevant documents, ranking accuracy increases dramatically. The challenge is to guess such entities mentioned in relevant documents from the query, its text, and any linked entities. We propose expansion in the KG around entities mentioned in or otherwise related to the query, to find neighborhood entities mentioned in corpus documents.

About the speaker:

Uma is a Ph.D. candidate in the Computer Science and Engineering department of IIT Bombay since Autumn 2011, through a joint co-op program between IITB and Yahoo! Labs. Her main advisor is Prof. Soumen Chakrabarti from IITB and her co-advisors are Dr. Peter Mika and Dr. Roi Blanco from Yahoo! Labs, Barcelona. She is interested in data mining and machine learning. Prior to joining PhD program, she is working as a research engineer at Yahoo! labs, Bangalore.

Back to Machine Learning Seminars