Seminar by Dr. Sanjeev Khudanpur
Statistical Language Modeling: An Introduction
Dr. Sanjeev Khudanpur
Department of Electrical and Computer Engineering
and
Department of Computer Science
John Hopkins University
Baltimore, Maryland, USA
Date: Thursday, January 15, 2004
Time: 2:15 PM
Venue: CS-101
Abstract
A viewpoint that treats natural language as a stochastic process and uses statistical models for various tasks such as automatic speech recognition, machine translation, information retrieval, handwriting recognition and spelling correction, has met with remarkable success in the last two and a half decades. This talk will first outline the statistical framework(s) used in these tasks, collectively called human language technologies. A probabilistic generative model of natural language, called a language model, plays a key role in all these technologies. The talk will review the state of the art in language modeling. Some shortcomings of current techniques will be pointed out, namely the excessive abstraction of natural language as a symbol sequence and the consequent failure to model syntactic structure and meaning. Recent efforts to address these shortcomings will be described.
About the Speaker
Sanjeev Khudanpur is an Assistant Professor of Electrical and Computer Engineering and of Computer Science, and a member of the Center for Language and Speech Processing at the Johns Hopkins University in Baltimore, Maryland (USA). He received his B. Tech degree from IIT Bombay in 1988 and his Ph.D from the University of Maryland in 1997. His research interests are in the application of information theoretic ideas to human language technology, including automatic speech recognition, machine translation, information retrieval and natural language processing. He is a member of the IEEE, the Association for Computational Linguistics (ACL), and ACM.