Course Information
Instructor
Prerequisites
ESO 211 Data Structures. Some familiarity with probability would help.
Course Objective
NLP attempts to interact with humans and human texts via language.
Problems in the domain include analyzing texts to discover structures and
to make decisions. Translating from one language to another. Interacting
with humans in dialogue systems or cooperative tasks. Particular emphasis
on Indian languages.
Issues for languages with relatively poor tagged resources are how to boost
unsupervised and semi-supervised methods for the purposes of analysis.
Also, how to discover structures via parallel corpora.
References
- Jurafsky, Daniel; James H. Martin; Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition Pearson Education India, 2000.
- Manning, Christopher D.; Hinrich Schuetze; Foundations of Statistical Natural Language Processing Cambridge, MIT Press, 1999.
- Kiraz, George Anton; Computational Nonlinear Morphology: With Emphasis on Semitic Languages Cambridge University Press, 2001, 171 pages
-
Additional Readings: Oxford Handbook of Computational Linguistics.
Also: several video lectures will be prescribed.
Projects
An important part of this course will be a course project. Each of you is
also expected to select a project in which you will investigate some topic of
current research interest, and you are expected to be able to communicate the
key ideas of your project to others in the course.
Owing to the high project weightage, project groups will be formed
by lottery within the first two weeks of class. Subsequently,
project topics will be defined based on discussions with each group.
Grading Scheme
- Two written exams (two hours each): 2 x 20%
- Course discussions, Homework and Labs: 15-20%
- Final Project: 40-50%
(Approx: Proposal: 5%, Presentation: 10%, Report: 15%, Demo/Oral: 15%)
Course Topics
Lecs 1/2 Introduction and Overview : Language Structures and Levels - Sounds / Words / Sentences / Discourse Objectives of NLP Morphological processing Syntactic analysis - parsing. Regular Expressions, demonstrations of use on corpus. Manning / Shuetze: Ch 1 - Rule-based (rationalist) vs probabilistic (empiricist) history Lec 3/4 Morphological processing Rule-based: Porter stemmer Machine-Learning - Unsupervised approaches - HW1 : Unsupervised Morphological Processing / Parallel Corpora [Selection of Papers for Review] NOTE: All Homeworks will be for a non-English language. Lec 5-6 Part of Speech Tagging Supervised Learning / SVM Hidden Markov Models Unsupervised POS tagging [Sub-groups to review Papers on subtopics] Lec 7 Probability and Information Theory ; Naive Bayes models HW2: Spell checker Lecs 8-9 Grammars - CFG grammars - rule-based parsing difficulties. Alternative: Probabilistic grammars Discovering grammars from patterns in text HW3: Unsupervised Syntax discovery [*Special Session: Project Proposal Presentations] [* = extra session] Lec 10-11 Semantic modeling Classical ontology-driven approaches Latent Semantic Analysis Linking Language to Vision / Robotics Lec 12 Word discovery from real situations Aligning unsupervised syntax with sensory structures Lec 13 Machine Translation Acquiring structures from Parallel Corpora Lec 14 Spatial Language and Semantics Lecs 15-19: Interim Project Presentations. Lec END NLP and the rest of AI. Turing test. [*Final Poster Presentations]