CS671: Natural Language Processing

Department of Computer Science & Engineering, IIT Kanpur

Jul - Nov 2015

Home      |      Course Info     |      Assignments      |      Students     |      Resources     |      Projects

Course Information

(This is from the 2013 NLP course page; this part will be updated soon.)



ESO 211 Data Structures. Some familiarity with probability would help.

Course Objective

NLP attempts to interact with humans and human texts via language. Problems in the domain include analyzing texts to discover structures and to make decisions. Translating from one language to another. Interacting with humans in dialogue systems or cooperative tasks. Particular emphasis on Indian languages.

Issues for languages with relatively poor tagged resources are how to boost unsupervised and semi-supervised methods for the purposes of analysis. Also, how to discover structures via parallel corpora.


  • Jurafsky, Daniel; James H. Martin; Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition Pearson Education India, 2000.
  • Manning, Christopher D.; Hinrich Schuetze; Foundations of Statistical Natural Language Processing Cambridge, MIT Press, 1999.
  • Kiraz, George Anton; Computational Nonlinear Morphology: With Emphasis on Semitic Languages Cambridge University Press, 2001, 171 pages
  • Additional Readings: Oxford Handbook of Computational Linguistics.
    Also: several video lectures will be prescribed.


An important part of this course will be a course project. Each of you is also expected to select a project in which you will investigate some topic of current research interest, and you are expected to be able to communicate the key ideas of your project to others in the course.

Owing to the high project weightage, project groups will be formed by lottery within the first two weeks of class. Subsequently, project topics will be defined based on discussions with each group.

Grading Scheme

  • Two written exams (two hours each): 2 x 20%
  • Course discussions, Homework and Labs: 15-20%
  • Final Project: 40-50%
    (Approx: Proposal: 5%, Presentation: 10%, Report: 15%, Demo/Oral: 15%)

Course Topics

    Lecs 1/2
        Introduction and Overview : Language Structures and Levels
        - Sounds / Words / Sentences / Discourse
        Objectives of NLP

        Morphological processing
        Syntactic analysis - parsing. 

        Regular Expressions, demonstrations of use on corpus. 

        Manning / Shuetze: Ch 1 - Rule-based (rationalist) vs probabilistic
        (empiricist) history

    Lec 3/4
        Morphological processing
        Rule-based: Porter stemmer
        Machine-Learning - Unsupervised approaches - 
        HW1 : Unsupervised Morphological Processing / Parallel Corpora
        [Selection of Papers for Review]

        NOTE: All Homeworks will be for a non-English language. 

    Lec 5-6
        Part of Speech Tagging 
        Supervised Learning / SVM
        Hidden Markov Models 
        Unsupervised POS tagging

    [Sub-groups to review Papers on subtopics]

    Lec 7
        Probability and Information Theory ; Naive Bayes models
        HW2: Spell checker

    Lecs 8-9
        Grammars - CFG grammars - rule-based parsing difficulties. 
        Alternative: Probabilistic grammars

        Discovering grammars from patterns in text
        HW3: Unsupervised Syntax discovery

    [*Special Session: Project Proposal Presentations]
                    [* = extra session]

    Lec 10-11
        Semantic modeling
        Classical ontology-driven approaches
        Latent Semantic Analysis
        Linking Language to Vision / Robotics
    Lec 12  Word discovery from real situations 
            Aligning unsupervised syntax with sensory structures

    Lec 13  Machine Translation 
            Acquiring structures from Parallel Corpora

    Lec 14  Spatial Language and Semantics

    Lecs 15-19: Interim Project Presentations. 

    Lec END  NLP and the rest of AI.  Turing test.  

    [*Final Poster Presentations]