CS671: Natural Language Processing
Assignment: Homework 1 [Finding Syllables in Indic Languages]




Personal Details

Name: PRABUDDHA CHAKRABORTY
Roll Number: 15111027
1st year, M.Tech(Computer Science and Engineering)
IIT Kanpur



Note- Algorithm 2 for finding Bengali syllables is partially based on the algorithm presented in the paper "Rule Based Grapheme to Phoneme Mapping for Hindi Speech Synthesis" by Monojit Choudhury


Codes

Note- Please run the Codes using Python 3.

Bengali


Bengali: Corpus

Bengali: Letter Frequency Plot
Bengali: List of top Letters and Bigram
Bengali: Word Frequency Plot
Bengali: List of top Words and Bigram

Using Algorithm 1

    Bengali: Syllable Frequency Plot
    Bengali: List of top syllables and Bigram

Using Algorithm 2

    Bengali: Syllable Frequency Plot
    Bengali: List of top syllables and Bigram

Latin Devanagari


Latin Devanagari: Corpus

LatinDevanagari: Syllable Frequency Plot
LatinDevanagari: List of top syllables and Bigram
LatinDevanagari: Letter Frequency Plot
LatinDevanagari: List of top Letters and Bigram
LatinDevanagari: Word Frequency Plot
LatinDevanagari: List of top Words and Bigram

English


English: Corpus

English: Syllable Frequency Plot
English: List of top syllables and Bigram
English: Letter Frequency Plot
English: List of top Letters and Bigram
English: Word Frequency Plot
English: List of top Words and Bigram

Bengali corpus Created by me

-----Bengali: Corpus