CS671: Natural Language Processing

Semester I, 2015-16
Ekansh Gupta, 12252
egupta@iitk.ac.in


Finding Syllables in Indic languages

Language 1 - Hindi in Devanagari script
Corpus - Munshi Premchand's Novel - Karmabhoomi (कर्मभूमि). File Size - 1.5MB
Language 2 - Sanskrit in Latin Extended script
Corpus - Bhagavad Gita. File Size - 90KB
Change browser's encoding if unable to properly view any corpus.

Read the instructions before executing syllable.py and syllable_bigram.py.
Click here for the full list of top 1000 syllables.



Top syllables in Language 1 (Devanagari Hindi)




Top syllable bigrams in Language 1 (Devanagari Hindi)





Top syllables in Language 2 (Latin transliteration of Sanskrit)




Top syllable bigrams in Language 2 (Latin transliteration of Sanskrit)





Log frequency plot of top syllables in Language 1 (Devanagari Hindi)



Log frequency plot of top syllable bigrams in Language 1 (Devanagari Hindi)




Log frequency plot of top syllables in Language 2 (Latin transliteration of Sanskrit)



Log frequency plot of top syllable bigrams in Language 2 (Latin transliteration of Sanskrit)