A.1.1 - The corpora for Odia was self made by taking paragraphs from random articles from or.wikipedia.org.
A.1.2 - List of top syllable unigrams for Odia can be found here.
A.1.3 - List of top syllable bigrams for Odia can be found here.
A.2.1 - The corpora for Hindi was provided in the assignment problem as random articles from hi.wikipedia.org.
A.2.2 - List of top syllable unigrams for Hindi can be found here.
A.2.3 - List of top syllable bigrams for Hindi can be found here.
Number of syllables = 109035
Number of distinct syllables = 2995
B.1.1 Log-Frequency vs Rank plot for Odia dataset
Number of syllables = 26307
Number of distinct syllables = 2083
B.2.1 Log-Frequency vs Rank plot for Hindi dataset
C.1.1 - The self made dataset for Odia language can be found here.
C.2.1 - The dataset provided in the assignment problem (hwiki.txt) was used in the problem for denavagari script hindi.
The code for the assignment can be found in the directory ./code/
[1]Choudhury, Monojit. "Rule-based grapheme to phoneme mapping for hindi speech synthesis." 90th Indian Science Congress of the International Speech Communication Association (ISCA), Bangalore, India. 2003