Seminar by Prof. Rajat Moona

Technology Development for the Indian Languages at IIT Kanpur

Rajat Moona
Department of Computer Science and Engineering
Indian Institute of Technology Kanpur
Date: July 31, 2001
Time: 03:30 PM
Venue: CC-217
Tea/snacks will be served at 03:15 PM.

Abstract

At IIT Kanpur we have some projects on the technology development for the Indian Languages. Some of these are centered around development of the web sites with the Indian Language contents, translation system and development of Indian Language technologies for the Linux. In this discussion meeting, we shall focus on the IIT Kanpur effort on the Linux based Indian Language Technologies. Internationalization efforts of the scripts for the computers have been going on for quite some time. Many of the world languages are linear in nature. It essentially means that there are few shapes in which the the documents are written in these languages. The documents are written by cascading the corresponding shapes in one direction. English is a prime example of this and most European languages fall in this category. As opposed to this, most Indian languages are phonetic in nature and the scripts for these languages convey the phonetic meaning in much more precise manner. This means that there is large number of display shapes to write these scripts.

With the standard coding for characters, like ASCII, it it not possible to encode all the display shapes in a linear manner, mainly because the coding space is limited. Indian Standard Code for Information Interchange (ISCII) was defined and adopted as an IS standard for representation of the Indian Script documents in Indian Languages. Unfortunately it breaks the strong link between the fonts and coding standard of the characters.

We developed some methodologies for entry and display of the Indian Language Text for the Unix based X window system. One such software component had been the Iterm which was developed in 1996 and is one of the only kind which allows standard Unix text based programs (vi, cat etc.) to show output in Devanagari.

We are now developing library based support for the Indian Languages which will then permit other applications to run including web browsers to display text in Indian Languages. Currently the web contents are developed using a non-standard font based coding as that is the only way browsers can display the text.

Some other technologies that we are developing are, printing utilities for the Indian Script texts, spelling check programs which can handle "sandhi", very commonly used in Indian Languages, add on to the browsers to be able to handle the Indian Languages. We shall discuss our approach and techniques in this forum.

About the Speaker

Rajat Moona is a faculty member in the CSE department at IIT Kanpur.

Back to Seminars in 2001-02