Converting Legacy Indic Content to Standards (Unicode)

When content is created and distributed in English, all computers in the world agree upon and understand the way characters are represented. This common understanding is called ASCII and we take it for granted when we browse web sites or send emails. For Indian languages this standard is either ISCII or Unicode. However, many Indian language websites and applications use proprietary solutions developed by various Indian language software makers in years past.

This leads to a great crippling of what technology can do.  For example, documents produced using a particular legacy software cannot be sent to someone else, unless they also purchase the same legacy software. Given the wide variety of such software and their mutual incompatibility, the ability to transfer documents and take advantage of information produced elsewhere is practically lost.

These proprietary encodings are prepared with the very specific and narrow aim of displaying Indian language content on screen; for all practical purposes it is no more use than a sequence of images.  As a result, it is difficult for the machine to understand and process Indian language text rendered using these systems. Consequently, simple and useful text processing operations like spell checking, searching, sorting are all made impossible. It is no surprise, therefore, that we have not seen any increase in the development or use of such processing software for Indian languages in the last few years.  The automated converters we developed can help convert these documents to a standardised representation. These converted documents can then be processed and/or transmitted to anyone without any special requirements on the recipient's side.