Interpreting Unknown Words in Machine Translation from Hindi to English

R.M.K. Sinha (India)


Natural language processing, Unknown words, Hindi to English, Machine translation.


A natural text for translation using machine contains several unknown words for which there are no entries in the dictionary. These words may be names, acronyms, abbreviations, terminology and foreign words. Also, some of the words may not be found in the dictionary due to its limited size. A machine translation system has to provide mechanism for handling such unknown lexical units. In this paper we describe the strategy adopted in our system for machine aided translation from Hindi to English. No attempt has been made to expand the vocabulary by deriving their meaning. Instead, a set of heuristics is used to identify the nature of the unknown word and generate appropriate form for translation. It is a common practice in India to mix the words of English in Hindi and vice versa. However, the grammatical rules in construction of gender, number, verb-nominalization or forms, conform to that for the language (Hindi or English) used irrespective of their origin.

