Natheer Khasawneh, Maisa M. Al-Khudair, and Mohammad Fraiwan
Arabic text-to-speech, corpus reduction, diphone, classification
Text-to-speech tools are gaining an increasing momentum with the pervasiveness of today’s computer applications. These tools are typically implemented using diphones and syllables, with a body of knowledge (i.e., corpus) comprised of pre-recorded sounds. Although pre-recording achieves high intelligibility and a more natural experience, it requires a large memory size to store the sounds, which in turn leads to slowness in the conversion process. In this paper, we tackle the problem of reducing the size of memory required to store the pre-recordings of an Arabic text-to-speech system. We take a different approach and explore building a classification model based on predefined types of news documents, and propose a scheme for constructing an Arabic corpus based on this model. Performance evaluation results show that, using our scheme, a 29% reduction of the database size will only incur a 0.57% loss of recognition correctness, while a massive 89% reduction will lower the correctness by a mere 1.29%.
Important Links:
Go Back