Selecting the Best Feature Set for Thai Word Sense Disambiguation using Support Vector Machines

C. Nusai, Y. Suzuki, and H. Yamazaki (Japan)

Keywords

Natural language processing, machine learning, word sense disambiguation, and support vector machines

Abstract

This paper proposes a method of selecting the best feature set for Thai word sense disambiguation by using Support Vector Machines (SVM) algorithm. This research focuses on Thai verb sense disambiguation. Many approaches have been employed to resolve the word sense ambiguity with a reasonable degree of accuracy. Our research focuses on the corpus-based approach that employs a supervised machine learning method for disambiguation. The machine learning method has the ability of selecting the suitable feature. In order to find the best feature set for resolving Thai word sense ambiguity, our method uses characteristics of the words co-occur with the ambiguous word in sentences extracted from Thai corpus for determining sense of the ambiguous word. The ambiguous words are evaluated with 30 feature sets under “word” “part of speech (POS)” and “semantic concept (SM)” features. The result shows that “word & SM” feature set gives the best result as the best feature set of sense indicator and the accuracy rate is approximately 90 96 %.

Important Links:



Go Back