Arabic Text Data Mining: A Root Extractor for Dimensionality Reduction

T.M. Eldos (Jordan)

Keywords

Text Data Mining, Information Retrieval,Categorization and Indexing.

Abstract

The world has recently witnessed a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources and company-wide intranets. Text data mining, as a multidisciplinary field involving information retrieval, text analysis, information extraction, clustering, categorization and linguistics, is becoming of more significance, and efforts have been multiplied in studies to provide for fetching the increasingly available information efficiently. In the past few years, not only have new documents been produced directly in digital form, thus being suitable for automatic indexing, but also many of the older documents have been ported from their physical medium to the digital one. The meaning of a document is represented by a vector of features, which are weighted according to a measure that best estimate relevance. Text categorization presents unique challenges due to the large number of attributes present in the data set, large number of training samples, and attributes dependencies. This paper focuses on the dimensionality problem in multilingual text data mining.

Important Links:



Go Back