An Arabic Auto-indexing System for Information Retrieval

R. A. Haraty, N. Mansour, and W. Daher (Lebanon)

Keywords

Information retrieval, stem word extraction, term's weight, and spread factor.

Abstract

This paper tackles the problem of auto-indexing Arabic documents. For that purpose, a four-layer model is proposed as a solution. The model's layers are interdependent meaning that the proposed solution works fine not only for Arabic but also for any other language. Obviously, that would require the Arabic grammar to be taken into consideration. In addition, this report introduces a new concept to calculate the weight of a term relevant to its container document. Traditionally, the weight of a term used to rely totally on the rate of repeat (or the count) of that term. The new innovation is to take into consideration the rate of "spreading" within the document.

Important Links:



Go Back