Linguistic-based Automatic Email Classification

O. Nouali (Algeria) and P. Blache (France)


Information classification, e-mail filtering, linguistic proprieties, Computational Linguistics, neural networks, Learning.


The current approaches in information filtering are based directly or indirectly on the traditional methods of information retrieval [1, 2]. They are based on the occurrence of a given set of keywords to identify possibly relevant information. In general, word-based filtering approach is limited because it suffers from the fact that different users often use different reformulations for the same concept or idea. Thus valuable information may be overlooked by this approach. Our approach to improve filtering quality is to separate classification from filtering. Once the classification process is complete, the filtering takes place. This paper presents an approach to classify e-mail, based on linguistic features model. The main feature of the model is its representation by a neural network and its learning capability. It used to parse the message's body and to extract linguistic features that classifies or characterizes email by type(personal, spam, …). However, with this approach the system can filter and classify messages which don’t match any words with the user’s interests. At the end, to measure the approach performances, we illustrate and discuss the results obtained by experimental evaluations.

Important Links:

Go Back