Artificial Life Feature Selection Techniques for Spam E-Mail Filters

R. Wang, A.M. Youssef, and A.K. Elhakeem (Canada)

Keywords

Spam email filter, feature selection, Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO)

Abstract

Spam e-mail, has turned from being a mere nuisance to a major problem for most enterprise networks. In this paper, we present two novel artificial life feature selection techniques to improve the accuracy of spam e-mail filters. First, each feature in the e-mail is represented by its term frequency and inverse document frequency (TF-IDF). These features are then sorted by document frequency (DF) to choose some powerful discriminatory terms that best represents the e-mail. Then, a representative feature subset with the best discriminatory power among the constructed features is selected using two artificial life techniques: Particle Swarm Optimization (PSO), and Ant Colony Optimization (ACO). Both the PSO and ACO are tailored to fit the binary nature of the feature selection problem. Using a K-Nearest Neighbor (KNN) classifier, the features obtained using both the PSO and ACO outperform those obtained using the Genetic Algorithm (GA) and classical Principle Component Analysis (PCA) feature selection. In particular, without any feature selection, the obtained accuracy is about 88.1%. Using PSO and ACO based features, the accuracy is about 91.8% and 95.4% compared to 90.9% for GA based features and 90% for PCA based features.

Important Links:



Go Back