Comparing Feature Bias and Feature Selection Strategies for Many-Attribute Machine Learning

S. Luo and D. Corne (UK)

Keywords

Feature Selection, Feature Bias, Prediction Tasks, Classification, Proteomics, Machine Learning.

Abstract

We describe the concept of feature bias (FB) strategies and compare such strategies with traditional feature selection (FS) for predictive machine learning on a collection of datasets. FS is a common step in many classification and regression tasks. It is necessary because machine learning tools often cannot cope when the data has thousands of attributes. However, the strategy used by FS techniques is essentially binary. It is hoped that most “irrelevant” features are removed prior to the application of machine learning, and that the subsequent machine learning stage will be much faster (since there are fewer features to process) and also more successful (since many features will be removed by FS that seem unimportant for the classification task at hand). However, FS methods typically rely on standard statistical ideas and are unable to guarantee that all and only relevant features remain. A feature bias strategy, on the other hand, is an alternative approach in which we never entirely remove any feature from consideration. Experimental results reveal that FB can greatly improve upon FS for prediction tasks, particularly on poorly correlated datasets. We propose a tentative guideline for choosing an FS or FB strategy based on simply calculated inherent correlation of the dataset.

Important Links:



Go Back