A Procedure for Building Reduced Reliable Training Datasets from Real-World Data

Silvia Cateni, Valentina Colla, Marco Vannucci, and Marco Vannocci

Keywords

AI in Data Analytics, Modelling and simulation, Outlier Detection, Variable Selection

Abstract

Dimensionality reduction and anomalous data detection are important tasks in machine learning and data mining applications. Many real-world datasets are affected by errors and variable redundancy and this fact can generate problems when the data are used to develop accurate models exploiting some training procedures for parameters tuning. In this paper an automatic procedure is proposed combining detection of unreliable data and reduction of dimensionality to be adopted before exploiting the data to develop a model for prediction purposes. The method has been tested on several datasets belonging to the UCI repository and industrial fields. The results of tests are showed and discussed in the paper. The proposed approach provide a good prediction accuracy providing a minimal but essential dataset.

Important Links:



Go Back