Capacity of a Supervised Learning Algorithm

B. Bo?tjan (Slovenia/Finland), G. Izidor, W. Tatjana (Slovenia), and J. Hannu (Finland)

Keywords

data mining, knowledge discovery, supervised learning, assessment of learning algorithms

Abstract

Knowledge discovery is a process of extracting knowledge from data. It consists of several steps, from identifying data sources, data preparation, and data cleaning, to the extraction process itself, the evaluation of the results and consolidation of the new findings with the old knowledge. The extraction process is referred to as data mining, which received a huge amount of attention lately. Data mining is an approach to data analysis by using techniques from many different fields, such as databases, artificial intelligence, statistics, machine learning, visualization, computer graphics, and many others. One of the tasks of the data mining is to build a model on given (old) data and to use the model on new, unseen data. Such a model is often used to classify data items into pre-defined classes. A model makes mistakes – the same way the human experts sometimes wrongly associate the symptoms (observations) with illnesses (target classes). As with the human experts, computer generated models tend to make fewer mistakes with more training and learning. The question that arises is what is the »final« rate of mistakes. We define the final rate of mistakes as the capacity of a learning algorithm. The contribution of this paper is the approach to estimate the capacity of a supervised learning algorithm. The approach can be used to check whether a given algorithm would achieve a desired accuracy had it had unlimited training.

Important Links:



Go Back