A Framework for Generating Data to Simulate Changing Environments

A. Narasimhamurthy and L.I. Kuncheva (UK)

Keywords

Machine Learning, Changing environments, Concept drift, Population drift, Artificial data, Simulated data.

Abstract

A fundamental assumption often made in supervised clas sification is that the problem is static, i.e. the description of the classes does not change with time. However many practical classification tasks involve changing environ ments. Thus designing and testing classifiers for changing environments are of increasing interest and importance. A number of benchmark data sets are available for static clas sification tasks. For example, the UCI machine learning repository is extensively used by researchers to compare algorithms across various domains. No such benchmark datasets are available for changing environments. Also, while generating data for static environments is relatively straightforward, this is not so for changing environments. The reason is that an infinite amount of changes can be simulated, and it is difficult to define which ones will be realistic and hence useful. In this paper we propose a general framework for generating data to simulate changing environments. The paper gives illustrations of how the framework encompasses various types of changes observed in real data and also how the two most popular simulation models (STAGGER and moving hyperplane) are represented within.

Important Links:



Go Back