Classification and Anomaly Detection in Tolerance Space

F.-S. Sun and C.-H. Tzeng (USA)

Keywords

Intelligent Data Systems and Computing, Data Mining, Similarity, Classification.

Abstract

This paper introduces a Boolean classification model from a view point of data clustering on tolerance space. A tol erance space is an abstract set having a similarity which is a binary relation with two properties: reflexivity and sym metry. A representative clustering is a representative sys tem of the space so that any element is similar to at least one representative. If a similarity is properly chosen so that similar elements belong likely to the same class, the clustering is used to develop a classification model, which is used to estimate the conditional probabilities of classes. Then a classification decision can be made based on the conditional probabilities. Before defining a tolerance space for a huge database in a classification task, the database is first reduced by feature selection and signature search. The records classifiable by the signatures will be excluded. The classification model is for the rest of the records and is developed by first forming the feature vectors of a train ing data set to a tolerance space with a suitable similarity and then computing a sub-minimal representative system. Conditional probabilities of classes for a record are esti mated by comparing the record with the representatives in the sub-minimal representative system. Experiments on a network intrusion dataset show that a classification model can predict network attacks accurately when a proper sim ilarity is chosen.

Important Links:



Go Back