Multivariate Similarity-based Conformity Measure (MSCM): An Outlier Detection Measure for Data Mining Applications

S.A. Badawy (Canada), A. Elragal, and M. Gabr (Egypt)

Keywords

Outlier Detection, Data Mining, Similarity, Conformity.

Abstract

Outliers, the odd objects in the dataset, can be viewed from two different perspectives; the outliers as undesirable objects that should be treated or deleted in the data preparation step of the data mining process, and the outliers as interesting objects that are identified for their own interest in the data mining step of the mining process. In the latter case, outliers shouldn’t be removed, that’s why one of the main categories of tasks performed by data mining techniques is outlier detection. Applications that make use of such detection include credit card fraud detection and network intrusion detection. Most of the available outlier detection techniques rely in a distance measure to compare the objects in the dataset which imposed the restriction of dealing with numeric data. In this paper a new multivariate similarity-based conformity measure (MSCM) is suggested to be used to detect outliers in datasets that contain attributes of different data types. The MSCM satisfies two other desirable features; being a multivariate measure and giving ranking instead of a binary judgment of the object. The measure has been applied on three different datasets in order to be evaluated; the measure has shown good results in these experiments.

Important Links:



Go Back