Clustering Quality Measures for Data Samples with Multiple Labels

M. Attik, S. Al Shehabi, and J.-C. Lamirel (France)

Keywords

Clustering analysis, quality measurement, model selection, stopping criterion, multiple labels. 503062

Abstract

This paper focuses on the problem of data classification whenever these data are associated with multiple labels. It especially deals with the case where each label has no an tagonistic label and the absence of a label for a data does not necessarily imply that this data cannot have said label, e.g. the substances in mineral exploration, the keywords of the Web pages, . . . We propose new clustering quality measurements which are adapted to data associated with multiple labels. Said measurements are based on the use of two main informations: the similarity between the data given by the clustering algorithm and the distribution of the labels in the model after a projection of these labels on the classification model. Their main area of application is the clustering model selection problem. They can also be used for determining the stopping criterion for the cluster ing algorithm training. An experimentation of the proposed measurements in the documentary data analysis field shows that they significantly outperform the state of the art.

Important Links:



Go Back