Clustering Analysis for Data Samples with Multiple Labels

A. Attik, S. Al Shehabi, and J.-C. Lamirel (France)

Keywords

Clustering analysis, relevancy analysis, knowledge extrac tion, labeling strategy, multiple labels. 503-039

Abstract

This paper presents a new clustering analysis approach based on data samples with multiple labels. It especially deals with the case where each label has no antagonistic label and the absence of a label for a data does not nec essarily imply that this data cannot have said label, e.g. the substances in mineral exploration, the keywords of the Web pages, . . . The proposed approach relies on two anal yses that are conduced in a parallel way: cluster analysis and label analysis. The cluster analysis aims at selecting the most interesting or relevant clusters. The label analysis aims both at classifying the labels into specific categories such as implicit, explicit, noisy and novel and into more general embedding categories that are relevant and irrele vant. The proposed analysis methods are based on the use of two main informations: the similarity between the data given by the clustering algorithm and the distribution of the labels in the model after a projection of these labels on the classification model. Moreover, these methods make use of original quality measures for performing both labels and cluster analyses. An experimentation in the domain of documentary data highlights the accuracy of the proposed approach.

Important Links:



Go Back