An Evaluation of Keyword Selection on Gene Clustering in Biomedical Literature Mining

V. Dasigi, O. Karam, and S. Pydimarri (USA)

Keywords

Text Mining, Automatic Keyword Selection, Feature Fusion, TF-IDF, Z-Score, Gene Clustering

Abstract

We describe two statistical metrics, Z-score and a variant of the familiar TF-IDF, which are appropriate for identifying keywords associated with genes by mining a collection of MEDLINE® abstracts. We describe experiments in clustering genes based on the identified keyword features that different genes share with each other. The quality of clustering is measured by comparing the clusters generated by a clustering algorithm against expert-defined clusters. We evaluate the quality of clustering based on keyword features identified by the two different metrics, as well as combinations of the keywords derived from the metrics. We present these results and our analysis.

Important Links:



Go Back