B. Chen, P.C. Tai, R. Harrison, and Y. Pan (USA)
Fuzzy-Greedy-Kmeans model (FGK model), sequence motif, HSSP and BLOSUM 62.
Discovering protein sequence motif information is one of the most crucial tasks in bioinformatics research. In this paper, we try to obtain protein recurring patterns which are universally conserved across protein family boundaries. In order to achieve the goal, our dataset is extremely large. Therefore, an efficient technique is required. In this article, short recurring segments of proteins are explored by utilizing a granular computing strategy. First, Fuzzy C-Means clustering algorithm (FCM) is applied to separate the whole dataset into several smaller information granules and then followed by a novel greedy initialization OF K-means clustering algorithm on each granule to obtain the final results. A new evaluation method for sequence motif information, based on the function of the HSSP and the BLOSUM62 matrix, is also proposed. Compared with the existing IEEE Trans. research results, our method requires only one fifth of the execution time and shows better results in all three different quality measures.
Important Links:
Go Back