A Refined Multisite Fungal Protein Localizer

M. Nathan and G. Butler (Canada)

Keywords

machine learning, applications to bioinformatics, data min ing, decision tree, subcellular localization.

Abstract

In a previous work, we built a classifier that used a decision tree to predict fungal protein localization based on phys iochemical properties of proteins. 178 features selected from proteins compositional properties, functional motifs and signal sequences were studied for their effect on sub cellular localization. That work resulted in a localizer that would successfully predict some of the reported localiza tions in 64% of the cases and all the reported localizations in 49% of the cases. Here, we improve on the results of the mentioned work by streamlining the classes of protein features used. Considering various modes of intra-cellular protein movement and the requirements for such transport, we establish a list of features that would have direct im pact on the recognition of the proteins by the transport ma chinery of the cell. We shall detect the occurrence of such features in fungal proteins and use them as potential de terminants of subcellular localization. The system rebuilt based on 980 of such features is validated using a 5-fold cross validation and results in a success rate of 87% for predicting some and 77% for predicting all the reported lo calization sites of 3 fungal species for which annotations on subcellular localization were available.

Important Links:



Go Back