B. Han (USA, PRC), S. Vucetic, and Z. Obradovic (USA)
Information Retrieval, Text Mining, Support Vector Machine, Bioinformatics
We have initialized research aimed at automatically extracting Medline citations of biomedical articles and reranking them according to their relevance to a certain biomedical property difficult to express as PubMed query. Our proposed approach to this problem is to train support vector machines as classifiers able to distinguish relevant citations from the rest of retrieved citations. We used their predictions to re-rank citations retrieved from PubMed and represented as vectors of term frequencies. Major improvements were achieved in reranking citations with respect to protein disorder-function relationships where the average relative ranking of a relevant citation was improved from 48% to 16%. On average only 13% and 28% of citations relevant to our target topic were recalled in the top 5 and top 10 citations retrieved by queering PubMed with disordered protein names. By our reranking method, this was improved to about 58% and 78%, respectively, suggesting that the proposed method might provide a cost-effective tool for identifying articles that are difficult to express as specific PubMed queries.
Important Links:
Go Back