INFORMATION EXTRACTION FROM RADIOLOGY REPORTS FOR A POPULATION BASED CANCER REGISTRY

Dung H.M. Nguyen, Jon D. Patrick

Keywords

Clinical Assessment and Patient Diagnosis, Clinical Engi-neering, Information Extraction, Active Learning, Cancer.

Abstract

A complete system of Cancer Information Extraction for a population based Cancer Registry is introduced. The anal- ysis involves the classification and annotation of radiology imaging reports to identify the components needed to com- plete cancer staging and recurrence extraction. Besides tra- ditional supervised learning methods such as Conditional Random Fields and Support Vector Machines, active learn- ing approaches are investigated to bring further improve- ment to the information extraction system performance. A reportability classifier, separating cancer from non-cancer reports, has achieved a performance of 97.74% sensitivity and 96.00% specificity on the held-out test set. The accura- cies of Report Purpose classifier and Tumour Stream clas- sifier are approximately 80% on 10-fold cross-validation (CV) experiments. The overall F-score of the tagging sys- tem is over 93% on 5-fold CV with approximately 487000 instances from more than 3000 reports manually annotated.

Important Links:



Go Back