Information Extraction from Radiology Reports for a Population based Cancer Registry

Dung H.M. Nguyen and Jon D. Patrick


Clinical Assessment and Patient Diagnosis, Clinical Engineering, Information Extraction, Active Learning


A complete system of Cancer Information Extraction for a population based Cancer Registry is introduced. The analysis involves the classification and annotation of radiology imaging reports to identify the components needed to complete cancer staging and recurrence extraction. Besides traditional supervised learning methods such as Conditional Random Fields and Support Vector Machines, active learning approaches are investigated to bring further improvement to the information extraction system performance. A reportability classifier, separating cancer from non-cancer reports, has achieved a performance of 97.74% sensitivity and 96.00% specificity on the held-out test set. The accuracies of Report Purpose classifier and Tumour Stream classifier are approximately 80% on 10-fold cross-validation (CV) experiments. The overall F-score of the tagging system is over 93% on 5-fold CV with approximately 487000 instances from more than 3000 reports manually annotated.

Important Links:

Go Back