A Fast Algorithm of Address Lines Extraction on Complex Chinese Mail Pieces

T. Liu, X. Ding, Q. Fu (PRC), and Z. Ren (Germany)

Keywords

Chinese mail address lines, connected components Analy sis, similarity, split and merge

Abstract

A fast and efficient method is presented to extract address lines on both machine printed and handwritten Chinese mail envelopes. The algorithm is based on a bottom-up approach. First, we select out text blocks from connected components (CCs) and immediately group the text blocks into the initial lines. Then, the average text block features are computed to validate the initial text lines and guide an iterative split and merge process. Lines are split by merg ing the text CCs in detail according to criteria for similarity and consistency of neighborhood text blocks. Particularly, some non-text blocks within the lines are recovered if they are similar with other text blocks. A skew detection and, accordingly, deskew step is followed. We have tested the performance of our methods on a large mail sample test deck with different categories of envelopes, and an obvious improvement both on accuracy and on computation time could be achieved compared to our previous system.

Important Links:



Go Back