Sobia Habib, Manoj K. Shukla, and Rajiv Kapoor
Pre-processing, hough transform, line, character, separate touching line (STL), separate touching character (STC)
Line and character segmentation pose significant challenges in the context of degraded Devanagari script. When two lines in the script touch or overlap, it becomes arduous to determine the precise split point between them. Similarly, during character segmentation, the same issue arises, complicating the process further. This paper introduces novel techniques for effectively segmenting touching and overlapping lines and characters in degraded Devanagari Script. An innovative algorithm, known as the “Separate Touching Line Algorithm,” is put forth to accurately segment touching and overlapping lines, with a specific focus on identifying header lines. The classical vertical projection profile method separates the words from the line. The character segmentation algorithm employed in this research extracts various features, such as aspect ratio, intersecting points, horizontal projection profile, and centroids of connected components. These features are utilised to separate the components effectively. Unlike previous approaches in the Devanagari script, which often treat compound characters as distinct entities, this algorithm aids in identifying and separating fused and compound characters by determining the split point. Additionally, it facilitates the detection and separation of multiple points of contact between characters. This approach proves beneficial in reducing the reliance on compound characters and expanding the training dataset. Our line segmentation algorithm demonstrates an impressive accuracy rate of 98.89%, while the character segmentation algorithm achieves an accuracy of 97.49%. These remarkable results surpass the performance of all previously employed methods for handling degraded printed documents.
Important Links:
Go Back