Fast Signature based Recognition of Isolated Arabic Characters

F. Hussain and J. Cowell (UK)


Arabic, fonts, OCR, pattern recognition, confusion matrix, image signatures.


Comparing characters to be recognised against a set of templates is an effective, but slow technique for character recognition. Signaturing has been demonstrated as a valid approach for character recognition which retains the benefits of template comparison but is typically 100 times faster. Signatures are abstractions of the characters to be recognised, which can be derived quickly and are much smaller than the actual character. This technique can be used to identify not only the closest match, but also the closeness of match to all other characters in the set, which in turn can be expressed in a triangular Confusion Matrix. The paper presents an enhanced signature recognition algorithm which is illustrated with examples from the Arabic character set. The likelihood of confusion between groups of Arabic characters is high, where the distinguishing feature may be in the form of one, two, or three dots (positioned above, below or in the middle of the basic shape). The characters are grouped by pre-processing based on the number and position of dots. The signature is then derived and compared against a set of signature templates for the respective group. This approach gives a greater separation between the characters than the non enhanced form and is still very rapid.

