An Empirical Comparison of Techniques for Extracting Concept Abbreviations from Identifiers

H. Feild, D. Binkley, and D. Lawrie (USA)


neural network, software maintenance, code comprehension, software engineering


When a programmer is faced with the task of modifying code written by others, he or she must rst gain an un derstanding of the concepts and entities used by the pro gram. Comments and identi ers are the two main sources of such knowledge. In the case of identi ers, the meaning can be hidden in abbreviations that make comprehension more dif cult. A tool that can automatically replace abbre viations with their full word meanings would improve the comprehension ability (especially of less experienced pro grammers) to understand and work with the code. Such a tool rst needs to isolate abbreviations within the identi ers. When identi ers are separated by division markers such as underscores or camel-casing, this isolation task is trivial. However, many identi ers lack these division mark ers. Therefore, the rst task of automatic expansion is sep aration of identi ers into their constituent parts. Presented here is a comparison of three techniques that accomplish this task: a random algorithm (used as a straw man), a greedy algorithm, and a neural network based algorithm. The greedy algorithm’s performance ranges from 75 to 81 percent correct, while the neural network’s performance ranges from 71 to 95 percent correct.

Important Links:

Go Back