Using Patterns for Syntactic Parsing

N. Inui and Y. Kotani (Japan)


Syntactic Parsing, Corpus, N-gram, Case-Base


This paper proposes a new corpus-based method to parse natural language sentences. Our system acquires parsing rules from sentences with parenthesized structure. For this kind of corpus, previous works estimate non-terminal symbols for acquiring context-free rules. Different from previous works, we use an N-gram model to generalize rules simply. Our method estimates boundaries of phrases by rewriting rules that are directly acquired from parenthesized sentences and their N-gram approximations. In our experiment, we found better performance in using the products of N-gram occurrence probabilities as the likeliness of phrasing. We obtain the recall ratios of 78% and 86% for Japanese and English, respectively. Our method is able to parse sentences, equal to other methods using linguistic knowledge from experimental results. We consider that our method is, especially, useful for dialogue systems because of its robustness.

