Empirical Study of a Parser based on M-level N-gram Model

N. Inui and Y. Kotani (Japan)

Keywords

Parsing, ATN, ngram, DFA, CFG, corpus

Abstract

This paper presents a novel method of full sentence parsing of natural language by the simplified ATN, the extension to M-level N-gram model and the preliminary experimental results. The characteristics of our method are followings: 1) Rules are acquired from bracketed sentences where non-terminal symbols are not contained, without language-dependent knowledge, 2) N-gram model is introduced for the robust parsing and 3) M-level model is introduced for the high-accuracy. The simplified ATN is a restricted finite state automaton which is often used in shallow parsing. Since our method is only dependent on corpora, various outputs about languages and syntactic forms are possible by replacing corpus. The experimental results for Japanese and English showed that our method was expected high recall equivalent to the other corpus-based full sentence parsers using linguistic knowledge or deep knowledge.

Important Links:



Go Back