Meta Data Extraction from Unstructured Wiki Texts

Tanja Langguth and Dirk Krechel

Keywords

Data Mining, Knowledge Acquisition, Information Retrieval, Ontology

Abstract

The automatic extraction of meta data from wiki articles has to be carried out when saving a new article. The meta data types to extract are known in advance, e.g. as a consequence of an automatic classification of the text. Before the automatic extraction, a training has to be performed. Therefore the system is designed as a classification system, using the possible meta data types as classes. These classes are assigned to natural language expressions which were extracted from the article text. As a test data set, we use some Wikipedia articles and their according DBpedia data, which represent sample meta data. A Named Entity Recognition is used for the retrieval of candidates. Then, semantic, syntactic and lexical features are extracted. For the classification, a decision tree learner, a k-nearest neighbour classifier and a naive bayes classifier are compared.

Important Links:



Go Back