Tanja Langguth and Dirk Krechel
Data Mining, Knowledge Acquisition, Information Retrieval, Ontology
The automatic extraction of meta data from wiki articles has to be carried out when saving a new article. The meta data types to extract are known in advance, e.g. as a consequence of an automatic classification of the text. Before the automatic extraction, a training has to be performed. Therefore the system is designed as a classification system, using the possible meta data types as classes. These classes are assigned to natural language expressions which were extracted from the article text. As a test data set, we use some Wikipedia articles and their according DBpedia data, which represent sample meta data. A Named Entity Recognition is used for the retrieval of candidates. Then, semantic, syntactic and lexical features are extracted. For the classification, a decision tree learner, a k-nearest neighbour classifier and a naive bayes classifier are compared.
Important Links:
Go Back