I. Moghrabi and F. Yamout (Lebanon)
Relevance Feedback, Query Expansion, Classification
The reformulation algorithm proposed by James French [1], builds a tree-structured classifier, called a query tree. The query tree can easily be transformed into a Boolean query. French's algorithm selects a term that will give the maximum decrease in impurity level from a node to its child, and chooses it as a criterion for splitting the node. If we have more than one term that gives the maximum decrease in impurity level, the algorithm chooses arbitrarily the first one it encounters. Choosing arbitrarily the first term yields an unbalanced tree with more terms. Therefore, the height of the tree turns out to be larger than it should and hence its traversal of the tree requires more I/O accesses to the inverted file. We propose a heuristic rule that establishes a balanced query tree with fewer terms. This is done by ensuring the choice of the term for which the number of non-classified nodes at a given level is maximal. As a result, fewer terms yield faster retrieval evaluated in terms of shorter queries.
Important Links:
Go Back