H.H. Shahri and A.A. Barforush (Iran)
Data cleaning, fuzzy duplicate elimination, adaptation
Fuzzy duplicate elimination is an important part of the data cleaning process, especially in data warehousing and integration, where data is gathered from distributed and inconsistent sources. Learnable string similarity measures are an active area of research in the duplicate elimination problem. Our proposed framework, AFFDEF (Adaptive and Flexible Fuzzy Duplicate Elimination Framework), builds upon our earlier work on duplicate elimination and exploits neuro-fuzzy modeling for the first time, to produce a unique adaptive framework for duplicate elimination, which automatically learns and adapts to the specific notion of similarity at a meta-level and encompasses many of the previous works on trainable and domain-specific similarity measures. As reported earlier, employing fuzzy inference, removes the repetitive task of hard-coding a program based on a schema, which was always required in the previous approaches. In addition, our extendible framework grants flexibility to the user. Hence, it can be utilized in the production of an intelligent and effective tool to increase the quality and accuracy of data.
Important Links:
Go Back