INCREMENTAL OBJECT MATCHING APPROACH OF SCHEMA-FREE DATA WITH MAPREDUCE

doi:10.2316/Journal.202.2014.2.202-3912

INCREMENTAL OBJECT MATCHING APPROACH OF SCHEMA-FREE DATA WITH MAPREDUCE

Kun Ma, Fusen Dong, and Bo Yang

References

[1] F. Naumann and M. Herschel, An introduction to duplicate detection (California, USA: Morgan and Claypool Publishers, 2010).
[2] K. Ma, Z. Chen, A. Abraham, and B. Yang, A transparentdata middleware in support of multi-tenancy, Proc. 7th Int.Conf. on Next Generation Web Services Practices, Salamanca,Spain, 2011, 11–19.
[3] K. Ma, B. Yang, and A. Abraham, A template-based modeltransformation approach for deriving multi-tenant saas applications, Acta Polytechnica Hungarica, 9(2), 2012, 25–41.
[4] L. Getoor, Entity resolution: theory, practice and open challenges, Proc. VLDB Endowment, 5(12), 2012, 2018–2019.
[5] M. Hernndez and S. Stolfo, The merge/purge problem for large databases, ACM SIGMOD Record, 24(2), 1995, 127–138.
[6] R. Cattell, MapReduce: simpliﬁed data processing on large clusters, Communications of the ACM, 51(1), 2008, 107–113.
[7] J. Dean and S. Ghemawat, MapReduce: a ﬂexible data pro-cessing tool, Communications of the ACM, 53(1), 2010, 72–77.
[8] A. Elmagarmid, P. Ipeirotis, and V. Verykios, Duplicate record detection: a survey, IEEE Transactions on Knowledge and Data Engineering, 19 (1), 2007, 1–16.
[9] H. Kopcke and E. Rahm, Frameworks for entity matching: a comparison, Data and Knowledge Engineering, 69(2), 2010,197–210.
[10] P. Christen, T. Churches, and M. Hegland, Febrl c a parallel open source data linkage system, Lecture Notes in Computer Science, 3056, 2004, 638–647.
[11] H. Sik Kim and D. Lee, Parallel linkage, Proc. 16th ACM Conf. on Information and Knowledge Management, Lisbon,Portugal, 2007, 283–292.
[12] T. Kirsten, L. Kolb, M. Hartung, A. Gross, H. K¨opcke, and E. Rahm, Data partitioning for parallel entity matching, 8th Int. Workshop on Quality in Databases, Singapore, 2010.
[13] L. Kolb and A.T.E. Rahm, Multi-pass sorted neighborhood blocking with MapReduce, Computer Science – Research and Development, 27(1), 2012, 45–63.
[14] J. Dean and S. Ghemawat, MapReduce: simpliﬁed data processing on large clusters, Proc. 2004 Symp. on Operating System Design and Implementation, Seattle, WA, USA, 2004, 1–13.
[15] M. Bhandarkar, MapReduce programming with ApacheHadoop, Proc. 2010 IEEE Int. Symp. on Parallel andDistributed Processing, Atlanta, GA, USA, 2010, 1.

Important Links:

Abstract
DOI: 10.2316/Journal.202.2014.2.202-3912
From Journal (202) International Journal of Computers and Applications - 2014

Go Back