INCREMENTAL OBJECT MATCHING APPROACH OF SCHEMA-FREE DATA WITH MAPREDUCE

Kun Ma, Fusen Dong, and Bo Yang

References

  1. [1] F. Naumann and M. Herschel, An introduction to duplicate detection (California, USA: Morgan and Claypool Publishers, 2010).
  2. [2] K. Ma, Z. Chen, A. Abraham, and B. Yang, A transparent data middleware in support of multi-tenancy, Proc. 7th Int. Conf. on Next Generation Web Services Practices, Salamanca, Spain, 2011, 11–19.
  3. [3] K. Ma, B. Yang, and A. Abraham, A template-based model transformation approach for deriving multi-tenant saas applications, Acta Polytechnica Hungarica, 9(2), 2012, 25–41.
  4. [4] L. Getoor, Entity resolution: theory, practice and open challenges, Proc. VLDB Endowment, 5(12), 2012, 2018–2019.
  5. [5] M. Hernndez and S. Stolfo, The merge/purge problem for large databases, ACM SIGMOD Record, 24(2), 1995, 127–138.
  6. [6] R. Cattell, MapReduce: simplified data processing on large clusters, Communications of the ACM, 51(1), 2008, 107–113.
  7. [7] J. Dean and S. Ghemawat, MapReduce: a flexible data processing tool, Communications of the ACM, 53(1), 2010, 72–77.
  8. [8] A. Elmagarmid, P. Ipeirotis, and V. Verykios, Duplicate record detection: a survey, IEEE Transactions on Knowledge and Data Engineering, 19 (1), 2007, 1–16.
  9. [9] H. Kopcke and E. Rahm, Frameworks for entity matching: a comparison, Data and Knowledge Engineering, 69(2), 2010, 197–210.
  10. [10] P. Christen, T. Churches, and M. Hegland, Febrl c a parallel open source data linkage system, Lecture Notes in Computer Science, 3056, 2004, 638–647.
  11. [11] H. Sik Kim and D. Lee, Parallel linkage, Proc. 16th ACM Conf. on Information and Knowledge Management, Lisbon, Portugal, 2007, 283–292.
  12. [12] T. Kirsten, L. Kolb, M. Hartung, A. Gross, H. Köpcke, and E. Rahm, Data partitioning for parallel entity matching, 8th Int. Workshop on Quality in Databases, Singapore, 2010.
  13. [13] L. Kolb and A.T.E. Rahm, Multi-pass sorted neighborhood blocking with MapReduce, Computer Science – Research and Development, 27(1), 2012, 45–63.
  14. [14] J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters, Proc. 2004 Symp. on Operating System Design and Implementation, Seattle, WA, USA, 2004, 1–13.
  15. [15] M. Bhandarkar, MapReduce programming with Apache Hadoop, Proc. 2010 IEEE Int. Symp. on Parallel and Distributed Processing, Atlanta, GA, USA, 2010, 1.

Important Links:

Go Back