W. Liang (Australia) and J.X. Yu (PRC)
Materialized view incremental maintenance, data warehousing, partitioning, parallel algorithms, PC cluster
A data warehouse is a repository of integrated information that collects and maintains a large amount of data from multiple distributed, autonomous and possibly heterogeneous data sources. Often the data is stored in the form of materialized views in order to provide fast access to the integrated data. How to maintain the warehouse data completely consistent with the remote source data is a challenging issue, and transactions containing multiple updates at one or multiple sources further complicate this consistency issue. Due to the fact that a data warehouse usu ally contains a very large amount of data and its processing is time consuming, it becomes inevitable to introduce parallelism to data warehousing. The popularity and cost effective parallelism brought by the PC cluster makes it be come a promising platform for such purpose. In this paper the complete consistency maintenance of select-project-join (SPJ) materialized views is considered. Based on a PC cluster consisting of à personal computers, several parallel maintenance algorithms for the materialized views are presented. The key behind the proposed algorithms is how to tradeoff the work load among the PCs and how to balance the communications cost among the PCs as well between the PC cluster and remote sources.
Important Links:
Go Back