Title: A Declarative Approach to Modeling and Solving the View Selection Problem
Abstract: View selection is important in many data-intensive systems e.g., commercial database and data warehousing systems to improve query performance. View selection can be defined as the process of selecting a set of views to be materialized in order to optimize query evaluation. To support this process, different related issues have to be considered. Whenever a data source is changed, the materialized views built on it have to be maintained in order to compute up-to-date query results. Besides the view maintenance issue, each materialized view also requires additional storage space which must be taken into account when deciding which and how many views to materialize.
The problem of choosing which views to materialize that speed up incoming queries constrained by an additional storage overhead and/or maintenance costs, is known as the view selection problem. This is one of the most challenging problems in data warehousing and it is known to be a NP-complete problem. In a distributed environment, the view selection problem becomes more challenging. Indeed, it includes another issue which is to decide on which computer nodes the selected views should be materialized. The view selection problem in a distributed context is now additionally constrained by storage space capacities per computer node, maximum global maintenance costs and the communications cost between the computer nodes of the network.
In this work, we deal with the view selection problem in a centralized context as well as in a distributed setting. Our goal is to provide a novel and efficient approach in these contexts. For this purpose, we designed a solution using constraint programming which is known to be efficient for the resolution of NP-complete problems and a powerful method for modeling and solving combinatorial optimization problems. The originality of our approach is that it provides a clear separation between formulation and resolution of the problem. Indeed, the view selection problem is modeled as a constraint satisfaction problem in an easy and declarative way. Then, its resolution is performed automatically by the constraint solver. Furthermore, our approach is flexible and extensible, in that it can easily model and handle new constraints and new heuristic search strategies for optimization purpose.
The main contributions of this thesis are as follows. First, we define a framework that enables to have a better understanding of the problems we address in this thesis. We also analyze the state of the art in materialized view selection to review the existing methods by identifying respective potentials and limits. We then design a solution using constraint programming to address the view selection problem in a centralized context. Our performance experimentation results show that our approach has the ability to provide the best balance between the computing time to be required for finding the materialized views and the gain to be realized in query processing by materializing these views. Our approach will also guarantee to pick the optimal set of materialized views where no time limit is imposed. Finally, we extend our approach to provide a solution to the view selection problem when the latter is studied under multiple resource constraints in a distributed context. Based on our extensive performance evaluation, we show that our approach outperforms the genetic algorithm that has been designed for a distributed setting.