Zenith scientific seminar: Tristan Allard,”Privacy-Preserving Data Publishing using Secure Devices”, November 16, 2012.


Tristan Allard will present part of his Ph.D. thesis work on Privacy-Preserving Data Publishing on November 16, 2012, at 10:30 am. Location: Galéra, Room 127.

Title: ETAP : Revisiting Privacy-Preserving Data Publishing using Secure Devices.

Abstract:The goal of Privacy-Preserving Data Publishing (PPDP) is to generate a sanitized (i.e. harmless) view of sensitive personal data (e.g. a health survey), to be released to some agencies or simply the public. However, traditional PPDP practices all make the assumption that the process is run on a trusted central server. In this talk, I will argue that the trust assumption on the central server is far too strong, and overview METAP, a generic fully distributed protocol designed to execute various forms of PPDP algorithms on an asymmetric architecture composed of low power secure devices and a powerful but untrusted infrastructure. This work, currently under submission, is joint with Benjamin Nguyen and Philippe Pucheral.

Permanent link to this article: https://team.inria.fr/zenith/zenith-scientific-seminar-tristan-allardprivacy-preserving-data-publishing-using-secure-devices-november-16-2012/

Zenith scientific seminar: Imene Mami,”A Declarative Approach to Modeling and Solving the View Selection Problem”, November 9, 2012.

Imene will defend her Ph.D thesis on November 15. She will give a talk about the view selection problem on November 9 at 10:30 am, room G.127.

Title: A Declarative Approach to Modeling and Solving the View Selection Problem

Abstract: View selection is important in many data-intensive systems e.g., commercial database and data warehousing systems to improve query performance. View selection can be defined as the process of selecting a set of views to be materialized in order to optimize query evaluation. To support this process, different related issues have to be considered. Whenever a data source is changed, the materialized views built on it have to be maintained in order to compute up-to-date query results. Besides the view maintenance issue, each materialized view also requires additional storage space which must be taken into account when deciding which and how many views to materialize.
The problem of choosing which views to materialize that speed up incoming queries constrained by an additional storage overhead and/or maintenance costs, is known as the view selection problem. This is one of the most challenging problems in data warehousing and it is known to be a NP-complete problem. In a distributed environment, the view selection problem becomes more challenging. Indeed, it includes another issue which is to decide on which computer nodes the selected views should be materialized. The view selection problem in a distributed context is now additionally constrained by storage space capacities per computer node, maximum global maintenance costs and the communications cost between the computer nodes of the network.
In this work, we deal with the view selection problem in a centralized context as well as in a distributed setting. Our goal is to provide a novel and efficient approach in these contexts. For this purpose, we designed a solution using constraint programming which is known to be efficient for the resolution of NP-complete problems and a powerful method for modeling and solving combinatorial optimization problems. The originality of our approach is that it provides a clear separation between formulation and resolution of the problem. Indeed, the view selection problem is modeled as a constraint satisfaction problem in an easy and declarative way. Then, its resolution is performed automatically by the constraint solver. Furthermore, our approach is flexible and extensible, in that it can easily model and handle new constraints and new heuristic search strategies for optimization purpose.
The main contributions of this thesis are as follows. First, we define a framework that enables to have a better understanding of the problems we address in this thesis. We also analyze the state of the art in materialized view selection to review the existing methods by identifying respective potentials and limits. We then design a solution using constraint programming to address the view selection problem in a centralized context. Our performance experimentation results show that our approach has the ability to provide the best balance between the computing time to be required for finding the materialized views and the gain to be realized in query processing by materializing these views. Our approach will also guarantee to pick the optimal set of materialized views where no time limit is imposed. Finally, we extend our approach to provide a solution to the view selection problem when the latter is studied under multiple resource constraints in a distributed context. Based on our extensive performance evaluation, we show that our approach outperforms the genetic algorithm that has been designed for a distributed setting.

Permanent link to this article: https://team.inria.fr/zenith/zenith-scientific-seminar-imene-mamia-declarative-approach-to-modeling-and-solving-the-view-selection-problem-november-9-2012/

Zenith scientific seminar: Florent Masseglia, “Mining Uncertain Data Streams”, October 17, 11am.

Florent Masseglia will present a recent work, done with Reza Akbarinia, about uncertain data stream mining on October 17 at 11am, room G.227.

Title: Mining Uncertain Data Streams.

Abstract: Dealing with uncertainty has gained increasing attention these past few years in both static and streaming data management and mining. There are many possible reasons for uncertainty, such as noise occurring when data are collected, noise injected for privacy reasons, semantics of the results of a search engine (often ambiguous),etc. Thus, many sensitive domains now involve massive uncertain data (including scientific applications). The problem is even more difficult for uncertain data streams where massive frequent updates need to be taken into account while respecting data stream constraints. In this context, discovering Probabilistic Frequent Itemsets (PFI) is very challenging since algorithms designed for deterministic data are not applicable.

In this talk, I will present our recent work with Reza Akbarinia on this topic. We propose FMU (Fast Mining of Uncertain data streams), the first solution for exact PFI mining in data streams with sliding windows. FMU allows updating the frequentness probability of an itemset whenever a transaction is added or removed from the observation window. Using these update operations, we are able to extract PFI in sliding windows with very low response times.

Permanent link to this article: https://team.inria.fr/zenith/zenith-scientific-seminar-florent-masseglia-mining-uncertain-data-streams-october-17-11am/

Zenith scientific seminar: Khalid Saleem, “Open Data Analytics – Research Perspectives”, September 19, 11am.

Before leaving our team, Khalid will give a synthetic presentation of his work about open data analytics during his stay, on September 19, at 11am, G.127. Title : Open Data Analytics – Research Perspectives

Title : Open Data Analytics – Research Perspectives

Abstract : According to a survey internet has grown to 98 peta bytes in 2011, comprising of web pages and raw data, with more than 2 billion web users. Although, the web page creation and access have been standardized over the years, the available data lacks such standards. Different terminologies are being used to tag the data; open, big or linked data. The contributed data is categorized as web scale and have a very high degree of format variance, thus making it very difficult to formalize a standard access technique. Based on these atypical data characteristics, data scientists are envisaging a new era of data analytics, requiring better algorithms and applications to deliver in-time benefits from this data.

The presentation explains the scenarios which help in typifying of data available on the web (open, big, linked), in different domains (Government, Science, Enterprise, Society). Secondly, we outline the open data characteristics and present a model framework, signifying the research domains related to open data analytics. The model can help the data scientists and the application developers in devising open data-driven real-time analytical tools. Alongside, examples of open data financial equity will also be highlighted.

Permanent link to this article: https://team.inria.fr/zenith/zenith-scientific-seminar-khalid-saleem-open-data-analytics-research-perspectives/

Zenith scientific seminar: Imene Mami “View Selection Under Multiple Resource Constraints in a Distributed Context”, August 27, 11:30am.

In a joint talk with Miguel Liroz (at 11am), Imene will present her recent work on view selection in a distributed context under resource constraints, room G.127 at 11:30.

Title: View Selection Under Multiple Resource Constraints in a Distributed Context

Abstract: The use of materialized views in commercial database systems and data warehousing systems is a common technique to improve the query performance. In past research, the view selection issue has essentially been investigated in the centralized context. In this paper, we address the view selection problem in a distributed scenario. We first extend the AND-OR view graph to capture the distributed features. Then, we propose a solution using constraint programming for modeling and solving the view selection problem under multiple resource constraints in a distributed context. Finally, we experimentally show that our approach provides better performance resulting from evaluating the quality of the solutions in terms of cost saving.

Permanent link to this article: https://team.inria.fr/zenith/zenith-scientific-seminar-imene-mami-view-selection-under-multiple-resource-constraints-in-a-distributed-context-august-27-1130am/

Zenith scientific seminar: Miguel Liroz, “Dynamic Workload-Based Partitioning for Large-Scale Databases”, August 27, 11am.

Miguel Liroz will present a recent work on large-scale databases paritioning in the next scientific seminar of Zenith, room G.127 at 11am. This will be a joint talk with Imene Mami (at 11:30).

Title: Dynamic Workload-Based Partitioning for Large-Scale Databases

Abstract: Applications with very large databases, where data items are continuously appended, are becoming more and more common. Thus, the development of efficient workload-based data partitioning is one of the main requirements to offer good performance to most of those applications that have complex access patterns, e.g. scientific applications. However, the existing workload-based approaches, which are executed in a static way, cannot be applied to very large databases. In this paper, we propose DynPart, a dynamic partitioning algorithm for continuously growing databases. DynPart efficiently adapts the data partitioning to the arrival of new data elements by taking into account the affinity of new data with queries and fragments. In contrast to existing static approaches, our approach offers a constant execution time, no matter the size of the database, while obtaining very good partitioning efficiency. We validated our solution through experimentation over real-world data; the results show its effectiveness.

Permanent link to this article: https://team.inria.fr/zenith/zenith-scientific-seminar-miguel-liroz-dynamic-workload-based-partitioning-for-large-scale-databases-august-27-11am/

Zenith scientific seminar: Alexis Joly, “Searching and Mining Big multimedia data: recent works, perspectives and application to botanical data management”, June 18, 2012

Alexis Joly will present a recent work on Searching and Mining Big multimedia data in this Zenith scientific seminar, in room G.127, at 10:30.

Abstract: NoSQL technologies started bridging the gap between information retrieval and data management technologies. In this context, content-based indexing and mining methods offer new perspectives towards managing large collections of unstructured and heterogeneous documents. In this talk, I will first present some of my recent works on high-dimensional data hashing and distributed KNN-search that were typically developed for content-based image retrieval applications. I will then introduce some ongoing works and more generic perspectives based on these technologies, notably in the context of botanical data management.

 

Permanent link to this article: https://team.inria.fr/zenith/zenith-scientific-seminar-alexis-joly-searching-and-mining-big-multimedia-data-recent-works-perspectives-and-application-to-botanical-data-management-june-18-2012/

Morgan&Claypool 2012  book now available

P2P Techniques for Decentralized Applications

Authors: Esther Pacitti, Reza Akbarinia, Manal El-Dick

See: http://www.morganclaypool.com/doi/abs/10.2200/S00414ED1V01Y201204DTM025

Permanent link to this article: https://team.inria.fr/zenith/morganclaypool-2012-book-now-available/

Zenith scientific seminar: Patrick Valduriez, “Principles of Distributed Data Management in 2020?”, April 4th 2012

Patrick Valduriez will present his talk about “Principles of Distributed Data Management in 2020?” on April 4th in our new building (Galéra) room 127.

Abstract: With the advents of high-speed networks, fast commodity hardware, and the web, distributed data sources have become ubiquitous. The third edition of the Özsu-Valduriez textbook Principles of Distributed Database Systems [1] reflects the evolution of distributed data management and distributed database systems. In this new edition, the fundamental principles of distributed data management could be still presented based on the three dimensions of earlier editions: distribution, heterogeneity and autonomy of the data sources. In retrospect, the focus on fundamental principles and generic techniques has been useful not only to understand and teach the material, but also to enable an infinite number of variations. The primary application of these generic techniques has been obviously for distributed and parallel DBMS versions. Continue reading

Permanent link to this article: https://team.inria.fr/zenith/zenith-scientific-seminar-patrick-valduriez-on-principles-of-distributed-data-management-in-2020/

Zenith scientific seminar: Fady Draidi, “P2P Recommendation for Large-scale Online Communities”, March 2nd, 2012

Fady will defend his thesis on March 9th. He will present his work in the first scientific seminar of Zenith on March 2nd (Lirmm, room E.223).

Abstract:
Recommendation systems (RS) and P2P are both complementary in easing large-scale data sharing: RS to filter and personalize users’ demands, and P2P to build decentralized large-scale data sharing systems. However, many challenges need to be overcome when building scalable, reliable and efficient RS atop P2P.
In this work, we focus on large-scale communities, where users rate the contents they explore, and store in their local workspace high quality content related to their topics of interest. Our goal then is to provide a novel and efficient P2P-RS for this context. Continue reading

Permanent link to this article: https://team.inria.fr/zenith/first-news/