From Databases to Data Science: impact on information systems
Junior Conference on Data Science and Engineering, Paris Saclay, 15 september 2016
Data has been quoted as the new oil, to reflect that big data can be turned into high-value information and new knowledge. Although data analysis has been around for a while, starting with statistics and evolving lately into exploratory data analysis, data mining and business intelligence, the new dimensions of big data (volume, variety, velocity, etc.) make it very hard to process and analyze data online, and derive good conclusions. In particular, relational DBMSs, which are at the heart of any information system, have been lately criticized for their “one size fits all” approach. Although they have been able to integrate support for all kinds of data (e.g., multimedia objects, XML and JSON documents and new functions), this has resulted in a loss of performance and flexibility for new data-intensive applications. To address this grand challenge, data science is emerging as a new science that combines data management, statistics and machine learning, visualization and human-computer interactions to collect, clean, integrate, analyze and visualize big data. The ultimate goal is to create new data products and services, as well as training legions of data scientists. In this talk, I will introduce data science, in relation to databases, and discuss its impact on information systems. I will also illustrate the main opportunities and risks, in particular by telling my favorite stories about the good, the bad and the ugly.