Workshop Data Science @ IBC 2016, 15/6/2016 – 14h-17h
Campus Saint Priest, Bat 5, 1/124, Institut de Biologie Computationnelle (http://www.ibc-montpellier.fr)
Organisé par: Esther.Pacitti@lirmm.fr
13h30 Café d’accueil
14h Introduction, Esther Pacitti
Equipe Zenith, Univ. Montpellier, Inria, LIRMM
Data Science: opportunities and risks
Patrick Valduriez
Equipe Zenith, Inria, Univ. Montpellier, LIRMM
Data has been quoted as the new oil, to reflect that big data can be turned into high-value information and new knowledge. Although data analysis has been around for a while, starting with statistics and evolving lately into exploratory data analysis, data mining and business intelligence, the new dimensions of big data (volume, variety, velocity, etc.) make it very hard to process and analyze data, and derive good conclusions. To address this grand challenge, data science is emerging as a new science that combines computer science, statistics and machine learning, visualization and human-computer interactions to collect, clean, integrate, analyze and visualize big data. The ultimate goal is to create new data products and services, as well as training legions of data scientists. In this talk, I will introduce data science, including big data and cloud technologies. I will also illustrate the main opportunities and risks, in particular by telling my favorite stories about the good, the bad and the ugly.
Fast data analytics for time series and other ordered data
Dennis Shasha
New York University and Inria (int. chair in Zenith)
The relational model is based on a single data type and a few operations: unordered tables which can be selected, projected, joined, and aggregated. This model is in fact unnecessary for simplicity and needlessly limits the expressive power, making it difficult to express query on ordered data such as time series data and other sequence data.
This talk presents a language for expressing ordered queries, optimization techniques and performance results. The talk goes on to present experiments comparing the system against other popular data analytic systems including Sybase IQ, Python’s popular Pandas library and MonetDB using a variety of benchmarks including the ones that those systems use themselves. On the same hardware, our system is faster.
Discussion