IBC and Zenith Seminar: Daniel de Oliveira “Parameter and Data Recommendation in Scientific Workflows based on Provenance”, 5 June 2018

IBC seminar (WP5): 5/6/2018, room 1.124, 14h

Organized by Zenith

Parameter and Data Recommendation in Scientific Workflows based on Provenance
Daniel de Oliveira

Fluminense Federal University
Rio de Janeiro, Brazil

Abstract: A growing number of data- and compute-intensive experiments have been modeled as scientific workflows in the last years. Such experiments are commonly executed several types varying parameters and input data files since the comparing method plays an important role in scientific research. As the complexity of the experiments and the volume of input and intermediate data increase, scientists have to spend much time defining parameter values and data files to be processed in such experiments. This talk discusses the problem of identifying suitable parameter values and data files for an experiment and then recommending them for the scientist. We present a novel method to make recommendations for scientists. This method is based on data captured from previous executions of the workflow and machine learning algorithms. Our experiments show that, the recommended data files and parameters do a good job in helping scientists to execute workflow successfully.

Permanent link to this article: https://team.inria.fr/zenith/ibc-and-zenith-seminar-daniel-de-oliveira-parameter-and-data-recommendation-in-scientific-workflows-based-on-provenance-5-june-2018/

Zenith seminar: Patrick Valduriez “Blockchain 2.0: opportunities and risks” 19 oct 2018

Séminaire Zenith: vendredi 19 octobre 2018, 11h
BAT5-01.124

Blockchain 2.0: opportunities and risks
Patrick Valduriez
Zenith, Inria & LIRMM

Popularized by bitcoin and other digital currencies, the blockchain has the potential to revolutionize our economic and social systems.  Blockchain was invented for bitcoin to solve the double spending problem of previous digital currencies without the need of a trusted, central authority. The original blockchain is a public, distributed ledger that can record and share transactions among a number of computers in a secure and permanent way. It is a complex distributed database infrastructure, combining several technologies such as P2P, data replication, consensus protocols and cryptography.

The term Blockchain 2.0 refers to new applications of the blockchain to go beyond transactions and enable exchange of assets without powerful intermediaries. Examples of applications are smart contracts, persistent digital ids, intellectual property rights, blogging, voting, reputation, etc. Blockchain 2.0  could dramatically cut down transaction costs, by automating operations and removing intermediaries. It could allow people to monetize their own information and creators of intellectual property to be properly compensated. The potential impact on society is also huge, as excluded people could join the global economy, e.g. by having digital bank accounts for free.

In this talk, I will introduce Blockchain 2.0 technologies and applications, and discuss the opportunities and risks. In developing countries, for instance, the lack of existing infrastructure and regulation may be a chance to embrace the blockchain revolution and leapfrog traditional solutions. But there are also risks, related to regulation, security, privacy, or integration with existing practice, which must be well understood and addressed.

Permanent link to this article: https://team.inria.fr/zenith/zenith-seminar-patrick-valduriez-blockchain-2-0-opportunities-and-risks-19-oct-2018/

Zenith seminar: Mathieu Fontaine “Alpha-stable process for signal processing” 20 sept 2018

Séminaire Zenith: jeudi 20 septembre 2018, 11h
BAT5-01.124, Campus Saint Priest

Alpha-stable process for signal processing
Mathieu Fontaine
Zenith, Inria & LIRMM

The scientific topic of sound source separation (SSS) aims at decomposing audio signals into their constitutive components, e.g. separate the main singer voice from the background music or from the background noise. In the case of very old and degraded historical recordings, SSS strongly extends classical denoising methods by being able to account for complex signal or noise patterns and achieve efficient separation where traditional approaches fail.
Alpha-stable processes enjoy important mathematical challenges, efficient filtering applications and computational efficiency. This presentation targets at studying these models from a theoretical point of view, for the purpose of extending them in several directions : audio source localization, theoretical research in multichannel scenarios and restoring old historical recordings.

Permanent link to this article: https://team.inria.fr/zenith/zenith-seminar-mathieu-fontaine-alpha-stable-process-for-signal-processing-20-sept-2018/

IBC seminar: Dennis Shasha “Reducing Errors by Refusing to Guess (Occasionally)” 1 june 2018.

Séminaire IBC, organisé par  Zenith
Vendredi 1er juin 2018, 14h
Salle des séminaire, Bat. 4, LIRMM

SafePredict: Reducing Errors by Refusing to Guess (Occasionally)
Dennis Shasha
Courant Institute, New York University

We propose a meta-algorithm to reduce the error rate of state-of-the-art machine learning algorithms by refusing to make predictions in certain cases even when the underlying algorithms suggest predictions. Intuitively, our SafePredict approach estimates the likelihood that a prediction will be in error and when that likelihood is high, the approach refuses to go along with that prediction. Unlike other approaches, we can probabilistically guarantee an error rate on predictions we do make (denoted the {\em decisive predictions}). Empirically on seven diverse data sets from genomics, ecology, image-recognition, and gaming,, our method can probabilistically guarantee to reduce the error rate to 1/4 of what it is in the state-of-the-art machine learning algorithm at a cost of between 11% and 58% refusals. Competing state-of-the-art methods refuse at roughly twice the rate  of ours (sometimes refusing all suggested predictions).

Short bio

Dennis Shasha is a Julius Silver Professor of computer science at the Courant Institute of New York University and an Associate Director of NYU Wireless. He works on meta-algorithms for machine learning to achieve guaranteed correctness rates, with biologists on pattern discovery for network inference; with computational chemists on algorithms for protein design; with physicists and financial people on algorithms for time series; on clocked computation for DNA computing; and on  computational reproducibility. Other areas of interest include database tuning as well as tree and graph matching. Because he likes to type, he has written six books of puzzles about a mathematical detective named Dr. Ecco, a biography about great computer scientists, and a book about the future of computing. He has also written five technical books about database tuning, biological pattern recognition, time series, DNA computing, resampling statistics,  and causal inference in molecular networks. He has co-authored over eighty journal papers, seventy conference papers, and twenty-five patents. He has written the puzzle column for various publications including Scientific American, Dr. Dobb’s Journal, and the Communications of the ACM. He is a fellow of the ACM and an INRIA International Chair.

Permanent link to this article: https://team.inria.fr/zenith/ibc-seminar-dennis-shasha-reducing-errors-by-refusing-to-guess-occasionally-1-june-2018/

PhD/postdoc positions in Machine Learning and Big Data

Zenith  is proposing a PhD position and a postdoc position on machine learning and big data, with Antoine Liutkus and Patrick Valduriez as advisors.

The successful candidates would work with us at Inria offices in Montpellier on: learning parameters models in big data, with applications to audio data analysis and processing.

Main research themes:
. Parallelization, distributed computing
. Probabilistic models, inference, sketching
. Deep learning
. Audio processing

The programme is very selective and a good publication track is required. Foreigners are strongly encouraged to apply, because the funding promotes mobility.

Details:
. PhD position
. Postdoc position

 

Permanent link to this article: https://team.inria.fr/zenith/phdpost-positions-in-machine-learning-and-big-data/

Journée Droit de l’Internet: la blockchain, Montpellier, Vendredi 2 mars 2018

Zenith participe à la Journée Droit de l’Internet: la blockchain, Faculté de Droit et de Science Politique, Montpellier, Vendredi 2 mars 2018.

Programme

 

 

Permanent link to this article: https://team.inria.fr/zenith/journee-droit-de-linternet-la-blockchain-montpellier-vendredi-2-mars-2018/

IBC seminar: Themis Palpanas “Data Series Management: Fulfilling the Need for Big Sequence Analytics” 19 jan. 2018

Séminaire IBC, organisé par  Zenith
Lundi 19 mars 2018, 11h
Salle 1/124, Bat. 5

Data Series Management: Fulfilling the Need for Big Sequence Analytics
Themis Palpanas
IUF et Université Paris Descartes

There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to index and mine very large collections of sequences, or data series. Examples of such applications come from a multitude of social and scientific domains, including biology, where high-throughput sequencing is generating massive sequence collections. It is not unusual for these applications to involve numbers of data series in the order of hundreds of millions to billions, which are often times not analyzed in their full detail due to their sheer size. However, no existing data management solution (such as relational databases, column stores, array databases, and time series management systems) can offer native support for sequences and the corresponding operators necessary for complex analytics.
In this talk, we argue for the need to study the theory and foundations for sequence management of big data sequences, and to build corresponding systems that will enable scalable management and analysis of very large sequence collections. We describe recent efforts in designing techniques for indexing and mining truly massive collections of data series that will enable scientists to easily analyze their data. We discuss novel techniques that adaptively create data series indexes, allowing users to correctly answer queries before the indexing task is finished. Finally, we present our vision for the future in big sequence management research, including the promising directions in terms of storage, distributed processing, and query benchmarks.

short bio
———
Themis Palpanas is Senior Member of the Institut Universitaire de France (IUF), a distinction that recognizes excellence across all academic disciplines, and professor of computer science at the Paris Descartes University (France), where he is director of diNo, the data management group. He received the BS degree from the National Technical University of Athens, Greece, and the MSc and PhD degrees from the University of Toronto, Canada. He has previously held positions at the University of Trento, and at IBM T.J. Watson Research Center, and visited Microsoft Research, and the IBM Almaden Research Center.
His interests include problems related to data science (big data analytics and machine learning applications). He is the author of nine US patents, three of which have been implemented in world-leading commercial data management products. He is the recipient of three Best Paper awards, and the IBM Shared University Research (SUR) Award.
He is curently serving on the VLDB Endowment Board of Trustees, as an Editor in Chief for the BDR Journal, Associate Editor for VLDB 2019, Associate Editor in the TKDE, and IDA journals, as well as on the Editorial Advisory Board of the IS journal, and the Editorial Board of the TLDKS Journal. He has served as General Chair for VLDB 2013, Associate Editor for VLDB 2017, and Workshop Chair for EDBT 2016, ADBIS 2013 and ADBIS 2014, General Chair for the PDA@IOT International Workshop (in conjunction with VLDB 2014), and General Chair for the Event Processing Symposium 2009.

Permanent link to this article: https://team.inria.fr/zenith/ibc-seminar-themis-palpanas-data-series-management-fulfilling-the-need-for-big-sequence-analytics-19-jan-2018/

Journée d’étude Méthode, Intégrité Scientifique & Données, 16 février 2018, Montpellier.

Zenith participe à la Journée d’étude Méthode, Intégrité Scientifique & Données, Vendredi 16 février 2018, MSH SUD, Site Saint Charles 2, Montpellier.

 

Permanent link to this article: https://team.inria.fr/zenith/zenith-participe-a-la-journee-detude-methode-integrite-scientifique-donnees-vendredi-16-fevrier-2018-msh-sud-site-saint-charles-2-montpellier/

Zenith Seminar: Vitor Silva “A methodology for capturing and analyzing dataflow paths in computational simulations” 31 jan. 2018

Mercredi 31 janvier, 11h, Salle 2/124

A methodology for capturing and analyzing dataflow paths in computational simulations
Vitor Silva, COPPE/UFRJ, Rio de Janeiro

Scientific applications in large-scale are based on the execution of complex computational models in a specific field of the science. Moreover, a huge volume of scientific data is commonly generated and stored in data sources, which can be raw data files or in-memory data structures. In this context, domain specialists often need to analyze part of these scientific data to validate their scientific hypotheses. Besides the analysis of single data sources, they also need to relate scientific data from different data sources and to perform analysis during the execution of scientific application, since it may take days or weeks, even in high performance computing environments. Therefore, it is important a solution that enables scientific and provenance data extraction (for providing dataflow monitoring) and online dataflow analysis support. According to this exploratory scientific data analysis scenario, we propose a methodology for capturing and analyzing dataflow paths from scientific applications based on the modeling of the dataflow, scientific data, and queries.

Permanent link to this article: https://team.inria.fr/zenith/zenith-seminar-vitor-silva-a-methodology-for-capturing-and-analyzing-dataflow-paths-in-computational-simulations-31-jan-2018/

Zenith Seminar: Christophe Godin “Can we Manipulate Tree-forms like Numbers ?” 7 dec. 2017

Can we manipulate tree-forms like numbers ?
Christophe Godin, Inria

Thursday 7 December at 14h30

Salle des séminaires, Bat. 4

Abstract: Tree-forms are ubiquitous in nature and recent observation technologies make it increasingly easy to capture their details, as well as the dynamics of their development, in 3 dimensions, with unprecedented accuracy. These massive and complex structural data raise new conceptual and computational issues related to their analysis and to the quantification of their variability. Mathematical and computational techniques that usually successfully apply to traditional scalar or vectorial datasets fail to apply to such structural objects: How to define the average form of a set of tree-forms ? how to compare and classify tree-forms ? Can we solve efficiently optimization problems in tree-form spaces ? how to approximate tree-forms ? Can their intrinsic exponential computational curse be circumvented ? In this talk, I will present a recent work that we have made with my colleague Romain Azais to approach these questions from a new perspective, in which tree-forms show properties similar to that of numbers or real functions: they can be decomposed, approximated, averaged, transformed in dual spaces where specific computations can be carried out more efficiently. I will discuss how these first results can be applied to the analysis and simulation of tree-forms in developmental biology

Permanent link to this article: https://team.inria.fr/zenith/zenith-seminar-christophe-godin-can-we-manipulate-tree-forms-like-numbers-7-dec-2017/