News – Page 11 – Scientific Data Management

Apr 28

PhD Position “Privacy Preserving Query Processing in Decentralized Online Social Networks”, April 28, 2015

Filed under Jobs
April 28, 2015

PhD Position: Privacy Preserving Query Processing in Decentralized Online Social Networks

Topic

We propose a PhD position (CIFRE) which will be done in the context of collaboration between INRIA Zenith team (https://team.inria.fr/zenith/) and MatchUpBox startup (http://matchupbox.com/) that is developing a P2P social network application whose objective is to enable users to share their content while preserving the sensible data from unauthorized access.

Recently, there has been a growing research interest in extending privacy preserving techniques for social networks. One of the main challenges is to develop practical solutions that protect the users’ sensitive data while offering a good degree of utility for the queries executed over the published data.

The objective of this PhD thesis is to provide a query processing service that allows efficient evaluation of OLAP (Online analytical processing) queries without disclosing any information about individual data. An example of such queries is the following: given a topic, what is the percentage of the users interested in the topic? For realizing such a service, we need a distributed architecture allowing us to control the queries asked on each user data, and to block the answers if the privacy restrictions are not respected. The queries should be executed in a fully decentralized way, and the communications between the nodes should be secure. We also need to take into account the particular characteristics of the P2P architectures, especially the dynamic behavior of peers that can leave the system any time.

Missions and activities

The tasks to be realized are the following:

Investigating the state-of-the-art approaches for privacy preservation in social networks.
Proposing a distributed architecture for the service of privacy preserving query processing.
Proposing efficient techniques for some important OLAP queries which will be used in the application.
Implementing a prototype of the proposed service.
Evaluating the prototype in a distributed platform, such as Grid5000.

Skills and profiles

We will study all applications carefully. But, an ideal applicant may have the following skills:

Experience in privacy preserving methods (or cryptography).
Strong knowledge of C++ or Java.

Environment

The Zenith project-team of INRIA (https://team.inria.fr/zenith/), headed by Patrick Valduriez, aims to propose new solutions related to distributed data management. The research topics of Zenith include: parallel processing of massive data, recommendation in social networks, privacy preservation data analysis, probabilistic data management, etc.

MatchUpBox (http://matchupbox.com/index.html) is an innovative startup specialized in data security and privacy. It is developing a social network that helps the users regain the control of their data and to interact safely on the Internet. The uses are anonymous any time they are connected. All the content the users put to the social network provided by MatchUpBox is protected and only accessible to the owner and those whom he/she authorizes.

Both Zenith and MatchUpBox are located in Montpellier that is a very active town located in south of France.

How to apply

To apply for this PhD position, please send your CV and a motivation letter to the following email addresses:

Reza Akbarinia (reza.akbarinia@inria.fr)
Esther Pacitti (esther.pacitti@lirmm.fr)
Jorick Lartigau (jorick.lartigau@matchupbox.com)

Permanent link to this article: https://team.inria.fr/zenith/privacy-preserving-query-processing-in-decentralized-online-social-networks/

Apr 24

Morgenstern seminar: Dennis Shasha (NYU) “The Changing Nature of Invention in Computer Science”

Filed under Seminars, Slider News
April 24, 2015

Colloquium Jacques Morgenstern, Jeudi 2 avril, 11h, Amphithéatre Gilles Kahn, Inria Sophia Antipolis.

The Changing Nature of Invention in Computer Science

Dennis Shasha

Courant Institute of Mathematical Sciences, New York University

Zenith team, Inria, LIRMM & IBC, Montpellier

What drives inventions in computing? Necessity seems to play only a minor role. Anger at the way things are is much more powerful, because it leads to easier ways to work (the invention of new computer languages). A general dissatisfaction with the practical or theoretical structure of the world can open up whole new approaches to problems (complexity theory and cryptography). Finally, a genuine collaboration between people and machines can lead to an entirely new kind of engineering for devices that will travel to far-off planets or to hostile environments. The talk will discuss the work of several inventors in computing and engineering, their inventions, and how they came up with them and how they plan to come up with more in the future. The ensuing discussion will address the fundamental nature of invention in a world partly populated by intelligent machines.

Permanent link to this article: https://team.inria.fr/zenith/morgenstern-seminar-the-changing-nature-of-invention-in-computer-science-by-dennis-shasha-april-2-2015/

Apr 24

Inria Sophia-Antipolis seminar: Dennis Shasha (NYU) “Group Testing to Describe Causality in Gene Networks”

Filed under Seminars, Slider News
April 24, 2015

Séminaire 1: mercredi 1er avril 2015, 15h, Salle Euler bleu, Inria Sophia-Antipolis

“Group Testing to Describe Causality in Gene Networks”

Dennis Shasha Courant Institute of Mathematical Sciences, New York University Zenith team, Inria, LIRMM & IBC, Montpellier

Genomics is essentially the study of network effects among genes. A typical outcome of a genomic study will be some experimental proof of one more causal relationships of the form induction ingene X causes repression in gene Y. A typical way to establish such connections are to knock out genes and see which other genes are affected. Alternatively, single genes can be over-expressed. The advent of CRISPR allows the suppression or over-expression of several genes at once. The question then is how to make use of such technology to discover causal links more efficiently. We have found a way to use combinatorial group testing for this purposes. The talk explains the algorithm and validates it on the DREAM simulator for genomic networks.

Permanent link to this article: https://team.inria.fr/zenith/seminar-group-testing-to-describe-causality-in-gene-networks-by-dennis-shasha-april-1-2015/

Apr 20

Prix Turing : les bases de données à l’honneur via le Blog Binaire du Monde, 20 avril 2015

Filed under Slider News
April 20, 2015

Le billet de Patrick Valduriez pour le blog binaire (journal Le Monde), sur le prix Turing de cette année : Michael Stonebraker, chercheur au Massachusetts Institute of Technology (USA), vient de remporter le prestigieux Prix Turing de l’ACM, souvent considéré comme « le prix Nobel de l’informatique ». Dans son annonce du 25 mars 2015, l’ACM précise que Stonebraker « a inventé de nombreux concepts qui sont utilisés dans presque tous les systèmes de bases de données modernes… ». Cette reconnaissance au plus haut niveau international me donne l’occasion de donner un éclairage sur la place singulière du domaine de la gestion de données dans la recherche en informatique. Lire la suite sur le blog binaire, dans l’article sur le prix Turing de Michael Stonebreaker. crédit photo M. Scott Brauer

Permanent link to this article: https://team.inria.fr/zenith/prix-turing-2015-les-bases-de-donnees-a-lhonneur-via-le-blog-binaire-du-monde/

Apr 09

PhD position “Predictive Big Data Analytics: continuous and dynamically adaptable data request and processing”, April 9, 2015

Filed under Jobs
April 9, 2015

PhD position available at Inria.

Location:Sophia Antipolis

Funding:Labex UCN@SOPHIA, (accepted)

Title: Predictive Big Data Analytics: continuous and dynamically adaptable data request and processing

Main advisor:Francoise Baude (PR), Scale team CNRS I3S

Mail: Francoise.baude@unice.fr Web page:http://www-sop.inria.fr/members/francoise.baude/

Co-advisor: Didier Parigot (HDR), Zenith team, INRIA CRISAM

Mail: Didier.Parigot@inria.fr Web page: http://www-sop.inria.fr/members/Didier.Parigot/

Keywords: Large scale distribution, Data Stream processing, real-time data analytics, Component Model.

Application: The candidate should have a background in large scale data management, component model and be proficient in English. Send us a detailed CV, including a complete bibliography and recommendation letters.

Contacts:Francoise.baude@unice.fr or Didier.Parigot@inria.fr

Context:

As the popularity of Big Data explodes, more and more use cases are implemented using this kind of technologies. But there are some use cases that are not properly tackled by classic Big Data models and platforms like Apache Hadoop MapReduce because of these models intrinsic batch nature. These cases are those where online processing of new data is required as soon as they enter the system, in order to aggregate to the current analysis results the newest information extracted from these incoming data. Such on-line and continuous processing pertains to what is known as continuous query and trigger in the more focused context of databases [1][2], or also as complex event processing in publish-subscribe systems [3]. More generally, processing the incoming data is known as Data Stream processing, and in the big data area is known as real-time data analytics.

Social networks are nowadays the medium of choice for data delivery among end-users, be it in public circles or in more private spheres as professional dedicated networks. Analysing, understanding, recommending, rating all the vast amount of data of various kinds (text, images, videos, etc) is a feature more and more required to be supported by the underlying systems of those social networks. In particular, the Beepers startup associated with the Zenith team, is working towards an alerting tool to be part of their offered toolbox for building a social network. The goal of this joint PhD is partly motivated by this perspective as deploying an alert pertains to a persistent -possibly sophisticated- query and thus a data streaming program. Thanks to the partnership established through this common PhD research, we want to offer Beepers as an added-value, the capability for the query to dynamically evolve. The motivation for this evolution is to better stick to the discovery of a trend, a sentiment, an opinion, etc. that is emerging as extracted by the continuous data flow analytics.

Indeed, some situations have the need of what could be named anticipatory analytics: given gathered data originating from various sources and combined to get meaningful information out of them, the goal is to adapt the current analytics in such a way that it can match to the anticipated coming situation, somehow ahead of time. For instance, doing short-term weather forecast for local places : if suddenly the speed of the wind increases and changes direction while intense rain falls, there is a need for (1) updating the short-term weather predictions, but also for (2) deploying appropriate supervision of the now-in-danger zone, so that, in case of a flooding risk of the new targeted zone, assuming the new zone is an inhabited one, the system gets able to trigger alerts towards the right actors: if now a flooding can reach the hospital, evacuation effectively should start, whereas it was not necessary few minutes before when wind was still soft and given its direction the risky zone was a field full of corn; now, supervising also electricity delivery is required to make sure any blackout during evacuation risk can be anticipated. Another example of anticipated analytics is in multimedia stream processing: for example, a data journalist that is extracting data from its preferred social network and runs analytics on videos to learn about happenings in a specific city area prior to writing its press article, suspects from the broadcasted content to recognize someone possibly involved in a crime scene shown in a concomitant published video; and, accordingly, decides to adapt its current analytics to combine the two information sources around some common data which is the relatives whose accommodation lies in the specific city area, and continues its search focusing now on these relatives acts while still monitoring the initially recognized suspect.

From these exemplary scenarios, we clearly foresee that one trend in big data technologies is in predictive analytics [16][17][18] which by nature requires a strong capability of dynamic adaptation given partial already gained results. In this trend, our claim is that the analytics require to adapt (even better self-adapt) to what is happening, given of course some previously user-defined rules dictating which adaptation it could be relevant to decide to trigger.

Work:

Several big data platforms geared at real-time analytics have emerged recently: Spark Streaming [4], Twitter’s Storm, S4[5]. These platforms allow one to define a program as taking eventually, after a compilation process, the form of a DAG (directed acyclic graph) but to our knowledge, none allows to adapt the program at runtime with respect to its functional/business nature. This is because these languages such as StreamSQL, CQL, StreamIt, and IBM’s SPL [7] are generally declarative. As a result, developers focus on expressing data processing logic, but not orchestrating runtime adaptations (i.e. in which situation functional adaptation should happen, how to monitor the adaptation needs, how to effectively modify the program without redeploying a new one from scratch) . Some data workflow engines start to appear in the community of large-scale stream processing but without yet an adequate behavioural/functional adaptation capability, as the so far only focus was to allow adaptation to face requirements for non-functional changes dictated by scalability, fault-tolerance, performance needs through node replication or elasticity [6]. From our literature survey done so far, only [8] sketches a solution for functional adaptivity in stream processing languages of big data platforms, even if some anterior works on stream platforms or active databases are a good starting point [9].

As a result from some years of research in the Scale team, Grid Component Model (GCM) [10] is a component model for applications to be run on distributed infrastructures, that extends the Fractal component model. Also, Zenith’ research have resulted in strong competences in component oriented platforms featuring high dynamicity like SON [11] (and that the co-supervised PhD student could also take advantage from). Fractal defines a component model where components can be hierarchically organized, reconfigured, and controlled offering functional server interfaces and requiring client interfaces. GCM extends that model providing to the components the possibility to be remotely located, distributed, parallel, and deployed in a large-scale computing environment (cluster, grid, cloud, etc), and adding collective communications (multicast and gathercast interfaces). Autonomic capabilities can be expressed in the membranes of GCM components, that can drive their reconfiguration at functional level, in a totally autonomous manner [10]. If the DAG of a streaming application translates into a component oriented program, then it can naturally benefit from its intrinsic reconfiguration properties. This is one of the expected research question to be addressed in the scope of this PhD: how to benefit from autonomic, high-expressivity, clear functional versus non-functional separation of concerns features of the GCM component-oriented approach in order to support dynamic adaptation of the analytics the streaming application corresponds to.

Work:

Towards the global goal of designing a working solution for anticipated big data analytics, the development of the PhD could be organized along the following guidelines :

Investigate on the stream processing languages (mainly Domain Specific Languages) that can be used atop of GCM composition framework, and extended to express adaptability functionalities. An approach developed in [15] relying upon data flow analysis of the DSL code could be reused to infer the GCM-based graph. Extend the GCM model and runtime to be stream processing aware
Implement the framework to be able to write adaptable stream processing workflows relying in fine on GCM and on SON( to handle the needed dynamic code deployment features).
Study existing and emerging stream processing platforms (that are mainly open source and supported by Apache) and select or extend the most appropriate one to plug to it the GCM-based dynamic reconfiguration of analytics feature and new DSL; alternatively but more ambitious, if existing platforms extensibility is not easy, develop a complete GCM-based big data real-time analytics solution, and make sure it can interact with existing big data providers (famous public social networks, as twitter, facebook, or dedicated social networks as built with the Beepers technology)
Benchmark and test on relevant big data analytics use-cases, extracting relevant information from social networks contents (with semantic annotation) like images, videos, text; applying to these contents for instance sentimental analysis to predict future situations [12], and accordingly adjust recommendations provided to the users of the social network [13].

Complementarity and perspectives

This research proposal is at the crossing of middleware for large-scale platforms working on large data volume, languages (including DSLs), data mining, social networking. Françoise Baude from SCALE team is an expert in runtime and middleware for distributed languages, and a recent EU project, PLAY, allowed her to gain experience in situational-awareness through complex event processing and supporting publish/subscribe platforms for web-semantic described events [14]. Didier Parigot from Zenith is addressing DSLs, data flow models, and also middleware and databases [15] to support social networking, through the iLab collaboration with Beepers. Overall, the two teams share a common background, while having complementary assets applicable to the emerging technologies for big data in general. The two teams have here a first opportunity to collaborate. They also have a strong willingness proved by past and current efforts, to transfer research results towards the industry, which has a rising interest in data analytics supporting solutions. This common PhD research stands up as a nice catalyst and opportunity to demonstrate usability and relevance of past developed platforms (ProActive/GCM, and SON respectively, that are going to be used in complementary aspects), on this exciting emerging area of adaptable big data analytics.

Innovation potential:

Due to the applied nature of the research there are obvious perspectives of valorization as a “product” handled by existing or to come SMEs of the Sophia-Antipolis ecosystem. Consequently, the candidate may apply for an ICT Labs Doctoral Training Center additional funding. Indeed, the Labex@UCN has been labelled as the foundation for the newly funded Doctoral Training Center in Sophia-Antipolis from 2015. In this context, six additional months of PhD funding, plus an Innovation & Entrepreneurship curriculum, and the opportunities to evaluate the project in other ICT Labs ecosystems are allocated. There are at least two natural ICT Labs collaborating ecosystems that are strongly engaged in Big Data analytics efforts, which consequently deserve our attention. These are:

Technische Universität Berlin (TUB), which coordinates the Berlin Big Data Center (BBDC), a national center on Big Data recently established by the German Federal Ministry of Education and Research (BMBF). TUB established the Data Analytics Laboratory (DAL) in 2011 to serve as a focal point for innovative research. TUB is the birthplace of one of the leading open-source big data analytics platform called Stratosphere, now Apache Flink, with an active worldwide user community. This is one of the possible systems that could be extended by the thesis work, to provide adaptive analytics.
The University of Trento (UNITN), an associated partner of EIT ICT labs Trento CLC, as the Politechnico de Milano. Both also offer an ecosystem active in big data analytics solutions. Polimi and its ecosystem, benefit from the expertise of the researchers working in the Collaborative Innovation Center on Big Data (CIC) jointly developed with IBM.

References :

[1] Shivnath Babu and Jennifer Widom. 2001. Continuous queries over data streams. SIGMOD Rec. 30, 3 (September 2001), 109-120

[2] Scheuermann, Peter, and Goce Trajcevski. “Active Database Systems.” Wiley Encyclopedia of Computer Science and Engineering (2008).

[3] Eugene Wu, Yanlei Diao, and Shariq Rizvi. 2006. High-performance complex event processing over streams. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data (SIGMOD ’06)

[4]Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized streams: fault-tolerant streaming computation at scale. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP ’13)

[5] Neumeyer, Leonardo, et al. “S4: Distributed stream computing platform.” Data Mining Workshops (ICDMW), 2010 IEEE International Conference on. IEEE, 2010.

[6] Raphaël Barazzutti, Thomas Heinze, Andre Martin, Emanuel Onica, Pascal Felber, Christof Fetzer, Zbigniew Jerzak, Marcelo Pasin, Etienne Riviere: Elastic Scaling of a High-Throughput Content-Based Publish/Subscribe Engine. ICDCS 2014: 567-576

[7] Martin Hirzel, Henrique Andrade, Bugra Gedik, Vibhore Kumar, Giuliano Losa, Mark Mendell, Howard Nasgaard, Rboert Soulé, Kun-Lung Wu – SPL Stream Processing Language Specification – IBM, 2009

[8] Gabriela Jacques-Silva, Bugra Gedik, Rohit Wagle, Kun-Lung Wu, Vibhore Kumar – Building User-defined Runtime Adaptation Routines for Stream Processing Applications – The Very Large Data Base Endowment Journal (VLDB), 2012

[9] Trajcevski, Goce, et al. “Evolving triggers for dynamic environments.” Advances in Database Technology-EDBT 2006. Springer Berlin Heidelberg, 2006. 1039-1048.

[10] F. Baude, L. Henrio, C. Ruz Programming Distributed and Adaptable Autonomous Components – the GCM/ProActive Framework Software: Practice and Experience, Wiley, In Press, 2015

[11] Ayoub Ait Lahcen, Didier Parigot, “A Lightweight Middleware for Developing P2P Applications with Component and Service-Based Principles”, CSE, 2012, 2012 IEEE 15th International Conference on Computational Science and Engineering (CSE 2012),

[12] Asur, Sitaram, and Bernardo A. Huberman. “Predicting the future with social media.” Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on. Vol. 1. IEEE, 2010.

[13] Fady Draidi, Esther Pacitti, Didier Parigot, and Guillaume Verger. 2011. P2Prec: a social-based P2P recommendation system. In Proceedings of the 20th ACM international conference on Information and knowledge management (CIKM ’11)

[14] N. Stojanovic, R. Stühmer, F. Baude, P. Gibert. Tutorial: Where Event Processing Grand Challenge meets Real-time Web: PLAY Event Marketplace DEBS’12, the 6th ACM International conference on Distributed Event-based system, ACM, 2012, p. 341-352 July 2012.

[15]Ayoub Ait Lahcen, Developing Component-Based Applications with a Data-Centric Approach and within a Service-Oriented P2P Architecture: Specification, Analysis and Middleware, PhD thesis co-supervised by D. Parigot, Dec 2012

[16] Boulos, Maged N. Kamel, et al. “Social Web mining and exploitation for serious applications: Technosocial Predictive Analytics and related technologies for public health, environmental and national security surveillance.” Computer Methods and Programs in Biomedicine 100.1 (2010): 16-23.

[17]Lozada, Brian A. “The Emerging Technology of Predictive Analytics: Implications for Homeland Security.” Information Security Journal: A Global Perspective 23.3 (2014): 118-122.

[18]Doyle, Andy, et al. “The EMBERS architecture for streaming predictive analytics.” Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 2014.

Permanent link to this article: https://team.inria.fr/zenith/predictive-big-data-analytics-continuous-and-dynamically-adaptable-data-request-and-processing/

Mar 17

IBC seminar: Marta Mattoso (UFRJ) “Exploratory Analysis of Raw Data Files through Dataflows”

Filed under Seminars, Slider News
March 17, 2015

Séminaire IBC, pôle données connaissances du LIRMM, Zenith Mardi 17 mars, 11h Salle 1/124, Campus Saint-Priest – Bâtiment 5 Exploratory Analysis of Raw Data Files through Dataflows* Marta Mattoso COPPE/UFRJ, Rio de Janeiro Scientific applications generate raw data files in very large scale. Most of these files follow a standard format established by the domain area application, like HDF5, NetCDF and FITS. These formats are supported by a variety of programming languages, libraries and programs. Since they are in large scale, analyzing these files require writing a specific program. Scientific data analysis systems like database management systems (DBMS) are not suited because of time consuming data loading, data transformation at large scale and legacy code incompatible with a DBMS. Recently there have been several proposals for indexing and querying raw data files without the overhead of using a DBMS. Systems like noDB, SDS and RAW offer query support to the raw data file after a scientific program has generated it. However, these solutions are focused on the analysis of one single large file. When a large number of files are all related and required to the evaluation of one scientific hypothesis, the relationships must be managed manually or by writing specific programs. In this talk we will discuss current approaches for raw data analysis and present our approach that combines DBMS and raw data analysis. It takes advantage of provenance database system support from scientific workflow management systems (SWfMS). When scientific applications are managed by SWfMS, the data that is being generated is registered along the provenance database. Therefore this provenance data may act as a description of theses files. When the SWfMS is dataflow aware and also registers selected data and pointers to domain data all in the same database. This resulting database becomes an important access path to the large number of files that are generated by the scientific workflow execution. This becomes a complementary approach to the single raw data file analysis support. *Joint work with Vitor Silva, Daniel Oliveira and Patrick Valduriez

Permanent link to this article: https://team.inria.fr/zenith/seminar-exploratory-analysis-of-raw-data-files-through-dataflows-by-marta-mattoso-march-17/

Mar 12

Zenith seminar: Aleksandra Levchenko “Integrating Big Data and Relational Data with a Functional SQL-like Query Language”

Filed under Seminars
March 12, 2015

Zenith seminar: Jeudi 12 mars, 11h, salle 1/124 bat 5, Campus Saint Priest Integrating Big Data and Relational Data with a Functional SQL-like Query Language(*) Aleksandra Levchenko Zenith, Inria and LIRMM, University of Montpellier, France Odessa National Polytechnic University, Ukraine Abstract Multistore systems have been recently proposed to provide integrated access to multiple, heterogeneous data stores through a single query engine. In particular, much attention is being paid on the integration of unstructured big data typically stored in HDFS with relational data. One main solution is to use a relational query engine that allows SQL-like queries to retrieve data from HDFS, which requires the system to provide a relational view of the unstructured data and hence is not always feasible. In this talk we introduce a functional SQL-like query language that can integrate data retrieved from different data stores and take full advantage of the functionality of the underlying data processing frameworks by allowing the ad-hoc usage of user defined map/filter/reduce operators in combination with traditional SQL statements. Furthermore, the query language allows for optimization by enabling subquery rewriting so that filter conditions can be pushed inside and executed at the data store as early as possible. Our approach is validated with two data stores and representative queries that demonstrate the usability of the query language and evaluate the benefits from query optimization. (*) Joint work with Carlyna Bondiombouy, Boyan Kolev and Patrick Valduriez

Permanent link to this article: https://team.inria.fr/zenith/zenith-working-seminar-aleksandra-levchenko-thrusday-12-march-11h/

Mar 06

IBC seminar: Dennis Shasha (NYU) “Statistics is Easy”

Filed under Seminars, Slider News
March 6, 2015

Séminaire IBC, pôle données connaissances LIRMM, Zenith Vendredi 6 mars à 14h Salle 2/22 Campus Saint-Priest – Bâtiment 5 860 rue de St Priest 34392 Montpellier Cedex 5 Statistics is Easy Dennis Shasha Courant Institute of Mathematical Sciences, New York University Inria, Zenith team, Montpellier Few people remember statistics with much love. To some, probability was fun because it felt combinatorial and logical (with potentially profitable applications to gambling), but statistics was a bunch of complicated formulas with counter-intuitive assumptions. As a result, if a practicing natural or social scientist must conduct an experiment, he or she can’t derive anything from first principles but instead pulls out some dusty statistics book and applies some formula or uses some software, hoping that the distribution assumptions allowing the use of that formula apply. To mimic a familiar phrase: “There are hacks, damn hacks, and there are statistics.” Surprisingly, a strong minority current of modern statistical theory offers the possibility of avoiding both the magic and the assumptions of classical statistical theory through randomization techniques known collectively as resampling. These techniques take a given sample and either create new samples by randomly selecting values from the given sample with replacement or by randomly shuffling labels on the data. The questions answered are the familiar: how accurate is my measurement likely to be (confidence interval) and could it have happened by mistake (significance). This talk explains the basic of resampling statistics through a number of simple-to-understand examples such as tossing coins, evaluating the effectiveness of drugs, and determining the sane reaction to a medical test result. The talk will be in French but the power points will be in English. It would be good if the participants could get the book before the lecture (should be freely downloadable if your library has an account at Morgan Claypool ): Statistics is Easy! Dennis Shasha and Manda Wilson Synthesis Lectures on Mathematics and Statistics, Morgan Claypool http://www.morganclaypool.com/doi/abs/10.2200/S00142ED1V01Y200807MAS001 Bio Dennis Shasha is a professor of computer science at the Courant Institute of Mathematical Sciences, a division of New York University. His current areas of research include work done with biologists on pattern discovery for microarrays, combinatorial design, network inference, and protein docking; work done with physicists, musicians, and professionals in finance on algorithms for time series; and work on database applications in untrusted environments. Other areas of interest include database tuning as well as tree and graph matching. After graduating from Yale in 1977, he worked for IBM designing circuits and microcode for the IBM 3090. While at IBM, he earned his M.Sc. from Syracuse University in 1980. He completed his Ph.D. in applied mathematics at Harvard in 1984. Professor Shasha has written six books of puzzles, five of which center on the work of a mathematical detective by the name of Jacob Ecco, a biography about great computer scientists and several technical books relating to his various areas of research (biological computing, databases, statistics, etc.). He has written monthly puzzle columns for Scientific American and Dr. Dobb’s Journal. In 2013 he became a fellow of the Association for Computing Machinery. Since 2015, he holds an Inria International Chair, in the Zenith team.

Permanent link to this article: https://team.inria.fr/zenith/seminar-statistics-is-easy-dennis-shasha-march-6-at-2pm/

Dec 15

Zenith seminar: Sihem Amer Yahia (LIG) “Task Assignment Optimization in Crowdsourcing”

Filed under Seminars, Slider News
December 15, 2014

Monday Dec 15 at 11 am Bat5 2/124

Task Assignment Optimization in Crowdsourcing

By Dr. Sihem Amer-Yahia (LIG, Univ. Grenoble) A crowdsourcing process can be viewed as a combination of three components: worker skill estimation, worker-to-task assignment, and task accuracy evaluation. The reason why crowdsourcing today is so popular is that tasks are small, independent, homogeneous, and do not require a long engagement from workers. The crowd is typically volatile, its arrival and departure asynchronous, and its levels of attention and accuracy variable. As a result, popular crowdsourcing platforms are not well-adapted to emerging team-based tasks such as collaborative editing, multi-player games, or fan-subbing, that require to form a team of experts to accomplish a task together. In particular, I will argue that the optimization of worker-to-task assignment is central to the effectiveness of team-based crowdsourcing. I will present a framework that allows to formulate worker-to-task assignment as optimization problems with different goals and summarize some of our results in this area.

Permanent link to this article: https://team.inria.fr/zenith/zenith-seminar-monday-dec-15-at-11-am-sihem-amer-yahia/

Dec 09

Zenith seminar “Multiplayer Games: a complex application in need for scalable replica management”, Bettina Kemme (McGill Univ.), Dec 9, 2014

Filed under Seminars, Slider News
December 9, 2014

9/12/2014 à 10h30, salle 5.1.056 Multiplayer Games: a complex application in need for scalable replica management Prof. Bettina Kemme, McGill University Multiplayer Online Games (MOGs) are an extremely popular online technology, one that produces billions of dollars in revenues. The underlying architecture of game engines is distributed by nature and has to maintain large amounts of quickly changing state. In particular, each client has its own partial view of a continuously evolving virtual world, and all these client copies have to be kept up-to-date. In this talk, I will present an overview of current game architectures, from client-server to peer-to-peer architectures, and outline possible solutions to several challenges that one faces when trying to meet the scalability, response time and low cost requirements of multiplayer game engines: distributed state maintenance, scalable update dissemination, and the avoidance or detection of malicious cheating behavior.

Permanent link to this article: https://team.inria.fr/zenith/zenith-seminar-monday-dec-09-at-10h30-prof-bettina-kemme-mcgill-university/

PhD Position “Privacy Preserving Query Processing in Decentralized Online Social Networks”, April 28, 2015

PhD Position: Privacy Preserving Query Processing in Decentralized Online Social Networks

Topic

Missions and activities

Skills and profiles

Environment

How to apply

Morgenstern seminar: Dennis Shasha (NYU) “The Changing Nature of Invention in Computer Science”

Inria Sophia-Antipolis seminar: Dennis Shasha (NYU) “Group Testing to Describe Causality in Gene Networks”

Prix Turing : les bases de données à l’honneur via le Blog Binaire du Monde, 20 avril 2015

PhD position “Predictive Big Data Analytics: continuous and dynamically adaptable data request and processing”, April 9, 2015

IBC seminar: Marta Mattoso (UFRJ) “Exploratory Analysis of Raw Data Files through Dataflows”

Zenith seminar: Aleksandra Levchenko “Integrating Big Data and Relational Data with a Functional SQL-like Query Language”

IBC seminar: Dennis Shasha (NYU) “Statistics is Easy”

Zenith seminar: Sihem Amer Yahia (LIG) “Task Assignment Optimization in Crowdsourcing”

Zenith seminar “Multiplayer Games: a complex application in need for scalable replica management”, Bettina Kemme (McGill Univ.), Dec 9, 2014

Search

Events

Calendar

Meta