A hybrid P2P/cloud for Large Scale Data Sharing

Post-Doc Offer

A hybrid P2P/cloud for Large Scale Data Sharing

With the advent of the Internet and the World-wide-web, there is an emergent need to develop user applications that access data and resources stored in the network. In order to facilitate the development of network-centric applications, new computational paradigms are needed that are scalable, elastic, available, and fault-tolerant. During the past decades two dominant paradigms referred to as Peer-to-Peer (P2P) Computing and Cloud Computing have become widely prevalent as computational paradigms for distributed applications. Peer-to-peer computing is a highly decentralized computing paradigm that leverages computing resources at the user level for supporting decentralized user level applications such as wide-scale media file sharing, telecommunication services (e.g., Skype), and others. Cloud computing on the other hand relies on large data-centers consisting of thousands of server-class machines and all application processing and application data is centralized in the network core, i.e., data-centers [1,2]. The two paradigms in many ways are complementary and provide different trade-offs. For instance, the cost for computing and storage is almost free in P2P but it suffers from the challenges of churn and low reliability of user machines. Cloud computing, on the other hand significantly simplifies the task of system administration in the data-center but requires a very large investment in building large-scale data-centers.

This postdoc topic requires research in new distributed architectures an algorithms that leverage from the above two paradigms. At present, in the commercial realm, cloud computing has emerged as a dominant paradigm. However, we contend that cloud computing is amenable for supporting client-server interactions. As we move towards applications that are more collaborative and require continuous interactivity (i.e., latency sensitive applications), the cloud computing paradigm may not be able to sustain such applications. Examples of such applications arise in the area of distributed gaming, group video-chat, online interactive classrooms, and synchronous group interactions in online social networks. The commonality among all these applications is that they require many-to-many communication as well as the need for streaming media flow among all the members.

The goal is to develop a hybrid platform that combines the two paradigms and leverages computing, storage, and network resources both in the data-centers (i.e., the cloud) as well as at the edges of the network (i.e., the peer or user machines). In addition, we will also explore the suitability of this hybrid model for large scale distributed data sharing through recommendation in different contexts such as data streaming [3] and scientific on-line communities [4, 5]. The common issue here is that users have their own datasets (documents, videos, etc.) locally stored and controlled, and are willing to share their data in a personalized and controlled way.

[1] Big Data and Cloud Computing: Current State and Future Opportunities, Divyakant Agrawal, Sudipto Das, Amr El Abbadi, EDBT 2011: 530-533.
[2] Database Scalability, Elasticity, and Autonomy in the Cloud Divyakant Agrawal, Amr El Abbadi, Sudipto Das, and Aaron J. Elmore, DASFAA (1) 2011: 2-15.
[3] Flower-CDN: a hybrid P2P overlay for efficient query processing in CDN, Manal El Dick, Esther Pacitti and Bettina Kemme, EDBT 2009: 427-438.
[4] P2Prec: A P2P Recommendation System for Large-Scale Data Sharing, Fady Draidi, EstherPacitti and Bettina Kemme, Trans. Large-Scale Data- and Knowledge-Centered Systems 3: 87-116 (2011).
[5] Zenith: Scientific Data Management on a Large Scale, Esther Pacitti and Patrick Valduriez, ERCIM News 2012(89): (2012).

Contact : Esther.Pacitti@lirmm.fr

Permanent link to this article: https://team.inria.fr/zenith/a-hybrid-p2pcloud-for-large-scale-data-sharing/