Data Management in the Cloud
Colloquium sur le Cloud Computing
Mardi 18 octobre 2011, 10h30-12h, Salle des Séminaires, LIRMM
Organisé par l’équipe Zenith (Esther Pacitti, Patrick Valduriez)
Amr El Abbadi
University of California, Santa Barbara
Abstract: Over the past two decades, database and systems researchers have made significant advances in the development of algorithms and techniques to provide data management solutions that carefully balance the three major requirements when dealing with critical data: high availability, reliability, and data consistency. However, over the past few years the data requirements, in terms of data availability and system scalability, from Internet scale enterprises that provide services and cater to millions of users has been unprecedented. Cloud computing has emerged as an extremely successful paradigm for deploying Internet and Web-based applications. Scalability, elasticity, pay-per-use pricing, and autonomic control of large-scale operations are the major reasons for the successful widespread adoption of cloud infrastructures. Current proposed solutions to scalable data management, driven primarily by prevalent application requirements, significantly downplay the data consistency requirements and instead focus on high scalability and resource elasticity to support data-rich applications for millions to tens of millions of users. However, the growing popularity of “cloud computing”, the resulting shift of a large number of Internet applications to the cloud, and the quest towards providing data management services in the cloud, has opened up the challenge for designing data management systems that provide consistency guarantees at a granularity which goes beyond single rows and keys. In this talk, we analyze the design choices that allowed modern scalable data management systems to achieve orders of magnitude higher levels of scalability compared to traditional databases. With this understanding, we highlight some design principles for data management systems that can be used to augment existing databases with new cloud features such as scalability, elasticity, and autonomy. We then present two systems that leverage these principles. The first system, G-Store, provides transactional guarantees on data granules formed on-demand while being efficient and scalable. The second system, ElasTraS, provides elastically scalable transaction processing using logically contained database partitions. Finally, we will present two techniques for on-demand live database migration, a primitive operation critical to provide lightweight elasticity as a first class notion in the next generation of database systems. The first technique, Albatross, supports live migration in a multitenant database serving OLTP style workloads where the persistent database image is stored in network attached storage. The second technique, Zephyr, efficiently migrates live databases in a shared nothing transactional database architecture.
Bio: Amr El Abbadi is currently Professor and Chair of the Computer Science Department at the University of California, Santa Barbara. He received his B. Eng. in Computer Science from Alexandria University, Egypt, and received his Ph.D. in Computer Science from Cornell University in August 1987. Prof. El Abbadi is an ACM Fellow. He has served as a journal editor for several database journals, including, currently, The VLDB Journal. He has been Program Chair for multiple database and distributed systems conferences, most recently SIGSPATIAL GIS 2010 and ACM Symposium on Cloud Computing (SoCC) 2011. He has also served as a board member of the VLDB Endowment from 2002—2008. In 2007, Prof. El Abbadi received the UCSB Senate Outstanding Mentorship Award for his excellence in mentoring graduate students. He has published over 250 articles in databases and distributed systems.