Zenith seminar: “Improving the Efficiency of Multi-site Web Search Engines”, Xiao Bai, Jan 31, 2014

search_botSeminaire Zenith
30/01 à 10h30 salle 227, Galera
Improving the Efficiency of Multi-site Web Search Engines
Xiao Bai – Yahoo Labs Barcelona
Abstract: 
A multi-site web search engine is composed of a number of search sites geographically distributed around the world. Each search site is typically responsible for crawling and indexing the web pages that are in its geographical neighborhood. A query is selectively processed on a subset of search sites that are predicted to return the best-matching results. The scalability and efficiency of multi-site web search engines have attracted a lot of research attention in recent years. In particular, research has focused on replicating important web pages across sites, forwarding queries to relevant sites, and caching results of previous queries. Yet, these problems have only been studied in isolation, but no prior work has properly investigated the interplay between them.
In talk, I will present what we believe is the first comprehensive analysis of a full stack of techniques for efficient multi-site web search. Specifically, we propose a document replication technique that improves the query locality of the state-of-the-art approaches with various replication budget distribution strategies. We devise a machine learning approach to decide the query forwarding patterns, achieving a significantly lower false positive ratio than a state-of-the-art thresholding approach with little negative impact on search result quality. We propose three result caching strategies that reduce the number of forwarded queries and analyze the trade-off they introduce in terms of storage and network overheads. Finally, we show that the combination of the best-of-the-class techniques yields very promising search efficiency, rendering multi-site, geographically distributed web search engines an attractive alternative to centralized web search engines.
Short Bio: Xiao Bai is a research scientist in Yahoo Labs Barcelona. Before joining Yahoo, she received her Ph.D. in INRIA Rennes (France) in 2010. She obtained her Bachelor’s Degree and Master’s Degree from Xi’an Jiaotong University (China) in 2004 and 2007 respectively. During 2002 and 2004, she studied in Ecole Centrale de Lyon (France) within a Franco-Chinese exchange program and obtained her Engineer Degree (Diplôme d’Ingénieur). Her research interests include distributed data management, Web search and social networks. She has been working on different problems, such as personalized query processing in P2P systems, Web search (including web crawling, distributed architecture and efficiency optimization), content recommendation, and caching mechanisms for social applications.

Permanent link to this article: https://team.inria.fr/zenith/zenith-seminar-xiao-bai-improving-the-efficiency-of-multi-site-web-search-engines-jan-31-10-30-am/