Distributed Web Search
An appealing solution to scale Web search with the growth of the Internet is the use of distributed architectures. Distributed search engines rely on multiple sites deployed in distant regions across the world, where each site is specialized to serve queries issued by the users of its region.
Distributed search raises several challenges. In order to preserve the quality of the results, all documents should be taken into account during the evaluation of a query. However, for scalability reasons, each search site can only index a subset of the documents. When a user query requests a document which is not indexed locally, the search site has to contact the other sites in order to compute an exact result. As sites are distributed in different regions, this generates additional latency and reduces the satisfaction of the user.
In this presentation, I will describe the general architecture of a distributed search engine. Then, I will focus on the problem of assigning new documents to a search site