Google新一代实时搜索系统的核心机制

Percolator是Google为解决大规模数据实时更新而设计的系统,它基于BigTable,实现了高效的数据处理和更新机制。通过采用增量处理的方式,Percolator能够快速地将新的网页内容纳入搜索引擎索引中,相较于之前的批量处理方式,大幅度减少了索引更新的时间延迟。

最近,Google发布一篇关于其新一代实时搜索系统核心机制的论文《Large-scale Incremental Processing Using Distributed Transactions and Notifications》,在这篇论文中介绍名为“Percolator”的一个基于BigTable的系统,在功能上其非常类似传统数据库的触发器(Trigger),但是在伸缩性方面有其独到的设计,下面是其摘要、下载地址和相关文章等。

摘要

Updating an index of the web as documents are crawled requires continuously transforming a large repository of existing documents as new documents arrive. This task is one example of a class of data processing tasks that transform a large repository of data via small, independent mutations. These tasks lie in a gap between the capabilities of existing infrastructure. Databases do not meet the storage or throughput requirements of these tasks: Google's indexing system stores tens of petabytes of data and processes billions of updates per day on thousands of machines. MapReduce and other batch-processing systems cannot process small updates individually as they rely on creating large batches for efficiency.

We have built Percolator, a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index. By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, we process the same number of documents per day, while reducing the average age of documents in Google search results by 50%.

下载地址

相关文章

Google's Colossus Makes Search Real-Time By Dumping MapReduce

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值