Larbin Source Code Analysis 1——Introduction

Larbin是一款用C++编写的开源网络爬虫程序,主要用于抓取大量网页填充搜索引擎数据库。它能够在标准PC上利用足够的网络速度抓取超过一亿页的内容。需要注意的是,Larbin仅具备网页抓取功能,并不包含网页内容解析和索引建立等功能,用户需自行编写代码增强其功能。

what Larbin can do

Larbin is an open source web crawler programmed by C++. It is intended to fetch a large number of web pages to fill the database of a serch engine. With a network fast enough, Larbin should be able to fetch more than 100 millions pages on a standard PC.

what Larbin can not do

Larbin is just a web crawler, NOT an indexer. He gives us the raw meterial to cook our dinner, NOT the dinner itself! We have to write some code to enhance its functions, endowing him with the ability to interprete the information, generate indexs for our database, and some other customize applications!

OK, Prologue ends. If the introduction above does not satisfy your craving fully, may be this website can provide you some further introductive information. We will talk about something deepen next.

What  this article will NOT cover

(but provide link which may answer your questions)

Compile and Run Larbin

Maybe you have some problems to make Larbin work even you read the website above 

How to customize Larbin

OK, let’s walk into the Larbin World!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值