- nutch启动入口,bin/crawl 调用 bin/nutch。
- nutch系列学习笔记,详细解读nutch每个tool:http://blog.youkuaiyun.com/gobitan/article/category/284793
- 官方wiki:http://wiki.apache.org/nutch/FrontPage#Nutch_1.X_tutorial.28s.29
- 官方教程:http://wiki.apache.org/nutch/#Tutorials
-
Nutch 1.X tutorial(s)
-
NutchTutorial - How to configure Nutch to crawl in local mode and post to Apache Solr for search/index.
-
QuickStartparseChecker - Quick start tutorial on how to use the ParseChecker tool to quickly scrape a website.
-
https://wiki.apache.org/nutch/Nutch_1.X_RESTAPI - An overview of the entire Nutch 1.X REST API.
-
-
Nutch and Hadoop Tutorial (found on google)https://wiki.apache.org/nutch/NutchHadoopTutorial
-
Option 2: Set up Nutch from a source distribution
Advanced users may also use the source distribution:
-
Download a source package (apache-nutch-1.X-src.zip)
- Unzip
-
cd apache-nutch-1.X/
-
Run ant in this folder (cf. RunNutchInEclipse)
-
Now there is a directory runtime/local which contains a ready to use Nutch installation.
When the source distribution is used ${NUTCH_RUNTIME_HOME} refers to apache-nutch-1.X/runtime/local/. Note that
-
config files should be modified in apache-nutch-1.X/runtime/local/conf/
-
ant clean will remove this directory (keep copies of modified config files)
-
- 在Hadoop上运行:http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial
- 看起来比较靠谱的“Nutch+Hadoop集群搭建” http://www.open-open.com/lib/view/open1328670771405.html
在Hadoop上部署nutch 及nutch相关
最新推荐文章于 2016-07-18 14:53:28 发布