问题:nutch Content of size 94218 was truncated to 65536
解:需要 把nutch-site.xml中加入file.content.limit 和http.content.limit 配置,且设置原65535为-1,
然后把mysql里 my.ini加入以下配置
[mysqldump]
quick
max_allowed_packet=20M
问题:
Exception
in thread
"main" java.lang.RuntimeException:
job failed:
name=generate:
null,
jobid=job_local177967844_0002
at
org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54),查询发现utf8
传入了空,log报空指针
解:<property><name>generate.batch.id</name><value>*</value></property>
问题 缺batchid
解,在webpage表加入batchId varchar(767) default null