taskdb
每个project 一张表, 如
sqlite> .schema taskdb_test_meituan
CREATE TABLE `taskdb_test_meituan` (
taskid PRIMARY KEY,
project,
url, status,
schedule, fetch, process, track,
lastcrawltime, updatetime
);
CREATE INDEX `status_taskdb_test_meituan_index` ON `taskdb_test_meituan` (status);
字段意义
sqlite> select url, schedule, fetch from taskdb_test_meituan order by updatetime desc limit 2 ;
http://gj.meituan.com/category/jiafang/all/page1|{"priority": 2}|{"headers": {"Accept-Language": "zh-CN,zh;q=0.8", "Accept-Encoding": "gzip,deflate,sdch", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36", "Connection": "keep-alive", "Cache-Control": "max-age=0"}}
http://www.meituan.com/shop/2010830|{"priority": 2, "age": 172800}|{"headers": {"Accept-Language": "zh-CN,zh;q=0.8", "Accept-Encoding": "gzip,deflate,sdch", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36", "Connection": "keep-alive", "Cache-Control": "max-age=0"}}
schedule: 应该就是class handler 里function @config里的内容
fetech: 是 crawl_config dict