14/10/11 10:00:04 WARN crawl.Crawl: solrUrl is not set, indexing will be skipped...
14/10/11 10:00:06 INFO crawl.Crawl: crawl started in: crawl
14/10/11 10:00:06 INFO crawl.Crawl: rootUrlDir = data/urls
14/10/11 10:00:06 INFO crawl.Crawl: threads = 100
14/10/11 10:00:06 INFO crawl.Crawl: depth = 3
14/10/11 10:00:06 INFO crawl.Crawl: solrUrl=null
14/10/11 10:00:06 INFO crawl.Crawl: topN = 100
14/10/11 10:00:07 INFO crawl.Injector: Injector: starting at 2014-10-11 10:00:07
14/10/11 10:00:07 INFO crawl.Injector: Injector: crawlDb: crawl/crawldb
14/10/11 10:00:07 INFO crawl.Injector: Injector: urlDir: data/urls
14/10/11 10:00:07 INFO Configuration.deprecation: mapred.temp.dir is deprecated. Instead, use mapreduce.cluster.temp.dir
14/10/11 10:00:07 INFO crawl.Injector: Injector: Converting injected urls to crawl db entries.
14/10/11 10:00:09 INFO client.RMProxy: Connecting to ResourceManager at idc66/192.168.56.66:8080
14/10/11 10:00:10 INFO client.RMProxy: Connecting to ResourceManager at idc66/192.168.56.66:8080
14/10/11 10:00:20 INFO mapred.FileInputFormat: Total input paths to process : 1
14/10/11 10:00:20 INFO mapreduce.JobSubmitter: number of splits:2
14/10/11 10:00:20 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1412989543453_0001
14/10/11 10:00:21 INFO impl.YarnClientImpl: Submitted application application_1412989543453_0001
14/10/11 10:00:21 INFO mapreduce.Job: The url to track the job: http://idc66:8088/proxy/application_1412989543453_0001/
14/10/11 10:00:21 INFO mapreduce.Job: Running job: job_1412989543453_0001
14/10/11 10:00:39 INFO mapreduce.Job: Job job_1412989543453_0001 running in uber mode : false
14/10/11 10:00:39 INFO mapreduce.Job: map 0% reduce 0%
14/10/11 10:01:26 INFO mapreduce.Job: map 100% reduce 0%
14/10/11 10:01:51 INFO mapreduce.Job: map 100% reduce 100%
14/10/11 10:01:52 INFO mapreduce.Job: Job job_1412989543453_0001 completed successfully
14/10/11 10:01:52 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=6
FILE: Number of bytes written=351842
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=245
HDFS: Number of bytes written=86
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=179870
Total time spent by all reduces in occupied slots (ms)=53334
Total time spent by all map tasks (ms)=89935
Total time spent by all reduce tasks (ms)=17778
Total vcore-seconds taken by all map tasks=89935
Total vcore-seconds taken by all reduce tasks=17778
Total megabyte-seconds taken by all map tasks=138140160
Total megabyte-seconds taken by all reduce tasks=54614016
Map-Reduce Framework
Map input records=2
Map output records=0
Map output bytes=0
Map output materialized bytes=12
Input split bytes=200
Combine input records=0
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=12
Reduce input records=0
Reduce output records=0
Spilled Records=0
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=8397
CPU time spent (ms)=8130
Physical memory (bytes) snapshot=1627553792
Virtual memory (bytes) snapshot=7534977024
Total committed heap usage (bytes)=1441071104
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
injector
urls_filtered=2
File Input Format Counters
Bytes Read=45
File Output Format Counters
Bytes Written=86
14/10/11 10:01:52 INFO crawl.Injector: Injector: total number of urls rejected by filters: 2
14/10/11 10:01:52 INFO crawl.Injector: Injector: total number of urls injected after normalization and filtering: 0
14/10/11 10:01:52 INFO crawl.Injector: Injector: Merging injected urls into crawl db.
14/10/11 10:01:52 INFO client.RMProxy: Connecting to ResourceManager at idc66/192.168.56.66:8080
14/10/11 10:01:52 INFO client.RMProxy: Connecting to ResourceManager at idc66/192.168.56.66:8080
14/10/11 10:01:58 INFO mapred.FileInputFormat: Total input paths to process : 1
14/10/11 10:01:58 INFO mapreduce.JobSubmitter: number of splits:1
14/10/11 10:01:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1412989543453_0002
14/10/11 10:01:58 INFO impl.YarnClientImpl: Submitted application application_1412989543453_0002
14/10/11 10:01:58 INFO mapreduce.Job: The url to track the job: http://idc66:8088/proxy/application_1412989543453_0002/
14/10/11 10:01:58 INFO mapreduce.Job: Running job: job_1412989543453_0002
14/10/11 10:02:33 INFO mapreduce.Job: Job job_1412989543453_0002 running in uber mode : false
14/10/11 10:02:33 INFO mapreduce.Job: map 0% reduce 0%
14/10/11 10:02:40 INFO mapreduce.Job: map 100% reduce 0%
14/10/11 10:02:49 INFO mapreduce.Job: map 100% reduce 100%
14/10/11 10:02:49 INFO mapreduce.Job: Job job_1412989543453_0002 completed successfully
14/10/11 10:02:49 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=6
FILE: Number of bytes written=234971
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=230
HDFS: Number of bytes written=215
HDFS: Number of read operations=7
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=10712
Total time spent by all reduces in occupied slots (ms)=17775
Total time spent by all map tasks (ms)=5356
Total time spent by all reduce tasks (ms)=5925
Total vcore-seconds taken by all map tasks=5356
Total vcore-seconds taken by all reduce tasks=5925
Total megabyte-seconds taken by all map tasks=8226816
Total megabyte-seconds taken by all reduce tasks=18201600
Map-Reduce Framework
Map input records=0
Map output records=0
Map output bytes=0
Map output materialized bytes=6
Input split bytes=144
Combine input records=0
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=6
Reduce input records=0
Reduce output records=0
Spilled Records=0
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=44
CPU time spent (ms)=2470
Physical memory (bytes) snapshot=448868352
Virtual memory (bytes) snapshot=5610287104
Total committed heap usage (bytes)=724303872
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=86
File Output Format Counters
Bytes Written=215
14/10/11 10:02:49 INFO client.RMProxy: Connecting to ResourceManager at idc66/192.168.56.66:8080
14/10/11 10:02:49 INFO crawl.Injector: Injector: finished at 2014-10-11 10:02:49, elapsed: 00:02:41
14/10/11 10:02:49 INFO crawl.Generator: Generator: starting at 2014-10-11 10:02:49
14/10/11 10:02:49 INFO crawl.Generator: Generator: Selecting best-scoring urls due for fetch.
14/10/11 10:02:49 INFO crawl.Generator: Generator: filtering: true
14/10/11 10:02:49 INFO crawl.Generator: Generator: normalizing: true
14/10/11 10:02:49 INFO crawl.Generator: Generator: topN: 100
14/10/11 10:02:49 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
14/10/11 10:02:49 INFO crawl.Generator: Generator: jobtracker is 'local', generating exactly one partition.
14/10/11 10:02:49 INFO client.RMProxy: Connecting to ResourceManager at idc66/192.168.56.66:8080
14/10/11 10:02:49 INFO client.RMProxy: Connecting to ResourceManager at idc66/192.168.56.66:8080
14/10/11 10:02:55 INFO mapred.FileInputFormat: Total input paths to process : 1
14/10/11 10:02:55 INFO mapreduce.JobSubmitter: number of splits:1
14/10/11 10:02:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1412989543453_0003
14/10/11 10:02:55 INFO impl.YarnClientImpl: Submitted application application_1412989543453_0003
14/10/11 10:02:55 INFO mapreduce.Job: The url to track the job: http://idc66:8088/proxy/application_1412989543453_0003/
14/10/11 10:02:55 INFO mapreduce.Job: Running job: job_1412989543453_0003
14/10/11 10:03:08 INFO mapreduce.Job: Job job_1412989543453_0003 running in uber mode : false
14/10/11 10:03:08 INFO mapreduce.Job: map 0% reduce 0%
14/10/11 10:03:17 INFO mapreduce.Job: map 100% reduce 0%
14/10/11 10:04:15 INFO mapreduce.Job: map 100% reduce 100%
14/10/11 10:04:15 INFO mapreduce.Job: Job job_1412989543453_0003 completed successfully
14/10/11 10:04:15 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=6
FILE: Number of bytes written=237427
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=205
HDFS: Number of bytes written=0
HDFS: Number of read operations=5
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=12016
Total time spent by all reduces in occupied slots (ms)=164820
Total time spent by all map tasks (ms)=6008
Total time spent by all reduce tasks (ms)=54940
Total vcore-seconds taken by all map tasks=6008
Total vcore-seconds taken by all reduce tasks=54940
Total megabyte-seconds taken by all map tasks=9228288
Total megabyte-seconds taken by all reduce tasks=168775680
Map-Reduce Framework
Map input records=0
Map output records=0
Map output bytes=0
Map output materialized bytes=6
Input split bytes=119
Combine input records=0
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=6
Reduce input records=0
Reduce output records=0
Spilled Records=0
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=289
CPU time spent (ms)=3510
Physical memory (bytes) snapshot=929054720
Virtual memory (bytes) snapshot=5619216384
Total committed heap usage (bytes)=763297792
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=86
File Output Format Counters
Bytes Written=0
14/10/11 10:04:15 WARN crawl.Generator: Generator: 0 records selected for fetching, exiting ...
14/10/11 10:04:15 INFO crawl.Crawl: Stopping at depth=0 - no more URLs to fetch.
14/10/11 10:04:15 WARN crawl.Crawl: No URLs to fetch - check your seed list and URL filters.
14/10/11 10:04:15 INFO crawl.Crawl: crawl finished: crawl
|