wordcount python例子.
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
hadoop命令如下:
hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/contrib/streaming/hadoop-0.20.2-dev-streaming.jar - file mapper.py -mapper mapper.py -file reducer.py -reducer reducer.py -input /user/mapr/helloworld -output /user/mapr/helloworld-out2
结果如下:
mapr@mapr-desktop:~/mapreduce/wordcount$ hadoop dfs -cat /user/mapr/helloworld-out2/part-00000
hadoop! 1
hello 2
world! 1
OK!
mapper.py
#!/usr/bin/env python
import sys
for line in sys.stdin:
line = line.strip()
words = line.split()
for word in words:
print '%s\t%s' % (word, 1)
reducer.py
#!/usr/bin/env python
import sys
current_word = None
current_count = 0
word = None
for line in sys.stdin:
line = line.strip()
word, count = line.split('\t', 1)
count = int(count)
if current_word == word:
current_count += count
else:
if current_word:
print '%s\t%s' % (current_word, current_count)
current_count = count
current_word = word
if current_word == word:
print '%s\t%s' % (current_word, current_count)
原文无修改。