阶段作业1:完整的中英文词频统计

But
strBig ='''Big Big World

Emilia

I'm a big big girl

In a big big world




It's not a big big thing if you leave me

But I do do feel

that I too too will miss you much

Miss you much.

I can see the first leaf falling

It's all yellow and nice

It's so very cold outside

Like the way I'm feeling inside

I'm a big big girl

In a big big world

It's not a big big thing if you leave me

But I do do feel

that I too too will miss you much

Miss you much.

Outside it's now raining

And tears are falling from my eyes

Why did it have to happen

Why did it all have to end

I'm a big big girl

In a big big world

It's not a big big thing if you leave me

But I do do feel

that I too too will miss you much

Miss you much.

I have your arms around me warm like fire

But when I open my eyes

You're gone.

I'm a big big girl

In a big big world

It's not a big big thing if you leave me

But I do do feel

that I too too will miss you much

Miss you much.

I'm a big big girl

In a big big world

It's not a big big thing if you leave me

But I do feel that will miss you much

Miss you much.
'''

fo =open('bigbigworld.txt','r',encoding='utf-8')
big =fo.read().lower()
fo.close()
print(big)
#字符串预处理
sep =''' .,:;?!-_'''
for ch in sep:
    strBig =strBig.replace(ch,' ')

strList =strBig.split()
print(len(strList),strList)

strSet =set(strList)
print(len(strSet),strSet)

strDict ={}
for word in strSet:
    strDict[word] = strList.count(word)

print(len(strDict),strDict)

wcList =list(strDict.items())
print(wcList)
wcList.sort(key=lambda x:x[1],reverse=True)
print(wcList)
C:\Users\Administrator\AppData\Local\Programs\Python\Python36\python.exe C:/Users/Administrator/PycharmProjects/untitled/bcd.py
big big world

emilia

i'm a big big girl

in a big big world




it's not a big big thing if you leave me

but i do do feel

that i too too will miss you much

miss you much.

i can see the first leaf falling

it's all yellow and nice

it's so very cold outside

like the way i'm feeling inside

i'm a big big girl

in a big big world

it's not a big big thing if you leave me

but i do do feel

that i too too will miss you much

miss you much.

outside it's now raining

and tears are falling from my eyes

why did it have to happen

why did it all have to end

i'm a big big girl

in a big big world

it's not a big big thing if you leave me

but i do do feel

that i too too will miss you much

miss you much.

i have your arms around me warm like fire

but when i open my eyes

you're gone.

i'm a big big girl

in a big big world

it's not a big big thing if you leave me

but i do do feel

that i too too will miss you much

miss you much.

i'm a big big girl

in a big big world

it's not a big big thing if you leave me

but i do feel that will miss you much

miss you much.

244 ['Big', 'Big', 'World', 'Emilia', "I'm", 'a', 'big', 'big', 'girl', 'In', 'a', 'big', 'big', 'world', "It's", 'not', 'a', 'big', 'big', 'thing', 'if', 'you', 'leave', 'me', 'But', 'I', 'do', 'do', 'feel', 'that', 'I', 'too', 'too', 'will', 'miss', 'you', 'much', 'Miss', 'you', 'much', 'I', 'can', 'see', 'the', 'first', 'leaf', 'falling', "It's", 'all', 'yellow', 'and', 'nice', "It's", 'so', 'very', 'cold', 'outside', 'Like', 'the', 'way', "I'm", 'feeling', 'inside', "I'm", 'a', 'big', 'big', 'girl', 'In', 'a', 'big', 'big', 'world', "It's", 'not', 'a', 'big', 'big', 'thing', 'if', 'you', 'leave', 'me', 'But', 'I', 'do', 'do', 'feel', 'that', 'I', 'too', 'too', 'will', 'miss', 'you', 'much', 'Miss', 'you', 'much', 'Outside', "it's", 'now', 'raining', 'And', 'tears', 'are', 'falling', 'from', 'my', 'eyes', 'Why', 'did', 'it', 'have', 'to', 'happen', 'Why', 'did', 'it', 'all', 'have', 'to', 'end', "I'm", 'a', 'big', 'big', 'girl', 'In', 'a', 'big', 'big', 'world', "It's", 'not', 'a', 'big', 'big', 'thing', 'if', 'you', 'leave', 'me', 'But', 'I', 'do', 'do', 'feel', 'that', 'I', 'too', 'too', 'will', 'miss', 'you', 'much', 'Miss', 'you', 'much', 'I', 'have', 'your', 'arms', 'around', 'me', 'warm', 'like', 'fire', 'But', 'when', 'I', 'open', 'my', 'eyes', "You're", 'gone', "I'm", 'a', 'big', 'big', 'girl', 'In', 'a', 'big', 'big', 'world', "It's", 'not', 'a', 'big', 'big', 'thing', 'if', 'you', 'leave', 'me', 'But', 'I', 'do', 'do', 'feel', 'that', 'I', 'too', 'too', 'will', 'miss', 'you', 'much', 'Miss', 'you', 'much', "I'm", 'a', 'big', 'big', 'girl', 'In', 'a', 'big', 'big', 'world', "It's", 'not', 'a', 'big', 'big', 'thing', 'if', 'you', 'leave', 'me', 'But', 'I', 'do', 'feel', 'that', 'will', 'miss', 'you', 'much', 'Miss', 'you', 'much']
71 {'feeling', 'tears', 'falling', 'And', 'have', "I'm", 'it', 'see', 'Why', 'fire', 'leave', 'not', 'a', 'But', 'end', 'Emilia', 'feel', 'will', 'the', 'nice', 'arms', 'when', 'too', 'now', "You're", 'that', 'world', 'can', 'so', 'from', 'miss', 'raining', 'girl', 'In', 'leaf', 'did', 'cold', 'first', 'open', 'Big', 'Outside', "it's", 'my', 'you', 'are', 'much', 'way', 'your', 'yellow', 'warm', 'if', 'big', 'Like', "It's", 'happen', 'around', 'inside', 'World', 'and', 'outside', 'like', 'Miss', 'do', 'thing', 'all', 'I', 'very', 'to', 'eyes', 'gone', 'me'}
71 {'feeling': 1, 'tears': 1, 'falling': 2, 'And': 1, 'have': 3, "I'm": 6, 'it': 2, 'see': 1, 'Why': 2, 'fire': 1, 'leave': 5, 'not': 5, 'a': 15, 'But': 6, 'end': 1, 'Emilia': 1, 'feel': 5, 'will': 5, 'the': 2, 'nice': 1, 'arms': 1, 'when': 1, 'too': 8, 'now': 1, "You're": 1, 'that': 5, 'world': 5, 'can': 1, 'so': 1, 'from': 1, 'miss': 5, 'raining': 1, 'girl': 5, 'In': 5, 'leaf': 1, 'did': 2, 'cold': 1, 'first': 1, 'open': 1, 'Big': 2, 'Outside': 1, "it's": 1, 'my': 2, 'you': 15, 'are': 1, 'much': 10, 'way': 1, 'your': 1, 'yellow': 1, 'warm': 1, 'if': 5, 'big': 30, 'Like': 1, "It's": 7, 'happen': 1, 'around': 1, 'inside': 1, 'World': 1, 'and': 1, 'outside': 1, 'like': 1, 'Miss': 5, 'do': 9, 'thing': 5, 'all': 2, 'I': 12, 'very': 1, 'to': 2, 'eyes': 2, 'gone': 1, 'me': 6}
[('feeling', 1), ('tears', 1), ('falling', 2), ('And', 1), ('have', 3), ("I'm", 6), ('it', 2), ('see', 1), ('Why', 2), ('fire', 1), ('leave', 5), ('not', 5), ('a', 15), ('But', 6), ('end', 1), ('Emilia', 1), ('feel', 5), ('will', 5), ('the', 2), ('nice', 1), ('arms', 1), ('when', 1), ('too', 8), ('now', 1), ("You're", 1), ('that', 5), ('world', 5), ('can', 1), ('so', 1), ('from', 1), ('miss', 5), ('raining', 1), ('girl', 5), ('In', 5), ('leaf', 1), ('did', 2), ('cold', 1), ('first', 1), ('open', 1), ('Big', 2), ('Outside', 1), ("it's", 1), ('my', 2), ('you', 15), ('are', 1), ('much', 10), ('way', 1), ('your', 1), ('yellow', 1), ('warm', 1), ('if', 5), ('big', 30), ('Like', 1), ("It's", 7), ('happen', 1), ('around', 1), ('inside', 1), ('World', 1), ('and', 1), ('outside', 1), ('like', 1), ('Miss', 5), ('do', 9), ('thing', 5), ('all', 2), ('I', 12), ('very', 1), ('to', 2), ('eyes', 2), ('gone', 1), ('me', 6)]
[('big', 30), ('a', 15), ('you', 15), ('I', 12), ('much', 10), ('do', 9), ('too', 8), ("It's", 7), ("I'm", 6), ('But', 6), ('me', 6), ('leave', 5), ('not', 5), ('feel', 5), ('will', 5), ('that', 5), ('world', 5), ('miss', 5), ('girl', 5), ('In', 5), ('if', 5), ('Miss', 5), ('thing', 5), ('have', 3), ('falling', 2), ('it', 2), ('Why', 2), ('the', 2), ('did', 2), ('Big', 2), ('my', 2), ('all', 2), ('to', 2), ('eyes', 2), ('feeling', 1), ('tears', 1), ('And', 1), ('see', 1), ('fire', 1), ('end', 1), ('Emilia', 1), ('nice', 1), ('arms', 1), ('when', 1), ('now', 1), ("You're", 1), ('can', 1), ('so', 1), ('from', 1), ('raining', 1), ('leaf', 1), ('cold', 1), ('first', 1), ('open', 1), ('Outside', 1), ("it's", 1), ('are', 1), ('way', 1), ('your', 1), ('yellow', 1), ('warm', 1), ('Like', 1), ('happen', 1), ('around', 1), ('inside', 1), ('World', 1), ('and', 1), ('outside', 1), ('like', 1), ('very', 1), ('gone', 1)]

Process finished with exit code 0

 

转载于:https://www.cnblogs.com/aaaadaztz/p/9712219.html

### 使用 Hadoop MapReduce 统计 TXT 文件中单词出现的频率 为了实现这一目标,需遵循一系列操作流程以确保文件被正确处理并得到预期的结果。 #### 准备工作环境 在 Linux 环境下配置好 Hadoop 并启动集群服务。确认 Hadoop 安装路径,并设置必要的环境变量以便能够顺利运行 HDFS 命令[^2]。 #### 创建测试文本文件 创建一个包含至少一万英文单词的文本文件 `wordtests.txt` 或者其他命名方式如 `zjh.txt`,用于后续上传至 HDFS 和执行词频统计实验[^3]。 #### 将文本文件上传到 HDFS 通过命令行工具将准备好的文本文件上传到指定位置 `/user/hadoop/input` 下: ```bash cd /usr/local/hadoop ./bin/hdfs dfs -put ./wordtests.txt input ``` 验证文件是否已成功上传: ```bash ./bin/hdfs dfs -ls input ``` 此步骤确保源数据已被放置于分布式文件系统内供下一步骤访问。 #### 编写 Python Mapper 脚本 利用 Python 实现简单的映射器逻辑,在这里定义了一个名为 `mapper.py` 的脚本来解析输入流中的每一行文字,将其拆分为单个词语并通过标准输出发送每条记录附带数值 "1": ```python #!/usr/bin/env python import sys for line in sys.stdin: words = line.strip().split() for word in words: print(f"{word}\t1") ``` 赋予该脚本可执行权限: ```bash chmod +x mapper.py ``` 这段代码实现了基本的地图端任务——即把原始文档转换成键值对形式的数据集,其中每个关键词对应着数量标记 “1”。这一步骤为之后聚合相同词条提供了基础材料[^4]。 #### 编写 Reducer 脚本 同样采用 Python 来构建化简器部分的功能,负责接收来自多个映射节点传递过来的信息并对它们实施累加运算从而得出最终结果。以下是简化版 reducer 示例 (`reducer.py`) : ```python #!/usr/bin/env python from collections import defaultdict import sys current_word = None current_count = 0 word_counts = defaultdict(int) for line in sys.stdin: try: word, count = line.split('\t', 1) count = int(count) if current_word != word and current_word is not None: print(f'{current_word}\t{current_count}') current_count = 0 current_word = word current_count += count except ValueError as e: continue if current_word is not None: print(f'{current_word}\t{current_count}') ``` 这个程序遍历所有接收到的标准输入项,当遇到新的关键字时就打印之前累积的数量;最后还需记得输出最后一个有效组合。 #### 提交作业给 YARN 集群调度管理平台 借助 hadoop-streaming.jar 工具提交上述两个阶段的任务描述以及关联资源地址给 Apache Yarn 进行统一管理和分配计算资源完成整个 MR 流程: ```bash hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming*.jar \ -file ~/path/to/mapper.py -mapper mapper.py \ -file ~/path/to/reducer.py -reducer reducer.py \ -input /user/hadoop/input/* -output /user/hadoop/output/ ``` 注意替换实际存在的绝对路径指向自定义编写的 map-reduce 脚本所在的位置。 #### 获取并保存结果 一旦作业结束,可以通过下面指令获取存储在 HDFS 上面的结果并将之下载回本地计算机上进一步分析或展示: ```bash mkdir output && hdfs dfs -getmerge /user/hadoop/output/part-* ./ cat part-* ``` 这样就可以看到完整词频统计数据列表了[^5]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值