Python|python实现将题目转化为字典！

Python实现：将Word题目转化为JSON格式字典

最新推荐文章于 2024-12-01 20:45:08 发布

原创最新推荐文章于 2024-12-01 20:45:08 发布 · 356 阅读

2 ·

CC 4.0 BY-SA版权

本文介绍了如何使用Python的docx库读取Word文档中的题目信息，并通过正则表达式处理，将其转化为JSON格式的字典结构。具体步骤包括读取文档、提取信息、匹配选项及答案、构建字典并存储输出。

问题描述

在这里首先要提到 JSON 文件， JSON 文件是用来存储简单的数据结构和对象的文件，可以在web 应用程序中进行数据交换。而它的格式就有点类似于常用的字典结构，形如： {‘title’ :’ 关于《花间集》说法错误的是 ’ ,’content’ :{ ‘A’ :’ 作者是赵崇佐 ’, ’B’ : ‘ 收录当时流行歌曲歌词 ’ }, ‘true_choice’:”C” , ’type’:’ 单选题 ’ } 。今天要做的就是读取 word 里的信息并把它们按照如上的格式进行转化。

解决方案

首先要用 python 来解决并处理 word 的文档，就需要引进 docx 的库来读取 word 里的信息，读取出信息后，可以用正则表达式对信息进行进一步的提取和处理，最后以字典的格式存储并输出。

第一步引用 docx 库，读取每一个题目的信息并按不同的题目存放在列表中方便下一步处理。

file = docx.Document(s)
all_paragraphs = file.paragraphs
paragraphs_text = []
for paragraph in all_paragraphs:
paragraphs_text.append(paragraph.text)
l = []
a = 0
for i in range(len(paragraphs_text)):
if paragraphs_text[i] == '':
l.append(paragraphs_text[a:i])
a = i

第二步用正则表达式对信息进行进一步的提取和处理，最后字典的格式存储并输出。

list = []
for questions in l:
val = {}
cotent = {}
for strs in questions:
if re.match('\d', strs):
val['title'] = strs
if re.match('A', strs):
cotent['A'] = strs[2:]
if re.match('B', strs):
cotent['B'] = strs[2:]
if re.match('C', strs):
cotent['C'] = strs[2:]
if re.match('D', strs):
cotent['D'] = strs[2:]
if re.match(' 答案： ', strs):
val['true_choice'] = strs[3:]
if re.match(' 题型： ', strs):
val['type'] = strs[3:]
if len(cotent) > 1:
val['count'] = cotent
list.append(val)
return list

完整代码如下：

import docx
import re
def f(s):
file = docx.Document(s)
all_paragraphs = file.paragraphs
paragraphs_text = []
for paragraph in all_paragraphs:
paragraphs_text.append(paragraph.text)
l = []
a = 0
for i in range(len(paragraphs_text)):
if paragraphs_text[i] == '':
l.append(paragraphs_text[a:i])
a = i
list = []
for questions in l:
val = {}
cotent = {}
for strs in questions:
if re.match('\d', strs):
val['title'] = strs
if re.match('A', strs):
cotent['A'] = strs[2:]
if re.match('B', strs):
cotent['B'] = strs[2:]
if re.match('C', strs):
cotent['C'] = strs[2:]
if re.match('D', strs):
cotent['D'] = strs[2:]
if re.match(' 答案： ', strs):
val['true_choice'] = strs[3:]
if re.match(' 题型： ', strs):
val['type'] = strs[3:]
if len(cotent) > 1:
val['count'] = cotent
list.append(val)
return list

print(f("D://print2.docx"))

效果展示:

输出:

END