文本合并行处理

本文介绍了一个简单的Python脚本,用于处理从PDF文档中复制出来带有特殊格式(如换行符和破折号)的文本。通过合并这些被错误分割的单词,使得文本能够更方便地应用于诸如翻译等场景。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

在复制pdf一段文字时,由于格式原因,变成'豆腐块',如下所示.

再进行应用(百度翻译)时比较麻烦.


 The dataset is recorded using a time-of-flight
Intel Creative Interactive Gesture Camera and has
J = 16 annotated joints. Although the authors pro-
vide different artificially rotated training samples, we
only use the genuine 22k. The depth images have
a high quality with hardly any missing depth val-
ues, and sharp outlines with little noise. However,
the pose variability is limited compared to the NYU
dataset. Also, a relatively large number of samples
both from the training and test sets are incorrectly
annotated: We evaluated the accuracy and about 36%
of the poses from the test set have an annotation error
of at least 10 mm.

写了一小段python,对文本进行并行处理

def main():

    with open('a.md', 'r+') as obj:
        lines = obj.readlines()
        strr = ''
        for line in lines:
            line = line.rstrip()
            if len(line)==0:
                pass
            elif line[-1] == '-':
                strr += line[:-1]
            else:
                strr += line+' '
        obj.write(strr)
    obj.close()

if __name__ == '__main__':
    main()

此处'a.md'是ubuntu下随便起的文件名称.(Windows可以改为'a.txt')

注意要把py文件和md文件放在一个文件夹下.

处理后结果:

 The dataset is recorded using a time-of-flight Intel Creative Interactive Gesture Camera and has J = 16 annotated joints. Although the authors provide different artificially rotated training samples, we only use the genuine 22k. The depth images have a high quality with hardly any missing depth values, and sharp outlines with little noise. However, the pose variability is limited compared to the NYU dataset. Also, a relatively large number of samples both from the training and test sets are incorrectly annotated: We evaluated the accuracy and about 36% of the poses from the test set have an annotation error of at least 10 mm. 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值