在复制pdf一段文字时,由于格式原因,变成'豆腐块',如下所示.
再进行应用(百度翻译)时比较麻烦.
The dataset is recorded using a time-of-flight
Intel Creative Interactive Gesture Camera and has
J = 16 annotated joints. Although the authors pro-
vide different artificially rotated training samples, we
only use the genuine 22k. The depth images have
a high quality with hardly any missing depth val-
ues, and sharp outlines with little noise. However,
the pose variability is limited compared to the NYU
dataset. Also, a relatively large number of samples
both from the training and test sets are incorrectly
annotated: We evaluated the accuracy and about 36%
of the poses from the test set have an annotation error
of at least 10 mm.
写了一小段python,对文本进行并行处理
def main():
with open('a.md', 'r+') as obj:
lines = obj.readlines()
strr = ''
for line in lines:
line = line.rstrip()
if len(line)==0:
pass
elif line[-1] == '-':
strr += line[:-1]
else:
strr += line+' '
obj.write(strr)
obj.close()
if __name__ == '__main__':
main()
此处'a.md'是ubuntu下随便起的文件名称.(Windows可以改为'a.txt')
注意要把py文件和md文件放在一个文件夹下.
处理后结果:
The dataset is recorded using a time-of-flight Intel Creative Interactive Gesture Camera and has J = 16 annotated joints. Although the authors provide different artificially rotated training samples, we only use the genuine 22k. The depth images have a high quality with hardly any missing depth values, and sharp outlines with little noise. However, the pose variability is limited compared to the NYU dataset. Also, a relatively large number of samples both from the training and test sets are incorrectly annotated: We evaluated the accuracy and about 36% of the poses from the test set have an annotation error of at least 10 mm.