在Python中,mem_profiler可以用来评估内存的使用,而cprofiler结合pstats可以用来分析代码运行的时间.从这两点出发,我们可以用来评估Python代码的效率.
Talk is cheap, show me the code.
这里用简单的两个操作,一个评估只统计,一个评估整合并统计.从两个角度来梳理list和generator在不同场景下性能的区别.这里的779.csv是一个8.8M大小的文件.在试验中,列表统计结果是一个122M左右的list.
@profile
def exam_1():
counter = 0
with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
info = csv.reader(file)
for i in info:
counter += len(i)
print(counter)
@profile
def exam_2():
counter = 0
with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
for info in file:
counter += len(info.split(','))
print(counter)
@profile
def exam_3():
with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
info = csv.reader(file)
dd = [j for i in info for j in i]
counter = len(dd)
print(counter)
@profile
def exam_4():
counter = 0
info_dict = []
with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
info = csv.reader(file)
for i in info:
counter += len(i)
for j in i:
info_dict.append(j)
print(counter)
@profile
def exam_5():
counter = 0
info_dict = []
with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
for info in file:
dd = info.split(',')
counter += len(dd)
for j in dd:
info_dict.append(j)
print(counter)
@profile
def exam_6():
with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
info = csv.reader(file)
dd = [j for i in info for j in i]
counter = len(dd)
print(counter)
def cprofiler_exam_1():
cProfile.run("exam_1()", "timeit_1")
p = pstats.Stats('timeit_1')
p.sort_stats('time')
p.print_stats(10)
cProfile.run("exam_2()", "timeit_2")
p = pstats.Stats('timeit_2')
p.sort_stats('time')
p.print_stats(10)
cProfile.run("exam_3()", "timeit_3")
p = pstats.Stats('timeit_3')
p.sort_stats('time')
p.print_stats(10)
def cprofiler_exam_2():
cProfile.run("exam_4()", "timeit_4")
p = pstats.Stats('timeit_4')
p.sort_stats('time')
p.print_stats(10)
cProfile.run("exam_5()", "timeit_5")
p = pstats.Stats('timeit_5')
p.sort_stats('time')
p.print_stats(10)
cProfile.run("exam_6()", "timeit_6")
p = pstats.Stats('timeit_6')
p.sort_stats('time')
p.print_stats(10)
实验结果
第一个实验的评估结果是:
1258003
Filename: /home/admin/PycharmProjects/new_query_correction/test/test_generator.py
Line # Mem usage Increment Line Contents
================================================
29 14.6 MiB 14.6 MiB @profile
30 def exam_1():
31 14.6 MiB 0.0 MiB counter = 0
32 14.6 MiB 0.0 MiB with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
33 14.6 MiB 0.0 MiB info = csv.reader(file)
34 14.6 MiB 0.0 MiB for i in info:
35 14.6 MiB 0.0 MiB counter += len(i)
36 14.6 MiB 0.0 MiB print(counter)
Thu Jan 3 16:38:57 2019 timeit_1
348745 function calls (347872 primitive calls) in 22.572 seconds
Ordered by: internal time
List reduced from 160 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 22.493 22.493 22.561 22.561 /home/admin/PycharmProjects/new_query_correction/test/test_generator.py:29(exam_1)
317030/316937 0.047 0.000 0.047 0.000 {built-in method builtins.len}
1123 0.014 0.000 0.014 0.000 {built-in method _codecs.utf_8_decode}
1123 0.006 0.000 0.020 0.000 /usr/lib/python3.6/codecs.py:318(decode)
276/2 0.001 0.000 0.003 0.001 /usr/lib/python3.6/sre_compile.py:64(_compile)
186/5 0.001 0.000 0.002 0.000 /usr/lib/python3.6/sre_parse.py:470(_parse)
515 0.001 0.000 0.001 0.000 {built-in method posix.lstat}
515 0.000 0.000 0.001 0.000 /usr/lib/python3.6/posixpath.py:75(join)
668 0.000 0.000 0.001 0.000 /usr/lib/python3.6/enum.py:803(__and__)
217 0.000 0.000 0.001 0.000 /usr/lib/python3.6/posixpath.py:331(normpath)
1258003
Filename: /home/admin/PycharmProjects/new_query_correction/test/test_generator.py
Line # Mem usage Increment Line Contents
================================================
39 14.9 MiB 14.9 MiB @profile
40 def exam_2():
41 14.9 MiB 0.0 MiB counter = 0
42 14.9 MiB 0.0 MiB with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
43 14.9 MiB 0.0 MiB for info in file:
44 14.9 MiB 0.0 MiB counter += len(info.split(','))
45 14.9 MiB 0.0 MiB print(counter)
Thu Jan 3 16:39:20 2019 timeit_2
633149 function calls in 22.116 seconds
Ordered by: internal time
List reduced from 98 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 21.897 21.897 22.116 22.116 /home/admin/PycharmProjects/new_query_correction/test/test_generator.py:39(exam_2)
315100 0.172 0.000 0.172 0.000 {method 'split' of 'str' objects}
315116 0.026 0.000 0.026 0.000 {built-in method builtins.len}
1123 0.014 0.000 0.014 0.000 {built-in method _codecs.utf_8_decode}
1123 0.006 0.000 0.020 0.000 /usr/lib/python3.6/codecs.py:318(decode)
54 0.000 0.000 0.000 0.000 /usr/lib/python3.6/tokenize.py:492(_tokenize)
49 0.000 0.000 0.000 0.000 {method 'match' of '_sre.SRE_Pattern' objects}
1 0.000 0.000 0.000 0.000 /usr/lib/python3.6/inspect.py:935(getblock)
2 0.000 0.000 0.000 0.000 {built-in method io.open}
1 0.000 0.000 22.116 22.116 {built-in method builtins.exec}
1258003
Filename: /home/admin/PycharmProjects/new_query_correction/test/test_generator.py
Line # Mem usage Increment Line Contents
================================================
48 14.9 MiB 14.9 MiB @profile
49 def exam_3():
50 14.9 MiB 0.0 MiB with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
51 14.9 MiB 0.0 MiB info = csv.reader(file)
52 123.1 MiB 0.3 MiB dd = [j for i in info for j in i]
53
54 123.1 MiB 0.0 MiB counter = len(dd)
55 123.1 MiB 0.0 MiB print(counter)
Thu Jan 3 16:40:13 2019 timeit_3
3019 function calls (3018 primitive calls) in 53.355 seconds
Ordered by: internal time
List reduced from 84 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 53.316 53.316 53.337 53.337 /home/admin/PycharmProjects/new_query_correction/test/test_generator.py:52(<listcomp>)
1 0.017 0.017 53.354 53.354 /home/admin/PycharmProjects/new_query_correction/venv/lib/python3.6/site-packages/memory_profiler.py:657(f)
1121 0.014 0.000 0.014 0.000 {built-in method _codecs.utf_8_decode}
1121 0.007 0.000 0.021 0.000 /usr/lib/python3.6/codecs.py:318(decode)
1 0.000 0.000 53.337 53.337 /home/admin/PycharmProjects/new_query_correction/test/test_generator.py:48(exam_3)
63 0.000 0.000 0.000 0.000 /usr/lib/python3.6/tokenize.py:492(_tokenize)
54 0.000 0.000 0.000 0.000 {method 'match' of '_sre.SRE_Pattern' objects}
1 0.000 0.000 0.000 0.000 /usr/lib/python3.6/inspect.py:935(getblock)
1 0.000 0.000 0.000 0.000 /home/admin/PycharmProjects/new_query_correction/venv/lib/python3.6/site-packages/memory_profiler.py:754(show_results)
看起来似乎是最后一个任务花的时间最长.然而这个时间主要是花在了profiler的统计上面.注释掉profiler,时间如下:
/home/bai/PycharmProjects/new_query_correction/venv/bin/python /home/bai/PycharmProjects/new_query_correction/test/test_generator.py
1258003
Thu Jan 3 17:01:30 2019 timeit_1
317353 function calls in 0.142 seconds
Ordered by: internal time
List reduced from 14 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.121 0.121 0.142 0.142 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:30(exam_1)
315100 0.010 0.000 0.010 0.000 {built-in method builtins.len}
1121 0.010 0.000 0.010 0.000 {built-in method _codecs.utf_8_decode}
1121 0.001 0.000 0.011 0.000 /usr/lib/python3.6/codecs.py:318(decode)
1 0.000 0.000 0.000 0.000 {built-in method io.open}
1 0.000 0.000 0.142 0.142 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
1 0.000 0.000 0.142 0.142 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 /usr/lib/python3.6/_bootlocale.py:23(getpreferredencoding)
1 0.000 0.000 0.000 0.000 {built-in method _csv.reader}
1258003
Thu Jan 3 17:01:30 2019 timeit_2
632452 function calls in 0.169 seconds
Ordered by: internal time
List reduced from 14 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.092 0.092 0.169 0.169 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:40(exam_2)
315100 0.054 0.000 0.054 0.000 {method 'split' of 'str' objects}
315100 0.011 0.000 0.011 0.000 {built-in method builtins.len}
1121 0.010 0.000 0.010 0.000 {built-in method _codecs.utf_8_decode}
1121 0.001 0.000 0.011 0.000 /usr/lib/python3.6/codecs.py:318(decode)
1 0.000 0.000 0.169 0.169 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {built-in method io.open}
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
1 0.000 0.000 0.169 0.169 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {built-in method _locale.nl_langinfo}
1258003
Thu Jan 3 17:01:30 2019 timeit_3
2255 function calls in 0.168 seconds
Ordered by: internal time
List reduced from 15 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.140 0.140 0.151 0.151 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:52(<listcomp>)
1 0.016 0.016 0.168 0.168 <string>:1(<module>)
1121 0.010 0.000 0.010 0.000 {built-in method _codecs.utf_8_decode}
1121 0.001 0.000 0.011 0.000 /usr/lib/python3.6/codecs.py:318(decode)
1 0.000 0.000 0.168 0.168 {built-in method builtins.exec}
1 0.000 0.000 0.151 0.151 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:49(exam_3)
1 0.000 0.000 0.000 0.000 {built-in method io.open}
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
1 0.000 0.000 0.000 0.000 {built-in method _locale.nl_langinfo}
1 0.000 0.000 0.000 0.000 {built-in method _csv.reader}
这样看了,时间上并没有差别很大,但是直接转化为list在内存上浪费挺多的.所以generator的性能要优于list.
实验二
实验二是为了试探列表生成器和append方法的区别.
注释掉profiler的评估结果如下:
1258003
Thu Jan 3 16:38:51 2019 timeit_4
1575356 function calls in 0.342 seconds
Ordered by: internal time
List reduced from 15 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.257 0.257 0.326 0.326 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:59(exam_4)
1258003 0.044 0.000 0.044 0.000 {method 'append' of 'list' objects}
1 0.017 0.017 0.342 0.342 <string>:1(<module>)
315100 0.013 0.000 0.013 0.000 {built-in method builtins.len}
1121 0.011 0.000 0.011 0.000 {built-in method _codecs.utf_8_decode}
1121 0.001 0.000 0.012 0.000 /usr/lib/python3.6/codecs.py:318(decode)
1 0.000 0.000 0.000 0.000 {built-in method io.open}
1 0.000 0.000 0.342 0.342 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
1 0.000 0.000 0.000 0.000 {built-in method _locale.nl_langinfo}
1258003
Thu Jan 3 16:38:52 2019 timeit_5
1890455 function calls in 0.388 seconds
Ordered by: internal time
List reduced from 15 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.215 0.215 0.372 0.372 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:72(exam_5)
315100 0.085 0.000 0.085 0.000 {method 'split' of 'str' objects}
1258003 0.046 0.000 0.046 0.000 {method 'append' of 'list' objects}
1 0.017 0.017 0.388 0.388 <string>:1(<module>)
315100 0.014 0.000 0.014 0.000 {built-in method builtins.len}
1121 0.011 0.000 0.011 0.000 {built-in method _codecs.utf_8_decode}
1121 0.001 0.000 0.012 0.000 /usr/lib/python3.6/codecs.py:318(decode)
1 0.000 0.000 0.388 0.388 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {built-in method io.open}
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
1258003
Thu Jan 3 16:38:52 2019 timeit_6
2255 function calls in 0.167 seconds
Ordered by: internal time
List reduced from 15 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.138 0.138 0.149 0.149 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:88(<listcomp>)
1 0.018 0.018 0.167 0.167 <string>:1(<module>)
1121 0.010 0.000 0.010 0.000 {built-in method _codecs.utf_8_decode}
1121 0.001 0.000 0.011 0.000 /usr/lib/python3.6/codecs.py:318(decode)
1 0.000 0.000 0.167 0.167 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {built-in method io.open}
1 0.000 0.000 0.149 0.149 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:85(exam_6)
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
1 0.000 0.000 0.000 0.000 {built-in method _locale.nl_langinfo}
1 0.000 0.000 0.000 0.000 {built-in method _csv.reader}
可以看到列表生成器的效率要显著优于list的append方法.那么内存方面呢:
1258003
Filename: /home/bai/PycharmProjects/new_query_correction/test/test_generator.py
Line # Mem usage Increment Line Contents
================================================
58 14.7 MiB 14.7 MiB @profile
59 def exam_4():
60 14.7 MiB 0.0 MiB counter = 0
61 14.7 MiB 0.0 MiB info_dict = []
62 14.7 MiB 0.0 MiB with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
63 14.7 MiB 0.0 MiB info = csv.reader(file)
64 122.8 MiB 0.3 MiB for i in info:
65 122.8 MiB 0.0 MiB counter += len(i)
66 122.8 MiB 0.0 MiB for j in i:
67 122.8 MiB 0.3 MiB info_dict.append(j)
68 122.8 MiB 0.0 MiB print(counter)
Thu Jan 3 17:07:30 2019 timeit_4
1606958 function calls (1606085 primitive calls) in 123.793 seconds
Ordered by: internal time
List reduced from 160 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 123.558 123.558 123.765 123.765 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:58(exam_4)
1261904 0.141 0.000 0.141 0.000 {method 'append' of 'list' objects}
317033/316940 0.044 0.000 0.044 0.000 {built-in method builtins.len}
1 0.017 0.017 123.782 123.782 /home/bai/PycharmProjects/new_query_correction/venv/lib/python3.6/site-packages/memory_profiler.py:657(f)
1123 0.015 0.000 0.015 0.000 {built-in method _codecs.utf_8_decode}
1123 0.007 0.000 0.022 0.000 /usr/lib/python3.6/codecs.py:318(decode)
276/2 0.001 0.000 0.003 0.001 /usr/lib/python3.6/sre_compile.py:64(_compile)
186/5 0.001 0.000 0.002 0.000 /usr/lib/python3.6/sre_parse.py:470(_parse)
515 0.001 0.000 0.001 0.000 {built-in method posix.lstat}
515 0.000 0.000 0.001 0.000 /usr/lib/python3.6/posixpath.py:75(join)
1258003
Filename: /home/bai/PycharmProjects/new_query_correction/test/test_generator.py
Line # Mem usage Increment Line Contents
================================================
71 15.1 MiB 15.1 MiB @profile
72 def exam_5():
73 15.1 MiB 0.0 MiB counter = 0
74 15.1 MiB 0.0 MiB info_dict = []
75 15.1 MiB 0.0 MiB with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
76 123.8 MiB 0.3 MiB for info in file:
77 123.8 MiB 0.3 MiB dd = info.split(',')
78 123.8 MiB 0.0 MiB counter += len(dd)
79 123.8 MiB 0.0 MiB for j in dd:
80 123.8 MiB 0.3 MiB info_dict.append(j)
81 123.8 MiB 0.0 MiB print(counter)
Thu Jan 3 17:09:42 2019 timeit_5
1891378 function calls in 132.016 seconds
Ordered by: internal time
List reduced from 82 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 131.596 131.596 131.998 131.998 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:71(exam_5)
315100 0.196 0.000 0.196 0.000 {method 'split' of 'str' objects}
1258008 0.140 0.000 0.140 0.000 {method 'append' of 'list' objects}
315121 0.044 0.000 0.044 0.000 {built-in method builtins.len}
1 0.018 0.018 132.015 132.015 /home/bai/PycharmProjects/new_query_correction/venv/lib/python3.6/site-packages/memory_profiler.py:657(f)
1121 0.015 0.000 0.015 0.000 {built-in method _codecs.utf_8_decode}
1121 0.006 0.000 0.021 0.000 /usr/lib/python3.6/codecs.py:318(decode)
78 0.000 0.000 0.000 0.000 /usr/lib/python3.6/tokenize.py:492(_tokenize)
68 0.000 0.000 0.000 0.000 {method 'match' of '_sre.SRE_Pattern' objects}
1 0.000 0.000 0.000 0.000 /usr/lib/python3.6/inspect.py:935(getblock)
1258003
Filename: /home/bai/PycharmProjects/new_query_correction/test/test_generator.py
Line # Mem usage Increment Line Contents
================================================
84 24.9 MiB 24.9 MiB @profile
85 def exam_6():
86 24.9 MiB 0.0 MiB with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
87 24.9 MiB 0.0 MiB info = csv.reader(file)
88 123.1 MiB 0.3 MiB dd = [j for i in info for j in i]
89
90 123.1 MiB 0.0 MiB counter = len(dd)
91 123.1 MiB 0.0 MiB print(counter)
Thu Jan 3 17:10:36 2019 timeit_6
3006 function calls (3005 primitive calls) in 54.263 seconds
Ordered by: internal time
List reduced from 83 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 54.224 54.224 54.245 54.245 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:88(<listcomp>)
1 0.017 0.017 54.263 54.263 /home/bai/PycharmProjects/new_query_correction/venv/lib/python3.6/site-packages/memory_profiler.py:657(f)
1121 0.014 0.000 0.014 0.000 {built-in method _codecs.utf_8_decode}
1121 0.007 0.000 0.021 0.000 /usr/lib/python3.6/codecs.py:318(decode)
1 0.000 0.000 54.246 54.246 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:84(exam_6)
61 0.000 0.000 0.000 0.000 /usr/lib/python3.6/tokenize.py:492(_tokenize)
54 0.000 0.000 0.000 0.000 {method 'match' of '_sre.SRE_Pattern' objects}
1 0.000 0.000 0.000 0.000 /usr/lib/python3.6/inspect.py:935(getblock)
1 0.000 0.000 0.000 0.000 /home/bai/PycharmProjects/new_query_correction/venv/lib/python3.6/site-packages/memory_profiler.py:754(show_results)
12 0.000 0.000 0.000 0.000 {method 'write' of '_io.TextIOWrapper' objects}
事实上,列表生成器并不能改变数据的大小,但他将数据在同一时间处理,对于内存的分配问题节省了很多时间.所以,加了一个profiler以后有明显的区别.
综上
- 使用generator在处理速度上并不会慢于list(甚至在大数据量的情况下搜索,会快很多),但是可以节省大量的空间,所以如非必要,尽量使用generator.
- 使用列表生成器能节省大量的时间,提高内存分配上的效率,减少CPU的负担.