使用mem_profiler和cprofiler分析Python中的list和generator

探讨Python中使用generator和list在性能上的区别,通过实验对比发现generator在处理大数据量时,不仅速度不逊于list,还能大幅节省内存。同时,列表生成器相较于list的append方法,在时间和内存分配上更具优势。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

在Python中,mem_profiler可以用来评估内存的使用,而cprofiler结合pstats可以用来分析代码运行的时间.从这两点出发,我们可以用来评估Python代码的效率.
Talk is cheap, show me the code.
这里用简单的两个操作,一个评估只统计,一个评估整合并统计.从两个角度来梳理list和generator在不同场景下性能的区别.这里的779.csv是一个8.8M大小的文件.在试验中,列表统计结果是一个122M左右的list.
在这里插入图片描述

@profile
def exam_1():
    counter = 0
    with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
        info = csv.reader(file)
        for i in info:
            counter += len(i)
    print(counter)


@profile
def exam_2():
    counter = 0
    with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
        for info in file:
            counter += len(info.split(','))
    print(counter)


@profile
def exam_3():
    with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
        info = csv.reader(file)
        dd = [j for i in info for j in i]

    counter = len(dd)
    print(counter)


@profile
def exam_4():
    counter = 0
    info_dict = []
    with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
        info = csv.reader(file)
        for i in info:
            counter += len(i)
            for j in i:
                info_dict.append(j)
    print(counter)


@profile
def exam_5():
    counter = 0
    info_dict = []
    with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
        for info in file:
            dd = info.split(',')
            counter += len(dd)
            for j in dd:
                info_dict.append(j)
    print(counter)


@profile
def exam_6():
    with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
        info = csv.reader(file)
        dd = [j for i in info for j in i]

    counter = len(dd)
    print(counter)
    
def cprofiler_exam_1():
    cProfile.run("exam_1()", "timeit_1")
    p = pstats.Stats('timeit_1')
    p.sort_stats('time')
    p.print_stats(10)
    cProfile.run("exam_2()", "timeit_2")
    p = pstats.Stats('timeit_2')
    p.sort_stats('time')
    p.print_stats(10)
    cProfile.run("exam_3()", "timeit_3")
    p = pstats.Stats('timeit_3')
    p.sort_stats('time')
    p.print_stats(10)


def cprofiler_exam_2():
    cProfile.run("exam_4()", "timeit_4")
    p = pstats.Stats('timeit_4')
    p.sort_stats('time')
    p.print_stats(10)
    cProfile.run("exam_5()", "timeit_5")
    p = pstats.Stats('timeit_5')
    p.sort_stats('time')
    p.print_stats(10)
    cProfile.run("exam_6()", "timeit_6")
    p = pstats.Stats('timeit_6')
    p.sort_stats('time')
    p.print_stats(10)

实验结果


第一个实验的评估结果是:

1258003
Filename: /home/admin/PycharmProjects/new_query_correction/test/test_generator.py

Line #    Mem usage    Increment   Line Contents
================================================
    29     14.6 MiB     14.6 MiB   @profile
    30                             def exam_1():
    31     14.6 MiB      0.0 MiB       counter = 0
    32     14.6 MiB      0.0 MiB       with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
    33     14.6 MiB      0.0 MiB           info = csv.reader(file)
    34     14.6 MiB      0.0 MiB           for i in info:
    35     14.6 MiB      0.0 MiB               counter += len(i)
    36     14.6 MiB      0.0 MiB       print(counter)


Thu Jan  3 16:38:57 2019    timeit_1

         348745 function calls (347872 primitive calls) in 22.572 seconds

   Ordered by: internal time
   List reduced from 160 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   22.493   22.493   22.561   22.561 /home/admin/PycharmProjects/new_query_correction/test/test_generator.py:29(exam_1)
317030/316937    0.047    0.000    0.047    0.000 {built-in method builtins.len}
     1123    0.014    0.000    0.014    0.000 {built-in method _codecs.utf_8_decode}
     1123    0.006    0.000    0.020    0.000 /usr/lib/python3.6/codecs.py:318(decode)
    276/2    0.001    0.000    0.003    0.001 /usr/lib/python3.6/sre_compile.py:64(_compile)
    186/5    0.001    0.000    0.002    0.000 /usr/lib/python3.6/sre_parse.py:470(_parse)
      515    0.001    0.000    0.001    0.000 {built-in method posix.lstat}
      515    0.000    0.000    0.001    0.000 /usr/lib/python3.6/posixpath.py:75(join)
      668    0.000    0.000    0.001    0.000 /usr/lib/python3.6/enum.py:803(__and__)
      217    0.000    0.000    0.001    0.000 /usr/lib/python3.6/posixpath.py:331(normpath)


1258003
Filename: /home/admin/PycharmProjects/new_query_correction/test/test_generator.py

Line #    Mem usage    Increment   Line Contents
================================================
    39     14.9 MiB     14.9 MiB   @profile
    40                             def exam_2():
    41     14.9 MiB      0.0 MiB       counter = 0
    42     14.9 MiB      0.0 MiB       with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
    43     14.9 MiB      0.0 MiB           for info in file:
    44     14.9 MiB      0.0 MiB               counter += len(info.split(','))
    45     14.9 MiB      0.0 MiB       print(counter)


Thu Jan  3 16:39:20 2019    timeit_2

         633149 function calls in 22.116 seconds

   Ordered by: internal time
   List reduced from 98 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   21.897   21.897   22.116   22.116 /home/admin/PycharmProjects/new_query_correction/test/test_generator.py:39(exam_2)
   315100    0.172    0.000    0.172    0.000 {method 'split' of 'str' objects}
   315116    0.026    0.000    0.026    0.000 {built-in method builtins.len}
     1123    0.014    0.000    0.014    0.000 {built-in method _codecs.utf_8_decode}
     1123    0.006    0.000    0.020    0.000 /usr/lib/python3.6/codecs.py:318(decode)
       54    0.000    0.000    0.000    0.000 /usr/lib/python3.6/tokenize.py:492(_tokenize)
       49    0.000    0.000    0.000    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
        1    0.000    0.000    0.000    0.000 /usr/lib/python3.6/inspect.py:935(getblock)
        2    0.000    0.000    0.000    0.000 {built-in method io.open}
        1    0.000    0.000   22.116   22.116 {built-in method builtins.exec}


1258003
Filename: /home/admin/PycharmProjects/new_query_correction/test/test_generator.py

Line #    Mem usage    Increment   Line Contents
================================================
    48     14.9 MiB     14.9 MiB   @profile
    49                             def exam_3():
    50     14.9 MiB      0.0 MiB       with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
    51     14.9 MiB      0.0 MiB           info = csv.reader(file)
    52    123.1 MiB      0.3 MiB           dd = [j for i in info for j in i]
    53                             
    54    123.1 MiB      0.0 MiB       counter = len(dd)
    55    123.1 MiB      0.0 MiB       print(counter)


Thu Jan  3 16:40:13 2019    timeit_3

         3019 function calls (3018 primitive calls) in 53.355 seconds

   Ordered by: internal time
   List reduced from 84 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   53.316   53.316   53.337   53.337 /home/admin/PycharmProjects/new_query_correction/test/test_generator.py:52(<listcomp>)
        1    0.017    0.017   53.354   53.354 /home/admin/PycharmProjects/new_query_correction/venv/lib/python3.6/site-packages/memory_profiler.py:657(f)
     1121    0.014    0.000    0.014    0.000 {built-in method _codecs.utf_8_decode}
     1121    0.007    0.000    0.021    0.000 /usr/lib/python3.6/codecs.py:318(decode)
        1    0.000    0.000   53.337   53.337 /home/admin/PycharmProjects/new_query_correction/test/test_generator.py:48(exam_3)
       63    0.000    0.000    0.000    0.000 /usr/lib/python3.6/tokenize.py:492(_tokenize)
       54    0.000    0.000    0.000    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
        1    0.000    0.000    0.000    0.000 /usr/lib/python3.6/inspect.py:935(getblock)
        1    0.000    0.000    0.000    0.000 /home/admin/PycharmProjects/new_query_correction/venv/lib/python3.6/site-packages/memory_profiler.py:754(show_results)

看起来似乎是最后一个任务花的时间最长.然而这个时间主要是花在了profiler的统计上面.注释掉profiler,时间如下:

	/home/bai/PycharmProjects/new_query_correction/venv/bin/python /home/bai/PycharmProjects/new_query_correction/test/test_generator.py
1258003
Thu Jan  3 17:01:30 2019    timeit_1

         317353 function calls in 0.142 seconds

   Ordered by: internal time
   List reduced from 14 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.121    0.121    0.142    0.142 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:30(exam_1)
   315100    0.010    0.000    0.010    0.000 {built-in method builtins.len}
     1121    0.010    0.000    0.010    0.000 {built-in method _codecs.utf_8_decode}
     1121    0.001    0.000    0.011    0.000 /usr/lib/python3.6/codecs.py:318(decode)
        1    0.000    0.000    0.000    0.000 {built-in method io.open}
        1    0.000    0.000    0.142    0.142 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    0.142    0.142 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 /usr/lib/python3.6/_bootlocale.py:23(getpreferredencoding)
        1    0.000    0.000    0.000    0.000 {built-in method _csv.reader}


1258003
Thu Jan  3 17:01:30 2019    timeit_2

         632452 function calls in 0.169 seconds

   Ordered by: internal time
   List reduced from 14 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.092    0.092    0.169    0.169 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:40(exam_2)
   315100    0.054    0.000    0.054    0.000 {method 'split' of 'str' objects}
   315100    0.011    0.000    0.011    0.000 {built-in method builtins.len}
     1121    0.010    0.000    0.010    0.000 {built-in method _codecs.utf_8_decode}
     1121    0.001    0.000    0.011    0.000 /usr/lib/python3.6/codecs.py:318(decode)
        1    0.000    0.000    0.169    0.169 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method io.open}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    0.169    0.169 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {built-in method _locale.nl_langinfo}


1258003
Thu Jan  3 17:01:30 2019    timeit_3

         2255 function calls in 0.168 seconds

   Ordered by: internal time
   List reduced from 15 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.140    0.140    0.151    0.151 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:52(<listcomp>)
        1    0.016    0.016    0.168    0.168 <string>:1(<module>)
     1121    0.010    0.000    0.010    0.000 {built-in method _codecs.utf_8_decode}
     1121    0.001    0.000    0.011    0.000 /usr/lib/python3.6/codecs.py:318(decode)
        1    0.000    0.000    0.168    0.168 {built-in method builtins.exec}
        1    0.000    0.000    0.151    0.151 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:49(exam_3)
        1    0.000    0.000    0.000    0.000 {built-in method io.open}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    0.000    0.000 {built-in method _locale.nl_langinfo}
        1    0.000    0.000    0.000    0.000 {built-in method _csv.reader}

这样看了,时间上并没有差别很大,但是直接转化为list在内存上浪费挺多的.所以generator的性能要优于list.

实验二


实验二是为了试探列表生成器和append方法的区别.
注释掉profiler的评估结果如下:

1258003
Thu Jan  3 16:38:51 2019    timeit_4

         1575356 function calls in 0.342 seconds

   Ordered by: internal time
   List reduced from 15 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.257    0.257    0.326    0.326 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:59(exam_4)
  1258003    0.044    0.000    0.044    0.000 {method 'append' of 'list' objects}
        1    0.017    0.017    0.342    0.342 <string>:1(<module>)
   315100    0.013    0.000    0.013    0.000 {built-in method builtins.len}
     1121    0.011    0.000    0.011    0.000 {built-in method _codecs.utf_8_decode}
     1121    0.001    0.000    0.012    0.000 /usr/lib/python3.6/codecs.py:318(decode)
        1    0.000    0.000    0.000    0.000 {built-in method io.open}
        1    0.000    0.000    0.342    0.342 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    0.000    0.000 {built-in method _locale.nl_langinfo}


1258003
Thu Jan  3 16:38:52 2019    timeit_5

         1890455 function calls in 0.388 seconds

   Ordered by: internal time
   List reduced from 15 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.215    0.215    0.372    0.372 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:72(exam_5)
   315100    0.085    0.000    0.085    0.000 {method 'split' of 'str' objects}
  1258003    0.046    0.000    0.046    0.000 {method 'append' of 'list' objects}
        1    0.017    0.017    0.388    0.388 <string>:1(<module>)
   315100    0.014    0.000    0.014    0.000 {built-in method builtins.len}
     1121    0.011    0.000    0.011    0.000 {built-in method _codecs.utf_8_decode}
     1121    0.001    0.000    0.012    0.000 /usr/lib/python3.6/codecs.py:318(decode)
        1    0.000    0.000    0.388    0.388 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method io.open}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}


1258003
Thu Jan  3 16:38:52 2019    timeit_6

         2255 function calls in 0.167 seconds

   Ordered by: internal time
   List reduced from 15 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.138    0.138    0.149    0.149 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:88(<listcomp>)
        1    0.018    0.018    0.167    0.167 <string>:1(<module>)
     1121    0.010    0.000    0.010    0.000 {built-in method _codecs.utf_8_decode}
     1121    0.001    0.000    0.011    0.000 /usr/lib/python3.6/codecs.py:318(decode)
        1    0.000    0.000    0.167    0.167 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method io.open}
        1    0.000    0.000    0.149    0.149 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:85(exam_6)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    0.000    0.000 {built-in method _locale.nl_langinfo}
        1    0.000    0.000    0.000    0.000 {built-in method _csv.reader}

可以看到列表生成器的效率要显著优于list的append方法.那么内存方面呢:

1258003
Filename: /home/bai/PycharmProjects/new_query_correction/test/test_generator.py

Line #    Mem usage    Increment   Line Contents
================================================
    58     14.7 MiB     14.7 MiB   @profile
    59                             def exam_4():
    60     14.7 MiB      0.0 MiB       counter = 0
    61     14.7 MiB      0.0 MiB       info_dict = []
    62     14.7 MiB      0.0 MiB       with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
    63     14.7 MiB      0.0 MiB           info = csv.reader(file)
    64    122.8 MiB      0.3 MiB           for i in info:
    65    122.8 MiB      0.0 MiB               counter += len(i)
    66    122.8 MiB      0.0 MiB               for j in i:
    67    122.8 MiB      0.3 MiB                   info_dict.append(j)
    68    122.8 MiB      0.0 MiB       print(counter)


Thu Jan  3 17:07:30 2019    timeit_4

         1606958 function calls (1606085 primitive calls) in 123.793 seconds

   Ordered by: internal time
   List reduced from 160 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1  123.558  123.558  123.765  123.765 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:58(exam_4)
  1261904    0.141    0.000    0.141    0.000 {method 'append' of 'list' objects}
317033/316940    0.044    0.000    0.044    0.000 {built-in method builtins.len}
        1    0.017    0.017  123.782  123.782 /home/bai/PycharmProjects/new_query_correction/venv/lib/python3.6/site-packages/memory_profiler.py:657(f)
     1123    0.015    0.000    0.015    0.000 {built-in method _codecs.utf_8_decode}
     1123    0.007    0.000    0.022    0.000 /usr/lib/python3.6/codecs.py:318(decode)
    276/2    0.001    0.000    0.003    0.001 /usr/lib/python3.6/sre_compile.py:64(_compile)
    186/5    0.001    0.000    0.002    0.000 /usr/lib/python3.6/sre_parse.py:470(_parse)
      515    0.001    0.000    0.001    0.000 {built-in method posix.lstat}
      515    0.000    0.000    0.001    0.000 /usr/lib/python3.6/posixpath.py:75(join)


1258003
Filename: /home/bai/PycharmProjects/new_query_correction/test/test_generator.py

Line #    Mem usage    Increment   Line Contents
================================================
    71     15.1 MiB     15.1 MiB   @profile
    72                             def exam_5():
    73     15.1 MiB      0.0 MiB       counter = 0
    74     15.1 MiB      0.0 MiB       info_dict = []
    75     15.1 MiB      0.0 MiB       with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
    76    123.8 MiB      0.3 MiB           for info in file:
    77    123.8 MiB      0.3 MiB               dd = info.split(',')
    78    123.8 MiB      0.0 MiB               counter += len(dd)
    79    123.8 MiB      0.0 MiB               for j in dd:
    80    123.8 MiB      0.3 MiB                   info_dict.append(j)
    81    123.8 MiB      0.0 MiB       print(counter)


Thu Jan  3 17:09:42 2019    timeit_5

         1891378 function calls in 132.016 seconds

   Ordered by: internal time
   List reduced from 82 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1  131.596  131.596  131.998  131.998 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:71(exam_5)
   315100    0.196    0.000    0.196    0.000 {method 'split' of 'str' objects}
  1258008    0.140    0.000    0.140    0.000 {method 'append' of 'list' objects}
   315121    0.044    0.000    0.044    0.000 {built-in method builtins.len}
        1    0.018    0.018  132.015  132.015 /home/bai/PycharmProjects/new_query_correction/venv/lib/python3.6/site-packages/memory_profiler.py:657(f)
     1121    0.015    0.000    0.015    0.000 {built-in method _codecs.utf_8_decode}
     1121    0.006    0.000    0.021    0.000 /usr/lib/python3.6/codecs.py:318(decode)
       78    0.000    0.000    0.000    0.000 /usr/lib/python3.6/tokenize.py:492(_tokenize)
       68    0.000    0.000    0.000    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
        1    0.000    0.000    0.000    0.000 /usr/lib/python3.6/inspect.py:935(getblock)


1258003
Filename: /home/bai/PycharmProjects/new_query_correction/test/test_generator.py

Line #    Mem usage    Increment   Line Contents
================================================
    84     24.9 MiB     24.9 MiB   @profile
    85                             def exam_6():
    86     24.9 MiB      0.0 MiB       with open("../data/target/cat_data/brandgood_detail_779.csv") as file:
    87     24.9 MiB      0.0 MiB           info = csv.reader(file)
    88    123.1 MiB      0.3 MiB           dd = [j for i in info for j in i]
    89                             
    90    123.1 MiB      0.0 MiB       counter = len(dd)
    91    123.1 MiB      0.0 MiB       print(counter)


Thu Jan  3 17:10:36 2019    timeit_6

         3006 function calls (3005 primitive calls) in 54.263 seconds

   Ordered by: internal time
   List reduced from 83 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   54.224   54.224   54.245   54.245 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:88(<listcomp>)
        1    0.017    0.017   54.263   54.263 /home/bai/PycharmProjects/new_query_correction/venv/lib/python3.6/site-packages/memory_profiler.py:657(f)
     1121    0.014    0.000    0.014    0.000 {built-in method _codecs.utf_8_decode}
     1121    0.007    0.000    0.021    0.000 /usr/lib/python3.6/codecs.py:318(decode)
        1    0.000    0.000   54.246   54.246 /home/bai/PycharmProjects/new_query_correction/test/test_generator.py:84(exam_6)
       61    0.000    0.000    0.000    0.000 /usr/lib/python3.6/tokenize.py:492(_tokenize)
       54    0.000    0.000    0.000    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
        1    0.000    0.000    0.000    0.000 /usr/lib/python3.6/inspect.py:935(getblock)
        1    0.000    0.000    0.000    0.000 /home/bai/PycharmProjects/new_query_correction/venv/lib/python3.6/site-packages/memory_profiler.py:754(show_results)
       12    0.000    0.000    0.000    0.000 {method 'write' of '_io.TextIOWrapper' objects}

事实上,列表生成器并不能改变数据的大小,但他将数据在同一时间处理,对于内存的分配问题节省了很多时间.所以,加了一个profiler以后有明显的区别.

综上

  1. 使用generator在处理速度上并不会慢于list(甚至在大数据量的情况下搜索,会快很多),但是可以节省大量的空间,所以如非必要,尽量使用generator.
  2. 使用列表生成器能节省大量的时间,提高内存分配上的效率,减少CPU的负担.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值