引入
在实际项目中,常需要批量处理某些文件夹下的子文件夹/文件。因此就需要遍历文件夹下所有的文件。
若只想读取特定的文件,就是在遍历过程中加一个条件判断即可,比如限定只访问特定后缀名的文件。
在python中,可通过os模块的os.walk()轻松实现对文件夹的遍历。
os.walk()
接口的基本使用:
- 传入参数top,即需要遍历的根文件夹
- 返回结果是一个生成器,可通过for循环去迭代其中的元素,就是一个从上至下的目录树。
Signature: os.walk(top, topdown=True, onerror=None, followlinks=False)
Docstring:
Directory tree generator.
For each directory in the directory tree rooted at top (including top
itself, but excluding '.' and '..'), yields a 3-tuple
dirpath, dirnames, filenames
dirpath is a string, the path to the directory. dirnames is a list of
the names of the subdirectories in dirpath (excluding '.' and '..').
filenames is a list of the names of the non-directory files in dirpath.
Note that the names in the lists are just names, with no path components.
To get a full path (which begins with top) to a file or directory in
dirpath, do os.path.join(dirpath, name).
实例1
功能:获取某一文件夹下的所有文件名
import os
def get_all_file(dir_name):
fullname_list, filename_list = [], []
for root, dirs, files in os.walk(dir_name):
for filename in files:
# 文件名列表,包含完整路径
fullname_list.append(os.path.join(root, filename))
# # 文件名列表,只包含文件名
filename_list.append(filename)
return fullname_list, filename_list
实例2
功能:打印某一文件夹下的所有文件/子文件夹信息
import os
from os.path import join, getsize
dir_name = 'C:\Users\Public'
for root, dirs, files in os.walk(dir_name):
# print(root, dirs, files)
info = f'''
{root} contains {len(files)} non-directory files, they are {files}
contains {len(dirs)} sub-directories, they are {dirs}
'''
print(info)
654





