python库-优快云博客

本文链接：https://blog.youkuaiyun.com/sysleo/article/details/88573140

本文介绍了Python中常用库的使用方法，包括numpy、opencv、os、pandas等，涵盖了图像处理、文件操作、数据处理等多个方面。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

记录一下平时遇到的python库的用法

pixel_array

pixel_array provides more useful pixel data for uncompressed images.The NumPy numerical package must be installed on your system to use this property, because pixel_array returns a NumPy array:

>>> import dicom
>>> ds=dicom.read_file("MR_small.dcm")
>>> ds.pixel_array
array([[ 905, 1019, 1227, ...,  302,  304,  328],
       [ 628,  770,  907, ...,  298,  331,  355],
       [ 498,  566,  706, ...,  280,  285,  320],
       ...,
       [ 334,  400,  431, ..., 1094, 1068, 1083],
       [ 339,  377,  413, ..., 1318, 1346, 1336],
       [ 378,  374,  422, ..., 1369, 1129,  862]], dtype=int16)
>>> ds.pixel_array.shape
(64, 64)

numpy.random.randn()

randn generates an array of shape (d0, d1, …, dn), filled with random floats sampled from a univariate “normal” (Gaussian) distribution of mean 0 and variance 1

>>> np.random.randn()
2.1923875335537315 #random

>>> 2.5 * np.random.randn(2, 4) + 3
array([[-4.49401501,  4.00950034, -1.81814867,  7.29718677],  #random
       [ 0.39924804,  4.68456316,  4.99394529,  4.84057254]]) #random

numpy.stack

numpy.stack(arrays, axis=0, out=None)
if axis=0 it will be the first dimension and if axis=-1 it will be the last dimension.


>>> arrays = [np.random.randn(3, 4) for _ in range(10)]
>>> np.stack(arrays, axis=0).shape
(10, 3, 4)
>>> np.stack(arrays, axis=1).shape
(3, 10, 4)
>>> np.stack(arrays, axis=2).shape
(3, 4, 10)

>>> a = np.array([1, 2, 3])
>>> b = np.array([2, 3, 4])
>>> np.stack((a, b))
array([[1, 2, 3],
       [2, 3, 4]])
>>> np.stack((a, b), axis=-1)

array([[1, 2],
       [2, 3],
       [3, 4]])

numpy.squeeze 函数：

从数组的形状中删除单维度条目，即把shape中为1的维度去掉用法：numpy.squeeze(a,axis = None)

1）a表示输入的数组；
2）axis用于指定需要删除的维度，但是指定的维度必须为单维度，否则将会报错； 
3）axis的取值可为None 或 int 或 tuple of ints, 可选。若axis为空，则删除所有单维度的条目； 
4）返回值：数组 
5)不会修改原数组；

>>> a = e.reshape(1,1,10)
>>> aarray([[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]])
>>> np.squeeze(a)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

cv2.imwrite

cv2.imwrite(“D:\cat2.jpg”, img)

第一个参数是保存的路径及文件名，第二个是图像矩阵。

os.getcwd()

os.getcwd() 方法用于返回当前工作目录。
img_dir = os.path.join(os.getcwd(), “images”) # .jpg

Python open() 函数

open(name[, mode[, buffering]])

参数说明：
name : 一个包含了你要访问的文件名称的字符串值。
mode : mode 决定了打开文件的模式：只读，写入，追加等。所有可取值见如下的完全列表。这个参数是非强制的，默认文件访问模式为只读(r)。
buffering : 如果 buffering 的值被设为 0，就不会有寄存。如果 buffering 的值取 1，访问文件时会寄存行。如果将 buffering 的值设为大于 1 的整数，表明了这就是的寄存区的缓冲大小。如果取负值，寄存区的缓冲大小则为系统默认。

f = open(label_fp, "a")  打开一个文件用于追加。如果该文件已存在，文件指针将会放在文件的结尾。也就是说，新的内容将会被写入到已有内容之后。如果该文件不存在，创建新文件进行写入。

tqdm

https://pypi.python.org/pypi/tqdm

from tqdm import tqdm
for i in tqdm(range(10000)):
     sleep(0.01)
通过一个列表，来生成一个进度条

Panda

Pandas使用一个二维的数据结构DataFrame来表示表格式的数据，相比较于Numpy，Pandas可以存储混合的数据结构，同时使用NaN来表示缺失的数据

annots = pd.read_csv(os.path.join(DATA_DIR, "stage_1_train_labels.csv"))
annots.head()
使用函数head( m )来读取前m条数据，如果没有参数m，默认读取前五条数据

可以使用tolist()函数转化为list

food_info.columns.tolist()

与Numpy一样，用shape属性来显示数据的格式

dimensions = food_info.shape
print(dimensions)

输出：(8618,36) 表示这个表格有8618行和36列的数据，其中dimensions[0]为8618，dimensions[1]为36

pandas.DataFrame.drop_duplicates(subset=None, keep='first', inplace=False)

这个drop_duplicate方法是对DataFrame格式的数据，去除特定列下面的重复行。返回DataFrame格式的数据。

subset : column label or sequence of labels, optional
用来指定特定的列，默认所有列
keep : {‘first’, ‘last’, False}, default ‘first’
删除重复项并保留第一次出现的项
inplace : boolean, default False
是直接在原来数据上修改还是保留一个副本

Series结构介绍

The Series is the primary building block of pandas and represents a one-dimensional labeled array based on the NumPy ndarray;（从书上搬来的，逃~）
大概就是说Series结构是基于NumPy的ndarray结构，是一个一维的标签矩阵（感觉跟python里的字典结构有点像）

import pandas as pd
s=pd.Series([1,2,3,4,5]，index=['a','b','c','f','e'])
print s

使用glob.glob获得文件路径

glob.glob的参数是一个只含有方括号、问号、正斜线的正则表达式，同时也是shell命令

import glob

#获取指定目录下的所有图片
print (glob.glob(r"/home/qiaoyunhao/*/*.png"),"\n")#加上r让字符串不转义

#获取上级目录的所有.py文件
print (glob.glob(r'../*.py')) #相对路径

Python set() 函数

set() 函数创建一个无序不重复元素集，可进行关系测试，删除重复数据，还可以计算交集、差集、并集等。

Python默认模块 os和shutil 实用函数

os.sep    可以取代操作系统特定的路径分隔符。windows下为 '\\'
os.name    字符串指示你正在使用的平台。比如对于Windows，它是'nt'，而对于Linux/Unix用户，它是 'posix'
os.getcwd()    函数得到当前工作目录，即当前Python脚本工作的目录路径
os.getenv()    获取一个环境变量，如果没有返回none
os.putenv(key, value)    设置一个环境变量值
os.listdir(path)    返回指定目录下的所有文件和目录名
os.remove(path)    函数用来删除一个文件
os.system(command)    函数用来运行shell命令
os.linesep    字符串给出当前平台使用的行终止符。例如，Windows使用 '\r\n'，Linux使用 '\n' 而Mac使用 '\r'
os.path.split(path)        函数返回一个路径的目录名和文件名
os.path.isfile()    和os.path.isdir()函数分别检验给出的路径是一个文件还是目录
os.path.exists()    函数用来检验给出的路径是否真地存在
os.curdir        返回当前目录 ('.')
os.mkdir(path)    创建一个目录
os.makedirs(path)    递归的创建目录
os.chdir(dirname)    改变工作目录到dirname          
os.path.getsize(name)    获得文件大小，如果name是目录返回0L
os.path.abspath(name)    获得绝对路径
os.path.normpath(path)    规范path字符串形式
os.path.splitext()        分离文件名与扩展名
os.path.join(path,name)    连接目录与文件名或目录
os.path.basename(path)    返回文件名
os.path.dirname(path)    返回文件路径
os.walk(top,topdown=True,None)        遍历迭代目录
os.rename(src, dst)        重命名file或者directory src到dst 如果dst是一个存在的directory, 将抛出OSError. 在Unix, 如果dst在存且是一个file, 如果用户有权限的话，它将被安静的替换. 操作将会失败在某些Unix 中如果src和dst在不同的文件系统中. 如果成功, 这命名操作将会是一个原子操作 (这是POSIX 需要). 在 Windows上, 如果dst已经存在, 将抛出OSError，即使它是一个文件. 在unix，Windows中有效。
os.renames(old, new)    递归重命名文件夹或者文件。像rename()

shutil.copy( src, dst) 复制一个文件到一个文件或一个目录

shutil 模块

shutil.copyfile( src, dst)    从源src复制到dst中去。当然前提是目标地址是具备可写权限。抛出的异常信息为IOException. 如果当前的dst已存在的话就会被覆盖掉
shutil.move( src, dst)        移动文件或重命名
shutil.copymode( src, dst)    只是会复制其权限其他的东西是不会被复制的
shutil.copystat( src, dst)    复制权限、最后访问时间、最后修改时间
shutil.copy( src, dst)        复制一个文件到一个文件或一个目录
shutil.copy2( src, dst)        在copy上的基础上再复制文件最后访问时间与修改时间也复制过来了，类似于cp –p的东西
shutil.copy2( src, dst)        如果两个位置的文件系统是一样的话相当于是rename操作，只是改名；如果是不在相同的文件系统的话就是做move操作
shutil.copytree( olddir, newdir, True/Flase)
把olddir拷贝一份newdir，如果第3个参数是True，则复制目录时将保持文件夹下的符号连接，如果第3个参数是False，则将在复制的目录下生成物理副本来替代符号连接
shutil.rmtree( src )    递归删除一个目录以及目录内的所有内容

class_id, rcx, rcy, rw, rh = list(map(float, line.strip().split()))

list(map (int, input().strip().split()))

input() waits for the user to type something, and returns it as a string.
input().strip() This will eliminate trailing spaces from the user input, if they are there.
Why is it required? Because some users will do that and that might break your program.
Remember that input() will cast the input as string. So, now we have a string of integers like so: “1 2 4 42”
input.strip().split()
split() is used to create a Python list out of a string. If no delimiter is given, this breaks the string by spaces. So, now we have: [“1”, “2”, “4”, “42”]
map(int, input().strip().split())
map() takes two arguments. The first one is the method to apply, the second one is the data to apply it to. By this understanding, we can see this is doing nothing but typecasting every element of the list to an integer value.
list(map(int, input().strip().split()))
list() converts its argument to a list.
This is to work around a difference between python 2 and 3: In 2, map returns a list (so converting it to a list is redundant but harmless). In 3, map returns an iterator: An object you can repeatedly asked for its next value.