1 在数组内部使用条件逻辑(使用where)
假设我们有两个实数值数组: xarr和yarr,和一个布尔值数组 cond
In [140]: xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
In [141]: yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
In [142]: cond = np.array([True, False, True, True, False])
根据cond数组中的条件分别取xarr和yarr中的值
In [143]: result = [(x if c else y) .....:
for x, y, c in zip(xarr, yarr, cond)]
In [144]: result Out[144]: [1.1000000000000001, 2.2000000000000002, 1.3, 1.3999999999999999, 2.5]
上式中有两个问题:(1)对于大数组速度较慢(2)不能使用在多维数组中
使用numpy的where
In [145]: result = np.where(cond, xarr, yarr)
In [146]: result Out[146]: array([ 1.1, 2.2, 1.3, 1.4, 2.5])
where中的第2个和第3个条件可以不必是数组,也可以是纯量(单个数值)
In [147]: arr = randn(4, 4)
In [148]: arr
Out[148]: array([[ 0.6372, 2.2043, 1.7904, 0.0752],
[-1.5926, -1.1536, 0.4413, 0.3483],
[-0.1798, 0.3299, 0.7827, -0.7585],
[ 0.5857, 0.1619, 1.3583, -1.3865]])
In [149]: np.where(arr > 0, 2, -2)
Out[149]: array([[ 2, 2, 2, 2],
[-2, -2, 2, 2],
[-2, 2, 2, -2],
[ 2, 2, 2, -2]])
# set only positive values to 2
In [150]: np.where(arr > 0, 2, arr)
Out[150]: array([[ 2. , 2. , 2. , 2. ],
[-1.5926, -1.1536, 2. , 2. ],
[-0.1798, 2. , 2. , -0.7585],
[ 2. , 2. , 2. , -1.3865]])
使用where可以编写更复杂的逻辑(1默认等同于True ,0默认等同于False)
result = [] for i in range(n):
if cond1[i] and cond2[i]:
result.append(0)
elif cond1[i]:
result.append(1)
elif cond2[i]:
result.append(2)
else:
result.append(3)
上式等同于下式
np.where(cond1 & cond2, 0,
np.where(cond1, 1,
np.where(cond2, 2, 3)))
2 数学统计方法
例如:均值函数mean,求和函数sum
In [151]: arr = np.random.randn(5, 4)
In [152]: arr.mean()
Out[152]: 0.062814911084854597
In [153]: np.mean(arr)
Out[153]: 0.062814911084854597
In [154]: arr.sum()
Out[154]: 1.2562982216970919
均值函数mean和求和函数sum都有一个可选的参数axis,使用如下:
In [155]: arr.mean(axis=1)
Out[155]: array([-1.2833, 0.2844, 0.6574, 0.6743, -0.0187])
In [156]: arr.sum(0)
Out[156]: array([-3.1003, -1.6189, 1.4044, 4.5712])
基本的统计函数:
Method Description
sum Sum of all the elements in the array or along an axis. Zero-length arrays have sum 0.
mean Arithmetic mean. Zero-length arrays have NaN mean.
std, var Standard deviation and variance, respectively, with optional degrees of freedom adjust- ment (default denominator n).
min, max Minimum and maximum.
argmin, argmax Indices of minimum and maximum elements, respectively.
cumsum Cumulative sum of elements starting from 0 cumprod
Cumulative product of elements starting from 1
布尔数组的方法
对于布尔数组而言,sum方法只会统计元素等于True的个数
其他的方法有:any和all(都是针对数组值是否为True而言),用法如下:
In [162]: bools = np.array([False, False, True, False])
In [163]: bools.any()
Out[163]: True
In [164]: bools.all()
Out[164]: False
排序
对于一维数组
In [165]: arr = randn(8)
In [166]: arr
Out[166]: array([ 0.6903, 0.4678, 0.0968, -0.1349, 0.9879, 0.0185, -1.3147, -0.5425])
In [167]: arr.sort()
In [168]: arr
Out[168]: array([-1.3147, -0.5425, -0.1349, 0.0185, 0.0968, 0.4678, 0.6903, 0.9879])
对于二维数组
In [169]: arr = randn(5, 3)
In [170]: arr
Out[170]: array([[-0.7139, -1.6331, -0.4959],
[ 0.8236, -1.3132, -0.1935],
[-1.6748, 3.0336, -0.863 ],
[-0.3161, 0.5362, -2.468 ],
[ 0.9058, 1.1184, -1.0516]])
In [171]: arr.sort(1)
In [172]: arr
Out[172]: array([[-1.6331, -0.7139, -0.4959],
[-1.3132, -0.1935, 0.8236],
[-1.6748, -0.863 , 3.0336],
[-2.468 , -0.3161, 0.5362],
[-1.0516, 0.9058, 1.1184]])
3 唯一性(相当于SQL的distinct)和其他集合逻辑
<strong> In [176]: names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
In [177]: np.unique(names) Out[177]:
array(['Bob', 'Joe', 'Will'], dtype='|S4')
</strong>
上式等同于下式:
In [180]: sorted(set(names))
Out[180]: ['Bob', 'Joe', 'Will']
numpy的in1d方法用来检验某数组中元素是否全部来源于后面的一个数组
In [181]: values = np.array([6, 0, 0, 3, 2, 5, 6])
In [182]: np.in1d(values, [2, 3, 6])
Out[182]: array([ True, False, False, True, True, False, True], dtype=bool)
对集合的操作类型及描述:
Method Description
unique(x) Compute the sorted, unique elements in x
intersect1d(x, y) Compute the sorted, common elements in x and y
union1d(x, y) Compute the sorted union of elements
in1d(x, y) Compute a boolean array indicating whether each element of x is contained in y
setdiff1d(x, y) Set difference, elements in x that are not in y
setxor1d(x, y) Set symmetric differences; elements that are in either of the arrays, but not both
4 线性代数操作
如:矩阵乘法,矩阵分解,行列式运算。
例如 numpy的dot方法就是计算两个矩阵的乘法,用法如下:
In [194]: x = np.array([[1., 2., 3.], [4., 5., 6.]])
In [195]: y = np.array([[6., 23.], [-1, 7], [8, 9]])
In [198]: x.dot(y) # equivalently np.dot(x, y)
Out[198]: array([[ 28., 64.], [ 67., 181.]])
numpy.linalg是一个矩阵运算的相关函数方法集合,其实现了Fortran工业标准的函数库,这也是 MATLAB and R, BLAS, LA- PACK等语言所使用的。
使用实例如下:
In [201]: from numpy.linalg import inv, qr
In [202]: X = randn(5, 5)
In [203]: mat = X.T.dot(X)
In [204]: inv(mat)
Out[204]: array([[ 3.0361, -0.1808, -0.6878, -2.8285, -1.1911],
[-0.1808, 0.5035, 0.1215, 0.6702, 0.0956],
[-0.6878, 0.1215, 0.2904, 0.8081, 0.3049],
[-2.8285, 0.6702, 0.8081, 3.4152, 1.1557],
[-1.1911, 0.0956, 0.3049, 1.1557, 0.6051]])
In [206]: q, r = qr(mat)
In [207]: r
Out[207]: array([[ -6.9271, 7.389 , 6.1227, -7.1163, -4.9215],
[ 0. , -3.9735, -0.8671, 2.9747, -5.7402],
[ 0. , 0. , -10.2681, 1.8909, 1.6079],
[ 0. , 0. , 0. , -1.2996, 3.3577],
[ 0. , 0. , 0. , 0. , 0.5571]])
相关方法如下:
Function Description
diag Return the diagonal (or off-diagonal) elements of a square matrix as a 1D array, or convert a 1D
array into a square matrix with zeros on the off-diagonal
dot Matrix multiplication
trace Compute the sum of the diagonal elements
det Compute the matrix determinant
eig Compute the eigenvalues and eigenvectors of a square matrix
inv Compute the inverse of a square matrix
pinv Compute the Moore-Penrose pseudo-inverse inverse of a square matrix
qr Compute the QR decomposition
svd Compute the singular value decomposition (SVD)
solve Solve the linear system Ax = b for x, where A is a square matrix
lstsq Compute the least-squares solution to y = Xb