NumPy Basics: Arrays and Vectorized Computation
NumPy, short for Numerical Python, is the fundamental package required for high
performance scientific computing and data analysis.
- ndarray
- mathematical functions for fast operations on entire arrays of data without having to write loop
- Tools for reading data form disk
- Linear Algebra, random number generation, Fourier transformation
- Tools for interrating code wiritten in C, C++, Fortran
基本设置
%matplotlib inline
from __future__ import division
from numpy.random import randn
import numpy as np
np.set_printoptions(precision=4, suppress=True)
NumPy ndarray: A Multidimensional Array Object
基本使用
data = randn(2, 3)
data *10
data + data
data.shape
data.dtype
Creating ndarray
Array
它能接受任何序列, 然后创建一个NumPy array,包含输入的序列zeros and ones
zeros 和 ones创建对应shape的array, 而且分别全为0,1.empty
empty creats an array without initializing its values to any particular valuearange
arange 将range变为对应的array
#array
data1= [6,7.5,8,0,1]
arr1 = np.array(data1)
#二维序列 nested sequences
data2 = [[1,2,3,4],[5,6,7,8]]
arr2 = np.array(data2)
#zeros, ones
a1 = np.zeros(10)
a2 = np.ones((2,3))
#empty
np.empty(10)
#arange
np.arange(15)
Function | Description |
---|---|
array | Convert input data (list, tuple, array, or other sequence type) to an ndarray either by inferring a dtype or explicitly specifying a dtype. Copies the input data by default. |
asarray | Convert input to ndarray, but do not copy if the input is already an ndarray |
arange | Like the built-in range but returns an ndarray instead of a list. |
ones, ones_like | Produce an array of all 1’s with the given shape and dtype. ones_like takes another array and produces a ones array of the same shape and dtype. |
zeros, zeros_like | Like ones and ones_like but producing arrays of 0’s instead |
empty, empty_like | Create new arrays by allocating new memory, but do not populate with any values like ones and zeros |
eye, identity | Create a square N x N identity matrix (1’s on the diagonal and 0’s elsewhere) |
Data Types for ndarrays
主要时用于计算memory大小的,后面数字表示bit位数, double(float)8字节,所以要64bits
arr1 = np.array([1,2,3],dtype = np.float64)
arr2 = np.array([1,2,3],dtype = np.int32)
arr1.dtype
arr2.dtype
casting dtypes between different arrays
类型给定方法:
1. 初始化时默认给定
2. 初始化时给定
3. arr.astype(给定dtype,或这另一个arr2.dtype)
astype always creates a new array,不论类型有没有被改变
#1. 初始化默认给定
arr = np.arange(1,6)
#2. 初始化是给定
numeric_strings = np.array(['1.25','-9.6','42'],dtype = np.string_)
#3. 改变数据类型
float_arr = arr.astype(np.float64) #cast int64 to float64
numeric_strings.astype(float)
#if cast fail for some reason, a TypeError will be raised,
# Numpy is smart enough to alias Python types to equivalent dtypes
# arr2.dtype
arr1 = np.arange(10)
arr2 = randn(2,3)
arr1.astype(arr2.dtype),arr1.dtype
Operations between Arrays and Scalars
和R, Matlab一致,
所有的*, + ,-,/是对应元素间的操作
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr
#二元运算符
arr + arr
arr - arr
arr * arr
arr / arr
#一元运算符
1 / arr
arr ** 0.5
Bacis Indexing and Sclicing
One dimension
Array slices are views on the original array,
and any modifications to the view will be reflected in the source array.
arr = np.arange(10)
arr
arr[5]
arr[5:8]
arr[5:8] = 12
arr
arr_slice = arr[5:8]
arr_slice[1] = 12345
arr
arr_slice[:] = 64
arr
copy of the slice of the array
arr[5:8].copy()
arr_slice_copy = arr[5:8].copy()
arr_slice_copy[1] = 1
arr_slice_copy
arr
Higher Dimension
The elements at each index are no longer scalars but rather corresponding arrays
arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2d[2]
arr2d[0][2],arr2d[0,2]
arr3d = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])
arr3d
arr3d.shape
arr3d[0]
arr3d[0] = 42
arr3d[1, 0]
Indexing with slices
view of original array
arr[1:6]
arr2d
# 仅有一个表示行
arr2d[:2]
# 两个则分别表示行和列
arr2d[:2, 1:]
arr2d[1, :2]
Boolean Indexing
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = randn(7, 4)
names
data
names == 'Bob'
data[names == 'Bob']
data[names == 'Bob', 2:]
data[names == 'Bob', 3]
mask = (names == 'Bob') | (names == 'Will')
#do not support keywords and, or
mask
data[mask]
data[data<0] = 0
data
data[names!='Joe'] = 7
data
Fancy Indexing
Indexing using integer arrays
arr = np.empty((8, 4))
for i in range(8):
arr[i] = i
arr
arr[[4, 3, 0, 6]]
arr[[-3,-5,-7]]
arr = np.arange(32).reshape((8, 4))
arr
arr[[1, 5, 7, 2], [0, 3, 1, 2]]
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]
arr[np.ix_([1, 5, 7, 2], [0, 3, 1, 2])]
Transposing arrays and swapping axes
arr = np.arange(15).reshape((3, 5))
arr
arr.T
arr = np.random.randn(6, 3)
np.dot(arr.T, arr)
transpose(), swapaxes()暂时用不到
Universal Functions: Element-wise Array Functions
一些快速的函数,element-wise的函数
arr = np.arange(10)
np.sqrt(arr)
np.exp(arr)
参数为多个array
x = randn(8)
y = randn(8)
x
y
np.maximum(x, y) # element-wise maximum
返回多个值
arr = randn(7) * 5
np.modf(arr)
Uinary functions
Function | Description |
---|---|
abs, fabs | Compute the absolute value element-wise for integer, floating point, or complex values. Use fabs as a faster alternative for non-complex-valued data |
sqrt | Compute the square root of each element. Equivalent to arr ** 0.5 |
square | Compute the square of each element. Equivalent to arr ** 2 |
exp | Compute the exponent e x of each element |
log, log10, log2, log1p | Natural logarithm (base e), log base 10, log base 2, and log(1 + x), respectively |
sign | Compute the sign of each element: 1 (positive), 0 (zero), or -1 (negative) |
ceil | Compute the ceiling of each element, i.e. the smallest integer greater than or equal to each element |
floor | Compute the floor of each element, i.e. the largest integer less than or equal to each element |
rint | Round elements to the nearest integer, preserving the dtype |
modf | Return fractional and integral parts of array as separate array |
isnan | Return boolean array indicating whether each value is NaN (Not a Number) |
isfinite, isinf | Return boolean array indicating whether each element is finite (non- inf , non- NaN ) or infinite, respectively |
cos, cosh, sin, sinh, tan, tanh | Regular and hyperbolic trigonometric functions |
arccos, arccosh, arcsin, arcsinh, arctan, arctanh | Inverse trigonometric functions |
logical_not | Compute truth value of not x element-wise. Equivalent to -arr . |
Binary functions
Function | Description |
---|---|
add | Add corresponding elements in arrays |
subtract | Subtract elements in second array from first array |
multiply | Multiply array elements |
divide, floor_divide | Divide or floor divide (truncating the remainder) |
power | Raise elements in first array to powers indicated in second array |
maximum, fmax | Element-wise maximum. fmax ignores NaN |
minimum, fmin | Element-wise minimum. fmin ignores NaN |
mod | Element-wise modulus (remainder of division) |
copysign | Copy sign of values in second argument to values in first argument |
Data processing using arrays
vectorization把loop转换成array expression: faster
Expressing conditional logic as array operations
- pure python
result = [x if c else y for x,y,c in zip(x,y,c)
numpy
result = np.where(c,x,y) arr = randn(4, 4) arr np.where(arr > 0, 2, -2) np.where(arr > 0, 2, arr) # set only positive values to 2
Mathematical and statistical methods
mean
arr = np.random.randn(5, 4) # normally-distributed data arr.mean() np.mean(arr) arr.sum()
按行列,0为列,1 为行
arr.mean(axis=1) arr.sum(0)
cumsum, cumprod
arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) arr.cumsum(0) arr.cumprod(1)
Method | Description |
---|---|
sum | Sum of all the elements in the array or along an axis. Zero-length arrays have sum 0. |
mean | Arithmetic mean. Zero-length arrays have NaN mean. |
std, var Standard deviation and variance, respectively, with optional degrees of freedom adjust-ment (default denominator n ). | |
min, max | Minimum and maximum. |
argmin, argmax | Indices of minimum and maximum elements, respectively. |
cumsum | Cumulative sum of elements starting from 0 |
cumprod | Cumulative product of elements starting from 1 |
Methods for boolean arrays
统计正数
arr = randn(100) (arr > 0).sum() # Number of positive values
- 是否存在any,是否都all
bools = np.array([False, False, True, False])
bools.any()
bools.all()
Sorting
arr.sort()
arr = randn(8) arr arr.sort() arr
arr.sort(1)
arr.sort(1)
np.sort()
np.sort(arr)
Unique and other set logic
np.unique(arr)
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe']) np.unique(names) ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4]) np.unique(ints)
np.in1d(arr1,arr2)
values = np.array([6, 0, 0, 3, 2, 5, 6]) np.in1d(values, [2, 3, 6])
Method | Description |
---|---|
unique(x) | Compute the sorted, unique elements in x |
intersect1d(x, y) | Compute the sorted, common elements in x and y |
union1d(x, y) | Compute the sorted union of elements |
in1d(x, y) | Compute a boolean array indicating whether each element of x is contained in y |
setdiff1d(x, y) | Set difference, elements in x that are not in y |
setxor1d(x, y) | Set symmetric differences; elements that are in either of the arrays, but not both |
File input and output with arrays
Storing arrays on disk in binary format
arr = np.arange(10)
np.save('some_array', arr)
np.load('some_array.npy')
np.savez('array_archive.npz', a=arr, b=arr)
arch = np.load('array_archive.npz')
arch['b'] #dict-like
Saving and loading text files
pandas里面的read_csv和read_table 较为常用
arr = np.loadtxt('array_ex.txt', delimiter=',')
arr
Linear algebra
from numpy.linalg import inv, qr
1. A %*% B
“`python
x = np.array([[1., 2., 3.], [4., 5., 6.]])
y = np.array([[6., 23.], [-1, 7], [8, 9]])
x
y
x.dot(y) # equivalently np.dot(x, y)
```
2. QR分解
“`
from numpy.linalg import inv, qr
X = randn(5, 5)
mat = X.T.dot(X)
inv(mat)
mat.dot(inv(mat))
q, r = qr(mat)
r
Function | Description |
---|---|
diag | Return the diagonal (or off-diagonal) elements of a square matrix as a 1D array, or |
dot | Matrix multiplication |
trace | Compute the sum of the diagonal elements |
det | Compute the matrix determinant |
eig | Compute the eigenvalues and eigenvectors of a square matrix |
inv | Compute the inverse of a square matrix |
pinv | Compute the Moore-Penrose pseudo-inverse inverse of a square matrix |
qr | Compute the QR decomposition |
svd | Compute the singular value decomposition (SVD) |
solve | Solve the linear system Ax = b for x, where A is a square matrix |
lstsq | Compute the least-squares solution to y = Xb |
Random number generation
samples = np.random.normal(size=(4, 4))
samples
from random import normalvariate
N = 1000000
%timeit samples = [normalvariate(0, 1) for _ in xrange(N)]
%timeit np.random.normal(size=N)
Function | Description |
---|---|
seed | Seed the random number generator |
permutation | Return a random permutation of a sequence, or return a permuted range |
shuffle | Randomly permute a sequence in place |
rand | Draw samples from a uniform distribution |
randint | Draw random integers from a given low-to-high range |
randn | Draw samples from a normal distribution with mean 0 and standard deviation 1 (MATLAB-like interface) |
binomial | Draw samples a binomial distribution |
normal | Draw samples from a normal (Gaussian) distribution |
beta | Draw samples from a beta distribution |
chisquare | Draw samples from a chi-square distribution |
gamma | Draw samples from a gamma distribution |
uniform | Draw samples from a uniform [0, 1) distribution |
Example: Random Walks
pure python
import random
position = 0
walk = [position]
steps = 1000
for i in xrange(steps):
step = 1 if random.randint(0, 1) else -1
position += step
walk.append(position)
numpy
np.random.seed(12345)
nsteps = 1000
draws = np.random.randint(0, 2, size=nsteps)
steps = np.where(draws > 0, 1, -1)
walk = steps.cumsum()
初探random walk
walk.min()
walk.max()
找出初次到达10或-10的时刻
(np.abs(walk)>=10).argmax()
Simulating many random walks at once
nwalks = 5000
nsteps = 1000
draws = np.random.randint(0, 2, size=(nwalks, nsteps)) # 0 or 1
steps = np.where(draws > 0, 1, -1)
walks = steps.cumsum(1) #对行求和
walks
初探random walk
walks.max()
walks.min()
hits30 = (np.abs(walks) >= 30).any(1)
hits30
hits30.sum() # Number that hit 30 or -30
crossing_times = (np.abs(walks[hits30]) >= 30).argmax(1)
crossing_times.mean()
正态分布 random walk
steps = np.random.normal(loc=0, scale=0.25,
size=(nwalks, nsteps))