[Python for Data Anlysis]CH04 Numpy Basics -- Arrays and Vectorized Computation

本文全面介绍了NumPy库,从基础知识到高级应用,包括数组操作、数学函数、线性代数、随机数生成、傅里叶变换等。详细解释了NumPy数组的使用方法,提供了各种实用函数的实例,并通过代码示例展示了如何高效地进行数据处理和分析。

NumPy Basics: Arrays and Vectorized Computation

NumPy, short for Numerical Python, is the fundamental package required for high
performance scientific computing and data analysis.

  • ndarray
  • mathematical functions for fast operations on entire arrays of data without having to write loop
  • Tools for reading data form disk
  • Linear Algebra, random number generation, Fourier transformation
  • Tools for interrating code wiritten in C, C++, Fortran

基本设置

%matplotlib inline
from __future__ import division
from numpy.random import randn
import numpy as np
np.set_printoptions(precision=4, suppress=True)

NumPy ndarray: A Multidimensional Array Object

基本使用

data = randn(2, 3)
data *10
data + data
data.shape
data.dtype

Creating ndarray

  1. Array
    它能接受任何序列, 然后创建一个NumPy array,包含输入的序列

  2. zeros and ones
    zeros 和 ones创建对应shape的array, 而且分别全为0,1.

  3. empty
    empty creats an array without initializing its values to any particular value

  4. arange
    arange 将range变为对应的array

#array
data1= [6,7.5,8,0,1]
arr1 = np.array(data1)
#二维序列 nested sequences
data2 = [[1,2,3,4],[5,6,7,8]]
arr2 = np.array(data2)

#zeros, ones
a1 = np.zeros(10)
a2 = np.ones((2,3))

#empty
np.empty(10)

#arange
np.arange(15)
FunctionDescription
arrayConvert input data (list, tuple, array, or other sequence type) to an ndarray either by inferring a dtype or explicitly specifying a dtype. Copies the input data by default.
asarrayConvert input to ndarray, but do not copy if the input is already an ndarray
arangeLike the built-in range but returns an ndarray instead of a list.
ones, ones_likeProduce an array of all 1’s with the given shape and dtype. ones_like takes another array and produces a ones array of the same shape and dtype.
zeros, zeros_likeLike ones and ones_like but producing arrays of 0’s instead
empty, empty_likeCreate new arrays by allocating new memory, but do not populate with any values like ones and zeros
eye, identityCreate a square N x N identity matrix (1’s on the diagonal and 0’s elsewhere)

Data Types for ndarrays

主要时用于计算memory大小的,后面数字表示bit位数, double(float)8字节,所以要64bits

arr1 = np.array([1,2,3],dtype = np.float64)
arr2 = np.array([1,2,3],dtype = np.int32)
arr1.dtype
arr2.dtype
casting dtypes between different arrays

类型给定方法:
1. 初始化时默认给定
2. 初始化时给定
3. arr.astype(给定dtype,或这另一个arr2.dtype)
astype always creates a new array,不论类型有没有被改变


#1. 初始化默认给定
arr = np.arange(1,6)
#2. 初始化是给定
numeric_strings = np.array(['1.25','-9.6','42'],dtype = np.string_)
#3. 改变数据类型
float_arr = arr.astype(np.float64) #cast int64 to float64
numeric_strings.astype(float) 
#if cast fail for some reason, a TypeError will be raised,
# Numpy is smart enough to alias Python types to equivalent dtypes

# arr2.dtype
arr1 = np.arange(10)
arr2 = randn(2,3)
arr1.astype(arr2.dtype),arr1.dtype

Operations between Arrays and Scalars

和R, Matlab一致,
所有的*, + ,-,/是对应元素间的操作

arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr
#二元运算符 
arr + arr
arr - arr
arr * arr
arr / arr
#一元运算符
1 / arr
arr ** 0.5

Bacis Indexing and Sclicing

One dimension

Array slices are views on the original array,
and any modifications to the view will be reflected in the source array.

arr = np.arange(10)
arr
arr[5]
arr[5:8]
arr[5:8] = 12
arr
arr_slice = arr[5:8]
arr_slice[1] = 12345
arr
arr_slice[:] = 64
arr
copy of the slice of the array
arr[5:8].copy()
arr_slice_copy = arr[5:8].copy()
arr_slice_copy[1] = 1
arr_slice_copy
arr
Higher Dimension

The elements at each index are no longer scalars but rather corresponding arrays

arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2d[2]
arr2d[0][2],arr2d[0,2]

arr3d = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])
arr3d
arr3d.shape
arr3d[0]
arr3d[0] = 42
arr3d[1, 0]
Indexing with slices

view of original array

arr[1:6]
arr2d
# 仅有一个表示行
arr2d[:2]
# 两个则分别表示行和列
arr2d[:2, 1:]
arr2d[1, :2]

Boolean Indexing

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = randn(7, 4)
names
data
names == 'Bob'
data[names == 'Bob'] 
data[names == 'Bob', 2:]
data[names == 'Bob', 3]

mask = (names == 'Bob') | (names == 'Will') 
#do not support keywords and, or
mask
data[mask]

data[data<0] = 0
data
data[names!='Joe'] = 7
data

Fancy Indexing

Indexing using integer arrays

arr = np.empty((8, 4))
for i in range(8):
    arr[i] = i
arr
arr[[4, 3, 0, 6]]
arr[[-3,-5,-7]]
arr = np.arange(32).reshape((8, 4))
arr
arr[[1, 5, 7, 2], [0, 3, 1, 2]]
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]
arr[np.ix_([1, 5, 7, 2], [0, 3, 1, 2])]

Transposing arrays and swapping axes

arr = np.arange(15).reshape((3, 5))
arr
arr.T
arr = np.random.randn(6, 3)
np.dot(arr.T, arr)

transpose(), swapaxes()暂时用不到

Universal Functions: Element-wise Array Functions

一些快速的函数,element-wise的函数

arr = np.arange(10)
np.sqrt(arr)
np.exp(arr)

参数为多个array

x = randn(8)
y = randn(8)
x
y
np.maximum(x, y) # element-wise maximum

返回多个值

arr = randn(7) * 5
np.modf(arr)
Uinary functions
FunctionDescription
abs, fabsCompute the absolute value element-wise for integer, floating point, or complex values. Use fabs as a faster alternative for non-complex-valued data
sqrtCompute the square root of each element. Equivalent to arr ** 0.5
squareCompute the square of each element. Equivalent to arr ** 2
expCompute the exponent e x of each element
log, log10, log2, log1pNatural logarithm (base e), log base 10, log base 2, and log(1 + x), respectively
signCompute the sign of each element: 1 (positive), 0 (zero), or -1 (negative)
ceilCompute the ceiling of each element, i.e. the smallest integer greater than or equal to each element
floorCompute the floor of each element, i.e. the largest integer less than or equal to each element
rintRound elements to the nearest integer, preserving the dtype
modfReturn fractional and integral parts of array as separate array
isnanReturn boolean array indicating whether each value is NaN (Not a Number)
isfinite, isinfReturn boolean array indicating whether each element is finite (non- inf , non- NaN ) or infinite, respectively
cos, cosh, sin, sinh, tan, tanhRegular and hyperbolic trigonometric functions
arccos, arccosh, arcsin, arcsinh, arctan, arctanhInverse trigonometric functions
logical_notCompute truth value of not x element-wise. Equivalent to -arr .
Binary functions
FunctionDescription
addAdd corresponding elements in arrays
subtractSubtract elements in second array from first array
multiplyMultiply array elements
divide, floor_divideDivide or floor divide (truncating the remainder)
powerRaise elements in first array to powers indicated in second array
maximum, fmaxElement-wise maximum. fmax ignores NaN
minimum, fminElement-wise minimum. fmin ignores NaN
modElement-wise modulus (remainder of division)
copysignCopy sign of values in second argument to values in first argument

Data processing using arrays

vectorization把loop转换成array expression: faster

Expressing conditional logic as array operations

  • pure python
    result = [x if c else y for x,y,c in zip(x,y,c)
  • numpy

    result = np.where(c,x,y)
    arr = randn(4, 4)
    arr
    np.where(arr > 0, 2, -2)
    np.where(arr > 0, 2, arr) # set only positive values to 2
    

Mathematical and statistical methods

  • mean

    arr = np.random.randn(5, 4) # normally-distributed data
    arr.mean()
    np.mean(arr)
    arr.sum()
    
  • 按行列,0为列,1 为行

    arr.mean(axis=1)
    arr.sum(0)
    
  • cumsum, cumprod

    arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
    arr.cumsum(0)
    arr.cumprod(1)
    
MethodDescription
sumSum of all the elements in the array or along an axis. Zero-length arrays have sum 0.
meanArithmetic mean. Zero-length arrays have NaN mean.
std, var Standard deviation and variance, respectively, with optional degrees of freedom adjust-ment (default denominator n ).
min, maxMinimum and maximum.
argmin, argmaxIndices of minimum and maximum elements, respectively.
cumsumCumulative sum of elements starting from 0
cumprodCumulative product of elements starting from 1

Methods for boolean arrays

  • 统计正数

    arr = randn(100)
    (arr > 0).sum() # Number of positive values
    
  • 是否存在any,是否都all

    bools = np.array([False, False, True, False])
    bools.any()
    bools.all()

Sorting

  • arr.sort()

    arr = randn(8)
    arr
    arr.sort()
    arr
    
  • arr.sort(1)

    arr.sort(1)
    
  • np.sort()

    np.sort(arr)
    

Unique and other set logic

  • np.unique(arr)

    names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe',       'Joe'])
    np.unique(names)
    ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
    np.unique(ints)
    
  • np.in1d(arr1,arr2)

    values = np.array([6, 0, 0, 3, 2, 5, 6])
    np.in1d(values, [2, 3, 6]) 
    
MethodDescription
unique(x)Compute the sorted, unique elements in x
intersect1d(x, y)Compute the sorted, common elements in x and y
union1d(x, y)Compute the sorted union of elements
in1d(x, y)Compute a boolean array indicating whether each element of x is contained in y
setdiff1d(x, y)Set difference, elements in x that are not in y
setxor1d(x, y)Set symmetric differences; elements that are in either of the arrays, but not both

File input and output with arrays

Storing arrays on disk in binary format

arr = np.arange(10)
np.save('some_array', arr)
np.load('some_array.npy')
np.savez('array_archive.npz', a=arr, b=arr)
arch = np.load('array_archive.npz')
arch['b'] #dict-like

Saving and loading text files

pandas里面的read_csv和read_table 较为常用

arr = np.loadtxt('array_ex.txt', delimiter=',')
arr

Linear algebra

from numpy.linalg import inv, qr
1. A %*% B
“`python
x = np.array([[1., 2., 3.], [4., 5., 6.]])
y = np.array([[6., 23.], [-1, 7], [8, 9]])
x
y
x.dot(y) # equivalently np.dot(x, y)

```

2. QR分解
“`
from numpy.linalg import inv, qr
X = randn(5, 5)
mat = X.T.dot(X)
inv(mat)
mat.dot(inv(mat))
q, r = qr(mat)
r

FunctionDescription
diagReturn the diagonal (or off-diagonal) elements of a square matrix as a 1D array, or
dotMatrix multiplication
traceCompute the sum of the diagonal elements
detCompute the matrix determinant
eigCompute the eigenvalues and eigenvectors of a square matrix
invCompute the inverse of a square matrix
pinvCompute the Moore-Penrose pseudo-inverse inverse of a square matrix
qrCompute the QR decomposition
svdCompute the singular value decomposition (SVD)
solveSolve the linear system Ax = b for x, where A is a square matrix
lstsqCompute the least-squares solution to y = Xb

Random number generation

samples = np.random.normal(size=(4, 4))
samples
from random import normalvariate
N = 1000000
%timeit samples = [normalvariate(0, 1) for _ in xrange(N)]
%timeit np.random.normal(size=N)
FunctionDescription
seedSeed the random number generator
permutationReturn a random permutation of a sequence, or return a permuted range
shuffleRandomly permute a sequence in place
randDraw samples from a uniform distribution
randintDraw random integers from a given low-to-high range
randnDraw samples from a normal distribution with mean 0 and standard deviation 1 (MATLAB-like interface)
binomialDraw samples a binomial distribution
normalDraw samples from a normal (Gaussian) distribution
betaDraw samples from a beta distribution
chisquareDraw samples from a chi-square distribution
gammaDraw samples from a gamma distribution
uniformDraw samples from a uniform [0, 1) distribution

Example: Random Walks

pure python

import random
position = 0
walk = [position]
steps = 1000
for i in xrange(steps):
    step = 1 if random.randint(0, 1) else -1
    position += step
    walk.append(position)

numpy

np.random.seed(12345)
nsteps = 1000
draws = np.random.randint(0, 2, size=nsteps)
steps = np.where(draws > 0, 1, -1)
walk = steps.cumsum()

初探random walk
walk.min()
walk.max()
找出初次到达10或-10的时刻

(np.abs(walk)>=10).argmax()

Simulating many random walks at once

nwalks = 5000
nsteps = 1000
draws = np.random.randint(0, 2, size=(nwalks, nsteps)) # 0 or 1
steps = np.where(draws > 0, 1, -1)
walks = steps.cumsum(1) #对行求和
walks

初探random walk

walks.max()
walks.min()

hits30 = (np.abs(walks) >= 30).any(1)
hits30
hits30.sum() # Number that hit 30 or -30

crossing_times = (np.abs(walks[hits30]) >= 30).argmax(1)
crossing_times.mean()

正态分布 random walk

steps = np.random.normal(loc=0, scale=0.25,
                         size=(nwalks, nsteps))
基于可靠性评估序贯蒙特卡洛模拟法的配电网可靠性评估研究(Matlab代码实现)内容概要:本文围绕“基于可靠性评估序贯蒙特卡洛模拟法的配电网可靠性评估研究”,介绍了利用Matlab代码实现配电网可靠性的仿真分析方法。重点采用序贯蒙特卡洛模拟法对配电网进行长时间段的状态抽样与统计,通过模拟系统元件的故障与修复过程,评估配电网的关键可靠性指标,如系统停电频率、停电持续时间、负荷点可靠性等。该方法能够有效处理复杂网络结构与设备时序特性,提升评估精度,适用于含分布式电源、电动汽车等新型负荷接入的现代配电网。文中提供了完整的Matlab实现代码与案例分析,便于复现和扩展应用。; 适合人群:具备电力系统基础知识和Matlab编程能力的高校研究生、科研人员及电力行业技术人员,尤其适合从事配电网规划、运行与可靠性分析相关工作的人员; 使用场景及目标:①掌握序贯蒙特卡洛模拟法在电力系统可靠性评估中的基本原理与实现流程;②学习如何通过Matlab构建配电网仿真模型并进行状态转移模拟;③应用于含新能源接入的复杂配电网可靠性定量评估与优化设计; 阅读建议:建议结合文中提供的Matlab代码逐段调试运行,理解状态抽样、故障判断、修复逻辑及指标统计的具体实现方式,同时可扩展至不同网络结构或加入更多不确定性因素进行深化研究。
撰写英文学术论文的“Method”部分时,Data analysis和Ethical considerations是非常重要的两个方面。以下是关于如何撰写这两个部分的具体指南和示例: ### Data Analysis 写作指南 在学术论文中,Data Analysis部分需要清晰地描述研究数据是如何被收集、处理和分析的。以下是一些写作建议: - 明确说明使用的数据分析方法,例如定性或定量分析。如果使用统计工具,请详细列出所用的统计测试及其原因。 - 提供足够的细节,以便其他研究人员可以复制您的分析过程。这包括描述数据预处理步骤、任何软件或编程语言的使用(如R或Python)以及特定算法的选择。 - 如果适用,讨论选择特定分析技术的理由,并解释为什么这些技术最适合解决研究问题。 **示例:** ```plaintext Data were analyzed using a mixed-methods approach combining both qualitative and quantitative techniques. Quantitative data were subjected to statistical analysis using SPSS version 25. Descriptive statistics were used to summarize demographic information, while inferential statistics (e.g., ANOVA) were employed to test hypotheses regarding differences in outcomes across groups. Qualitative data from interviews were transcribed verbatim and thematically analyzed using NVivo software. ``` ### Ethical Considerations 写作指南 Ethical considerations部分应概述研究过程中采取的伦理措施,以确保研究符合道德标准并保护参与者权益。以下是一些写作要点: - 描述获得伦理批准的过程,包括提及具体的伦理审查委员会或机构名称。 - 讨论如何保护参与者的隐私和匿名性,例如通过数据加密或移除个人识别信息。 - 解释如何获得知情同意,并明确告知参与者他们的权利,包括退出研究的权利。 **示例:** ```plaintext This study received ethical approval from the Institutional Review Board of [University Name]. All participants provided written informed consent prior to participation. To ensure confidentiality, all personal identifiers were removed from the dataset, and data were stored securely on password-protected computers. Participants were informed of their right to withdraw from the study at any time without penalty. ``` ### 注意事项 确保每个部分的语言简洁明了,避免不必要的复杂术语。同时,遵循目标期刊的格式要求和风格指南。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值