Python机器学习之Numpy

最新推荐文章于 2019-11-04 15:19:08 发布

lisa丶

最新推荐文章于 2019-11-04 15:19:08 发布

阅读量363

点赞数 2

CC 4.0 BY-SA版权

分类专栏： python 文章标签： Numpy python 机器学习

本文链接：https://blog.youkuaiyun.com/weixin_42341986/article/details/96272933

python 专栏收录该内容

4 篇文章

订阅专栏

本文介绍NumPy库的基本使用方法，包括数组创建、索引、切片、数学运算等核心功能，以及如何通过NumPy进行高效的科学计算。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

NumPy是Python语言的一个扩充程序库。支持高级大量的维度数组与矩阵运算，此外也针对数组运算提供大量的数学函数库。Numpy内部解除了Python的PIL(全局解释器锁),运算效率极好,是大量机器学习框架的基础库!

高性能科学计算和数据分析的基础包
ndarray，多维数组（矩阵），具有矢量运算能力，快速、节省空间
矩阵运算，无需循环，可完成类似Matlab中的矢量运算
线性代数、随机数生成

科学计算库Numpy

import numpy
# genfromtxt 在txt中读数据，并进行分割操作，进行类型转换
world_alcohol = numpy.genfromtxt("world_alcohol.txt", delimiter=",",dtype=str)
print(type(world_alcohol))
print(world_alcohol)

运行结果：
<class ‘numpy.ndarray’>
[[‘Year’ ‘WHO region’ ‘Country’ ‘Beverage Types’ ‘Display Value’]
[‘1986’ ‘Western Pacific’ ‘Viet Nam’ ‘Wine’ ‘0’]
[‘1986’ ‘Americas’ ‘Uruguay’ ‘Other’ ‘0.5’]
…,
[‘1987’ ‘Africa’ ‘Malawi’ ‘Other’ ‘0.75’]
[‘1989’ ‘Americas’ ‘Bahamas’ ‘Wine’ ‘1.5’]
[‘1985’ ‘Africa’ ‘Malawi’ ‘Spirits’ ‘0.31’]]

Numpy创建简单的数组

import numpy
#创建一维数组
vector = numpy.array([5, 10, 15, 20])
#创建二维数组，矩阵
matrix = numpy.array([[5,10,15],[20,25,30],[35,40,45]])
print(vector)
print(matrix)

运行结果：
[ 5 10 15 20]
[[ 5 10 15]
[20 25 30]
[35 40 45]]

.shape 数组的结构

print(vector.shape) #1行4列
print(matrix.shape) #3行3列

运行结果：
(4,)
(3, 3)
Numpy数组中必须是相同的结构

.dtype 查看numpy数据类型

number1 = numpy.array([5, 10, 15, 20])
#将5改为5.0之后，所有元素数据类型都变为float64
number2 = numpy.array([5.0, 10, 15, 20])
print(number1.dtype)
print(number2, number2.dtype)

运行结果：
int32
[ 5. 10. 15. 20.] float64

Numpy可以取某一个元素

打印出的元素，若想取’0.5’和’Cte d’Ivoire’这两个元素，则可做如下操作：

[[‘1986’ ‘Western Pacific’ ‘Viet Nam’ ‘Wine’ ‘0’]
[‘1986’ ‘Americas’ ‘Uruguay’ ‘Other’ ‘0.5’]
[‘1985’ ‘Africa’ “Cte d’Ivoire” ‘Wine’ ‘1.62’]
…,
[‘1987’ ‘Africa’ ‘Malawi’ ‘Other’ ‘0.75’]
[‘1989’ ‘Americas’ ‘Bahamas’ ‘Wine’ ‘1.5’]
[‘1985’ ‘Africa’ ‘Malawi’ ‘Spirits’ ‘0.31’]]

#skip_header=1 表示从第一行开始读取
world_alcohol = numpy.genfromtxt("world_alcohol.txt", delimiter=",",dtype=str, skip_header=1)
#取第一行第4列元素
uruguay_other_1986 = world_alcohol[1, 4]
#取第2行第2列元素
third_country = world_alcohol[2, 2]
print(uruguay_other_1986)
print(third_country)

运行结果：
0.5
Cte d’Ivoire

Numpy切片

#一维数组切片
vector = numpy.array([5,10,15,20])
print(vector[0:3]) #取第0个到第二个，不包括3

结果：
[ 5 10 15]

#二维数组切片
matrix= numpy.array([
    [5,10,12],
    [20,25,30],
    [35,21,52]
])
#取某一列元素
print(matrix[:,1]) #取第一列元素

结果：
[10 25 21]

#取2列元素
print(matrix[:,0:2]) #取0-1列元素

结果：
[[ 5 10]
[20 25]
[35 21]]

print(matrix[1:3,0:2]) #取1-2行的0-1列元素

结果：
[[20 25]
[35 21]]

Numpy计算

Numpy判断

vector = numpy.array([5,10,15,20])
result = (vector == 10) #判断数组中是否有等于10的元素，返回布尔类型的值
print(result, result.dtype)

结果：
[False True False False] bool

print(vector[result]) #将返回的bool值作为索引查找相应元素

结果：
[10]

#矩阵也可这样判断
matrix= numpy.array([
    [5,10,12],
    [20,25,30],
    [35,21,52]
])
result1 = (matrix == 25)
print(result1, result1.dtype)

结果：
[[False False False]
[False True False]
[False False False]] bool

result2 = (matrix[:,1] == 25) #取第一列中是否存在值为25的元素
print(result2)

结果：
[False True False]

result2 = (matrix[:,1] == 25) #将第2列作为索引
print(matrix[result2, :])

结果：
[[20 25 30]]

#逻辑操作
result = (vector == 10) & (vector == 5)
print(result)
result1 = (vector == 10) | (vector == 5)
print(result1)

结果：
[False False False False]
[ True True False False]

.astype类型转换

vector = numpy.array(['5','10','15','20'])
print(vector, vector.dtype)
vector = vector.astype(float) #用astype将str类型转换为float类型
print(vector, vector.dtype)

结果：
[‘5’ ‘10’ ‘15’ ‘20’] <U2
[ 5. 10. 15. 20.] float64

最大最小值运算

vector = numpy.array([5,10,15,20])
print(vector.min())
print(vector.max())

结果：
5
20

求和运算

matrix= numpy.array([
    [5,10,12],
    [20,25,30],
    [35,21,52]
])
print(matrix.sum(axis=1)) #行求和，维度为1
print(matrix.sum(axis=0)) #列求和，维度为0

结果：
[ 27 75 108]
[60 56 94]

Numpy矩阵

.arange 创造向量

import numpy as np
#arange为造出0-14的数组元素
print(np.arange(15))

结果：
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]

#reshape类型变换操作
a = np.arange(15).reshape(3, 5) #类型变换
#ndim维度，size代表多少个元素
print(a, a.ndim, a.dtype.name, a.size)

结果：
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]] 2 int32 15

#arange(10,50,5)代表起始值为10，最大值为50，不包括50，每次加6个值
b = np.arange(10,50,6)
print(b)

结果：
[10 16 22 28 34 40 46]

#arange(0,2,0.2)代表起始值为0，最大值为2，不包括2，每次加0.2个值
c = np.arange(0,2,0.2)
print(c)

结果：
[ 0. 0.2 0.4 0.6 0.8 1. 1.2 1.4 1.6 1.8]

矩阵初始化

#zeros将所有元素初始化为0
b = np.zeros((3,4))
print(b,b.ndim,b.size)

结果：
[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]] 2 12

#ones将所有元素初始化为1
c = np.ones(((2,3,4)))
print(c, c.ndim,c.size)

结果：
[[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]

[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]] 3 24

#random模块调用的random函数，random为随机产生的值
d = np.random.random((2,3))
print(d)

结果：
[[ 0.06161847 0.01691882 0.07025908]
[ 0.44663879 0.83298503 0.39707203]]

#linspace直接把需要的值造出来
from numpy import pi
#得到100个数，这100个数从0开始，到2*pi结束，中间平均累加
a = np.linspace(0,2*pi,100)
print(a)

结果：
[ 0. 0.21666156 0.43332312 0.64998469 0.86664625 1.08330781
1.29996937 1.51663094 1.7332925 1.94995406 2.16661562 2.38327719
2.59993875 2.81660031 3.03326187 3.24992343 3.466585 3.68324656
3.89990812 4.11656968 4.33323125 4.54989281 4.76655437 4.98321593
5.1998775 5.41653906 5.63320062 5.84986218 6.06652374 6.28318531]

Numpy数学运算

运算

##加减运算
a = np.array([20,30,40,50])
b = np.arange(4)
print(a, b, b-1)
print((a-b),(a+b))

结果：
[20 30 40 50] [0 1 2 3] [-1 0 1 2]
[20 29 38 47] [20 31 42 53]

#平方运算
print(b**b)

结果：
[0 1 4 9]

#对应位置相乘
a = np.arange(6).reshape(2,3)
print(a)
b = np.arange(5,21,3).reshape(3,2)
print(b)

结果：
a:
[[0 1 2]
[3 4 5]]
b:
[[ 5 8]
[11 14]
[17 20]]

print('a与b相乘:',np.dot(a,b))

结果：
a与b的对应位置相乘:
[[ 45 54]
[144 180]]

print('a:',a)
print('b的转置:',b.T)
print('点乘:',a*b.T)

结果：
a:
[[0 1 2]
[3 4 5]]
b的转置:
[[ 5 11 17]
[ 8 14 20]]
a与b的转置的对应位置相乘:
[[ 0 11 34]
[ 24 56 100]]

变换

a = np.arange(3)
#exp(a)计算e的a次幂，sqrt()开平方根
print(a, np.exp(a), np.sqrt(a))

结果：
[0 1 2] [ 1. 2.71828183 7.3890561 ] [ 0. 1. 1.41421356]

#floor向下取整 random取-1到+1之间随机值
b = np.floor(10*np.random.random((3,4)))
print(b)

结果：
[[ 1. 3. 5. 3.]
[ 0. 9. 9. 4.]
[ 2. 5. 3. 0.]]

#从矩阵拉成向量
print(b.ravel())

结果：
[ 1. 3. 5. 3. 0. 9. 9. 4. 2. 5. 3. 0.]

#变换
b.shape = (2, -1)
print(b)

结果：
[[ 1. 3. 5. 3. 0. 9.]
[ 9. 4. 2. 5. 3. 0.]]

矩阵拼接

a = np.floor(10*np.random.random((2,2)))
b = np.floor(10*np.random.random((2,2)))
print(a)
print(b)

结果：
[[ 9. 9.]
[ 8. 2.]]
[[ 2. 4.]
[ 7. 9.]]

#横着拼接
print(np.hstack((a,b)))

结果：
[[ 9. 9. 2. 4.]
[ 8. 2. 7. 9.]]

#竖着拼接
print(np.vstack((a,b)))

结果：
[[ 9. 9.]
[ 8. 2.]
[ 2. 4.]
[ 7. 9.]]

矩阵切分

#矩阵
a = np.floor(10*np.random.random((4,12)))
print(a)

结果：
[[ 1. 3. 9. 9. 4. 8. 1. 7. 3. 0. 0. 5.]
[ 0. 9. 5. 6. 6. 9. 1. 6. 6. 4. 1. 6.]
[ 3. 6. 6. 2. 8. 7. 2. 4. 1. 3. 4. 6.]
[ 7. 8. 3. 5. 5. 4. 1. 8. 0. 6. 3. 8.]]

#横着平均切3份
print(np.hsplit(a,3))

结果：
[array([[ 1., 3., 9., 9.],
[ 0., 9., 5., 6.],
[ 3., 6., 6., 2.],
[ 7., 8., 3., 5.]]), array([[ 4., 8., 1., 7.],
[ 6., 9., 1., 6.],
[ 8., 7., 2., 4.],
[ 5., 4., 1., 8.]]), array([[ 3., 0., 0., 5.],
[ 6., 4., 1., 6.],
[ 1., 3., 4., 6.],
[ 0., 6., 3., 8.]])]

#在指定的位置切3，5
print(np.hsplit(a,(3,5)))

结果：
[array([[ 1., 3., 9.],
[ 0., 9., 5.],
[ 3., 6., 6.],
[ 7., 8., 3.]]), array([[ 9., 4.],
[ 6., 6.],
[ 2., 8.],
[ 5., 5.]]), array([[ 8., 1., 7., 3., 0., 0., 5.],
[ 9., 1., 6., 6., 4., 1., 6.],
[ 7., 2., 4., 1., 3., 4., 6.],
[ 4., 1., 8., 0., 6., 3., 8.]])]

#竖着平均切2份
print(np.vsplit(a,2))

结果：
[array([[ 1., 3., 9., 9., 4., 8., 1., 7., 3., 0., 0., 5.],
[ 0., 9., 5., 6., 6., 9., 1., 6., 6., 4., 1., 6.]]), array([[ 3., 6., 6., 2., 8., 7., 2., 4., 1., 3., 4., 6.],
[ 7., 8., 3., 5., 5., 4., 1., 8., 0., 6., 3., 8.]])]

#竖着切指定的位置，切a的转置，在3,6切
print(np.vsplit(a.T, (3,6)))

结果：
[array([[ 1., 0., 3., 7.],
[ 3., 9., 6., 8.],
[ 9., 5., 6., 3.]]), array([[ 9., 6., 2., 5.],
[ 4., 6., 8., 5.],
[ 8., 9., 7., 4.]]), array([[ 1., 1., 2., 1.],
[ 7., 6., 4., 8.],
[ 3., 6., 1., 0.],
[ 0., 4., 3., 6.],
[ 0., 1., 4., 3.],
[ 5., 6., 6., 8.]])]

Numpy复制

#完全复制
#'=',复制 b发生变化a也会跟着变化，a与b地址相同
a = np.arange(12)
b = a
print(b is a)
b.shape=(3,4)
print(b.shape,id(b))
print(a.shape,id(a))

结果：
(3, 4) 694267485520
(3, 4) 694267485520

#浅复制 'view()',c发生变化a也会跟着变化，a与b地址不同
c = a.view()
print(c is a)
c[0][3]=1234
print(c,c.shape,id(c))
print(a,a.shape,id(a))

结果：
False
[[ 0 1 2 1234]
[ 4 5 6 7]
[ 8 9 10 11]] (3, 4) 694267691776
[[ 0 1 2 1234]
[ 4 5 6 7]
[ 8 9 10 11]] (3, 4) 694267485520

#深复制，'copy()',d发生变化a不会跟着变化，a与d地址不同
d = a.copy() #深复制
print(d is a)
d[0][0]=9999
print(d,d.shape,id(d))
print(a,a.shape,id(a))

结果：
False
[[ 0 1 2 1234]
[ 4 5 6 7]
[ 8 9 10 11]] (3, 4) 694267691776
[[ 0 1 2 1234]
[ 4 5 6 7]
[ 8 9 10 11]] (3, 4) 694267485520

Numpy索引

import numpy as np
data = np.sin(np.arange(20)).reshape(5,4)
print(data)

结果：
[[ 0. 0.84147098 0.90929743 0.14112001]
[-0.7568025 -0.95892427 -0.2794155 0.6569866 ]
[ 0.98935825 0.41211849 -0.54402111 -0.99999021]
[-0.53657292 0.42016704 0.99060736 0.65028784]
[-0.28790332 -0.96139749 -0.75098725 0.14987721]]

#找到列中最大值索引位置
ind = data.argmax(axis=0)
print(ind)
#找到索引对应的值
data_max = data[ind, range(data.shape[1])]
print(data_max)

结果：
[2 0 3 1]
[ 0.98935825 0.84147098 0.99060736 0.6569866 ]

#'tile()'操作，扩展
a = np.arange(0,40,10)
print(a)
#行变为原来2倍，列变成3倍
b = np.tile(a,(2,2))
print(b)

结果：
[ 0 10 20 30]
[[ 0 10 20 30 0 10 20 30]
[ 0 10 20 30 0 10 20 30]]

Numpy排序

import numpy as np
a = np.array([
    [4,3,6],
    [2,4,2]
])
print(a)

结果：
[[4 3 6]
[2 4 2]]

#按照行从小到大排序
print(np.sort(a,axis=1))

结果：
[[3 4 6]
[2 2 4]]

#将a拉伸
b = np.ravel(a)
print(b)

结果：
[4 3 6 2 4 2]

#计算出从小到大排序的位置---索引
j = np.argsort(b)
print(j)

结果：
[3 5 1 0 4 2]

#根据索引找到相应的值
print(b[j])

结果：
[2 2 3 4 4 6]

到此为止，Numpy的常用知识总结完毕。