Python中NumPy模块(Python for Data Analysis学习笔记)

本文是Python for Data Analysis学习笔记,重点介绍了NumPy中的数组操作。包括快速的数组运算、随机二维数组生成、astype转换、元素级运算、切片与复制、布尔索引以及矩阵乘法、排序和均值计算等核心概念。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

numpy-数组

  • numpy中的array运算更快
import numpy as np
my_arr = np.arange(1000000)
my_list = list(range(1000000))

#%time是ipython的特殊功能
%time for _ in range(10): my_arr2 = my_arr * 2
%time for _ in range(10): my_list2 = [x*2 for x in my_list]

CPU times: user 13 ms, sys: 8.12 ms, total: 21.1 ms
Wall time: 22 ms
CPU times: user 520 ms, sys: 147 ms, total: 667 ms
Wall time: 669 ms
  • 随机生成二维数组
data = np.random.randn(2,3)
  • 调用astype函数时会产生一个新的数组
numeric_strings = np.array(['1.25','-9.2','42'],dtype=np.string_)
numeric_strings_2 = numeric_strings.astype(np.float64)

print(numeric_strings.dtype)
print(numeric_strings_2.dtype)

|S4
float64
  • Any arithmetic operations between equal-size arrays applies the operation element-wise
  • 切片不是copy,需要显式地copy
arr = np.arange(10)
arr[:] = 12
arr
Out[17]: array([12, 12, 12, 12, 12, 12, 12, 12, 12, 12])

a = arr[5:8].copy()
a[:] = 10 #这样就不会影响arr
  • 选取二维数组的前两行
arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2d[:2]
  • boolean indexing
names = np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])
data = np.random.randn(len(names), 4)
data[names == 'Bob']
#To select everything but 'Bob', you can either use != or negate the condition using ~:
data[~(names == 'Bob')]

#这里 and 和 or 不好用. Use & (and) and | (or) instead.
#Selecting data from an array by boolean indexing always creates a copy of the data
mask = (names == 'Bob') | (names == 'Will')
data[mask]

-将二维数组里小于0的元素都变为0

data[data<0]=0

numpy-矩阵

  • 这里*是矩阵乘法,multiply是对应元素相乘
#矩阵
ss = np.mat([1,2,3])
mm = np.mat([1,2,3])
mm*ss.T  #矩阵乘法

Out[26]: matrix([[14]])

np.shape(mm)
Out[27]: (1, 3)
np.multiply(mm,ss)
Out[28]: matrix([[1, 4, 9]])
  • 排序
dd.argsort()  #得到矩阵中每个元素的序号
  • 取均值
dd.mean()
这本书主要是用 pandas 连接 SciPy 和 NumPy,用pandas做数据处理是Pycon2012上一个很热门的话题。另一个功能强大的东西是Sage,它将很多开源的软件集成到统一的 Python 接口。, Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language., Written by Wes McKinney, the main author of the pandas library, this hands-on book is packed with practical cases studies. It’s ideal for analysts new to Python and for Python programmers new to scientific computing., Use the IPython interactive shell as your primary development environment, Learn basic and advanced NumPy (Numerical Python) features, Get started with data analysis tools in the pandas library, Use high-performance tools to load, clean, transform, merge, and reshape data, Create scatter plots and static or interactive visualizations with matplotlib, Apply the pandas groupby facility to slice, dice, and summarize datasets, Measure data by points in time, whether it’s specific instances, fixed periods, or intervals, Learn how to solve problems in web analytics, social sciences, finance, and economics, through detailed examples
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值