pandas-时序数据的位移与调整

本文详细介绍了Pandas库中用于处理时序数据位移和调整的asfreq API,通过该方法可以将数据转换为特定频率的时间序列。链接:https://pandas.pydata.org/pandasdocs/stable/reference/api/pandas.DataFrame.asfreq.html。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

asfreq的API(地址:https://pandas.pydata.org/pandasdocs/stable/reference/api/pandas.DataFrame.asfreq.html)
在这里插入图片描述
在这里插入图片描述

第四课 Pandas时序型数据分析
第四节 时序数据的位移与频率调整
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline
#parse_dates=True将时间序列解析为datetime类型
data_df = pd.read_csv('./datasets/day_stats.csv', index_col='date', parse_dates=True, dayfirst=True)
data_df.head()
city	PM_China	PM_US Post	Polluted State CH	Polluted State US
date					
2013-10-01	beijing	67.416667	71.458333	light	light
2013-10-10	beijing	74.041667	81.583333	light	medium
2013-10-11	beijing	59.694444	59.291667	light	light
2013-10-12	beijing	69.236111	70.458333	light	light
2013-10-13	beijing	54.513889	57.333333	light	light
beijing_data = data_df[data_df['city'] == 'beijing'].copy()
beijing_data['PM_China'].plot(figsize=(10, 5))
<matplotlib.axes._subplots.AxesSubplot at 0x1de1cc83ba8>

位移 shift
beijing_data['+1'] = beijing_data['PM_China'].shift(1)
beijing_data['-1'] = beijing_data['PM_China'].shift(-1)
beijing_data.head()
city	PM_China	PM_US Post	Polluted State CH	Polluted State US	+1	-1
date							
2013-10-01	beijing	67.416667	71.458333	light	light	NaN	74.041667
2013-10-10	beijing	74.041667	81.583333	light	medium	67.416667	59.694444
2013-10-11	beijing	59.694444	59.291667	light	light	74.041667	69.236111
2013-10-12	beijing	69.236111	70.458333	light	light	59.694444	54.513889
2013-10-13	beijing	54.513889	57.333333	light	light	69.236111	37.739130
beijing_data.loc['2015-01'][['PM_China', '+1', '-1']].plot(figsize=(10, 5))
<matplotlib.axes._subplots.AxesSubplot at 0x1de1cdab8d0>

频率调整 asfreq
# 高频转低频
beijing_data.asfreq(freq='M').head()
city	PM_China	PM_US Post	Polluted State CH	Polluted State US	+1	-1
date							
2013-10-31	beijing	142.666667	170.750000	medium	heavy	92.583333	160.055556
2013-11-30	beijing	34.166667	34.708333	good	good	41.763889	44.565217
2013-12-31	beijing	49.722222	49.333333	light	light	44.571429	107.402778
2014-01-31	beijing	142.680556	166.208333	medium	heavy	65.930556	140.222222
2014-02-28	beijing	84.194444	102.666667	medium	medium	16.651515	6.304348
beijing_data['2013-12-31']
city	PM_China	PM_US Post	Polluted State CH	Polluted State US	+1	-1
date							
2013-12-31	beijing	49.722222	49.333333	light	light	44.571429	107.402778

# 低频转高频
beijing_data.asfreq(freq='12H').head(20)
city	PM_China	PM_US Post	Polluted State CH	Polluted State US	+1	-1
date							
2013-10-01 00:00:00	beijing	67.416667	71.458333	light	light	NaN	74.041667
2013-10-01 12:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2013-10-02 00:00:00	beijing	16.680556	19.541667	good	good	60.944444	24.971014
2013-10-02 12:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2013-10-03 00:00:00	beijing	62.041667	61.041667	light	light	31.500000	92.583333
2013-10-03 12:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2013-10-04 00:00:00	beijing	160.055556	168.166667	heavy	heavy	142.666667	286.782609
2013-10-04 12:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2013-10-05 00:00:00	beijing	286.782609	305.913044	heavy	heavy	160.055556	219.152778
2013-10-05 12:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2013-10-06 00:00:00	beijing	219.152778	213.416667	heavy	heavy	286.782609	105.472222
2013-10-06 12:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2013-10-07 00:00:00	beijing	105.472222	91.291667	medium	medium	219.152778	78.129630
2013-10-07 12:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2013-10-08 00:00:00	beijing	78.129630	63.611111	medium	light	105.472222	145.541667
2013-10-08 12:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2013-10-09 00:00:00	beijing	145.541667	146.500000	medium	medium	78.129630	204.305556
2013-10-09 12:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2013-10-10 00:00:00	beijing	74.041667	81.583333	light	medium	67.416667	59.694444
2013-10-10 12:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN

# 低频转高频,使用fill_value填充空值
beijing_data.asfreq(freq='12H', fill_value=0).head(20)
city	PM_China	PM_US Post	Polluted State CH	Polluted State US	+1	-1
date							
2013-10-01 00:00:00	beijing	67.416667	71.458333	light	light	NaN	74.041667
2013-10-01 12:00:00	0	0.000000	0.000000	0	0	0.000000	0.000000
2013-10-02 00:00:00	beijing	16.680556	19.541667	good	good	60.944444	24.971014
2013-10-02 12:00:00	0	0.000000	0.000000	0	0	0.000000	0.000000
2013-10-03 00:00:00	beijing	62.041667	61.041667	light	light	31.500000	92.583333
2013-10-03 12:00:00	0	0.000000	0.000000	0	0	0.000000	0.000000
2013-10-04 00:00:00	beijing	160.055556	168.166667	heavy	heavy	142.666667	286.782609
2013-10-04 12:00:00	0	0.000000	0.000000	0	0	0.000000	0.000000
2013-10-05 00:00:00	beijing	286.782609	305.913044	heavy	heavy	160.055556	219.152778
2013-10-05 12:00:00	0	0.000000	0.000000	0	0	0.000000	0.000000
2013-10-06 00:00:00	beijing	219.152778	213.416667	heavy	heavy	286.782609	105.472222
2013-10-06 12:00:00	0	0.000000	0.000000	0	0	0.000000	0.000000
2013-10-07 00:00:00	beijing	105.472222	91.291667	medium	medium	219.152778	78.129630
2013-10-07 12:00:00	0	0.000000	0.000000	0	0	0.000000	0.000000
2013-10-08 00:00:00	beijing	78.129630	63.611111	medium	light	105.472222	145.541667
2013-10-08 12:00:00	0	0.000000	0.000000	0	0	0.000000	0.000000
2013-10-09 00:00:00	beijing	145.541667	146.500000	medium	medium	78.129630	204.305556
2013-10-09 12:00:00	0	0.000000	0.000000	0	0	0.000000	0.000000
2013-10-10 00:00:00	beijing	74.041667	81.583333	light	medium	67.416667	59.694444
2013-10-10 12:00:00	0	0.000000	0.000000	0	0	0.000000	0.000000
 
### 多模态数据预处理步骤 多模态数据预处理涉及多种类型的输入,如文本、图像、声音等。为了有效整合这些不同的信息源并提升后续模型的表现力,在预处理阶段需遵循一系列标准化操作[^3]。 #### 数据导入 首先执行的是原始文件读取工作,这一步骤旨在获取来自不同传感器或记录设备的数据集。对于每种特定形式的媒体内容,可能需要用到专门设计好的库函数来进行加载,例如使用`PIL.Image.open()`打开图片文件或是借助`pandas.read_csv()`解析CSV文档中的结构化表格型资料[^5]。 ```python from PIL import Image import pandas as pd image_data = Image.open('path_to_image_file') structured_data = pd.read_csv('path_to_structured_data_file.csv') ``` #### 清洗初步筛选 在此之后是对收集来的素材做必要的清理——去除噪声干扰项(像静音片段)、填补缺失值以及纠正错误标签等问题;同依据具体应用场景的要求设定合理的阈值范围来过滤掉不满足条件的部分样本实例。 #### 特征提取 针对各类感知信号实施特征工程活动,即从原始观测序列里挖掘出具备代表意义的关键属性向量表示法。比如利用傅立叶变换计算频域特性参数表征音频波形变化规律,或者基于卷积神经网络自动抽取高层次语义概念自自然语言表达之中[^1]。 ```python # 假设有一个简单的CNN用于图像特征提取 model = tf.keras.applications.VGG16(weights='imagenet', include_top=False) def extract_features(image_path): img = image.load_img(image_path, target_size=(224, 224)) x = image.img_to_array(img) x = np.expand_dims(x, axis=0) x = preprocess_input(x) features = model.predict(x) return features.flatten() ``` #### 对齐序同步 当面对间连续性的多媒体组合件,则还需要考虑它们之间是否存在相对位移偏差情况,并采取适当措施加以校正以确保各维度上的动态演变趋势能够保持一致对应关系[^4]。 #### 归一化/标准化转换 最后一步则是统一量化尺度标准,使得所有参运算的对象都处于相同数量级水平线上便于比较分析。常用的方法有最大最小规范化(`MinMaxScaler`)或者是Z-score标准化(`StandardScaler`)等方式。 ```python from sklearn.preprocessing import StandardScaler scaler = StandardScaler().fit(training_samples) standardized_samples = scaler.transform(testing_samples) ``` 通过上述几个方面的综合考量和技术手段应用,可以大大改善后期建模过程中遇到的各种潜在难题,从而为构建更加精准可靠的预测机制奠定坚实基础。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值