Temporian 开源项目教程-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00436/article/details/142806434

Temporian 开源项目教程

temporian Temporian is an open-source Python library for preprocessing ⚡ and feature engineering 🛠 temporal data 📈 for machine learning applications 🤖 项目地址: https://gitcode.com/gh_mirrors/te/temporian

1. 项目介绍

Temporian 是一个用于预处理和特征工程的 Python 开源库，专门针对机器学习应用中的时间序列数据。它支持多变量时间序列、多变量时间序列、事件日志和跨源事件流。Temporian 的核心计算部分使用 C++ 实现，并针对时间序列数据进行了优化，能够在处理时间序列数据时比其他数据处理库快 1000 倍以上。

Temporian 的主要特点包括：

支持多种类型的时间序列数据。
针对时间序列数据进行了优化。
易于集成到现有的机器学习生态系统中。
防止未来数据泄露。

2. 项目快速启动

安装

使用 pip 从 PyPI 安装 Temporian：

pip install temporian -U

最小示例

考虑一个包含销售记录的 CSV 文件，其中包含时间戳、商店和收入信息。我们的目标是计算每个商店在工作日晚上 11 点的总收入。

import temporian as tp

# 加载销售交易数据
sales = tp.from_csv("sales.csv")

# 按商店索引销售数据
sales_per_store = sales.add_index("store")

# 列出工作日
days = sales_per_store.tick_calendar(hour=22)
work_days = (days.calendar_day_of_week() <= 5).filter()
work_days.plot(max_num_plots=1)

# 按商店和工作日聚合收入
daily_revenue = sales_per_store["revenue"].moving_sum(tp.duration.days(1), sampling=work_days).rename("daily_revenue")

# 绘制结果
daily_revenue.plot(max_num_plots=3)

# 将结果导出为 Pandas DataFrame
tp.to_pandas(daily_revenue)