一、软件介绍
文末提供程序和源码下载
pytorch-frame开源程序适用于 PyTorch 的表格深度学习库,一个模块化深度学习框架,用于在异构表格数据上构建神经网络模型。
PyTorch Frame 是 PyTorch 的深度学习扩展,专为具有不同列类型(包括数字、分类、时间、文本和图像)的异构表格数据而设计。它为实现现有和未来的方法提供了一个模块化框架。该库包含来自最先进模型、用户友好的小批量加载器、基准测试数据集和自定义数据集成接口的方法。
二、Library Highlights 库亮点
PyTorch Frame builds directly upon PyTorch, ensuring a smooth transition for existing PyTorch users. Key features include:
PyTorch Frame 直接基于 PyTorch 构建,确保现有 PyTorch 用户能够顺利过渡。主要功能包括:
- Diverse column types: PyTorch Frame supports learning across various column types:
numerical,categorical,multicategorical,text_embedded,text_tokenized,timestamp,image_embedded, andembedding. See here for the detailed tutorial.
多种列类型:PyTorch Frame 支持跨各种列类型学习:numerical、categoricalmulticategoricaltext_embeddedtext_tokenizedtimestampimage_embeddedembedding和 。有关详细教程,请参阅此处。 - Modular model design: Enables modular deep learning model implementations, promoting reusability, clear coding, and experimentation flexibility. Further details in the architecture overview.
模块化模型设计:支持模块化深度学习模型实施,促进可重用性、清晰的编码和实验灵活性。有关更多详细信息,请参阅 体系结构概述. - Models Implements many state-of-the-art deep tabular models as well as strong GBDTs (XGBoost, CatBoost, and LightGBM) with hyper-parameter tuning.
模型 实现许多最先进的深度表格模型以及具有超参数优化的强大 GBDT(XGBoost、CatBoost 和 LightGBM)。 - Datasets: Comes with a collection of readily-usable tabular datasets. Also supports custom datasets to solve your own problem. We benchmark deep tabular models against GBDTs.
数据集:附带一组易于使用的表格数据集。还支持自定义数据集来解决您自己的问题。我们将深度表格模型与 GBDT 进行基准测试。 - PyTorch integration: Integrates effortlessly with other PyTorch libraries, facilitating end-to-end training of PyTorch Frame with downstream PyTorch models. For example, by integrating with PyG, a PyTorch library for GNNs, we can perform deep learning over relational databases. Learn more in RelBench and example code.
PyTorch 集成:轻松与其他 PyTorch 库集成,促进 PyTorch Frame 与下游 PyTorch 模型的端到端训练。例如,通过与 PyG(一个用于 GNN 的 PyTorch 库)集成,我们可以对关系数据库执行深度学习。在 RelBench 和示例代码中了解更多信息。
三、Architecture Overview 架构概述
Models in PyTorch Frame follow a modular design of FeatureEncoder, TableConv, and Decoder, as shown in the figure below:
PyTorch Frame 中的模型遵循 FeatureEncoder 、 、 TableConv 和 Decoder 的模块化设计,如下图所示:

In essence, this modular setup empowers users to effortlessly experiment with myriad architectures:
从本质上讲,这种模块化设置使用户能够毫不费力地尝试各种架构:
Materializationhandles converting the raw pandasDataFrameinto aTensorFramethat is amenable to Pytorch-based training and modeling.
Materialization处理将原始 pandas 转换为TensorFrame适合基于 Pytorch 的训练和建模的 pandasDataFrame。FeatureEncoderencodesTensorFrameinto hidden column embeddings of size[batch_size, num_cols, channels].
FeatureEncoder编码TensorFrame为 size[batch_size, num_cols, channels]的隐藏列嵌入向量。TableConvmodels column-wise interactions over the hidden embeddings.
TableConv对隐藏嵌入的逐列交互进行建模。Decodergenerates embedding/prediction per row.
Decoder每行生成嵌入/预测。
四、Quick Tour 快速浏览
In this quick tour, we showcase the ease of creating and training a deep tabular model with only a few lines of code.
在这个快速导览中,我们展示了仅使用几行代码创建和训练深度表格模型的便利性。
Build and train your own deep tabular model
构建和训练您自己的深度表格模型
As an example, we implement a simple ExampleTransformer following the modular architecture of Pytorch Frame. In the example below:
例如,我们按照 Pytorch Frame 的模块化架构实现了一个简单的 ExampleTransformer 。在下面的示例中:
self.encodermaps an inputTensorFrameto an embedding of size[batch_size, num_cols, channels].
self.encoder将 inputTensorFrame映射到 size[batch_size, num_cols, channels]的嵌入向量。self.convsiteratively transforms the embedding of size[batch_size, num_cols, channels]into an embedding of the same size.
self.convs迭代地将 size[batch_size, num_cols, channels]

最低0.47元/天 解锁文章
2383

被折叠的 条评论
为什么被折叠?



