TBCNN 项目使用教程-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00062/article/details/139617492

TBCNN 项目使用教程

tbcnn Efficient tree-based convolutional neural networks in TensorFlow 项目地址: https://gitcode.com/gh_mirrors/tb/tbcnn

1. 项目介绍

TBCNN（Tree-based Convolutional Neural Networks）是一个基于 TensorFlow 实现的树形卷积神经网络模型。该项目的主要目的是处理编程语言处理任务，如代码分类和代码生成。TBCNN 基于论文 "Convolutional Neural Networks over Tree Structures for Programming Language Processing" 实现，但在某些方面有所不同，例如没有实现论文中的 "coding layer"，而是使用 Adam 优化器代替梯度下降。

2. 项目快速启动

2.1 环境准备

首先，确保你已经安装了 Python 2.x 版本，并创建一个虚拟环境：

pip install virtualenv
virtualenv -p /usr/bin/python2 venv
source venv/bin/activate

2.2 安装依赖

在虚拟环境中安装项目所需的依赖包：

pip install -r requirements.txt
python setup.py develop

2.3 数据爬取

创建一个 GitHub 访问令牌，并配置爬虫：

cp crawler/config.sample.json crawler/config.json
(vim|emacs|nano) crawler/config.json

在 config.json 中添加你的用户名和访问令牌。然后，下载算法数据并解析语法树：

mkdir crawler/data
crawl algorithms --out crawler/data/algorithms.pkl

2.4 数据向量化

从 GitHub 数据中采样 AST 节点，并将其转换为向量嵌入：

mkdir sampler/data
sample nodes --in crawler/data/algorithms.pkl \
             --out sampler/data/algorithm_nodes.pkl

mkdir vectorizer/data
vectorize ast2vec --in sampler/data/algorithm_nodes.pkl \
                 --out vectorizer/data/vectors.pkl \
                 --checkpoint vectorizer/logs/algorithms

2.5 模型训练与测试

采样小树并进行训练和测试：

sample trees --in crawler/data/algorithms.pkl \
             --out sampler/data/algorithm_trees.pkl \
             --maxsize 2000 \
             --test 30

classify train tbcnn --in sampler/data/algorithm_trees.pkl \
                     --logdir classifier/logs/1 \
                     --embed vectorizer/data/vectors.pkl

classify test tbcnn --in sampler/data/algorithm_trees.pkl \
                    --logdir classifier/logs/1 \
                    --embed vectorizer/data/vectors.pkl