PDVC 项目使用教程-优快云博客

PDVC 项目使用教程

PDVC End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021) 项目地址: https://gitcode.com/gh_mirrors/pd/PDVC

1. 项目介绍

PDVC（Parallel Decoding Video Captioning）是一个用于端到端密集视频字幕生成的开源项目。该项目通过并行解码的方式，将密集视频字幕生成任务转化为一个集合预测任务。PDVC 支持两个主要任务：密集视频字幕生成和视频段落字幕生成，并且支持两个数据集：ActivityNet Captions 和 YouCook2。

PDVC 的主要特点包括：

支持多种视频特征（C3D、TSN、TSP）。
提供预训练模型，可以直接用于生成视频字幕。
支持中文和其他非英语语言的字幕生成。
提供可视化工具，可以直接在视频中嵌入生成的字幕。

2. 项目快速启动

环境准备

确保你的环境满足以下要求：

Linux 系统
GCC >= 5.4
CUDA >= 9.2
Python >= 3.7
PyTorch >= 1.5.1

克隆项目

首先，克隆 PDVC 项目到本地：

git clone --recursive https://github.com/ttengwang/PDVC.git
cd PDVC

创建虚拟环境

使用 Conda 创建并激活虚拟环境：

conda create -n PDVC python=3.7
source activate PDVC
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch
pip install -r requirement.txt

编译 Deformable Attention 层

编译 Deformable Attention 层：

cd pdvc/ops
sh make.sh

运行 PDVC 进行视频字幕生成

下载预训练模型并将其放置在 save/ 目录下，然后运行以下命令生成视频字幕：

video_folder=visualization/videos
output_folder=visualization/output
pdvc_model_path=save/anet_tsp_pdvc/model-best.pth
output_language=en

bash test_and_visualize.sh $video_folder $output_folder $pdvc_model_path $output_language

生成的字幕将嵌入到视频中，并保存到 output_folder 目录下。