Beyond Caption To Narrative: Video Captioning With Multiple Sentences

最新推荐文章于 2024-03-30 09:38:37 发布

转载最新推荐文章于 2024-03-30 09:38:37 发布 · 788 阅读

paper reading 同时被 2 个专栏收录

85 篇文章

订阅专栏

caption

16 篇文章

订阅专栏

本文提出了一种新的视频描述生成方法，通过时间分割视频、定位动作、从多个帧生成多个句子，并利用自然语言处理技术连接这些句子以形成类似故事的描述。这种方法能够生成内容更丰富的视频描述。

Beyond Caption To Narrative: Video Captioning With Multiple Sentences

Andrew Shin, Katsunori Ohnishi, Tatsuya Harada

(Submitted on 18 May 2016)

Recent advances in image captioning task have led to increasing interests in video captioning task. However, most works on video captioning are focused on generating single input of aggregated features, which hardly deviates from image captioning process and does not fully take advantage of dynamic contents present in videos. We attempt to generate video captions that convey richer contents by temporally segmenting the video with action localization, generating multiple captions from multiple frames, and connecting them with natural language processing techniques, in order to generate a story-like caption. We show that our proposed method can generate captions that are richer in contents and can compete with state-of-the-art method without explicitly using video-level features as input.

Comments:	accepted to ICIP 2016
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1605.05440 [cs.CV]
	(or arXiv:1605.05440v1 [cs.CV] for this version)