Beyond Caption To Narrative: Video Captioning With Multiple Sentences
(Submitted on 18 May 2016)
Recent advances in image captioning task have led to increasing interests in video captioning task. However, most works on video captioning are focused on generating single input of aggregated features, which hardly deviates from image captioning process and does not fully take advantage of dynamic contents present in videos. We attempt to generate video captions that convey richer contents by temporally segmenting the video with action localization, generating multiple captions from multiple frames, and connecting them with natural language processing techniques, in order to generate a story-like caption. We show that our proposed method can generate captions that are richer in contents and can compete with state-of-the-art method without explicitly using video-level features as input.

本文提出了一种新的视频描述生成方法,通过时间分割视频、定位动作、从多个帧生成多个句子,并利用自然语言处理技术连接这些句子以形成类似故事的描述。这种方法能够生成内容更丰富的视频描述。
1万+

被折叠的 条评论
为什么被折叠?



