跨媒体_fuxin607的博客-优快云博客

跨媒体

关注

文章平均质量分 59

关注数：文章数：23 文章阅读量：51504 文章收藏量：126

作者: fuxin607

https://xinfu607.github.io/

展开

专栏收录文章

跨模态预训练迁移

1.ViLD，Zero-Shot Detection via Vision and Language Knowledge Distillation。code2.OVR-CNN，Open-Vocabulary Object Detection Using Captions[CVPR2021]。code3.LSeg，Language-driven Semantic Segmentation[ICLR2022]。code4.OpenSeg，Open-Vocabulary Image Segmentatio.

原创 2022-02-17 19:50:36 · 1932 阅读 · 0 评论
跨模态预训练

1.LXMERT，LXMERT: Learning Cross-Modality Encoder Representations from Transformers[EMNLP2019]。[code]（https://github.com/airsplay/lxmert）

原创 2021-12-11 17:19:01 · 3823 阅读 · 1 评论
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

这是CVPR2018 Oral的一篇关于 Image Captioning和Visual Question Answering的文章，paper链接https://arxiv.org/abs/1707.07998，作者的homepage http://www.panderson.me/，code已经被released出来了https://github.com/peteanderson80/bott...

原创 2018-04-28 16:11:31 · 1044 阅读 · 0 评论
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

这是CVPR2018一篇关于文本到图像合成的文章，paper链接https://arxiv.org/abs/1711.10485，code已经released出来了https://github.com/taoxugit/AttnGAN，作者的homepage https://sites.google.com/view/taoxu。...

原创 2018-05-17 21:08:53 · 3391 阅读 · 0 评论
image caption研究进展

主要介绍image caption最近的几篇文章，及其相关的应用。1.Google NIC，Show and Tell: A Neural Image Caption Generator [CVPR2015]。2.Hard(soft)-Attention，Show, Attend and Tell: Neural Image Caption Generation with Visual A...

原创 2019-11-08 12:10:20 · 801 阅读 · 0 评论
ECCV2018比较有意思的paper

Double JPEG Detection in Mixed JPEG Quality Factors using Deep Convolutional Neural Network Fighting Fake News: Image Splice Detection via Learned Self-Consistency Face De-Spoofing: Anti-Spoofing vi...

原创 2018-09-25 14:28:41 · 2446 阅读 · 0 评论
计算机视觉方向如何写文章

TitleAbstractIntroductionRelated WorkProposed methodsExperimentsConclusion and Future WorkAcknowledgements

原创 2018-10-23 18:45:13 · 9218 阅读 · 0 评论
跨媒体分析中的新任务

1.Measuring Depression Symptom Severity from Spoken Language and 3D Facial Expressions, https://arxiv.org/pdf/1811.08592.pdf.2.Composing Text and Image for Image Retrieval - An Empirical Odyssey, ht...

原创 2018-12-20 17:10:26 · 1114 阅读 · 2 评论
Baby Talk and Neural Baby Talk

Baby Talk: Understanding and Generating Image DescriptionsNeural Baby Talk

原创 2019-05-16 21:22:54 · 565 阅读 · 0 评论
Finding “It”: Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos

这是CVPR2018 Oral的一片关于Weakly-Supervised Video Grounding的文章，paper连接http://ai.stanford.edu/~dahuang/papers/cvpr18-ramil.pdf，作者的homepage http://ai.stanford.edu/~dahuang/，code暂时没有被released出来。文章要做的事情：输入：...

原创 2018-05-05 10:51:51 · 929 阅读 · 0 评论
TALL: Temporal Activity Localization via Language Query

这是ICCV2017 Spotlight的一篇关于temporal activity localization via language query in an untrimmed video的文章，paper连接https://arxiv.org/abs/1705.02101，作者的homepage https://jiyanggao.github.io/，code已经被released出来了h...

原创 2018-04-27 08:49:59 · 1675 阅读 · 0 评论
Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training

这是2018年4月24日用GAN和reinforcement learning（RL）做poetry generation from image的文章，paper连接https://arxiv.org/abs/1804.08473，暂时还没有找到作何的主页和相关的code. 文章要所的事情：输入：image　　　　　　输出：poetry 文章中show出来的example。文章与...

原创 2018-04-26 20:03:54 · 845 阅读 · 0 评论
Watch,Listen,and Describe:Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

这是NAACL2018的一篇关于video caption（CV与NLP结合）的文章，paper链接https://arxiv.org/abs/1804.05448，一作是加州大学圣塔芭芭拉分校（UCSB）的PHD，作者的homepage http://www.cs.ucsb.edu/~xwang/，code还没有被released出来（作者没有release code的习惯）。个人瞎扯：看...

原创 2018-04-18 09:12:42 · 739 阅读 · 1 评论
An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep RL

这是一篇做关于用reinfocement learning（RL）做Natural Language Object Retrieval的文章，paper的链接https://arxiv.org/abs/1703.07579，没有找到作者的homepage，但是code已经released出来了https://github.com/jxwufan/NLOR_A3C。文章要做的事情：输入：te...

原创 2018-04-22 10:29:02 · 547 阅读 · 0 评论
Text2Colors: Guiding Image Colorization through Text-Driven Palette Generation

arxiv上面2018年4月13号更新的韩国高丽大学的关于跨媒体（NLP与CV结合）的文章，一作是个研究生，团队主页http://davian.korea.ac.kr，文章链接https://arxiv.org/pdf/1804.04128.pdf，看文章的格式应该是ECCV2018在投，作者已经将pytorch code和dataset released在github上面了https://git...

原创 2018-04-13 15:13:39 · 939 阅读 · 0 评论
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models

CVPR2018的一篇关于跨媒体检索的文章，paper链接https://arxiv.org/abs/1711.06420，一作是南洋理工大学的PHD，作者的homepage http://jxgu.cc/，code已经被released出来了https://github.com/ujiuxiang/NLP_Practice.PyTorch/tree/master/cross_modal_retr...

原创 2018-04-19 08:18:36 · 2742 阅读 · 12 评论
Cross-Modal Retrieval in the Cooking Context：Learning Semantic Text-Image Embeddings

这是 ACM SIGIR2018的一篇做cross-modal retrieval的文章，paper链接 https://arxiv.org/pdf/1804.11146.pdf，作者是巴黎第六大学的PHD，作者的homepage http://webia.lip6.fr/~carvalho/static/home/，code暂时没有被released出来。文章要做的事情(recipe ret...

原创 2018-05-02 10:12:36 · 511 阅读 · 0 评论
文本图像跨媒体检索进展

主要介绍9篇关于文本图像双向检索任务的9篇论文。

原创 2018-05-15 11:07:54 · 12912 阅读 · 1 评论
Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning

这是CVPR2018 Oral的一篇关于做Visual Dialog Generation的文章，paper连接https://arxiv.org/abs/1711.07613，作者的homepage http://qi-wu.me/home.html，一作是University of Adelaide Chunhua Shen组的Assistant Professor，code暂时还没有被rel...

原创 2018-04-25 09:30:09 · 1006 阅读 · 0 评论
Show, Reward and Tell

这是AAAI2018用GAN和reinforcement learning（RL）做Photo Stream Story Telling的文章。paper链接https://pdfs.semanticscholar.org/977b/eecdf0b5c3487d03738cff501c79770f0858.pdf，暂时还没有找到作何的主页和相关的code，文章题目Show, Reward and ...

原创 2018-04-21 10:03:34 · 536 阅读 · 0 评论
Learning Cross-modal Embeddings for Cooking Recipes and Food Images

这是CVPR2017的一篇做cross-modal retrieval的文章，paper和相关数据代码链接http://im2recipe.csail.mit.edu/，作者的homepage https://imatge.upc.edu/web/people/amaia-salvador。文章要做的事情(recipe retreival)：输入：image（sentence）+datas...

原创 2018-05-03 16:48:37 · 739 阅读 · 0 评论
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

CVPR2018一篇关于Visual Question Answering Tricks的文章，作者是2017 VQA Challenge冠军团队成员之一，paper连接https://arxiv.org/abs/1708.02711，作者的homepage https://www.damienteney.info/adventures。文章要做的事情： visual question an...

原创 2018-05-04 09:37:39 · 1114 阅读 · 0 评论
Actor and Action Video Segmentation from a Sentence

CVPR2018 Oral的一篇关于跨媒体(Video与NLP结合)的文章，paper链接 https://arxiv.org/abs/1803.07485，一作是荷兰阿姆斯特丹大学的PHD，作者的homepage https://kgavrilyuk.github.io/，code和datasets还没有被released出来。个人瞎扯：这是我见过的第一篇发表出来的用NLP做video se...

原创 2018-04-16 08:40:36 · 1939 阅读 · 0 评论

跨媒体

作者: fuxin607

跨模态预训练迁移

跨模态预训练

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

image caption研究进展

ECCV2018比较有意思的paper

计算机视觉方向如何写文章

跨媒体分析中的新任务

Baby Talk and Neural Baby Talk

Finding “It”: Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos

TALL: Temporal Activity Localization via Language Query

Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training

Watch,Listen,and Describe:Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep RL

Text2Colors: Guiding Image Colorization through Text-Driven Palette Generation

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models

Cross-Modal Retrieval in the Cooking Context：Learning Semantic Text-Image Embeddings

文本图像跨媒体检索进展

Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning

Show, Reward and Tell

Learning Cross-modal Embeddings for Cooking Recipes and Food Images

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

Actor and Action Video Segmentation from a Sentence