Video retrieval (2024)--AAAI,CVPR-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_41825704/article/details/145598819
                    
                        
                    
                    AAAI 2024
Fewer Steps, Better Performance: Efficient Cross-Modal Clip Trimming for Video Moment Retrieval Using Language.
Commonsense for Zero-Shot Natural Language Video Localization.
Transferable Video Moment Localization by Moment-Guided Query Prompting.
VVS: Video-to-Video Retrieval with Irrelevant Frame Suppression.
TD²-Net: Toward Denoising and Debiasing for Video Scene Graph Generation .
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval.
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
CoVR: Learning Composed Video Retrieval from Web Video Captions.
GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval. partially relevant video retrieval (PRVR)
DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.
 
CVPR 2024
VTimeLLM: Empower LLM to Grasp Video Moments (VMR)
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval.
Holistic Features are Almost Sufficient for Text-to-Video Retrieval
MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval.
 
ICCV2023 workshop?
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
LLaViLo: Boosting Video Moment Retrieval via Adapter-Based Multimodal Modeling
An empirical study of the effect of video encoders on Temporal Video Grounding
 
ICML2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
 
IJCAI 2024
Contrastive Transformer Cross-Modal Hashing for Video-Text Retrieval.
Expressiveness is Effectiveness: Self-supervised Fashion-aware CLIP for Video-to-Shop Retrieval
 
NIPS 2024
Diffusion-Inspired Truncated Sampler for Text-Video Retrieval
GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching
VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding
SlowFocus: Enhancing Fine-grained Temporal Understanding in Video LLM
Ad Auctions for LLMs via Retrieval Augmented Generation.
RAGraph: A General Retrieval-Augmented Graph Learning Framework
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval
WikiDO: A New Benchmark Evaluating Cross-Modal Retrieval for Vision-Language Model
 
ACL 2024
Exploiting Intrinsic Multilateral Logical Rules for Weakly Supervised Natural Language Video Localization
 
MM2024
Not All Inputs Are Valid: Towards Open-Set Video Moment Retrieval using Language
MPT: Multi-grained Prompt Tuning for Text-Video Retrieval
Maskable Retentive Network for Video Moment Retrieval
Multi-Modal Inductive Framework for Text-Video Retrieval
Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video Retrieval
T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
Adaptively Building a Video-language Model for Video Captioning and Retrieval without Massive Video Pretraining
Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval
Learnable Negative Proposals Using Dual-Signed Cross-Entropy Loss for Weakly Supervised Video Moment Localization
Similarity Preserving Transformer Cross-Modal Hashing for Video-Text Retrieval
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval
Explicit Granularity and Implicit Scale Correspondence Learning for Point-Supervised Video Moment Localization
Hierarchical Debiasing and Noisy Correction for Cross-domain Video Tube Retrieval
TVPR: Text-to-Video Person Retrieval and a New Benchmark
 
TIP 2024 (IEEE Transactions on Image Processing)
Enhancing Text-Video Retrieval Performance With Low-Salient but Discriminative Objects.
 
SIGIR 2024
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval.
Predicting Micro-video Popularity via Multi-modal Retrieval Augmentation.