AAAI 2024 |
---|
Fewer Steps, Better Performance: Efficient Cross-Modal Clip Trimming for Video Moment Retrieval Using Language. |
Commonsense for Zero-Shot Natural Language Video Localization. |
Transferable Video Moment Localization by Moment-Guided Query Prompting. |
VVS: Video-to-Video Retrieval with Irrelevant Frame Suppression. |
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval. |
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning |
CoVR: Learning Composed Video Retrieval from Web Video Captions. |
GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval. partially relevant video retrieval (PRVR) |
DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval. |
CVPR 2024 |
---|
VTimeLLM: Empower LLM to Grasp Video Moments (VMR) |
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval. |
Holistic Features are Almost Sufficient for Text-to-Video Retrieval |
MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval. |
ICCV2023 workshop? |
---|
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models |
LLaViLo: Boosting Video Moment Retrieval via Adapter-Based Multimodal Modeling |
An empirical study of the effect of video encoders on Temporal Video Grounding |
ICML2024 |
---|
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent) |
IJCAI 2024 |
---|
Contrastive Transformer Cross-Modal Hashing for Video-Text Retrieval. |
Expressiveness is Effectiveness: Self-supervised Fashion-aware CLIP for Video-to-Shop Retrieval |
NIPS 2024 |
---|
Diffusion-Inspired Truncated Sampler for Text-Video Retrieval |
GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching |
VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding |
SlowFocus: Enhancing Fine-grained Temporal Understanding in Video LLM |
Ad Auctions for LLMs via Retrieval Augmented Generation. |
RAGraph: A General Retrieval-Augmented Graph Learning Framework |
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval |
WikiDO: A New Benchmark Evaluating Cross-Modal Retrieval for Vision-Language Model |
ACL 2024 |
---|
Exploiting Intrinsic Multilateral Logical Rules for Weakly Supervised Natural Language Video Localization |
MM2024 |
---|
Not All Inputs Are Valid: Towards Open-Set Video Moment Retrieval using Language |
MPT: Multi-grained Prompt Tuning for Text-Video Retrieval |
Maskable Retentive Network for Video Moment Retrieval |
Multi-Modal Inductive Framework for Text-Video Retrieval |
Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video Retrieval |
T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval |
Adaptively Building a Video-language Model for Video Captioning and Retrieval without Massive Video Pretraining |
Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval |
Learnable Negative Proposals Using Dual-Signed Cross-Entropy Loss for Weakly Supervised Video Moment Localization |
Similarity Preserving Transformer Cross-Modal Hashing for Video-Text Retrieval |
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval |
Explicit Granularity and Implicit Scale Correspondence Learning for Point-Supervised Video Moment Localization |
Hierarchical Debiasing and Noisy Correction for Cross-domain Video Tube Retrieval |
TVPR: Text-to-Video Person Retrieval and a New Benchmark |
TIP 2024 (IEEE Transactions on Image Processing) |
---|
Enhancing Text-Video Retrieval Performance With Low-Salient but Discriminative Objects. |
SIGIR 2024 |
---|
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval. |
Predicting Micro-video Popularity via Multi-modal Retrieval Augmentation. |