作者 | qian 编辑 | 自动驾驶之心
原文链接:https://zhuanlan.zhihu.com/p/1922228114404143784
点击下方卡片,关注“自动驾驶之心”公众号
>>自动驾驶前沿信息获取→自动驾驶之心知识星球
本文只做学术分享,如有侵权,联系删文
视觉大语言模型
综述汇总
智能交通和自动驾驶中的 LLM:https://github.com/ge25nab/Awesome-VLM-AD-ITS
AIGC 和 LLM:https://github.com/coderonion/awesome-llm-and-aigc
视觉语言模型综述:https://github.com/jingyi0000/VLM_survey
用于 CLIP 等视觉语言模型的出色提示 / 适配器学习方法:https://github.com/zhengli97/Awesome-Prompt-Adapter-Learning-for-VLMs
LLM/VLM 推理论文列表,并附有代码:https://github.com/DefTruth/Awesome-LLM-Inference
大型模型安全、安保和隐私的阅读清单(包括 Awesome LLM security、safety 等):https://github.com/ThuCCSLab/Awesome-LM-SSP
关于单 / 多智能体、机器人、llm/vlm/mla、科学发现等的知识库:https://github.com/weleen/awesome-agent
关于 Embodied AI 和相关研究 / 行业驱动资源的精选论文列表:https://github.com/haoranD/Awesome-Embodied-AI
一份精心策划的推理策略和算法列表,可提高视觉语言模型(VLM)的性能:https://github.com/Patchwork53/awesome-vlm-inference-strategies
著名的视觉语言模型及其架构:https://github.com/gokayfem/awesome-vlm-architectures
基础理论
预训练
[arxiv 2024] RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness
[CVPR 2024] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
[CVPR 2024] Do Vision and Language Encoders Represent the World Similarly?
[CVPR 2024] Efficient Vision-Language Pre-training by Cluster Masking
[CVPR 2024] Non-autoregressive Sequence-to-Sequence Vision-Language Models
[CVPR 2024] VTamin: Designing Scalable Vision Models in the Vision-Language Era
[CVPR 2024] Iterated Scoring Improves Compositionality in Large Vision-Language Models
[CVPR 2024] FairCLIP: Harnessing Fairness in Vision-Language Learning
[CVPR 2024] InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
[CVPR 2024] CVLA: On Pre-training for Visual Language Models
[CVPR 2024] Generative Region-Language Pretraining for Open-Ended Object Detection
[CVPR 2024] Enhancing Vision-Language Pre-training with Rich Supervisions
[ICLR 2024] Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
[ICLR 2024] MIMIC: Empowering Vision-language Model with Multi-Modal in-Context Learning
[ICLR 2024] Retrieval-Enhanced Contrastive Vision-Text Models
迁移学习方法
[NeurIPS 2024] Historical Test-time Prompt Tuning for Vision Foundation Models
[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
[IJCV 2024] Progressive Visual Prompt Learning with Contrastive Feature Re-formation
[ECCV 2024] CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
[ECCV 2024] FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
[ECCV 2024] GalOP: Learning Global and Local Prompts for Vision-Language Models
[ECCV 2024] Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
[CVPR 2024] Towards Better Vision-Inspired Vision-Language Models
[CVPR 2024] One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
[CVPR 2024] Any-Shot Prompting for Generalization over Distributions
[CVPR 2024] A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models
[CVPR 2024] Anchor-based Robust Finetuning of Vision-Language Models
[CVPR 2024] Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners
[CVPR 2024] Visual In-Context Prompting
[CVPR 2024] TCP: Textual-based Class-aware Prompt Tuning for Visual-Language Model
[CVPR 2024] Efficient Test-Time Adaptation of Vision-Language Models
[CVPR 2024] Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
知识蒸馏(检测&分割&多任务)
[NeurIPS 2024] Open-Vocabulary Object Detection via Language Hierarchy
[CVPR 2024] RegionGPT: Towards Region Understanding Vision Language Model
[ICLR 2024] LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors
[ICLR 2024] Ins-DetCLIP: Aligning Detection Model to Follow Human-Language Instruction
[ICLR 2024] CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
[ICLR 2024] FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition
[ICLR 2024] AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection
[CVPR 2023] EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata
世界模型
HERMES: A Unified Self - Driving World Model for Simutaneous 3D Scene Understanding and Generation
统一的驾驶世界模型 ——HERMES: 无缝整合了 3D 场景理解和未来场景演化 (生成)A Survey of World Models for Autonomous Driving
2025 年最新,自动驾驶中的世界模型全面综述DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT
Diffusion World Model
普林斯顿大学提出扩散世界模型DrivingGPT: Unifying Driving World Modeling and Planning with Multi - modal Autoregressive Transformers
DrivingGPT: 统一驾驶世界建模和规划Physical Informed Driving World Model
驾驶视频生成质量最新 SOTA! DrivePhysica: 一个创新符合物理原理的驾驶世界模型Understanding World or Predicting Future? A Comprehensive Survey of World Models
了解世界或预测未来?世界模型全面综述Navigation World Models
Meta 最新研究!导航世界模型(Navigation World Model, NWM),一种可控的视频生成模型,能够根据过去的观察和导航动作预测未来的视觉观测InfinityDrive: Breaking Time Limits in Driving World Models
第一个具有卓越泛化能力的驾驶世界模型:InfinityDriveExploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey
简介:探索自动驾驶中视频生成与世界模型之间的相互作用:一项调查DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation
首个利用视频生成模型改善驾驶场景 4D 重建的方法!DriveDreamer4D:利用世界模型先验知识增强了 4D 驾驶场景的表示Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving
驾驶在占用世界:通过自动驾驶的世界模型进行视觉为中心的 4D 占用预测和规划Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability
Vista:一个具有高保真度和多功能可控性的可泛化驾驶世界模型!Probing Multimodal LLMs as World Models for Driving
探索多模态 LLM 作为世界驾驶模型!DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving
自动驾驶各种任务性能全面提升!DriveWorld:通过自动驾驶世界模型进行 4D 预训练场景理解Prospective Role of Foundation Models in Advancing Autonomous Vehicles
大规模基础模型在自动驾驶中的应用和趋势DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
DriveDreamer-2:第一个能够生成定制驾驶视频的世界模型World Models for Autonomous Driving: An Initial Survey
自动驾驶中的世界模型
扩散模型
综述汇总
关于扩散模型的资源和论文集
https://github.com/diff-usion/Awesome-Diffusion-Models视频生成、编辑、恢复、理解等最新传播模型列表
https://github.com/showlab/Awesome-Video-Diffusion基于扩散的图像处理综述,包括恢复、增强、编码、质量评估
https://github.com/lixinustc/Awesome-diffusion-model-for-image-processing图扩散生成工作集合,包括论文、代码和数据集
https://github.com/yuntaoshou/Graph-Diffusion-Models-A-Comprehensive-Survey-of-Methods-and-ApplicationsEfficient Diffusion Models: A Comprehensive Survey from Principles to Practices [Paper]
Diffusion Models in 3D Vision: A Survey [Paper]
Conditional Image Synthesis with Diffusion Models: A Survey [Paper]
Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey [Paper]
A Survey on Diffusion Models for Recommender Systems [Paper]
Diffusion-Based Visual Art Creation: A Survey and New Perspectives [Paper]
Replication in Visual Diffusion Models: A Survey and Outlook [Paper]
Diffusion Model-Based Video Editing: A Survey [Paper]
Diffusion Models and Representation Learning: A Survey [Paper]
A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models [Paper]
Diffusion Models in Low-Level Vision: A Survey [Paper]
Video Diffusion Models: A Survey [Paper]
A Survey on Diffusion Models for Time Series and Spatio-Temporal Data [Paper]
Controllable Generation with Text-to-Image Diffusion Models: A Survey [Paper]
Diffusion Model-Based Image Editing: A Survey [Paper]
Diffusion Models, Image Super-Resolution And Everything: A Survey [Paper]
A Survey on Video Diffusion Models [Paper]
A Survey of Diffusion Models in Natural Language Processing [Paper]
端到端自动驾驶
主要介绍端到端自动驾驶研究论文集,持续跟踪 E2E 驾驶最新更新
链接1:https://github.com/opendilab/awesome-end-to-end-autonomous-driving#Overview-of-End-to-End-Driving-Method
链接2:https://github.com/Pranav-chib/Recent-Advancements-in-End-to-End-Autonomous-Driving-using-Deep-Learning
[CVPR 2024] Foundation Models for Autonomous Systems
[CVPR 2023] Workshop on End-to-end Autonomous Driving
[CVPR 2023] End-to-End Autonomous Driving: Perception, Prediction, Planning and Simulation
[ICRA 2023] Scalable Autonomous Driving
[NeurIPS 2022] Machine Learning for Autonomous Driving
[IROS 2022] Behavior-driven Autonomous Driving in Unstructured Environments
[ICRA 2022] Fresh Perspectives on the Future of Autonomous Driving Workshop
[NeurIPS 2021] Machine Learning for Autonomous Driving
[NeurIPS 2020] Machine Learning for Autonomous Driving
[CVPR 2020] Workshop on Scalability in Autonomous Driving
自动驾驶之心
论文辅导来啦
知识星球交流社区
近4000人的交流社区,近300+自动驾驶公司与科研结构加入!涉及30+自动驾驶技术栈学习路线,从0到一带你入门自动驾驶感知(大模型、端到端自动驾驶、世界模型、仿真闭环、3D检测、车道线、BEV感知、Occupancy、多传感器融合、多传感器标定、目标跟踪)、自动驾驶定位建图(SLAM、高精地图、局部在线地图)、自动驾驶规划控制/轨迹预测等领域技术方案、大模型,更有行业动态和岗位发布!欢迎加入。
独家专业课程
端到端自动驾驶、大模型、VLA、仿真测试、自动驾驶C++、BEV感知、BEV模型部署、BEV目标跟踪、毫米波雷达视觉融合、多传感器标定、多传感器融合、多模态3D目标检测、车道线检测、轨迹预测、在线高精地图、世界模型、点云3D目标检测、目标跟踪、Occupancy、CUDA与TensorRT模型部署、大模型与自动驾驶、NeRF、语义分割、自动驾驶仿真、传感器部署、决策规划、轨迹预测等多个方向学习视频
学习官网:www.zdjszx.com