CVPR 2025图像/视频/3D生成论文汇总（附论文呢/代码）

最新推荐文章于 2025-06-15 19:42:26 发布

自动驾驶之心

最新推荐文章于 2025-06-15 19:42:26 发布

阅读量2.3k

点赞数 3

CC 4.0 BY-SA版权

文章标签： 3d

原文链接：https://mp.weixin.qq.com/s?__biz=Mzg2NzUxNTU1OA==&mid=2247657082&idx=3&sn=05b6206d0ec088a197266ebd665f27c0&chksm=cf5b1d197f76940e3faf78b63102734fc7db1db87f5d9494794ea6c8bd21e03b97ce301cb7ca&scene=126&sessionid=0

作者 | Kobay 编辑 | 自动驾驶之心

原文链接：https://zhuanlan.zhihu.com/p/27979298565

点击下方卡片，关注“自动驾驶之心”公众号

戳我-> 领取自动驾驶近15个方向学习路线

>>点击进入→自动驾驶之心『CVPR 2025』技术交流群

本文只做学术分享，如有侵权，联系删文

Awesome-CVPR2025-AIGC

A Collection of Papers and Codes for CVPR2025 AIGC

整理汇总下2025年CVPR AIGC相关的论文和代码，具体如下。

最新修改版本会首先更新在Github，欢迎star，fork和PR~

也欢迎对AIGC相关任务感兴趣的朋友一块更新～

github.com/Kobaayyy/Awesome-CVPR2025-CVPR2024-ECCV2024-AIGC/blob/main/CVPR2025.md

论文接收公布时间：2025年2月27日

【Contents】

图像生成(Image Generation/Image Synthesis)
图像编辑（Image Editing)
视频生成(Video Generation/Image Synthesis)
视频编辑(Video Editing)
3D生成(3D Generation/3D Synthesis)
3D编辑(3D Editing)
多模态大语言模型(Multi-Modal Large Language Model)
其他多任务(Others)

1.图像生成(Image Generation/Image Synthesis)

Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

Paper: https://arxiv.org/abs/2411.17787
Code: https://github.com/czg1225/CoDe

Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification

Paper: https://arxiv.org/abs/2408.16266
Code: https://github.com/scuwyh2000/Diff-II

Parallelized Autoregressive Visual Generation

Paper: https://arxiv.org/abs/2412.15119
Code: https://github.com/Epiphqny/PAR

PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation

Paper: https://arxiv.org/abs/2412.03177
Code: https://github.com/hqhQAQ/PatchDPO

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Paper: https://arxiv.org/abs/2501.01423
Code: https://github.com/hustvl/LightningDiT

Rectified Diffusion Guidance for Conditional Generation

Paper: https://arxiv.org/abs/2410.18737
Code: https://github.com/thuxmf/recfg

SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models

Paper: https://arxiv.org/abs/2403.09055
Code: https://github.com/ironjr/semantic-draw

SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models

Paper: https://arxiv.org/abs/2412.04852
Code: https://github.com/taco-group/SleeperMark

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

Paper: https://arxiv.org/abs/2412.03069
Code: https://github.com/ByteFlow-AI/TokenFlow

2.图像编辑(Image Editing)

Attention Distillation: A Unified Approach to Visual Characteristics Transfer

Paper: https://arxiv.org/abs/2502.20235
Code: https://github.com/xugao97/AttentionDistillation

Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing

Paper: https://arxiv.org/abs/2411.16832
Code: https://github.com/taco-group/FaceLock

EmoEdit: Evoking Emotions through Image Manipulation

Paper: https://arxiv.org/abs/2405.12661
Code: https://github.com/JingyuanYY/EmoEdit

K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs

Paper: https://arxiv.org/abs/2502.18461
Code: https://github.com/HVision-NKU/K-LoRA

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

Paper: https://arxiv.org/abs/2412.08503
Code: https://github.com/Westlake-AGI-Lab/StyleStudio

3.视频生成(Video Generation/Video Synthesis)

ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way

Paper: https://arxiv.org/abs/2410.06241
Code: https://github.com/Bujiazi/ByTheWay

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

Paper: https://arxiv.org/abs/2411.17440
Code: https://github.com/PKU-YuanGroup/ConsisID

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

Paper: https://arxiv.org/abs/2412.09283
Code: https://github.com/NJU-PCALab/InstanceCap

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Paper: https://arxiv.org/abs/2411.17459
Code: https://github.com/PKU-YuanGroup/WF-VAE

4.视频编辑(Video Editing)

Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models

Paper: https://arxiv.org/abs/2407.15642
Code: https://github.com/maxin-cn/Cinemo

Generative Inbetweening through Frame-wise Conditions-Driven Video Generation

Paper: https://arxiv.org/abs/2412.11755
Code: https://github.com/Tian-one/FCVG

X-Dyna: Expressive Dynamic Human Image Animation

Paper: https://arxiv.org/abs/2501.10021
Code: https://github.com/bytedance/X-Dyna

5.3D生成(3D Generation/3D Synthesis)

Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation

Paper: https://arxiv.org/abs/2411.16185
Code: https://github.com/YuQiao0303/Fancy123

Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

Paper: https://arxiv.org/abs/2501.13928
Code: https://github.com/facebookresearch/fast3r

GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation

Paper: https://arxiv.org/abs/2406.06526
Code: https://github.com/hzxie/GaussianCity

LT3SD: Latent Trees for 3D Scene Diffusion

Paper: https://arxiv.org/abs/2409.08215
Code: https://github.com/quan-meng/lt3sd

Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture

Paper: https://arxiv.org/abs/2503.00495
Code: https://github.com/XuanchenLi/TexTalk

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale

Paper: https://arxiv.org/abs/2412.06699
Code: https://github.com/baaivision/See3D

6.3D编辑(3D Editing)

DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters

Paper: https://arxiv.org/abs/2411.17423
Code: https://github.com/yisuanwang/DRiVE

FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video

Paper: https://arxiv.org/abs/2411.15604
Code: https://github.com/zjwfufu/FateAvatar

Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters

Paper: https://arxiv.org/abs/2411.18197
Code: https://github.com/jasongzy/Make-It-Animatable

7.多模态大语言模型(Multi-Modal Large Language Models)

Automated Generation of Challenging Multiple Choice Questions for Vision Language Model Evaluation

Paper: https://arxiv.org/abs/2501.03225
Code: https://github.com/yuhui-zh15/AutoConverter

RAP-MLLM: Retrieval-Augmented Personalization for Multimodal Large Language Model

Paper: https://arxiv.org/abs/2410.13360
Code: https://github.com/Hoar012/RAP-MLLM

SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model

Paper: https://arxiv.org/abs/2412.01550
Code: https://github.com/hq-King/SeqAfford

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper: https://arxiv.org/abs/2411.17465
Code: https://github.com/showlab/ShowUI

8.其他任务(Others)

Continuous and Locomotive Crowd Behavior Generation

Paper:
Code: https://github.com/InhwanBae/Crowd-Behavior-Generation

Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Paper: https://arxiv.org/abs/2412.15322
Code: https://github.com/hkchengrex/MMAudio

持续更新~

① 自动驾驶论文辅导来啦

② 国内首个自动驾驶学习社区

『自动驾驶之心知识星球』近4000人的交流社区，已得到大多数自动驾驶公司的认可！涉及30+自动驾驶技术栈学习路线，从0到一带你入门自动驾驶感知（端到端自动驾驶、世界模型、仿真闭环、2D/3D检测、语义分割、车道线、BEV感知、Occupancy、多传感器融合、多传感器标定、目标跟踪）、自动驾驶定位建图（SLAM、高精地图、局部在线地图）、自动驾驶规划控制/轨迹预测等领域技术方案、大模型，更有行业动态和岗位发布！欢迎扫描加入

③全网独家视频课程

端到端自动驾驶、仿真测试、自动驾驶C++、BEV感知、BEV模型部署、BEV目标跟踪、毫米波雷达视觉融合、多传感器标定、多传感器融合、多模态3D目标检测、车道线检测、轨迹预测、在线高精地图、世界模型、点云3D目标检测、目标跟踪、Occupancy、CUDA与TensorRT模型部署、大模型与自动驾驶、NeRF、语义分割、自动驾驶仿真、传感器部署、决策规划、轨迹预测等多个方向学习视频（扫码即可学习）