CVPR 2025图像/视频/3D生成论文汇总(附论文呢/代码)

作者 | Kobay 编辑 | 自动驾驶之心

原文链接:https://zhuanlan.zhihu.com/p/27979298565

点击下方卡片,关注“自动驾驶之心”公众号

戳我-> 领取自动驾驶近15个方向学习路线

>>点击进入→自动驾驶之心『CVPR 2025』技术交流群

本文只做学术分享,如有侵权,联系删文

Awesome-CVPR2025-AIGC

A Collection of Papers and Codes for CVPR2025 AIGC

整理汇总下2025年CVPR AIGC相关的论文和代码,具体如下。

最新修改版本会首先更新在Github,欢迎star,fork和PR~

也欢迎对AIGC相关任务感兴趣的朋友一块更新~

github.com/Kobaayyy/Awesome-CVPR2025-CVPR2024-ECCV2024-AIGC/blob/main/CVPR2025.md

论文接收公布时间:2025年2月27日

【Contents】

  1. 图像生成(Image Generation/Image Synthesis)

  2. 图像编辑(Image Editing)

  3. 视频生成(Video Generation/Image Synthesis)

  4. 视频编辑(Video Editing)

  5. 3D生成(3D Generation/3D Synthesis)

  6. 3D编辑(3D Editing)

  7. 多模态大语言模型(Multi-Modal Large Language Model)

  8. 其他多任务(Others)

1.图像生成(Image Generation/Image Synthesis)

Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

  • Paper: https://arxiv.org/abs/2411.17787

  • Code: https://github.com/czg1225/CoDe

Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification

  • Paper: https://arxiv.org/abs/2408.16266

  • Code: https://github.com/scuwyh2000/Diff-II

Parallelized Autoregressive Visual Generation

  • Paper: https://arxiv.org/abs/2412.15119

  • Code: https://github.com/Epiphqny/PAR

PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation

  • Paper: https://arxiv.org/abs/2412.03177

  • Code: https://github.com/hqhQAQ/PatchDPO

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

  • Paper: https://arxiv.org/abs/2501.01423

  • Code: https://github.com/hustvl/LightningDiT

Rectified Diffusion Guidance for Conditional Generation

  • Paper: https://arxiv.org/abs/2410.18737

  • Code: https://github.com/thuxmf/recfg

SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models

  • Paper: https://arxiv.org/abs/2403.09055

  • Code: https://github.com/ironjr/semantic-draw

SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models

  • Paper: https://arxiv.org/abs/2412.04852

  • Code: https://github.com/taco-group/SleeperMark

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

  • Paper: https://arxiv.org/abs/2412.03069

  • Code: https://github.com/ByteFlow-AI/TokenFlow

2.图像编辑(Image Editing)

Attention Distillation: A Unified Approach to Visual Characteristics Transfer

  • Paper: https://arxiv.org/abs/2502.20235

  • Code: https://github.com/xugao97/AttentionDistillation

Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing

  • Paper: https://arxiv.org/abs/2411.16832

  • Code: https://github.com/taco-group/FaceLock

EmoEdit: Evoking Emotions through Image Manipulation

  • Paper: https://arxiv.org/abs/2405.12661

  • Code: https://github.com/JingyuanYY/EmoEdit

K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs

  • Paper: https://arxiv.org/abs/2502.18461

  • Code: https://github.com/HVision-NKU/K-LoRA

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

  • Paper: https://arxiv.org/abs/2412.08503

  • Code: https://github.com/Westlake-AGI-Lab/StyleStudio

3.视频生成(Video Generation/Video Synthesis)

ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way

  • Paper: https://arxiv.org/abs/2410.06241

  • Code: https://github.com/Bujiazi/ByTheWay

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

  • Paper: https://arxiv.org/abs/2411.17440

  • Code: https://github.com/PKU-YuanGroup/ConsisID

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

  • Paper: https://arxiv.org/abs/2412.09283

  • Code: https://github.com/NJU-PCALab/InstanceCap

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

  • Paper: https://arxiv.org/abs/2411.17459

  • Code: https://github.com/PKU-YuanGroup/WF-VAE

4.视频编辑(Video Editing)

Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models

  • Paper: https://arxiv.org/abs/2407.15642

  • Code: https://github.com/maxin-cn/Cinemo

Generative Inbetweening through Frame-wise Conditions-Driven Video Generation

  • Paper: https://arxiv.org/abs/2412.11755

  • Code: https://github.com/Tian-one/FCVG

X-Dyna: Expressive Dynamic Human Image Animation

  • Paper: https://arxiv.org/abs/2501.10021

  • Code: https://github.com/bytedance/X-Dyna

5.3D生成(3D Generation/3D Synthesis)

Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation

  • Paper: https://arxiv.org/abs/2411.16185

  • Code: https://github.com/YuQiao0303/Fancy123

Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

  • Paper: https://arxiv.org/abs/2501.13928

  • Code: https://github.com/facebookresearch/fast3r

GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation

  • Paper: https://arxiv.org/abs/2406.06526

  • Code: https://github.com/hzxie/GaussianCity

LT3SD: Latent Trees for 3D Scene Diffusion

  • Paper: https://arxiv.org/abs/2409.08215

  • Code: https://github.com/quan-meng/lt3sd

Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture

  • Paper: https://arxiv.org/abs/2503.00495

  • Code: https://github.com/XuanchenLi/TexTalk

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale

  • Paper: https://arxiv.org/abs/2412.06699

  • Code: https://github.com/baaivision/See3D

6.3D编辑(3D Editing)

DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters

  • Paper: https://arxiv.org/abs/2411.17423

  • Code: https://github.com/yisuanwang/DRiVE

FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video

  • Paper: https://arxiv.org/abs/2411.15604

  • Code: https://github.com/zjwfufu/FateAvatar

Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters

  • Paper: https://arxiv.org/abs/2411.18197

  • Code: https://github.com/jasongzy/Make-It-Animatable

7.多模态大语言模型(Multi-Modal Large Language Models)

Automated Generation of Challenging Multiple Choice Questions for Vision Language Model Evaluation

  • Paper: https://arxiv.org/abs/2501.03225

  • Code: https://github.com/yuhui-zh15/AutoConverter

RAP-MLLM: Retrieval-Augmented Personalization for Multimodal Large Language Model

  • Paper: https://arxiv.org/abs/2410.13360

  • Code: https://github.com/Hoar012/RAP-MLLM

SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model

  • Paper: https://arxiv.org/abs/2412.01550

  • Code: https://github.com/hq-King/SeqAfford

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

  • Paper: https://arxiv.org/abs/2411.17465

  • Code: https://github.com/showlab/ShowUI

8.其他任务(Others)

Continuous and Locomotive Crowd Behavior Generation

  • Paper:

  • Code: https://github.com/InhwanBae/Crowd-Behavior-Generation

Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

  • Paper: https://arxiv.org/abs/2412.15322

  • Code: https://github.com/hkchengrex/MMAudio

持续更新~

① 自动驾驶论文辅导来啦

efe7fc9044acc73a5521d7d7c7140515.jpeg

② 国内首个自动驾驶学习社区

『自动驾驶之心知识星球』近4000人的交流社区,已得到大多数自动驾驶公司的认可!涉及30+自动驾驶技术栈学习路线,从0到一带你入门自动驾驶感知端到端自动驾驶世界模型仿真闭环2D/3D检测、语义分割、车道线、BEV感知、Occupancy、多传感器融合、多传感器标定、目标跟踪)、自动驾驶定位建图SLAM、高精地图、局部在线地图)、自动驾驶规划控制/轨迹预测等领域技术方案大模型,更有行业动态和岗位发布!欢迎扫描加入

29b6ead29fcd8fecfecbaa7fae807ca8.png

 ③全网独家视频课程

端到端自动驾驶、仿真测试、自动驾驶C++、BEV感知、BEV模型部署、BEV目标跟踪、毫米波雷达视觉融合多传感器标定多传感器融合多模态3D目标检测车道线检测轨迹预测在线高精地图世界模型点云3D目标检测目标跟踪Occupancy、CUDA与TensorRT模型部署大模型与自动驾驶NeRF语义分割自动驾驶仿真、传感器部署、决策规划、轨迹预测等多个方向学习视频(扫码即可学习

ae1162e60c977c722ce3b05b5ad857b6.png

网页端官网:www.zdjszx.com

④【自动驾驶之心】全平台矩阵

0638c8090102e45de9dc76b5f26eee5c.png

### CVPR 2025 中与医学图像分割相关的研究和技术进展 尽管目前尚未有具体的 CVPR 2025 论文列表公开,但基于近年来的研究趋势以及已有的顶级会议成果(如 ECCV 2024 和 CVPR 2024),可以推测未来一年内医学图像分割领域可能的技术方向和发展重点。 #### 跨域公平性增强 在跨域医学图像分割方面,“FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification” 提出了实现跨域公平性的方法[^1]。该工作强调了如何减少因数据分布差异而导致的不公平现象,这对于临床应用尤为重要。预计 CVPR 2025 将进一步探索更高效的跨域适配算法,尤其是在半监督和无监督场景下的表现优化。 #### 半监督学习的新范式 “Adaptive Bidirectional Displacement for Semi-Supervised Medical Image Segmentation” 是一项针对半监督医学图像分割的重要贡献[^2]。此方法通过自适应双向位移机制显著提高了标注不足情况下的分割精度。随着医疗资源分配不均的问题日益突出,未来的半监督技术可能会更加注重模型泛化能力和计算效率之间的平衡。 #### 统一框架的设计理念 由 Yuhang Ding 等人提出的 S2VNet 展示了一个统一处理自动医学图像分割 (AMIS) 和交互式医学图像分割 (IMIS) 的通用框架[^3]。其核心优势在于切片到体积传播机制及循环质心聚合策略的应用,使得复杂三维结构的理解变得更加高效准确。这类设计思路很可能成为后续研究的重点之一,特别是在多器官联合分析等领域。 #### 多模态融合架构的发展 对于涉及多种类型数据(如影像、基因表达等)的任务,“Hybrid Early Fusion Attention Learning Network (HEALNet)” 提供了一种新颖解决方案[^4]。它不仅能够有效整合异构信息源,还具备良好的缺失容忍度特性,在实际应用场景下表现出色。因此,围绕多模态特征提取与表示学习展开深入探讨将是不可避免的趋势。 综上所述,虽然具体细节尚待揭晓,但从现有研究成果来看,CVPR 2025 很有可能聚焦以下几个方面:提升跨域迁移能力;开发更具鲁棒性和灵活性的半监督方案;构建适用于多样化需求的一体化解析工具;推进多维度协同工作的智能化平台建设等等。 ```python # 示例代码展示一种简单的卷积神经网络用于二分类任务 import torch.nn as nn class SimpleCNN(nn.Module): def __init__(self): super(SimpleCNN, self).__init__() self.conv_layer = nn.Sequential( nn.Conv2d(in_channels=1, out_channels=8, kernel_size=(3,3), padding='same'), nn.ReLU(), nn.MaxPool2d(kernel_size=(2,2)) ) self.fc_layer = nn.Linear(8*7*7, 2) def forward(self, x): x = self.conv_layer(x) x = x.view(-1, 8 * 7 * 7) x = self.fc_layer(x) return x ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值