放榜了!NeurIPS 2025论文汇总(自动驾驶/大模型/具身/RL等)

部署运行你感兴趣的模型镜像

NeurIPS 2025放榜了!自动驾驶之心着手汇总了中稿的相关工作,目前涉及自动驾驶、视觉感知推理、大模型训练、具身智能、强化学习、视频理解、代码生成等等方向!后续的论文更新也会第一时间上传至『自动驾驶之心知识星球』~

自动驾驶

FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving

  • paper:https://arxiv.org/abs/2505.17685

  • code:https://miv-xjtu.github.io/FSDrive.github.io/

  • 单位:阿里、西交

AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

  • paper:https://arxiv.org/abs/2506.13757

  • code:https://github.com/ucla-mobility/AutoVLA

  • 单位:UCLA

自动驾驶前沿信息一手获取!

Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian Splatting

  • paper:https://arxiv.org/abs/2506.05280

  • code:https://github.com/BigCiLeng/bilateral-driving

  • 单位:清华AIR、北航等

SURDS: Benchmarking Spatial Understanding and Reasoning in Driving Scenarios with Vision Language Models

  • paper:https://arxiv.org/abs/2411.13112

  • code:https://github.com/XiandaGuo/Drive-MLLM

  • 单位:中科院等

YOLOv12: Attention-Centric Real-Time Object Detectors

  • paper:https://arxiv.org/abs/2502.12524

  • code:https://github.com/sunsmarterjie/yolov12

  • 单位:水牛城大学、中科院等

视觉感知推理

OmniSegmentor: A Flexible Multi-Modal Learning Framework for Semantic Segmentation

  • paper: https://arxiv.org/pdf/2509.15096

  • code:https://github.com/VCIP-RGBD/DFormer

  • 单位:南开@程明明

PixFoundation 2.0: Do Video Multi-Modal LLMs Use Motion in Visual Grounding?

  • paper: https://arxiv.org/pdf/2509.02807

  • code: https://github.com/MSiam/PixFoundation-2.0.git.

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

  • paper: https://arxiv.org/pdf/2507.16815

  • code: https://jasper0314-huang.github.io/thinkact-vla/

  • 单位:英伟达、台湾大学

DeepTraverse: A Depth-First Search Inspired Network for Algorithmic Visual Understanding

  • paper: https://arxiv.org/pdf/2506.10084

Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question Answering

  • paper: https://arxiv.org/pdf/2505.18915

  • code: https://github.com/Schuture/DeepTumorVQA.

视频理解

PixFoundation 2.0: Do Video Multi-Modal LLMs Use Motion in Visual Grounding?

  • paper: https://arxiv.org/pdf/2509.02807

  • code: https://github.com/MSiam/PixFoundation-2.0.git.

图像/视频生成与编辑

Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning

  • paper: https://arxiv.org/pdf/2509.15188

  • code:https://github.com/ybseo-ac/Conv

AutoEdit: Automatic Hyperparameter Tuning for Image Editing

  • paper: https://arxiv.org/pdf/2509.15031

OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers

  • paper: https://arxiv.org/pdf/2505.21448

  • code:https://ziqiaopeng.github.io/OmniSync/

数据集/评估

Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question Answering

  • paper: https://arxiv.org/pdf/2505.18915

  • code: https://github.com/Schuture/DeepTumorVQA.

3D视觉

Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question Answering

  • paper: https://arxiv.org/pdf/2505.18915

  • code: https://github.com/Schuture/DeepTumorVQA.

大模型训练

Scaling Offline RL via Efficient and Expressive Shortcut Models

  • paper: https://arxiv.org/pdf/2505.22866

  • 单位:康奈尔大学

强化学习/偏好优化

LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning

  • paper: https://arxiv.org/pdf/2507.15521

Scaling Offline RL via Efficient and Expressive Shortcut Models

  • paper: https://arxiv.org/pdf/2505.22866

大模型微调

Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning

  • paper: https://arxiv.org/pdf/2509.15188

Adaptive LoRA Experts Allocation and Selection for Federated Fine-Tuning

  • paper: https://arxiv.org/pdf/2509.15087

Differentially Private Federated Low Rank Adaptation Beyond Fixed-Matrix

  • paper: https://arxiv.org/pdf/2507.09990

具身智能

Self-Improving Embodied Foundation Models

  • paper: https://arxiv.org/pdf/2509.15155

  • code: https://self-improving-efms.github.io

  • 单位:DeepMind

ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation

  • paper: https://arxiv.org/pdf/2505.22159

  • code: https://sites.google.com/view/forcevla2025

  • 单位:复旦、上交等

持续学习/模型幻觉

Improving Multimodal Large Language Models Using Continual Learning

  • paper: https://arxiv.org/pdf/2410.19925

  • code: https://shikhar-srivastava.github.io/cl-for-improving-mllms

人体

Real-Time Intuitive AI Drawing System for Collaboration: Enhancing Human Creativity through Formal and Contextual Intent Integration

  • paper: https://arxiv.org/pdf/2508.19254

大模型安全

Safely Learning Controlled Stochastic Dynamics

  • paper: https://arxiv.org/pdf/2506.02754

可解释性

Concept-Level Explainability for Auditing & Steering LLM Responses

  • paper: https://arxiv.org/pdf/2505.07610

文档理解

STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing

  • paper: https://arxiv.org/pdf/2411.00387

  • code: https://github.com/jiaruzouu/STEM-PoM.

医学

Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question Answering

  • paper: https://arxiv.org/pdf/2505.18915

  • code: https://github.com/Schuture/DeepTumorVQA.

Agent

AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents

  • paper: https://arxiv.org/pdf/2506.04018

混合专家模型

Adaptive LoRA Experts Allocation and Selection for Federated Fine-Tuning

  • paper: https://arxiv.org/pdf/2509.15087

ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation

  • paper: https://arxiv.org/pdf/2505.22159

  • code: https://sites.google.com/view/forcevla2025.

代码生成

Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning

  • paper: https://arxiv.org/pdf/2509.15188

SBSC: Step-By-Step Coding for Improving Mathematical Olympiad Performance

  • paper: https://arxiv.org/pdf/2502.16666

大模型推理优化

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

  • paper: https://arxiv.org/pdf/2507.16815

  • code: https://jasper0314-huang.github.io/thinkact-vla/

LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning

  • paper: https://arxiv.org/pdf/2507.15521

STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing

  • paper: https://arxiv.org/pdf/2411.00387

  • code: https://github.com/jiaruzouu/STEM-PoM.

扩散模型

Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning

  • paper: https://arxiv.org/pdf/2509.15188

OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers

  • paper: https://arxiv.org/pdf/2505.21448

其他领域

Out-of-distribution generalisation is hard: evidence from ARC-like tasks

  • paper: https://arxiv.org/pdf/2505.09716

Fair Summarization: Bridging Quality and Diversity in Extractive Summaries

  • paper: https://arxiv.org/pdf/2411.07521

您可能感兴趣的与本文相关的镜像

Yolo-v5

Yolo-v5

Yolo

YOLO(You Only Look Once)是一种流行的物体检测和图像分割模型,由华盛顿大学的Joseph Redmon 和Ali Farhadi 开发。 YOLO 于2015 年推出,因其高速和高精度而广受欢迎

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值