《CAMERAS AS RAYS: POSE ESTIMATION VIA RAY DIFFUSION》阅读笔记

论文地址:https://arxiv.org/abs/2402.14817

项目地址:https://github.com/jasonyzhang/RayDiffusion

---------------------------------------------------------------------------------------------------------------------------------

任务:

                                                        3D重建的估计相机姿势任务

挑战:

                                                        稀疏采样视图(<10)估计相机位姿

本文提出解决:

    与现有的追求摄像机外部全局参数化的自上而下预测的方法相反,作者提出了一种相机姿态的分布式表示,将相机视为一束光线。这种表示允许与空间图像特征紧密耦合,从而提高姿态精度。

大致过程:

基于回归:

        首先将image分为多个patch,其次将相机表示为射线,每条射线都以相机中心作为出发点,射向image中的对应patch中心,这时可以将相机表示为射线,同时射线的汇聚中心为相机中心

基于去噪模型:

        经典diffusion model结构。将image每个patch所对应的射向(GT)进行加噪,然后image作为condition进行去噪,优化射线的位置。

贡献:

         1. 将位姿预测任务重新表述为推断每个图像块的光束方程,而不是推断全局相机参数化。

         2. 提出了一种简单的基于回归的方法,用于在稀疏采样视图中推断这种表示,并展示了即使是这种简单的方法也超过了最新技术。

         3. 扩展了这种方法,通过学习基于光束的相机参数化的去噪扩散模型来捕捉相机分布,从而进一步提高了性能。

### 3D Human Pose Estimation Techniques and Applications In the realm of computer vision, **3D human pose estimation (HPE)** aims to identify and classify not only the presence but also the three-dimensional positions of key joints within the human body[^1]. This technology has evolved significantly with advancements in deep learning methods. #### Monocular Image-Based Methods Monocular image-based approaches leverage single-camera setups for estimating 3D poses from images or video frames. These models often employ convolutional neural networks (CNNs) that are trained on large datasets containing annotated keypoints. The network learns to predict depth information alongside spatial coordinates by understanding context clues such as limb orientation relative to camera angles[^2]. For instance, a popular method involves using hourglass architectures which iteratively refine heatmaps representing probable locations of each joint until convergence upon accurate predictions. Another approach utilizes multi-view geometry principles combined with CNN outputs to reconstruct full-body skeletons even when parts of bodies may be occluded during capture sessions. #### Multi-modal Fusion Approaches Beyond traditional visual data sources like RGB cameras, researchers have explored integrating other sensing modalities into HPE systems. One notable example includes leveraging WiFi signals capable of penetrating obstacles including walls; this allows for non-line-of-sight tracking without requiring line-of-sight visibility between subjects and sensors. By training deep neural networks on synchronized wireless and visual inputs, these hybrid solutions can achieve comparable accuracy levels while extending operational capabilities beyond conventional limitations imposed by purely optical means alone. #### Real-world Applications The practical implications span across various domains: - **Healthcare**: Monitoring patient movements post-surgery recovery. - **Sports Science**: Analyzing athlete performance metrics accurately. - **Virtual Reality/Augmented Reality**: Enhancing user interaction experiences through realistic avatar animations driven directly off real-time motion captures. ```python import numpy as np from sklearn.model_selection import train_test_split def preprocess_data(images, labels): """Preprocesses input dataset.""" X_train, X_val, y_train, y_val = train_test_split( images, labels, test_size=0.2, random_state=42) return X_train, X_val, y_train, y_val class PoseEstimator: def __init__(self): self.model = None def fit(self, X_train, y_train): # Train model here... pass def evaluate(self, X_val, y_val): # Evaluate model performance... pass ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值