面向实际应用的室内单目深度估计 Toward Practical Monocular Indoor Depth Estimation

本文提出了一种面向室内应用的单目深度估计框架,通过结构蒸馏技术改进现有方法。该框架利用Dense Prediction Transformers(DPT)学习相对深度,并结合汉明距离损失函数提高精度。针对室内场景复杂、无纹理区域多的特点,方法在模拟数据集SimSIN和真实数据集UniSIN上进行了验证,提高了模型的泛化性和准确性。此外,还介绍了新的室内立体图像数据集的收集,以增强模型的训练效果。

Toward Practical Monocular Indoor Depth Estimation

面向实际应用的室内单目深度估计

  (个人观点)这一篇文中描述的工作就是DPT和一个汉明距离的损失函数,DPT是基于深度传感器训练的,汉明距离的两个损失也没有太多的创新,我个人认为指标的提升大概率是依靠了深度真值训练的DPT的结果。在摘要中描述的结构化蒸馏主要是针对DPT的,文中也没有做相关描述。总体上来说还是堆砌工作,但是顶会!

0 Abstract

  现有的大多数单目深度估计方法都是在室外的驾驶场景中部署,由于室内的物体在距离摄像头较近的位置排列紧密且无规则,这种方法在室内的泛化性很差(场景不同导致模型的泛化性很差是基于数据驱动的方法的通病,这一点无可厚非,但是室内的环境总体上要比室外的环境复杂,这一点确实存在)。为了提高模型的鲁棒性,我们提出了一种结构蒸馏方式,从现成的相对深度估计器中学习诀窍,该估计器产生结构化和度量深度,并进行了实时的推理。本文方法在模拟数据集SimSIM和真实数据集UniSIM进行了评估和测试。

1 Introduction

  本文工作提出了一个实用的室内单目深度估计框架,具有以下特点:从现有的估计器和没有深度注释的左右图像对中学习,高效的收集训练数据,提高跨数据集的泛化性、准确性和深度感应。我们的工作适用于消费级AR/VR,例如3D室内场景重建和虚拟对象插入以及环境的交互,最近自监督深度估计受到了广泛的关注,一些方法在室外的数据集例如KITTI和CiityScapes上进行训练,由于以下原因,导致室内的自监督深度估计会更具有挑战性。

  1. 结构先验:室外的场景在训练之前往往添加了强大的场景结构,通常由天空和建筑物
### ControlNet Depth Parameter Usage and Implementation ControlNet is a neural network architecture designed to provide more control over the generation process of diffusion models by conditioning them on additional inputs, such as edge maps, depth maps, or segmentation masks[^1]. When using ControlNet with depth-related functionality, it typically involves generating or utilizing depth maps that represent the distance of scene surfaces from a viewpoint. #### Key Concepts Related to Depth Parameters in ControlNet Depth parameters within ControlNet are primarily used for enhancing spatial awareness during image synthesis tasks. By incorporating depth information into the model's input pipeline, one can guide the generative process toward creating images that respect three-dimensional structure. This includes: - **Preprocessing**: Before feeding an image through ControlNet, preprocessing steps may include calculating its corresponding depth map via specialized algorithms like monocular depth estimation networks (e.g., MiDaS)[^2]. ```python import cv2 from midas.model_loader import default_models, load_model model_path = "path/to/midas/model" device = "cuda" if torch.cuda.is_available() else "cpu" midas, transform, net_w, net_h = load_model(device, model_path, default_models["dpt_large"]) input_image = cv2.imread("example.jpg") input_image = cv2.cvtColor(input_image, cv2.COLOR_BGR2RGB) # Transforming the input image for inference img = transform({"image": input_image})["image"] ``` - **Integration During Training/Inference**: Once generated, these depth maps serve as conditional inputs alongside original RGB data when training or running predictions against your target dataset[^3]. The integration step ensures both modalities—color texture represented by pixels values along with structural cues provided by estimated distances—are jointly considered throughout each layer transformation inside the backbone CNNs employed under ControlNet framework implementations. #### Example Code Illustrating How To Use A Pretrained Model For Generating Depth Maps As Conditional Inputs In PyTorch Environment Below demonstrates how you might set up this kind of system where `controlnet_depth` refers specifically towards handling depth-based conditions while interacting appropriately between different components involved including but not limited to pre-trained weights loading procedures etcetera : ```python from diffusers import StableDiffusionControlNetPipeline, ControlNetModel import torch # Load pretrained ControlNet tailored around 'depth' modality. controlnet = ControlNetModel.from_pretrained( "lllyasviel/sd-controlnet-depth", torch_dtype=torch.float16 ).to("cuda") pipe = StableDiffusionControlNetPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16 ).to("cuda") prompt = "A fantasy landscape featuring mountains floating above ground level." output = pipe(prompt=prompt, num_inference_steps=20, generator=torch.manual_seed(0)).images[0] output.save("./fantasy_landscape_with_controlled_depth.png") ``` This script initializes necessary pipelines combining standard stable diffusion capabilities enhanced further thanks to inclusion specific controls derived directly out measured physical properties present real world objects captured digitally photographs alike !
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值