由Adversarially Learned Inference引发

首先ALI和BiGAN上本质上是一样的,但是有一点区别,那就是ALI的encoder输出是mu和sigma,然后随机采样出来编码,而BiGAN则直接输出一个确定的结果,没有sample这一步。

看到这篇文章我又去补习了一下VAEGAN,BiGAN,因此这篇文章主要就是做一下对比吧。

VAEGAN:

从VAE的角度去理解,增加了判别器,来使得生成的图片尽可能的逼真。

包含VAE中的所有loss,同时增加了一个判别器的损失项目(判断图片真假)

随后提出了一个VAEGAN的变体,就是判别器的输入有所不同,区别在于判别器的输出为三个类别(生成图像,重构图像,真实图像),这里这么做的目的是为了找出encoder生成的z和从先验分布中sample出来的z细微区别,能让两者更好的拟合。

BiGAN

BiGAN和VAEGAN结构上类似,有几点不同。

 

将VAEGAN中的encoder和decoder拆开,可以看做是两个GAN    encoder和D,decoder和D

看做两个GAN,也就取消了VAE中的kl约束,也就是不显示的添加kl损失的loss项。

这样p(x)通过encoder得到q(z‘),p(z)通过decoder得到q(x'),将x和z作为一个joint的整体喂给判别器

这样当判别器无法区分时,通过联合概率的角度真实数据和生成的隐变量,以及sample出来的隐变量和生成的数据

p(x,z')和q(x',z)相同,也就是联合概率分布相同,最终使得生成数据与原始数据相同。

可见和VAEGAN的区别,一个是取消了loss,另一个是判别器的输入是x,z而不是单独的x

 

 

 

 

### Learned Stereo Technology in Machine Learning and Computer Vision Learned stereo techniques represent a significant advancement within the field of computer vision, particularly focusing on depth estimation from stereo images. Traditional stereo algorithms rely heavily on hand-crafted features to match corresponding points between two or more camera views. However, learned approaches leverage deep learning models that can automatically learn feature representations directly from raw image data. Deep neural networks have been successfully applied to various tasks including object detection, segmentation, and now also disparity estimation which is crucial for reconstructing 3D scenes from pairs of rectified stereo images. These methods typically involve training convolutional neural networks (CNNs) with large datasets containing labeled ground truth disparities[^1]. One notable work by Wang et al., introduced an unsupervised framework where visual representations are learned without requiring explicit labels through video sequences captured over time [^2]. This approach allows systems not only to understand static spatial relationships but also temporal dynamics present in real-world environments. In practical applications such as video surveillance, edge computing plays a vital role when implementing advanced vision-based solutions like learned stereo matching due to its ability to process data locally at high speeds while minimizing latency issues associated with cloud processing [^3]. Edge devices equipped with powerful GPUs/TPUs enable efficient execution of computationally intensive operations required during inference phases of these sophisticated models. ```python import torch from torchvision import transforms from PIL import Image # Example code snippet showing how one might load pre-trained model weights into PyTorch. model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_resnet101', pretrained=True) model.eval() preprocess = transforms.Compose([ transforms.ToTensor(), ]) input_image = Image.open("left_stereo_image.jpg") input_tensor = preprocess(input_image) with torch.no_grad(): output = model([input_tensor])[0]['out'][0] print(output.shape) ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值