3DV
Two focus : predicting 3d shapes from image and processing 3d input data
Representations of 3D shape
Depth map
gives distance from the camera to the object in the world at that pixel
RGB image + Depth image = RGB-D Image (2.5D)
We can use Fully Convolutional network to predict the depth
problem : Scale / Depth Ambiguity
-> Use Scale invariant loss
Surface Normals
give a vector giving normal vector to the object in the world for that pixel
We can use Fully Convolutional network to predict Surface Normals
loss: xy∣x∣∣y∣\frac{x y}{|x||y|}∣x∣∣y∣xy
Also can’t represent the occluded objects
Voxel Grid
Represent a shape with a V×V×VV \times V \times VV×V×V grid of occupancies (just like minecraft 😃
Problems: Need high