Unsupervised Learning of Depth and Ego-Motion from Video之单目深度估计论文复现

最新推荐文章于 2025-07-11 09:49:42 发布

原创最新推荐文章于 2025-07-11 09:49:42 发布 · 3.3k 阅读

51 ·

CC 4.0 BY-SA版权

文章标签：

#单目深度估计

图像算法同时被 3 个专栏收录

20 篇文章

订阅专栏

科研

7 篇文章

订阅专栏

单目深度估计

4 篇文章

订阅专栏

Unsupervised Learning of Depth and Ego-Motion from Video之单目深度估计论文复现

0、论文详解
Discussion
1、下载数kitti据集
2、Running the single-view depth demo
3、Preparing training data
4、run the test_kitti_depth.py:
5、train

0、论文详解

在这里插入图片描述

这一过程进行了从三维坐标到二维坐标的转换，也即投影透视过程（用中心投影法将物体投射到投影面上，从而获得的一种较为接近视觉效果的单面投影图，也就是使我们人眼看到景物近大远小的一种成像方式）。我们还是拿针孔成像来说明（除了成像亮度低外，成像效果和透镜成像是一样的，但是光路更简单）。成像过程如图二所示：针孔面（相机坐标系）在图像平面（图像坐标系）和物点平面（棋盘平面）之间，所成图像为倒立实像。
在这里插入图片描述
但是为了在数学上更方便描述，我们将相机坐标系和图像坐标系位置对调，变成图三所示的布置方式（没有实际的物理意义，只是方便计算）：

此时，假设相机坐标系中有一点M，则在理想图像坐标系下（无畸变）的成像点P的坐标为（可由相似三角形原则得出）：

这里写图片描述
在这里插入图片描述

由于定义的像素坐标系原点与图像坐标系原点不重合，假设图像坐标系原点在像素坐标系下的坐标为（u0，v0），每个像素点在图像坐标系x轴、y轴方向的尺寸为：dx、dy，且像点在实际图像坐标系下的坐标为（xc，yc），于是可得到像点在像素坐标系下的坐标为：
在这里插入图片描述

公式2中(xp, yp)与公式5中(xc, yc)相同，都是图像坐标系下的坐标。若暂不考虑透镜畸变，则将式2与式5的转换矩阵相乘即为内参矩阵M：

在这里插入图片描述
之所以称之为内参矩阵可以理解为矩阵内各值只与相机内部参数有关，且不随物体位置变化而变化。

1、the scene is static without moving objects
2、there is no occlusion/disocclusion between the target view and the source views
3、Lambertian表面是指在一个固定的照明分布下从所有的视场方向上观测都具有相同亮度的表面，Lambertian表面不吸收任何入射光．Lambertian反射也叫散光反射，不管照明分布如何 Lambertian表面在所有的表面方向上接收并发散所有的入射照明，结果是每一个方向上都能看到相同数量的能量.

文章好久之前读的，有时间会补充这一块。

Discussion

Why is target frame at the center of the sequence ?
issue48
Getting pose vector without the scale factor uncertainty?
issue39

1、下载数kitti据集

For KITTI, first download the dataset using this script provided on the official website.
在这里插入图片描述

sudo bash ./raw_data_downloader.sh

在这里插入图片描述

2、Running the single-view depth demo

We provide the demo code for running our single-view depth prediction model. First, download the pre-trained model by running the following

bash ./models/download_depth_model.sh

then run the test.py:
test.py

from __future__ import division
import numpy as np
import PIL.Image as pil
import tensorflow as tf
from SfMLearner import SfMLearner
from utils import normalize_depth_for_display
import matplotlib.pyplot as plt

img_height=128
img_width=416
ckpt_file = 'models/model-190532'
fh = open('misc/sample.png', 'rb')
I = pil.open(fh)
I = I.resize((img_width, img_height), pil.ANTIALIAS)
I = np.array(I)
sfm = SfMLearner()
sfm.setup_inference(img_height,
                    img_width,
                    mode='depth')
saver = tf.train.Saver([var for var in tf.model_variables()])
with tf.Session() as sess:
    saver.restore(sess, ckpt_file)
    pred = sfm.inference(I[None,:,:,:], sess, mode='depth')
    plt.imshow(I)
    plt.show()
    plt.imshow(normalize_depth_for_display(pred['depth'][0, :, :, 0]))
    plt.show()

在这里插入图片描述

3、Preparing training data

In order to train the model using the provided code, the data needs to be formatted in a certain manner.
then run prepare_train_data.py:

prepare_train_data.py

from __future__ import division
import argparse
# import scipy.misc
import numpy as np
from glob import glob
from joblib import Parallel, delayed
import os
import cv2

parser = argparse.ArgumentParser()
parser.add_argument("--dataset_dir", type=str, default="/home/ross/PycharmProjects/SfMLearner-master/raw/kitti/dataset/", help="where the dataset is stored")
parser.add_argument("--dataset_name", type=str, default="kitti_raw_eigen", choices=["kitti_raw_eigen", "kitti_raw_stereo", "kitti_odom", "cityscapes"])
parser.add_argument("--dump_root", type=str, default="/home/ross/PycharmProjects/SfMLearner-master/raw/kitti/resulting/formatted/data/", help="Where to dump the data")
parser.add_argument("--seq_length", type=int, default=3, help="Length of each training sequence")
parser.add_argument("--img_height", type=int, default=128, help="image height")
parser.add_argument("--img_width", type=int, default=416, help="image width")
parser.add_argument("--num_threads", type=int, default=4, help="number of threads to use")
args = parser.parse_args()

def concat_image_seq(seq):
    for i, im in enumerate(seq):
        if i == 0:
            res = im
        else:
            res = np.hstack((res, im))
    return res

def dump_example(n, args):
    if n % 2000 == 0:
        print('Progress %d/%d....' % (n, data_loader.num_train))
    example = data_loader.get_train_example_with_idx(n)
    if example == False:
        return
    image_seq = concat_image_seq(example['image_seq'])
    intrinsics = example['intrinsics']
    fx = intrinsics[0, 0]
    fy = intrinsics[1, 1]
    cx = intrinsics[0, 2]
    cy = intrinsics[1, 2]
    dump_dir = os.path.join(args.dump_root, example['folder_name'])
    # if not os.path.isdir(dump_dir):
    #     os.makedirs(dump_dir, exist_ok=True)
    try: 
        os.makedirs(dump_dir)
    except OSError:
        if not os.path.isdir(dump_dir):
            raise
    dump_img_file = dump_dir + '/%s.jpg' % example['file_name']
    # scipy.misc.imsave(dump_img_file, image_seq.astype(np.uint8))
    cv2.imwrite(dump_img_file, image_seq.astype(np.uint8))
    dump_cam_file = dump_dir + '/%s_cam.txt' % example['file_name']
    with open(dump_cam_file, 'w') as f:
        f.write('%f,0.,%f,0.,%f,%f,0.,0.,1.' % (fx, cx, fy, cy))

def main():
    if not os.path.exists(args.dump_root):
        os.makedirs(args.dump_root)

    global data_loader
    if args.dataset_name == 'kitti_odom':
        from kitti.kitti_odom_loader import kitti_odom_loader
        data_loader = kitti_odom_loader(args.dataset_dir,
                                        img_height=args.img_height,
                                        img_width=args.img_width,
                                        seq_length=args.seq_length)

    if args.dataset_name == 'kitti_raw_eigen':
        from kitti.kitti_raw_loader import kitti_raw_loader
        data_loader = kitti_raw_loader(args.dataset_dir,
                                       split='eigen',
                                       img_height=args.img_height,
                                       img_width=args.img_width,
                                       seq_length=args.seq_length)

    if args.dataset_name == 'kitti_raw_stereo':
        from kitti.kitti_raw_loader import kitti_raw_loader
        data_loader = kitti_raw_loader(args.dataset_dir,
                                       split='stereo',
                                       img_height=args.img_height,
                                       img_width=args.img_width,
                                       seq_length=args.seq_length)        

    if args.dataset_name == 'cityscapes':
        from cityscapes.cityscapes_loader import cityscapes_loader
        data_loader = cityscapes_loader(args.dataset_dir,
                                        img_height=args.img_height,
                                        img_width=args.img_width,
                                        seq_length=args.seq_length)

    Parallel(n_jobs=args.num_threads)(delayed(dump_example)(n, args) for n in range(data_loader.num_train))

    # Split into train/val
    np.random.seed(8964)
    subfolders = os.listdir(args.dump_root)
    with open(args.dump_root + 'train.txt', 'w') as tf:
        with open(args.dump_root + 'val.txt', 'w') as vf:
            for s in subfolders:
                if not os.path.isdir(args.dump_root + '/%s' % s):
                    continue
                imfiles = glob(os.path.join(args.dump_root, s, '*.jpg'))
                frame_ids = [os.path.basename(fi).split('.')[0] for fi in imfiles]
                for frame in frame_ids:
                    if np.random.random() < 0.1:
                        vf.write('%s %s\n' % (s, frame))
                    else:
                        tf.write('%s %s\n' % (s, frame))

main()

在这里插入图片描述

4、run the test_kitti_depth.py:

test_kitti_depth.py

from __future__ import division
import tensorflow as tf
import numpy as np
import os
# import scipy.misc
import PIL.Image as pil
from SfMLearner import SfMLearner

flags = tf.app.flags
flags.DEFINE_integer("batch_size", 4, "The size of of a sample batch")
flags.DEFINE_integer("img_height", 128, "Image height")
flags.DEFINE_integer("img_width", 416, "Image width")
flags.DEFINE_string("dataset_dir", "/media/ross/DF091746BF0C6DD4/Dataset/kitti/", "Dataset directory")
flags.DEFINE_string("output_dir", "/media/ross/DF091746BF0C6DD4/Dataset/kitti/formatted/", "Output directory")
flags.DEFINE_string("ckpt_file", "/home/ross/PycharmProjects/SfMLearner-master/models/model-190532", "checkpoint file")
FLAGS = flags.FLAGS

def main(_):
    with open('data/kitti/test_files_eigen_copy.txt', 'r') as f:
        test_files = f.readlines()
        test_files = [FLAGS.dataset_dir + t[:-1] for t in test_files]
    if not os.path.exists(FLAGS.output_dir):
        os.makedirs(FLAGS.output_dir)
    basename = os.path.basename(FLAGS.ckpt_file)
    output_file = FLAGS.output_dir + '/' + basename
    sfm = SfMLearner()
    sfm.setup_inference(img_height=FLAGS.img_height,
                        img_width=FLAGS.img_width,
                        batch_size=FLAGS.batch_size,
                        mode='depth')
    saver = tf.train.Saver([var for var in tf.model_variables()]) 
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    with tf.Session(config=config) as sess:
        saver.restore(sess, FLAGS.ckpt_file)
        pred_all = []
        for t in range(0, len(test_files), FLAGS.batch_size):
            if t % 100 == 0:
                print('processing %s: %d/%d' % (basename, t, len(test_files)))
            inputs = np.zeros(
                (FLAGS.batch_size, FLAGS.img_height, FLAGS.img_width, 3), 
                dtype=np.uint8)
            for b in range(FLAGS.batch_size):
                idx = t + b
                if idx >= len(test_files):
                    break
                fh = open(test_files[idx], 'rb')
                raw_im = pil.open(fh)
                scaled_im = raw_im.resize((FLAGS.img_width, FLAGS.img_height), pil.ANTIALIAS)
                inputs[b] = np.array(scaled_im)
                # im = scipy.misc.imread(test_files[idx])
                # inputs[b] = scipy.misc.imresize(im, (FLAGS.img_height, FLAGS.img_width))
            pred = sfm.inference(inputs, sess, mode='depth')
            for b in range(FLAGS.batch_size):
                idx = t + b
                if idx >= len(test_files):
                    break
                pred_all.append(pred['depth'][b,:,:,0])
        np.save(output_file, pred_all)


if __name__ == '__main__':
    tf.app.run()

在这里插入图片描述

5、train

from __future__ import division
import tensorflow as tf
import pprint
import random
import numpy as np
from SfMLearner import SfMLearner
import os

flags = tf.app.flags
flags.DEFINE_string("dataset_dir", "./data/kitti/processed/formatted", "Dataset directory")
flags.DEFINE_string("checkpoint_dir", "./checkpoints/", "Directory name to save the checkpoints")
flags.DEFINE_string("init_checkpoint_file", None, "Specific checkpoint file to initialize from")
flags.DEFINE_float("learning_rate", 0.0002, "Learning rate of for adam")
flags.DEFINE_float("beta1", 0.9, "Momentum term of adam")
flags.DEFINE_float("smooth_weight", 0.5, "Weight for smoothness")
flags.DEFINE_float("explain_reg_weight", 0.0, "Weight for explanability regularization")
flags.DEFINE_integer("batch_size", 4, "The size of of a sample batch")
flags.DEFINE_integer("img_height", 128, "Image height")
flags.DEFINE_integer("img_width", 416, "Image width")
flags.DEFINE_integer("seq_length", 3, "Sequence length for each example")
flags.DEFINE_integer("max_steps", 200000, "Maximum number of training iterations")
flags.DEFINE_integer("summary_freq", 100, "Logging every log_freq iterations")
flags.DEFINE_integer("save_latest_freq", 5000, \
    "Save the latest model every save_latest_freq iterations (overwrites the previous latest model)")
flags.DEFINE_boolean("continue_train", False, "Continue training from previous checkpoint")
flags.DEFINE_integer("num_source", 2, "num_source")
flags.DEFINE_integer("num_scales", 4, "num_scales")
FLAGS = flags.FLAGS

def main(_):
    seed = 8964
    tf.set_random_seed(seed)
    np.random.seed(seed)
    random.seed(seed)

    pp = pprint.PrettyPrinter()
    pp.pprint(flags.FLAGS.__flags)
    
    if not os.path.exists(FLAGS.checkpoint_dir):
        os.makedirs(FLAGS.checkpoint_dir)
        
    sfm = SfMLearner()
    sfm.train(FLAGS)

if __name__ == '__main__':
    tf.app.run()

在这里插入图片描述