浅入浅出TensorFlow 8 - 行人分割

一. 环境准备

       本文介绍如何通过 Mask-RCNN 来实现行人检测,假设你已经对 SSD、YOLO、Faster RCNN 等框架有所了解。

1. 准备 TensorFlow 环境

     Tensorflow (>= 1.0.0)

     Numpy

2. Gtihub 代码

     代码下载:Github

3. 下载CoCo数据

     下载地址:http://mscoco.org/dataset/#download

      Web下载比较慢,可以从我的网盘下载:【百度网盘

4. 下载 ReNet50

[cpp]  view plain  copy
  1. wget http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz  

      解压得到  resnet_v1_50.ckpt


二. 代码编译运行

       代码编译可以参考 Github 说明,这里也给出如下流程:

1. make coco工具

[cpp]  view plain  copy
  1. cd ./libs/datasets/pycocotools  
  2. make   

2. 将下载的 COCO 数据放到 ./data 目录下,将数据转换成 tf 所需格式;

    按照说明文件:

        a)在 data下建一个 coco 文件夹,将指定的5个文件 copy到该目录;

        b)将zip文件解压缩;

        c)在根目录下建立 output/mask_rcnn 文件夹,用于存放 log;

        d)执行格式转换脚本(大概会花一小时);

[cpp]  view plain  copy
  1. python download_and_convert_data.py  

    可能会提示Python某些库错误,没关系,安装一下就好了,可以将pip源换成国内的,pip install  -i https://pypi.tuna.tsinghua.edu.cn/simple  pil

[cpp]  view plain  copy
  1. sudo pip install pil   # or python -m pip install Pillow  
  2. sudo pip install scikit-image  
  3. sudo apt-get install python3-tk  

3. 提前训练好的 Resnet模型

    在data下新建 pretrained_models 目录,将 resnet_v1_50.ckpt 放到目录下。

4. Make编译    

[cpp]  view plain  copy
  1. cd ./libs  
  2. make  

5. 训练数据

[cpp]  view plain  copy
  1. python ../train/train.py  

        
当你观察到Loss的时候,说明训练过程已经成功开始了,不要着急,等着Loss慢慢减少吧,原作者训练 8-GPU 花了32个小时。


三. 训练效果

       根据训练生成的 Log文件,存放在 output里面:

        

       设置 TensorBoard 的logdir,来看一下训练效果:  

        

       可以看到整个的 loss 的变化情况,还是很有成就感的,需要说明一下,在训练过程中可能 loss 会有震荡的情况,没有关系,等到逐渐下降就好了。

       看一下生成的 Graphs:

    

四. Demo 运行

       Github 上未给出 Demo 运行方法,需要我们自己找脚本来实现。

       可以参考上一篇  demo.py 自己来写,这里作者就不给出具体 code 了,请自行发挥。
Human parsing has been extensively studied recently (Yamaguchi et al. 2012; Xia et al. 2017) due to its wide applications in many important scenarios. Mainstream fashion parsing models (i.e., parsers) focus on parsing the high-resolution and clean images. However, directly applying the parsers trained on benchmarks of high-quality samples to a particular application scenario in the wild, e.g., a canteen, airport or workplace, often gives non-satisfactory performance due to domain shift. In this paper, we explore a new and challenging cross-domain human parsing problem: taking the benchmark dataset with extensive pixel-wise labeling as the source domain, how to obtain a satisfactory parser on a new target domain without requiring any additional manual labeling? To this end, we propose a novel and efficient crossdomain human parsing model to bridge the cross-domain differences in terms of visual appearance and environment conditions and fully exploit commonalities across domains. Our proposed model explicitly learns a feature compensation network, which is specialized for mitigating the cross-domain differences. A discriminative feature adversarial network is introduced to supervise the feature compensation to effectively reduces the discrepancy between feature distributions of two domains. Besides, our proposed model also introduces a structured label adversarial network to guide the parsing results of the target domain to follow the high-order relationships of the structured labels shared across domains. The proposed framework is end-to-end trainable, practical and scalable in real applications. Extensive experiments are conducted where LIP dataset is the source domain and 4 different datasets including surveillance videos, movies and runway shows without any annotations, are evaluated as target domains. The results consistently confirm data efficiency and performance advantages of the proposed method for the challenging cross-domain human parsing problem. Abstract—This paper presents a robust Joint Discriminative appearance model based Tracking method using online random forests and mid-level feature (superpixels). To achieve superpixel- wise discriminative ability, we propose a joint appearance model that consists of two random forest based models, i.e., the Background-Target discriminative Model (BTM) and Distractor- Target discriminative Model (DTM). More specifically, the BTM effectively learns discriminative information between the target object and background. In contrast, the DTM is used to suppress distracting superpixels which significantly improves the tracker’s robustness and alleviates the drifting problem. A novel online random forest regression algorithm is proposed to build the two models. The BTM and DTM are linearly combined into a joint model to compute a confidence map. Tracking results are estimated using the confidence map, where the position and scale of the target are estimated orderly. Furthermore, we design a model updating strategy to adapt the appearance changes over time by discarding degraded trees of the BTM and DTM and initializing new trees as replacements. We test the proposed tracking method on two large tracking benchmarks, the CVPR2013 tracking benchmark and VOT2014 tracking challenge. Experimental results show that the tracker runs at real-time speed and achieves favorable tracking performance compared with the state-of-the-art methods. The results also sug- gest that the DTM improves tracking performance significantly and plays an important role in robust tracking.
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值