训练信息:
训练模式:分布式训练 Tesla p100 * 4
2分类
配置模板:mask_rcnn_resnet101_atrous_coco_2018_01_28/pipeline.config
使用预训练模型
训练步数:20000
具体报错信息:
2.Traceback (most recent call last):
File "object_detection/model_main.py", line 111, in <module>
tf.app.run()
File "/root/anaconda3/envs/cvtf/lib/python3.6/site-packages/tensorflow/python/pla
_sys.exit(main(argv))
File "object_detection/model_main.py", line 107, in main
tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
File "/root/anaconda3/envs/cvtf/lib/python3.6/site-packages/tensorflow/python/est
return executor.run()
File "/root/anaconda3/envs/cvtf/lib/python3.6/site-packages/tensorflow/python/est
return self.run_local()
File "/root/anaconda3/envs/cvtf/lib/python3.6/site-packages/tensorflow/python/est
saving_listeners=saving_listeners)
File "/root/anaconda3/envs/cvtf/lib/python3.6/site-packages/tensorflow/python/est
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/root/anaconda3/envs/cvtf/lib/python3.6/site-packages/tensorflow/python/est
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/root/anaconda3/envs/cvtf/lib/python3.6/site-packages/tensorflow/python/est
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/root/anaconda3/envs/cvtf/lib/python3.6/site-packages/tensorflow/python/est
model_fn_results = self._model_fn(features=features, **kwargs)
File "/root/research/object_detection/model_lib.py", line 343, in model_fn
train_config.optimizer)
File "/root/research/object_detection/builders/optimizer_builder.py", line 50, in
learning_rate = _create_learning_rate(config.learning_rate)
File "/root/research/object_detection/builders/optimizer_builder.py", line 112, i
learning_rate_sequence, config.warmup)
File "/root/research/object_detection/utils/learning_schedules.py", line 160, in
raise ValueError('First step cannot be zero.')
ValueError: First step cannot be zero.
解决方法:
根据提示修改配置文件中的下图位置值即可。
将step的值从0 改成1保存