安装pytorch
anaconda下新建环境
conda create -n pytorch python=3.7
使用cmd查看cuda版本为10.2
!!cuda10.2支持pytorch1.5
PyTorch 版本 |CUDA 环境
0.4.1、1.2.0、1.4.0、1.5.0(1)、1.6.0、1.7.0(1) | 9.2
1.2.0、1.1.0、1.0.0(1) |10.0
1.4.0、1.5.0(1)、1.6.0、1.7.0(1) | 10.1
1.5.0(1)、1.6.0、1.7.0(1)、1.8.0(1)、1.9.0、1.10.0 |10.2
1.7.0(1) |11.0
1.8.0(1)、1.9.0、1.10.0 |11.1
1.8.0(1)、1.9.0、1.10.0 |11.3
尝试了很多版本,最后按照这个链接win正确下载方法下载成功环境,但是!!发现无法使用GPU!
UserWarning:
NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
我真的是吐了
原来是服务器是3060只支持cuda11.x,所以重装环境
注意!环境conda如果换成清华源,会默认装cpu版本!气死!!!
conda也会默认装cpu版本!
conda uninstall cpuonly 试过不行
最后还是根据cuda11.3相应的pytorch版本torchvision版本,在whl网站下载了对应的whl文件,然后使用pip进行安装成功的。
Windows系统最推荐安装方法
- 查看计算机支持的CUDA版本,然后登录pytorch网站,Pytorch官网查找对应版本需要下载的内容。
2.上述内容在点击下载中使用ctrl +F进行查找,下载对应python和win版本
3.虚拟环境中找到下载好的whl文件,进行安装pip install E:/torch-1.8.0+cu101-cp37-cp37m-win_amd64.whl
把conda的环境导入进pycharm即可,安装安装setup.py,使用以下命令:
cd lib
python setup.py build develop
win10可能在这一步出问题:
参考以下链接
代码部分
faster RCNN学习
对原图处理后得到公共特征层,然后获得建议框,利用建议框在公共特征层上截取,截取后的利用ROIPooling层调整,最后分类与回归
先运行原程序
使用命令
python trainval_net.py --dataset pascal_voc --net res101 --bs 4 --nw 0 --lr 0.001 --lr_decay_step 5 --cuda
- dataset数据集
- net 预训练模型
- bs batchsize
- epoch 训练轮数
- nw worker number 取决于GPU计算能力,3060
- lr 学习率
- cuda 使用gpu
经过训练获得以下结果
before filtering, there are 372 images...
after filtering, there are 372 images...
372 roidb entries
Loading pretrained weights from data/pretrained_model/resnet101_caffe.pth
[session 1][epoch 1][iter 0/ 93] loss: 2.6911, lr: 1.00e-03
fg/bg=(4/508), time cost: 4.584736
rpn_cls: 0.6750, rpn_box: 0.3954, rcnn_cls: 1.6207, rcnn_box 0.0001
save model: models/res101/pascal_voc\faster_rcnn_1_1_92.pth
[session 1][epoch 2][iter 0/ 93] loss: 0.2855, lr: 1.00e-03
fg/bg=(25/487), time cost: 0.477722
rpn_cls: 0.0302, rpn_box: 0.0304, rcnn_cls: 0.1243, rcnn_box 0.1005
save model: models/res101/pascal_voc\faster_rcnn_1_2_92.pth
[session 1][epoch 3][iter 0/ 93] loss: 0.1132, lr: 1.00e-03
fg/bg=(24/488), time cost: 0.416885
rpn_cls: 0.0205, rpn_box: 0.0204, rcnn_cls: 0.0122, rcnn_box 0.0601
save model: models/res101/pascal_voc\faster_rcnn_1_3_92.pth
[session 1][epoch 4][iter 0/ 93] loss: 0.0965, lr: 1.00e-03
fg/bg=(23/489), time cost: 0.437829
rpn_cls: 0.0324, rpn_box: 0.0091, rcnn_cls: 0.0194, rcnn_box 0.0356
save model: models/res101/pascal_voc\faster_rcnn_1_4_92.pth
[session 1][epoch 5][iter 0/ 93] loss: 0.1008, lr: 1.00e-03
fg/bg=(27/485), time cost: 0.449797
rpn_cls: 0.0060, rpn_box: 0.0174, rcnn_cls: 0.0103, rcnn_box 0.0671
save model: models/res101/pascal_voc\faster_rcnn_1_5_92.pth
[session 1][epoch 6][iter 0/ 93] loss: 0.0615, lr: 1.00e-04
fg/bg=(19/493), time cost: 0.454784
rpn_cls: 0.0099, rpn_box: 0.0105, rcnn_cls: 0.0074, rcnn_box 0.0338
save model: models/res101/pascal_voc\faster_rcnn_1_6_92.pth
[session 1][epoch 7][iter 0/ 93] loss: 0.0540, lr: 1.00e-04
fg/bg=(27/485), time cost: 0.462762
rpn_cls: 0.0059, rpn_box: 0.0094, rcnn_cls: 0.0085, rcnn_box 0.0302
save model: models/res101/pascal_voc\faster_rcnn_1_7_92.pth
[session 1][epoch 8][iter 0/ 93] loss: 0.0794, lr: 1.00e-04
fg/bg=(33/479), time cost: 0.432842
rpn_cls: 0.0219, rpn_box: 0.0145, rcnn_cls: 0.0143, rcnn_box 0.0287
save model: models/res101/pascal_voc\faster_rcnn_1_8_92.pth
[session 1][epoch 9][iter 0/ 93] loss: 0.0679, lr: 1.00e-04
fg/bg=(27/485), time cost: 0.445808
rpn_cls: 0.0190, rpn_box: 0.0068, rcnn_cls: 0.0230, rcnn_box 0.0192
save model: models/res101/pascal_voc\faster_rcnn_1_9_92.pth
[session 1][epoch 10][iter 0/ 93] loss: 0.0542, lr: 1.00e-04
fg/bg=(24/488), time cost: 0.441818
rpn_cls: 0.0106, rpn_box: 0.0099, rcnn_cls: 0.0077, rcnn_box 0.0261
save model: models/res101/pascal_voc\faster_rcnn_1_10_92.pth
[session 1][epoch 11][iter 0/ 93] loss: 0.0560, lr: 1.00e-04
fg/bg=(28/484), time cost: 0.462763
rpn_cls: 0.0093, rpn_box: 0.0054, rcnn_cls: 0.0216, rcnn_box 0.0197
save model: models/res101/pascal_voc\faster_rcnn_1_11_92.pth
[session 1][epoch 12][iter 0/ 93] loss: 0.0697, lr: 1.00e-05
fg/bg=(26/486), time cost: 0.440820
rpn_cls: 0.0082, rpn_box: 0.0097, rcnn_cls: 0.0138, rcnn_box 0.0380
save model: models/res101/pascal_voc\faster_rcnn_1_12_92.pth
[session 1][epoch 13][iter 0/ 93] loss: 0.1341, lr: 1.00e-05
fg/bg=(31/481), time cost: 0.453786
rpn_cls: 0.0145, rpn_box: 0.0071, rcnn_cls: 0.0487, rcnn_box 0.0638
save model: models/res101/pascal_voc\faster_rcnn_1_13_92.pth
[session 1][epoch 14][iter 0/ 93] loss: 0.0573, lr: 1.00e-05
fg/bg=(28/484), time cost: 0.447803
rpn_cls: 0.0107, rpn_box: 0.0090, rcnn_cls: 0.0094, rcnn_box 0.0283
save model: models/res101/pascal_voc\faster_rcnn_1_14_92.pth
[session 1][epoch 15][iter 0/ 93] loss: 0.0896, lr: 1.00e-05
fg/bg=(38/474), time cost: 0.462762
rpn_cls: 0.0207, rpn_box: 0.0111, rcnn_cls: 0.0113, rcnn_box 0.0465
save model: models/res101/pascal_voc\faster_rcnn_1_15_92.pth
[session 1][epoch 16][iter 0/ 93] loss: 0.0543, lr: 1.00e-05
fg/bg=(29/483), time cost: 0.493679
rpn_cls: 0.0124, rpn_box: 0.0088, rcnn_cls: 0.0054, rcnn_box 0.0276
save model: models/res101/pascal_voc\faster_rcnn_1_16_92.pth
[session 1][epoch 17][iter 0/ 93] loss: 0.0622, lr: 1.00e-05
fg/bg=(36/476), time cost: 0.444810
rpn_cls: 0.0098, rpn_box: 0.0061, rcnn_cls: 0.0086, rcnn_box 0.0377
save model: models/res101/pascal_voc\faster_rcnn_1_17_92.pth
[session 1][epoch 18][iter 0/ 93] loss: 0.0713, lr: 1.00e-06
fg/bg=(28/484), time cost: 0.446805
rpn_cls: 0.0123, rpn_box: 0.0098, rcnn_cls: 0.0149, rcnn_box 0.0343
save model: models/res101/pascal_voc\faster_rcnn_1_18_92.pth
[session 1][epoch 19][iter 0/ 93] loss: 0.0528, lr: 1.00e-06
fg/bg=(30/482), time cost: 0.447802
rpn_cls: 0.0094, rpn_box: 0.0084, rcnn_cls: 0.0031, rcnn_box 0.0319
save model: models/res101/pascal_voc\faster_rcnn_1_19_92.pth
[session 1][epoch 20][iter 0/ 93] loss: 0.1103, lr: 1.00e-06
fg/bg=(30/482), time cost: 0.461778
rpn_cls: 0.0194, rpn_box: 0.0064, rcnn_cls: 0.0257, rcnn_box 0.0588
save model: models/res101/pascal_voc\faster_rcnn_1_20_92.pth
可视化
python -m visdom.server
数据处理
数据格式
- JPEGImages
图片文件 - Annotations
标签文件,主要< object> - ImageSets区分验证集测试集
mains
tests.txt map计算 测试集
train.txt 训练
val.txt 验证的文件名
自己的数据集
训练出现的问题
4个epcho后
Traceback (most recent call last):
File “d:\Users\hid\anaconda3\envs\fasterrcnn\lib\site-packages\torch\serialization.py”, line 379, in save
_save(obj, opened_zipfile, pickle_module, pickle_protocol)
File “d:\Users\hid\anaconda3\envs\fasterrcnn\lib\site-packages\torch\serialization.py”, line 499, in _save
zip_file.write_record(name, storage.data_ptr(), num_bytes)
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “trainval_net.py”, line 381, in
}, save_name)
File “C:\Users\hid\PycharmProjects\fasterfcnn\lib\model\utils\net_utils.py”, line 69, in save_checkpoint
torch.save(state, filename)
File “d:\Users\hid\anaconda3\envs\fasterrcnn\lib\site-packages\torch\serialization.py”, line 380, in save
return
File “d:\Users\hid\anaconda3\envs\fasterrcnn\lib\site-packages\torch\serialization.py”, line 259, in exit
self.file_like.write_end_of_file()
RuntimeError: [enforce fail at …\caffe2\serialize\inline_container.cc:300] . unexpected pos 576476992 vs 576476880
执行nvidia-smi
| 773MiB / 12288MiB |
kill -9 PID
或者真的就是C盘满了,无法存储模型了,清理C盘就可以了
test_net.py 出现的问题
- win10下caffe在训练和测试faster-rcnn模型时提示IOError: [Errno 2] No such file or directory:'000001.xml’错误解决办法
链接
Pascal_voc.py annopath改成以下
annopath = os.path.join( self._devkit_path, ‘VOC’ + self._year, ‘Annotations’, ‘{}.xml’)。
- ValueError: invalid literal for int() with base 10 错误问题
先转成浮点数,再转成正整数。int(float(a))
常用指令
# 训练
python trainval_net.py --dataset pascal_voc --net vgg16 --bs 4 --nw 0 --lr 0.001 --lr_decay_step 5 --cuda
# 测试
python test_net.py --dataset pascal_voc --net vgg16 --checksession 1 --checkepoch 12 --checkpoint 92 --cuda
#demo
python demo.py --net vgg16 --checksession 1 --checkepoch 11 --checkpoint 92 --cuda --load_dir models
# or
python demo.py --net vgg16 --checksession 1 --checkepoch 20 --checkpoint 233 --cuda --load_dir '/Users/hid/PycharmProjects/fasterfcnn/models/'