玉玉大王-优快云博客

原创网络不能训练，训练集loss不下降，准确率一直是盲猜的结果

3.检查学习率调整策略的函数位置是否放置正确，学习率调整策略的函数要放置在epoch这个for循环下，而不是放置在train那个for循环里，否则学习率会每隔几个batch就衰减一次，很快就接近0，不能训练。4.这是我复现别人网络不能训练的原因：没有batchnorm。古早网络都没有加batchnorm，还有一些非cv领域（比如通信），他们设计的神经网络都比较简单，没有batchnorm。1.学习率设置不当，调整学习率大小，并且检查是否有学习率改变策略。2.控制梯度爆炸防止梯度消失。

2023-05-16 22:04:25 323 1

原创报错：/usr/bin/ld: cannot find crt1.o: No such file or directory /usr/bin/ld: cannot find -lgcc_s

ln -s 改成 ln -sf 还是不行。反正还是不对，后来去/usr/lib/gcc/x86_64-linux-gnu/3.4.6/查看一下libgcc_s.so，发现它损坏了。如果没安装，需要安装一下：sudo apt-get install libxxx-dev。这个bug真是要我老命，后来发现解决方法其实很简单，根据教程。如果希望永久有效，将其加在.bashrc文件中。再看看gcc能否搜索到指定的库文件。先看看有没有安装这个库。然后就报错了，报错如下。

2023-04-24 22:39:20 5396 2

原创 vscode用tensorboard报错 We failed to start a TensorBoard session due to the following error: Command fa

We failed to start a TensorBoard session due to the following error: Command failed: conda activate pytorch && echo 'e8b39361-0157-4923-80e1-22d70d46dee6' && python /home/zhangyulan/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/printEnvV

2022-10-11 09:52:30 4701 4

原创 module must have its parameters and buffers on device cuda:1 (device_ids[0]) but found one of them o

一机多卡训练时报错：RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1

2022-09-29 09:25:17 1321

原创前缀中缀后缀表达式

前缀中缀后缀表达式

2022-09-29 01:06:24 223

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

原创 网络不能训练，训练集loss不下降，准确率一直是盲猜的结果

原创 报错：/usr/bin/ld: cannot find crt1.o: No such file or directory /usr/bin/ld: cannot find -lgcc_s

原创 vscode用tensorboard报错 We failed to start a TensorBoard session due to the following error: Command fa

原创 module must have its parameters and buffers on device cuda:1 (device_ids[0]) but found one of them o

原创 前缀中缀后缀表达式

空空如也

空空如也

原创网络不能训练，训练集loss不下降，准确率一直是盲猜的结果

原创报错：/usr/bin/ld: cannot find crt1.o: No such file or directory /usr/bin/ld: cannot find -lgcc_s

原创前缀中缀后缀表达式