摸鱼之路（一）

MoodChild:)

已于 2022-11-05 22:03:11 修改

阅读量731

点赞数 7

文章标签： python

于 2022-11-05 16:54:49 首次发布

本文链接：https://blog.youkuaiyun.com/weixin_51579211/article/details/127704281

版权

第一篇博客，浅记录一下菜鸟读代码的心酸，又不想太摸鱼，一天一点进度，希望这个月能完成大部分，在这里留个足迹来安慰自己，顺便养成好习惯。

(意识流写文章，想到什么写什么吧。。。自己看的懂就好）

又是看OLTR代码发懵的一天。代码还没跑通，现在在改github上的代码，想先把CIFAR10的数据集代替进去，代码架构复杂得一匹。

捋一下代码的运行流程。

parser = argparse.ArgumentParser()
parser.add_argument('--config', default='./temp.py', type=str)
parser.add_argument('--test', default=False, action='store_true')
args = parser.parse_args()
test_mode = args.test
config = source_import("./temp.py").config
training_opt = config['training_opt']

读了parser其实就是读一下config字典和test_mode而已，其他的不需要用到就删了。

（注：test_mode好像要改了，直接拆成两个代码）

先看一下训练mode里面的。。

sampler_defs = training_opt['sampler']
    if sampler_defs:
        sampler_dic = {'sampler': source_import(sampler_defs['def_file']).get_sampler(),
                       'num_samples_cls': sampler_defs['num_samples_cls']}
    else:
        sampler_dic = None

    data = {x: load_data(phase=x,batch_size=training_opt['batch_size'],sampler_dic=sampler_dic,num_workers=training_opt['num_workers'])
            for x in (['train', 'val', 'train_plain'] if relatin_opt['init_centroids'] else ['train', 'val'])}
    training_model = model(config, data, test=False)
    training_model.train()

这边有个sampler采样器，在shuffle为true的时候会自动调用，默认是顺序采样，不过这里好像写了一个自定义的采样器。（还没看明白）

注意到需不需要采样器取决于training_opt，这是配置字典里面的一个字段。注意到stage1是不需要采样器的。采样器是作为data参数的。

最主要是这个data，在这里面改的应该会比较多。

data这里有个relatin_opt，是等于config['memory']的，里面有centroids和init-centroids两个参数，stage-1里面两者都是false，元嵌入两者都是true。所以在stage1中，data中只有两个，train和validation。

所以对于stage1来说，data就是{train: load_data(phase='train', batch_size=64, sampler=None, num_worker=4, val: load_data(phase='val', batch_size=64, sampler=None, num_worker=4)}。在dataloader.py里面具体改动（主要使返回值，github代码里面有个.txt日志可以删除掉）。

好的。然后train_model等于model传三个参数，config,data以及test=false。

重头戏还是在runnet.py里面，我是真的看不懂。。。

self.device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
        self.config = config
        self.training_opt = self.config['training_opt']
        self.memory = self.config['memory']
        self.data = data
        self.test_mode = test
        # Initialize model
        self.init_models()
        # Under training mode, initialize training steps, optimizers, schedulers, criterions, and centroids
        if not self.test_mode:
            # If using steps for training, we need to calculate training steps
            # for each epoch based on actual number of training data instead of
            # oversampled data number
            print('Using steps for training.')
            self.training_data_num = len(self.data['train'].dataset)
            self.epoch_steps = int(self.training_data_num/ self.training_opt['batch_size'])

            # Initialize model optimizer and scheduler
            print('Initializing model optimizer.')
            self.scheduler_params = self.training_opt['scheduler_params']
            self.model_optimizer, \
            self.model_optimizer_scheduler = self.init_optimizers(self.model_optim_params_list)
            self.init_criterions()
            if self.memory['init_centroids']:##如果要初始化质心
                self.criterions['FeatureLoss'].centroids.data = \
                    self.centroids_cal(self.data['train_plain'])##

看一下这个init代码，初始化了config, training_opt, data, 还有test(这里等于False)。

train模式要计算steps for training, 包括数据集大小，epoch大小，总体学习率，还要初始化优化器参数，初始化损失函数。stage_1需要不需要初始化质心，stage_2需要初始化质心。

__init_model__模块初始化model，networks配置的字典有网络本身以及分类器，每个都由基本参数和优化器参数组成。stage1_weight参数需要注意一下，stage_1是不需要用到的。这里根据参数create出网络模型和分类器（很奇怪不知道为什么可以？？)。

fix这个地方不是很懂。以后来补。

def train(self):
        # When training the network
        print_str = ['Phase: train']
        time.sleep(0.25)
        # Initialize best model
        best_model_weights = {}
        best_model_weights['feat_model'] = copy.deepcopy(self.networks['feat_model'].state_dict())
        best_model_weights['classifier'] = copy.deepcopy(self.networks['classifier'].state_dict())
        best_acc = 0.0
        best_epoch = 0
        end_epoch = self.training_opt['num_epochs']

        # Loop over epochs
        for epoch in range(1, end_epoch + 1):

            for model in self.networks.values():
                model.train()

            torch.cuda.empty_cache()

            # Iterate over dataset
            for step, (inputs, labels, _) in enumerate(self.data['train']):

                # Break when step equal to epoch step
                if step == self.epoch_steps:
                    break

                inputs, labels = inputs.to(self.device), labels.to(self.device)

                # If on training phase, enable gradients
                with torch.set_grad_enabled(True):

                    # If training, forward with loss, and no top 5 accuracy calculation
                    self.batch_forward(inputs, labels,
                                       centroids=self.memory['centroids'],
                                       phase='train')
                    self.batch_loss(labels)
                    self.batch_backward()

                    # Output minibatch training results
                    if step % self.training_opt['display_step'] == 0:
                        minibatch_loss_feat = self.loss_feat.item() \
                            if 'FeatureLoss' in self.criterions.keys() else None
                        minibatch_loss_perf = self.loss_perf.item()
                        _, preds = torch.max(self.logits, 1)
                        minibatch_acc = mic_acc_cal(preds, labels)

                        print_str = ['Epoch: [%d/%d]'
                                     % (epoch, self.training_opt['num_epochs']),
                                     'Step: %5d'
                                     % (step),
                                     'Minibatch_loss_feature: %.3f'
                                     % (minibatch_loss_feat) if minibatch_loss_feat else '',
                                     'Minibatch_loss_performance: %.3f'
                                     % (minibatch_loss_perf),
                                     'Minibatch_accuracy_micro: %.3f'
                                     % (minibatch_acc)]

            # Set model modes and set scheduler
            # In training, step optimizer scheduler and set model to train()
            self.model_optimizer_scheduler.step()
            if self.criterion_optimizer:
                self.criterion_optimizer_scheduler.step()

            # After every epoch, validation
            self.eval(phase='val')

            # Under validation, the best model need to be updated
            if self.eval_acc_mic_top1 > best_acc:
                best_epoch = copy.deepcopy(epoch)
                best_acc = copy.deepcopy(self.eval_acc_mic_top1)
                best_centroids = copy.deepcopy(self.centroids)
                best_model_weights['feat_model'] = copy.deepcopy(self.networks['feat_model'].state_dict())
                best_model_weights['classifier'] = copy.deepcopy(self.networks['classifier'].state_dict())

        print()
        print('Training Complete.')

        print_str = ['Best validation accuracy is %.3f at epoch %d' % (best_acc, best_epoch)]
        # Save the best model and best centroids if calculated
        print(print_str,"*********")
        self.save_model(epoch, best_epoch, best_model_weights, best_acc, centroids=best_centroids)
        print('Done')

train模块这里也比较重要。

在batch_forward传入参数值为(data,lable,False,'train')(Train模式）。

---------------------------------------------------------------------------------------------------------------------------------

分界线11.05