摸鱼之路(一)

       第一篇博客,浅记录一下菜鸟读代码的心酸,又不想太摸鱼,一天一点进度,希望这个月能完成大部分,在这里留个足迹来安慰自己,顺便养成好习惯。 

      (意识流写文章,想到什么写什么吧。。。自己看的懂就好)

       又是看OLTR代码发懵的一天。代码还没跑通,现在在改github上的代码,想先把CIFAR10的数据集代替进去,代码架构复杂得一匹。

       捋一下代码的运行流程。

parser = argparse.ArgumentParser()
parser.add_argument('--config', default='./temp.py', type=str)
parser.add_argument('--test', default=False, action='store_true')
args = parser.parse_args()
test_mode = args.test
config = source_import("./temp.py").config
training_opt = config['training_opt']

    读了parser其实就是读一下config字典和test_mode而已,其他的不需要用到就删了。

  (注:test_mode好像要改了,直接拆成两个代码)

     先看一下训练mode里面的。。

sampler_defs = training_opt['sampler']
    if sampler_defs:
        sampler_dic = {'sampler': source_import(sampler_defs['def_file']).get_sampler(),
                       'num_samples_cls': sampler_defs['num_samples_cls']}
    else:
        sampler_dic = None

    data = {x: load_data(phase=x,batch_size=training_opt['batch_size'],sampler_dic=sampler_dic,num_workers=training_opt['num_workers'])
            for x in (['train', 'val', 'train_plain'] if relatin_opt['init_centroids'] else ['train', 'val'])}
    training_model = model(config, data, test=False)
    training_model.train()

     这边有个sampler采样器,在shuffle为true的时候会自动调用,默认是顺序采样,不过这里好像写了一个自定义的采样器。(还没看明白)

      注意到需不需要采样器取决于training_opt,这是配置字典里面的一个字段。注意到stage1是不需要采样器的。采样器是作为data参数的。

      最主要是这个data,在这里面改的应该会比较多。

      data这里有个relatin_opt,是等于config['memory']的,里面有centroids和init-centroids两个参数,stage-1里面两者都是false,元嵌入两者都是true。所以在stage1中,data中只有两个,train和validation。

        所以对于stage1来说,data就是{train: load_data(phase='train', batch_size=64, sampler=None, num_worker=4, val: load_data(phase='val', batch_size=64, sampler=None, num_worker=4)}。在dataloader.py里面具体改动(主要使返回值,github代码里面有个.txt日志可以删除掉)。

        好的。然后train_model等于model传三个参数,config,data以及test=false。

        重头戏还是在runnet.py里面,我是真的看不懂。。。

self.device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
        self.config = config
        self.training_opt = self.config['training_opt']
        self.memory = self.config['memory']
        self.data = data
        self.test_mode = test
        # Initialize model
        self.init_models()
        # Under training mode, initialize training steps, optimizers, schedulers, criterions, and centroids
        if not self.test_mode:
            # If using steps for training, we need to calculate training steps
            # for each epoch based on actual number of training data instead of
            # oversampled data number
            print('Using steps for training.')
            self.training_data_num = len(self.data['train'].dataset)
            self.epoch_steps = int(self.training_data_num/ self.training_opt['batch_size'])

            # Initialize model optimizer and scheduler
            print('Initializing model optimizer.')
            self.scheduler_params = self.training_opt['scheduler_params']
            self.model_optimizer, \
            self.model_optimizer_scheduler = self.init_optimizers(self.model_optim_params_list)
            self.init_criterions()
            if self.memory['init_centroids']:##如果要初始化质心
                self.criterions['FeatureLoss'].centroids.data = \
                    self.centroids_cal(self.data['train_plain'])##

   看一下这个init代码, 初始化了config, training_opt, data, 还有test(这里等于False)。

   train模式要计算steps for training, 包括数据集大小,epoch大小,总体学习率, 还要初始化优化器参数,初始化损失函数。stage_1需要不需要初始化质心,stage_2需要初始化质心。

   

 __init_model__模块初始化model,networks配置的字典有网络本身以及分类器,每个都由基本参数和优化器参数组成。stage1_weight参数需要注意一下,stage_1是不需要用到的。这里根据参数create出网络模型和分类器(很奇怪不知道为什么可以??)。

    fix这个地方不是很懂。以后来补。

    

def train(self):
        # When training the network
        print_str = ['Phase: train']
        time.sleep(0.25)
        # Initialize best model
        best_model_weights = {}
        best_model_weights['feat_model'] = copy.deepcopy(self.networks['feat_model'].state_dict())
        best_model_weights['classifier'] = copy.deepcopy(self.networks['classifier'].state_dict())
        best_acc = 0.0
        best_epoch = 0
        end_epoch = self.training_opt['num_epochs']

        # Loop over epochs
        for epoch in range(1, end_epoch + 1):

            for model in self.networks.values():
                model.train()

            torch.cuda.empty_cache()

            # Iterate over dataset
            for step, (inputs, labels, _) in enumerate(self.data['train']):

                # Break when step equal to epoch step
                if step == self.epoch_steps:
                    break

                inputs, labels = inputs.to(self.device), labels.to(self.device)

                # If on training phase, enable gradients
                with torch.set_grad_enabled(True):

                    # If training, forward with loss, and no top 5 accuracy calculation
                    self.batch_forward(inputs, labels,
                                       centroids=self.memory['centroids'],
                                       phase='train')
                    self.batch_loss(labels)
                    self.batch_backward()

                    # Output minibatch training results
                    if step % self.training_opt['display_step'] == 0:
                        minibatch_loss_feat = self.loss_feat.item() \
                            if 'FeatureLoss' in self.criterions.keys() else None
                        minibatch_loss_perf = self.loss_perf.item()
                        _, preds = torch.max(self.logits, 1)
                        minibatch_acc = mic_acc_cal(preds, labels)

                        print_str = ['Epoch: [%d/%d]'
                                     % (epoch, self.training_opt['num_epochs']),
                                     'Step: %5d'
                                     % (step),
                                     'Minibatch_loss_feature: %.3f'
                                     % (minibatch_loss_feat) if minibatch_loss_feat else '',
                                     'Minibatch_loss_performance: %.3f'
                                     % (minibatch_loss_perf),
                                     'Minibatch_accuracy_micro: %.3f'
                                     % (minibatch_acc)]

            # Set model modes and set scheduler
            # In training, step optimizer scheduler and set model to train()
            self.model_optimizer_scheduler.step()
            if self.criterion_optimizer:
                self.criterion_optimizer_scheduler.step()

            # After every epoch, validation
            self.eval(phase='val')

            # Under validation, the best model need to be updated
            if self.eval_acc_mic_top1 > best_acc:
                best_epoch = copy.deepcopy(epoch)
                best_acc = copy.deepcopy(self.eval_acc_mic_top1)
                best_centroids = copy.deepcopy(self.centroids)
                best_model_weights['feat_model'] = copy.deepcopy(self.networks['feat_model'].state_dict())
                best_model_weights['classifier'] = copy.deepcopy(self.networks['classifier'].state_dict())

        print()
        print('Training Complete.')

        print_str = ['Best validation accuracy is %.3f at epoch %d' % (best_acc, best_epoch)]
        # Save the best model and best centroids if calculated
        print(print_str,"*********")
        self.save_model(epoch, best_epoch, best_model_weights, best_acc, centroids=best_centroids)
        print('Done')

train模块这里也比较重要。

在batch_forward传入参数值为(data,lable,False,'train')(Train模式)。

---------------------------------------------------------------------------------------------------------------------------------

分界线11.05

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值