Combination sum||| & 1v(动态规划)

组合总和问题解析
本文详细探讨了LeetCode上的组合总和问题III和IV,包括问题定义、解决思路及Java实现代码。通过深度优先搜索(DFS)和动态规划(DP)两种方法解决了组合总和III中寻找特定数量的数字组合问题,以及组合总和IV中计算可能的组合数量问题。

Combination Sum III

Problem

Find all possible combinations of k numbers that add up to a number n, given that only numbers from 1 to 9 can be used and each combination should be a unique set of numbers.

Ensure that numbers within the set are sorted in ascending order.

Example 1:
Input: k = 3, n = 7
Output:
[[1,2,4]]
Example 2:
Input: k = 3, n = 9
Output:
[[1,2,6], [1,3,5], [2,3,4]]

Note

思路和Combination Sum II一样,用DFS递归求解。
加一个参数count = kcount每当有新的数i加入计算集合cur则减一;同时,target,也就是给定的n,也要减少i
count0时,集合里就有k个数了。此时,若target也正好减小为0,说明当前集合pre是正解,pre加入res数组。

两个无法得到正解的情况是:
count0,而target不为0时,当然已经无法得到正解,return
count不为0target却已经小于等于0的情况下,此时仍要加入其它数以令count0,而要加入的数都是19的正整数,所以已无法满足令target0的条件,return

Solution

public class Solution {
    List<List<Integer>> res = new ArrayList<>();
    public List<List<Integer>> combinationSum3(int k, int n) {
        helper(1, k, n, new ArrayList<Integer>());
        return res;
    }
    public void helper(int start, int count, int target, List<Integer> pre) {
        if (count == 0) {
            if (target == 0) res.add(pre);
            else return;
        }
        else {
            if (target <= 0) return;
            if (target > 0) {
                for (int i = start; i <= 9; i++) {
                    List<Integer> cur = new ArrayList<Integer> (pre);
                    cur.add(i);
                    helper(i+1, count-1, target-i, cur);
                }
            }
        }
    }
}

Combination Sum IV

Problem:

Given an integer array with all positive numbers and no duplicates, find the number of possible combinations that add up to a positive integer target.

Example:
nums = [1, 2, 3]
target = 4

The possible combination ways are:
(1, 1, 1, 1)
(1, 1, 2)
(1, 2, 1)
(1, 3)
(2, 1, 1)
(2, 2)
(3, 1)

Note that different sequences are counted as different combinations.

Therefore the output is 7.

Follow up:

What if negative numbers are allowed in the given array?
How does it change the problem?
What limitation we need to add to the question to allow negative numbers?

Solution

DP method

public class Solution {
    public int combinationSum4(int[] nums, int target) {
        Arrays.sort(nums);
        int[] dp = new int[target+1];
        for (int i = 1; i <= target; i++) {
            for (int num: nums) {
                if (num == i) dp[i]++;
                else if (num < i) dp[i] += dp[i-num];
                else break;
            }
        }
        return dp[target];
    }
}

Optimized DP

public class Solution {
    public int backPackVI(int[] nums, int target) {
        int[] dp = new int[target+1];
        Arrays.sort(nums);
        dp[0] = 1;
        for (int i = 1; i <= target; i++) {
            for (int num: nums) {
                if (num <= i) dp[i] += dp[i-num];
            }
        }
        return dp[target];
    }
}

Another DP

public class Solution {
    public int backPackVI(int[] nums, int target) {
        int[] dp = new int[target+1];
        Arrays.fill(dp, -1);
        Arrays.sort(nums);
        return helper(nums, dp, target);
    }
    
    int helper(int[] nums, int[] dp, int target){
        if (dp[target] >= 0) return dp[target];
        dp[target] = 0;
        for (int i = 0; i < nums.length; i++){
            if (target > nums[i]) dp[target] += helper(nums, dp, target-nums[i]);
            else if (target == nums[i]) {
                dp[target]++;
                break;
            }
        }
        return dp[target];
    }
}

DFS: Exceeded time limit

public class Solution {
    int count = 0;
    public int backPackVI(int[] nums, int target) {
        //int count = 0;
        int sum = 0;
        dfs(nums, target, sum);
        return count;
    }
    
    void dfs(int[] nums, int target, int sum){
        if (sum > target) return;
        else if (sum == target) {
            count++;
        }
        for (int i = 0; i < nums.length; i++) {
            dfs(nums, target, sum+nums[i]);
        }
    }
}
"""FreqFormerV7_full.py Combined model (FreqFormerV7) and training script tuned for 2xRTX4090 (DDP, AMP, improved FFT, stronger transformer, better fusion, balanced loss, scheduler restarts). Format follows your earlier scripts for easy swap-in. Usage (single-node multi-gpu): torchrun --nproc_per_node=2 FreqFormerV7_full.py --data_dir <...> --save_dir ./checkpoints_v7 --batch_size 8 --num_epochs 200 If you want single-GPU debug: python FreqFormerV7_full.py --local_rank 0 --nproc_per_node 1 --debug """ import os import time import argparse import math from typing import Optional import numpy as np import torch import torch.nn as nn import torch.nn.functional as F from torch.utils.data import Dataset, DataLoader from torch.utils.data.distributed import DistributedSampler # ----------------------------- # Utilities: distributed helpers # ----------------------------- def is_main_process(): return not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0 def setup_ddp(local_rank): if 'RANK' in os.environ and 'WORLD_SIZE' in os.environ: rank = int(os.environ['RANK']) world_size = int(os.environ['WORLD_SIZE']) else: rank = 0 world_size = 1 torch.cuda.set_device(local_rank) torch.distributed.init_process_group(backend='nccl', init_method='env://') return rank, world_size # ----------------------------- # Dice + Lovasz (helpers) # ----------------------------- def one_hot(labels, num_classes): y = torch.eye(num_classes, device=labels.device)[labels] return y def multiclass_dice_loss(probs, labels, eps=1e-6): C = probs.shape[1] mask = (labels >= 0) if mask.sum() == 0: return probs.new_tensor(0.) probs = probs[mask] labels = labels[mask] gt = one_hot(labels, C) intersection = (probs * gt).sum(dim=0) cardinality = probs.sum(dim=0) + gt.sum(dim=0) dice = (2. * intersection + eps) / (cardinality + eps) loss = 1.0 - dice return loss.mean() def lovasz_grad(gt_sorted): gts = gt_sorted.sum() if gts == 0: return torch.zeros_like(gt_sorted) intersection = gts - gt_sorted.cumsum(0) union = gts + (1 - gt_sorted).cumsum(0) jaccard = 1. - intersection / union if gt_sorted.numel() > 1: jaccard[1:] = jaccard[1:] - jaccard[:-1] return jaccard def flatten_probas(probas, labels, ignore_index=-1): mask = (labels != ignore_index) if not mask.any(): return probas.new(0), labels.new(0) probas = probas[mask] labels = labels[mask] return probas, labels def lovasz_softmax(probas, labels, classes='present', ignore_index=-1): C = probas.size(1) losses = [] probas, labels = flatten_probas(probas, labels, ignore_index) if probas.numel() == 0: return probas.new_tensor(0.) for c in range(C): fg = (labels == c).float() if classes == 'present' and fg.sum() == 0: continue class_pred = probas[:, c] errors = (fg - class_pred).abs() perm = torch.argsort(errors, descending=True) fg_sorted = fg[perm] grad = lovasz_grad(fg_sorted) loss_c = torch.dot(F.relu(errors[perm]), grad) losses.append(loss_c) if len(losses) == 0: return probas.new_tensor(0.) return sum(losses) / len(losses) # ----------------------------- # Dataset (S3DIS-like npy layout) # ----------------------------- class S3DISDatasetAug(Dataset): def __init__(self, data_dir, split='train', val_area='Area_5', num_points=2048, augment=True): self.num_points = num_points self.augment = augment and (split == 'train') self.files = [] for f in sorted(os.listdir(data_dir)): if not f.endswith('.npy'): continue if split == 'train' and val_area in f: continue if split == 'val' and val_area not in f: continue self.files.append(os.path.join(data_dir, f)) if len(self.files) == 0: raise RuntimeError(f"No files found in {data_dir} (split={split})") def __len__(self): return len(self.files) def __getitem__(self, idx): data = np.load(self.files[idx]) coords = data[:, :3].astype(np.float32) # extra could be RGB or normal; ensure shape [N,3] extra = data[:, 3:6].astype(np.float32) labels = data[:, 6].astype(np.int64) N = coords.shape[0] if N >= self.num_points: choice = np.random.choice(N, self.num_points, replace=False) else: choice = np.random.choice(N, self.num_points, replace=True) coords = coords[choice] extra = extra[choice] labels = labels[choice] if self.augment: theta = np.random.uniform(0, 2 * np.pi) c, s = np.cos(theta), np.sin(theta) R = np.array([[c, -s, 0], [s, c, 0], [0, 0, 1]], dtype=np.float32) coords = coords.dot(R.T) scale = np.random.uniform(0.9, 1.1) coords = coords * scale coords = coords + np.random.normal(0, 0.02, coords.shape).astype(np.float32) # random flip x/y if np.random.rand() > 0.5: coords[:, 0] = -coords[:, 0] local_feat = np.concatenate([coords, extra], axis=1) return { 'local_feat': torch.from_numpy(local_feat).float(), 'coords': torch.from_numpy(coords).float(), 'extra': torch.from_numpy(extra).float(), 'label': torch.from_numpy(labels).long() } # ----------------------------- # compute class weights # ----------------------------- def compute_class_weights(file_list, num_classes, method='inv_sqrt'): counts = np.zeros(num_classes, dtype=np.float64) for p in file_list: data = np.load(p, mmap_mode='r') labels = data[:, 6].astype(np.int64) for c in range(num_classes): counts[c] += (labels == c).sum() counts = np.maximum(counts, 1.0) if method == 'inv_freq': weights = 1.0 / counts elif method == 'inv_sqrt': weights = 1.0 / np.sqrt(counts) else: weights = np.ones_like(counts) weights = weights / weights.sum() * num_classes return torch.from_numpy(weights.astype(np.float32)) # ----------------------------- # Model: FreqFormerV7 # ----------------------------- class FreqConvBlock(nn.Module): def __init__(self, in_ch, out_ch): super().__init__() self.net = nn.Sequential( nn.Conv1d(in_ch, out_ch, kernel_size=3, padding=1), nn.BatchNorm1d(out_ch), nn.GELU(), nn.Conv1d(out_ch, out_ch, kernel_size=3, padding=1), nn.BatchNorm1d(out_ch), nn.GELU(), ) def forward(self, x): return self.net(x.transpose(1, 2)).transpose(1, 2) class FreqChannelAttention(nn.Module): def __init__(self, dim, reduction=8): super().__init__() self.fc1 = nn.Linear(dim, dim // reduction) self.fc2 = nn.Linear(dim // reduction, dim) self.act = nn.GELU() self.sigmoid = nn.Sigmoid() def forward(self, x): attn = torch.mean(x, dim=1) attn = self.fc2(self.act(self.fc1(attn))) attn = self.sigmoid(attn).unsqueeze(1) return x * attn class TransformerBlock(nn.Module): def __init__(self, dim, num_heads, mlp_ratio=4.0, drop=0.1): super().__init__() self.norm1 = nn.LayerNorm(dim) self.attn = nn.MultiheadAttention(dim, num_heads, dropout=drop, batch_first=True) self.norm2 = nn.LayerNorm(dim) self.mlp = nn.Sequential( nn.Linear(dim, int(dim * mlp_ratio)), nn.GELU(), nn.Dropout(drop), nn.Linear(int(dim * mlp_ratio), dim), nn.Dropout(drop) ) def forward(self, x): h = x x = self.norm1(x) x, _ = self.attn(x, x, x) x = x + h h = x x = self.norm2(x) x = x + self.mlp(x) return x class CrossAttentionBlock(nn.Module): """Cross-attention between spatial and freq branches""" def __init__(self, dim_q, dim_kv, num_heads, drop=0.1): super().__init__() self.norm_q = nn.LayerNorm(dim_q) self.norm_kv = nn.LayerNorm(dim_kv) self.attn = nn.MultiheadAttention(dim_q, num_heads, dropout=drop, batch_first=True) # project kv to same dim as q if dim_q != dim_kv: self.kv_proj = nn.Linear(dim_kv, dim_q) else: self.kv_proj = nn.Identity() self.ff = nn.Sequential(nn.LayerNorm(dim_q), nn.Linear(dim_q, dim_q * 4), nn.GELU(), nn.Linear(dim_q * 4, dim_q)) def forward(self, q, kv): qn = self.norm_q(q) kvn = self.kv_proj(self.norm_kv(kv)) attn_out, _ = self.attn(qn, kvn, kvn) q = q + attn_out q = q + self.ff(q) return q class FreqFormerV7(nn.Module): def __init__(self, num_classes=13, embed_dim=384, freq_embed=192, depth=8, num_heads=8, drop=0.1, use_cross=True): super().__init__() self.embed_dim = embed_dim # spatial: xyz + rgb/normals (6) self.spatial_embed = nn.Linear(6, embed_dim) # freq branch: uses xyz+feats (6) for FFT self.freq_proj = nn.Linear(12, freq_embed) # after real+imag concat, length doubles (we'll handle dims dynamically) self.freq_conv = nn.Sequential( FreqConvBlock(freq_embed, freq_embed), FreqConvBlock(freq_embed, freq_embed) ) self.fca = FreqChannelAttention(freq_embed) # projection of freq -> embed_dim self.freq_to_spatial = nn.Linear(freq_embed, embed_dim) # fusion self.fuse_proj = nn.Linear(embed_dim + embed_dim, embed_dim) # transformer backbone self.blocks = nn.ModuleList([ TransformerBlock(embed_dim, num_heads=num_heads, mlp_ratio=4.0, drop=drop) for _ in range(depth) ]) # optional cross attention between spatial and freq (early) self.use_cross = use_cross if use_cross: self.cross = CrossAttentionBlock(embed_dim, freq_embed, num_heads, drop=drop) # classification head self.cls_head = nn.Sequential( nn.LayerNorm(embed_dim), nn.Dropout(0.3), nn.Linear(embed_dim, num_classes) ) self._init_weights() def _init_weights(self): for m in self.modules(): if isinstance(m, nn.Linear): nn.init.xavier_uniform_(m.weight) if m.bias is not None: nn.init.zeros_(m.bias) def forward(self, coords, feats=None): # coords: [B,N,3], feats: [B,N,3] if feats is None: x = coords feats = torch.zeros_like(coords) else: x = torch.cat([coords, feats], dim=-1) B, N, _ = x.shape # spatial branch spatial_feat = self.spatial_embed(x) # [B,N,embed_dim] # FFT branch: build input containing coords+feats then FFT along sequence dimension fft_input = torch.cat([coords, feats], dim=-1) # [B,N,6] # perform FFT along point dimension -> complex tensor [B,N,6] fft_c = torch.fft.fft(fft_input, dim=1) fft_real = fft_c.real fft_imag = fft_c.imag fft_cat = torch.cat([fft_real, fft_imag], dim=-1) # [B,N,12] freq_feat = self.freq_proj(fft_cat) # [B,N,freq_embed] freq_feat = self.freq_conv(freq_feat) freq_feat = self.fca(freq_feat) # optionally cross-attend: let spatial query freq if self.use_cross: # project freq to same dim if needed inside cross spatial_feat = self.cross(spatial_feat, freq_feat) # project freq to embed and fuse freq_to_spatial = self.freq_to_spatial(freq_feat) fused = torch.cat([spatial_feat, freq_to_spatial], dim=-1) fused = self.fuse_proj(fused) # transformer backbone for blk in self.blocks: fused = blk(fused) out = self.cls_head(fused) return out # ----------------------------- # Helpers: confusion, iou # ----------------------------- def compute_confusion_matrix(preds, gts, num_classes): mask = (gts >= 0) &amp; (gts < num_classes) gt = gts[mask].astype(np.int64) pred = preds[mask].astype(np.int64) conf = np.bincount(gt * num_classes + pred, minlength=num_classes ** 2) return conf.reshape((num_classes, num_classes)) def compute_iou_from_conf(conf): inter = np.diag(conf) gt_sum = conf.sum(axis=1) pred_sum = conf.sum(axis=0) union = gt_sum + pred_sum - inter iou = inter / (union + 1e-10) return iou # ----------------------------- # Training loop (DDP-ready, AMP) # ----------------------------- def save_checkpoint(state, path): torch.save(state, path) def train_main(): parser = argparse.ArgumentParser() parser.add_argument('--data_dir', default='/root/autodl-tmp/pointcloud_seg/data/S3DIS_new/processed_npy') parser.add_argument('--save_dir', default='./checkpoints_v7') parser.add_argument('--batch_size', type=int, default=8) parser.add_argument('--num_epochs', type=int, default=200) parser.add_argument('--num_points', type=int, default=2048) parser.add_argument('--num_classes', type=int, default=13) parser.add_argument('--lr', type=float, default=1e-3) parser.add_argument('--local_rank', type=int, default=int(os.environ.get('LOCAL_RANK', 0))) parser.add_argument('--use_class_weights', action='store_true') parser.add_argument('--use_lovasz', action='store_true') parser.add_argument('--warmup_epochs', type=int, default=5) parser.add_argument('--num_workers', type=int, default=8) parser.add_argument('--grad_clip', type=float, default=1.0) parser.add_argument('--debug', action='store_true') parser.add_argument('--drop_last', action='store_true') args = parser.parse_args() # DDP setup world_size = int(os.environ.get('WORLD_SIZE', 1)) use_ddp = world_size > 1 if use_ddp: rank, ws = setup_ddp(args.local_rank) else: rank = 0 ws = 1 device = torch.device('cuda', args.local_rank if torch.cuda.is_available() else 'cpu') os.makedirs(args.save_dir, exist_ok=True) # datasets &amp; samplers train_ds = S3DISDatasetAug(args.data_dir, split='train', num_points=args.num_points, augment=True) val_ds = S3DISDatasetAug(args.data_dir, split='val', num_points=args.num_points, augment=False) if use_ddp: train_sampler = DistributedSampler(train_ds) val_sampler = DistributedSampler(val_ds, shuffle=False) else: train_sampler = None val_sampler = None train_loader = DataLoader(train_ds, batch_size=args.batch_size, sampler=train_sampler, shuffle=(train_sampler is None), num_workers=args.num_workers, drop_last=(not args.debug and args.drop_last)) val_loader = DataLoader(val_ds, batch_size=1, sampler=val_sampler, shuffle=False, num_workers=max(1, args.num_workers // 2)) # class weights class_weights = None if args.use_class_weights: if is_main_process(): print("Computing class weights...") cw = compute_class_weights(train_ds.files, args.num_classes, method='inv_sqrt') class_weights = cw.to(device) if is_main_process(): print("class weights:", class_weights.cpu().numpy()) # model model = FreqFormerV7(num_classes=args.num_classes, embed_dim=384, freq_embed=192, depth=8, num_heads=8, drop=0.1, use_cross=True) model.to(device) if use_ddp: model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model) model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[args.local_rank], output_device=args.local_rank, find_unused_parameters=False) # optimizer, scheduler optimizer = torch.optim.AdamW(model.parameters(), lr=args.lr, weight_decay=1e-4) scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0=30, T_mult=2) scaler = torch.cuda.amp.GradScaler() best_miou = 0.0 start_epoch = 0 if is_main_process(): print("Training config:", vars(args)) print("Model params (M):", sum(p.numel() for p in model.parameters()) / 1e6) # training loop def get_lr_factor(epoch): if epoch < args.warmup_epochs: return float(epoch + 1) / max(1.0, args.warmup_epochs) return 1.0 for epoch in range(start_epoch, args.num_epochs): if use_ddp: train_sampler.set_epoch(epoch) model.train() t0 = time.time() running_loss = 0.0 iters = 0 for batch in train_loader: local_feat = batch['local_feat'].to(device) coords = batch['coords'].to(device) extra = batch['extra'].to(device) labels = batch['label'].to(device) optimizer.zero_grad() with torch.cuda.amp.autocast(): logits = model(coords, extra) B, N, C = logits.shape logits_flat = logits.view(-1, C) labels_flat = labels.view(-1) if class_weights is not None: ce = F.cross_entropy(logits_flat, labels_flat, weight=class_weights, ignore_index=-1) else: ce = F.cross_entropy(logits_flat, labels_flat, ignore_index=-1) probs = F.softmax(logits_flat, dim=-1) dice = multiclass_dice_loss(probs, labels_flat) if args.use_lovasz: lov = lovasz_softmax(probs, labels_flat, ignore_index=-1) else: lov = logits_flat.new_tensor(0.0) # stronger balanced combination loss = 0.5 * ce + 0.8 * dice + 0.5 * lov lr_mult = get_lr_factor(epoch) for g in optimizer.param_groups: g['lr'] = args.lr * lr_mult scaler.scale(loss).backward() torch.nn.utils.clip_grad_norm_(model.parameters(), args.grad_clip) scaler.step(optimizer) scaler.update() running_loss += loss.item() iters += 1 try: scheduler.step() except Exception: pass avg_loss = running_loss / max(1, iters) t1 = time.time() if is_main_process(): print(f"Epoch {epoch + 1}/{args.num_epochs} TrainLoss: {avg_loss:.4f} Time: {(t1 - t0):.1f}s LR: {optimizer.param_groups[0]['lr']:.6f}") # validation every 5 epochs if (epoch + 1) % 5 == 0 or (epoch + 1) == args.num_epochs: model.eval() conf = np.zeros((args.num_classes, args.num_classes), dtype=np.int64) tot_loss = 0.0 cnt = 0 with torch.no_grad(): for batch in val_loader: local_feat = batch['local_feat'].to(device) coords = batch['coords'].to(device) extra = batch['extra'].to(device) labels = batch['label'].to(device) logits = model(coords, extra) B, N, C = logits.shape logits_flat = logits.view(-1, C) labels_flat = labels.view(-1) if class_weights is not None: loss_ce = F.cross_entropy(logits_flat, labels_flat, weight=class_weights, ignore_index=-1) else: loss_ce = F.cross_entropy(logits_flat, labels_flat, ignore_index=-1) probs = F.softmax(logits_flat, dim=-1) dice = multiclass_dice_loss(probs, labels_flat) if args.use_lovasz: lov = lovasz_softmax(probs, labels_flat, ignore_index=-1) else: lov = logits_flat.new_tensor(0.0) loss = 0.5 * loss_ce + 0.8 * dice + 0.5 * lov tot_loss += loss.item() preds = logits.argmax(dim=-1).cpu().numpy().reshape(-1) gts = labels.cpu().numpy().reshape(-1) conf += compute_confusion_matrix(preds, gts, args.num_classes) cnt += 1 mean_loss = tot_loss / max(1, cnt) iou = compute_iou_from_conf(conf) miou = np.nanmean(iou) oa = np.diag(conf).sum() / (conf.sum() + 1e-12) if is_main_process(): print(f"-- Validation Loss: {mean_loss:.4f} mIoU: {miou:.4f} OA: {oa:.4f}") print("Per-class IoU:") for cid, v in enumerate(iou): print(f" class {cid:02d}: {v:.4f}") # gather miou across ranks (optional) - here we assume main does saving if is_main_process() and miou > best_miou: best_miou = miou path = os.path.join(args.save_dir, f'best_epoch_{epoch + 1:03d}_miou_{miou:.4f}.pth') state = {'epoch': epoch + 1, 'best_miou': best_miou} # unwrap DDP if isinstance(model, torch.nn.parallel.DistributedDataParallel): state['model_state_dict'] = model.module.state_dict() else: state['model_state_dict'] = model.state_dict() save_checkpoint(state, path) print("Saved best:", path) # final save if is_main_process(): final_path = os.path.join(args.save_dir, f'final_epoch_{args.num_epochs:03d}_miou_{best_miou:.4f}.pth') state = {'epoch': args.num_epochs, 'best_miou': best_miou} if isinstance(model, torch.nn.parallel.DistributedDataParallel): state['model_state_dict'] = model.module.state_dict() else: state['model_state_dict'] = model.state_dict() save_checkpoint(state, final_path) print("Training finished. Final saved to:", final_path) if __name__ == "__main__": train_main()这个代码运行有问题(base) root@autodl-container-cac742a9c6-f35b76d7:~# source /root/miniconda3/bin/activate pointcloud (pointcloud) root@autodl-container-cac742a9c6-f35b76d7:~# /root/miniconda3/envs/pointcloud/bin/python /root/autodl-tmp/pointcloud_seg/freqformer_v8.py /root/autodl-tmp/pointcloud_seg/freqformer_v8.py:460: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. scaler = torch.cuda.amp.GradScaler() Training config: {'data_dir': '/root/autodl-tmp/pointcloud_seg/data/S3DIS_new/processed_npy', 'save_dir': './checkpoints_v7', 'batch_size': 8, 'num_epochs': 200, 'num_points': 2048, 'num_classes': 13, 'lr': 0.001, 'local_rank': 0, 'use_class_weights': False, 'use_lovasz': False, 'warmup_epochs': 5, 'num_workers': 8, 'grad_clip': 1.0, 'debug': False, 'drop_last': False} Model params (M): 16.879141 /root/autodl-tmp/pointcloud_seg/freqformer_v8.py:489: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(): Epoch 1/200 TrainLoss: nan Time: 7.9s LR: 0.000997 Epoch 2/200 TrainLoss: nan Time: 7.5s LR: 0.000989 Epoch 3/200 TrainLoss: nan Time: 7.2s LR: 0.000976 Epoch 4/200 TrainLoss: nan Time: 6.8s LR: 0.000957 Epoch 5/200 TrainLoss: nan Time: 7.0s LR: 0.000933 -- Validation Loss: nan mIoU: 0.0145 OA: 0.1889 Per-class IoU: class 00: 0.1889 class 01: 0.0000 class 02: 0.0000 class 03: 0.0000 class 04: 0.0000 class 05: 0.0000 class 06: 0.0000 class 07: 0.0000 class 08: 0.0000 class 09: 0.0000 class 10: 0.0000 class 11: 0.0000 class 12: 0.0000 Saved best: ./checkpoints_v7/best_epoch_005_miou_0.0145.pth
最新发布
10-30
<think>我们正在解决FreqFormerV7模型训练中出现的TrainLoss和Validation Loss为NaN的问题。根据提供的引用内容,NaN的出现可能由多种原因引起,我们将逐一排查并给出解决方案。 引用内容中提到的一些常见原因: 1. 数据未归一化(引用[3]) 2. 损失函数中包含除法或对数运算,且输入值可能为负数或非常小的数(引用[3]) 3. 学习率设置过大(引用[4]) 4. 浮点数精度问题(如使用fp16时,CUDA运算可能产生NaN)(引用[2]) 5. 网络结构或数据问题(引用[4]) 由于FreqFormerV7是一个特定的模型,我们可能需要结合这些通用原因进行具体分析。以下是解决步骤: ### 1. 检查数据归一化 确保输入数据已经进行了适当的归一化。例如,图像数据通常归一化到[0,1]或[-1,1]区间。如果数据包含异常值或未归一化,可能导致计算过程中出现非常大的值,进而导致NaN。 ### 2. 检查损失函数 检查模型中使用的损失函数。如果损失函数包含除法、对数运算(如交叉熵损失)或平方根运算,需要确保输入值在合理的范围内。例如,在分类任务中,使用交叉熵损失时,模型的输出(经过softmax)可能接近0,这时取对数会得到负无穷大,导致NaN。 解决办法: - 在计算对数时,可以添加一个小的 epsilon(例如1e-8)来避免对0取对数。 - 检查模型输出是否经过激活函数(如softmax)后出现了0或1,可以通过调整激活函数的计算稳定性来解决。 ### 3. 调整学习率 过大的学习率可能导致优化过程中权重更新幅度过大,从而出现数值不稳定。尝试降低学习率(例如缩小到原来的1/10)并重新训练。 ### 4. 检查浮点数精度 如果使用了混合精度训练(fp16),则可能因为某些运算在fp16下不稳定而导致NaN(引用[2])。可以尝试以下方法: - 禁用混合精度训练,使用fp32进行训练。 - 如果必须使用fp16,可以检查是否在存在问题的操作(如卷积)中避免使用fp16,或者使用更高精度的fp32进行计算(混合精度训练中,某些操作可以保持fp32)。 ### 5. 检查网络结构和数据 - 如果网络结构较浅而数据量很大,可能会导致拟合能力不足(引用[4]中提到的样本多,网络小)。考虑增加网络深度或宽度。 - 检查数据中是否存在NaN或无穷大的值。可以使用如下代码检查数据: ```python import torch # 检查训练数据 for inputs, labels in train_loader: if torch.isnan(inputs).any() or torch.isinf(inputs).any(): print("训练数据包含NaN或Inf") if torch.isnan(labels).any() or torch.isinf(labels).any(): print("标签包含NaN或Inf") ``` ### 6. 梯度裁剪 在训练过程中,梯度爆炸也可能导致NaN。可以在优化器步骤之前进行梯度裁剪: ```python torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) ``` ### 7. 添加断言检查 在训练循环中添加断言检查,以便在出现NaN时立即停止并打印相关信息(引用[3]): ```python loss = ... # 计算损失 assert not torch.isnan(loss).any(), "Loss is NaN" assert not torch.isinf(loss).any(), "Loss is Inf" ``` ### 8. 其他可能原因 - 自定义层的实现:如果模型中包含自定义层,检查该层的实现是否存在数值不稳定的计算。 - 权重初始化:不合适的权重初始化可能导致输出过大,从而引发NaN。尝试使用不同的初始化方法。 ## 实施步骤建议 1. **验证数据**:首先确保数据没有NaN或Inf,并且已经归一化。 2. **简化问题**:尝试在一个小数据集上运行训练,看是否仍然出现NaN。如果小数据集上不出现,则可能是大数据集的问题。 3. **逐步排查**:按照上述可能的因素,逐个尝试调整(如降低学习率、禁用混合精度、添加梯度裁剪等)。 通过以上步骤,通常可以定位并解决NaN问题。如果问题仍然存在,可能需要更深入地检查模型架构的具体实现。 §§相关问题§§ 1. 如何检测训练数据中是否存在NaN或Inf? 2. 在PyTorch中如何正确使用混合精度训练以避免NaN? 3. 损失函数设计时如何避免数值不稳定? 4. 梯度裁剪的作用是什么?如何在PyTorch中实现? </think>### FreqFormerV7模型训练中Loss出现NaN的解决方案 根据引用内容和深度学习常见问题,以下是解决TrainLoss和Validation Loss出现NaN的系统性方案: #### 1. **检查数据预处理(首要步骤)** - 确保输入数据已正确归一化(如`Normalize(mean, std)`),消除异常值[^3] - 验证数据读取管道: ```python # 检查数据是否包含NaN/Inf print(torch.isnan(dataset[0]).any(), torch.isinf(dataset[0]).any()) ``` #### 2. **调整训练超参数** - **降低学习率**(最常见解决方案): ```python optimizer = Adam(model.parameters(), lr=1e-5) # 从1e-3逐步下调 ``` - **添加梯度裁剪**防止梯度爆炸: ```python torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) ``` #### 3. **检查浮点精度问题** - 禁用FP16混合精度训练(引用[2]问题): ```python # 删除amp.autocast()相关代码 trainer = Trainer(precision='32-true') # PyTorch Lightning示例 ``` - 若需FP16,添加梯度缩放: ```python scaler = GradScaler() # 防止下溢 ``` #### 4. **损失函数稳定性处理** - 对数运算添加epsilon防零除: ```python loss = -torch.log(pred + 1e-8) # 避免log(0) ``` - 自定义损失函数添加断言(引用[3]方法): ```python assert not torch.isnan(loss), f"NaN at {current_step}" ``` #### 5. **模型结构调整** - 减少层数或隐藏单元,降低复杂度(引用[4]经验) - 添加正则化组件: ```python self.dropout = nn.Dropout(0.5) # 抑制过拟合 ``` #### 6. **诊断工具** ```python # 实时监控梯度 for name, param in model.named_parameters(): if param.grad is not None: grad_mean = param.grad.mean() if torch.isnan(grad_mean): print(f"NaN gradient in {name}") ``` #### 7. **环境验证** - 更新CUDA/cuDNN版本(引用[2]的NVIDIA问题) - 最小化测试:用10个样本验证能否过拟合 > **关键排查顺序**:数据检查 → 学习率调整 → 精度设置 → 损失函数加固 → 梯度监控 → 架构简化。建议优先尝试学习率下调(70%案例有效)和FP32训练[^2][^4]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值