CV中常用Backbone-3:Clip/SAM原理以及代码操作

前面已经介绍了简单的视觉编码器,这里主要介绍多模态中使用比较多的两种backbone:1、Clip;2、SAM。对于这两个backbone简单介绍基本原理,主要是讨论使用这个backbone。
1、CV中常用Backbone-2:ConvNeXt模型详解
2、CV中常用Backbone(Resnet/Unet/Vit系列/多模态系列等)以及代码

SAM

SAM已经出了两个版本分别是:SAM v1和SAM v2这里对这两种分别进行解释,并且着重了解一下他的数据集是怎么构建的(毕竟很多论文里面都会提到直接用SAM作为一种数据集生成工具)

SAM v1[1]

https://arxiv.org/pdf/2304.02643
官方Blog:Introducing Segment Anything: Working toward the first foundation model for image segmentation[2]

结构上还是比较简单,首先在 Image Encoder:选择的是MAEPrompt Encoder:从上面结构图很容易知道就3类prompt:1、text用CLIP进行编码;2、points、bbox使用论文[3]主要是通过傅里叶特征映射方法来提高模型对于高频函数学习能力)中的方法来进行编码处理;3、mask这种内容就直接使用卷积进行编码而后将结果和编码后图像相加;

对于points以及bbox编码原理很简单就是用点或者坐标直接计算他们的傅里叶特征,比如说points的伪代码

import numpy as np
# 假设输入点为2D,[x, y]
points = np.array([[0.5, 0.3], [0.2, 0.7]])  # 形状: (N, 2)
m = 256  # 映射维度
sigma = 10.0  # 频率控制参数

# 生成随机矩阵B
B = np.random.normal(0, sigma, size=(m, 2))  # 形状: (m, 2)
# 计算傅里叶特征
Bx = np.dot(points, B.T)  # 点积,形状: (N, m)
fourier_features = np.concatenate([np.cos(2 * np.pi * Bx), np.sin(2 * np.pi * Bx)], axis=1)  # 形状: (N, 2m)

Mask decoder:掩码解码器可以有效的将图嵌入、提示嵌入和输出标记映射到掩码。本模型的解码器基于Transformer的解码器块修改,在解码器后添加了动态掩码预测头。解码器使用了提示自注意力和交叉注意力在提示到图嵌入(prompt-to-image embedding)和vice-versa两个方面进行了修改。完成这两个部分后,对图像进行上采样再使用MLP将输出标记映射到动态线性分类器上,最终得出每个图像位置的蒙板前景概率。

Resolving ambiguity:对于一个不确定的提示,模型会给出多个有效掩码,经过修改SAM可以由单个提示预测输出多个掩码(一般是3个--整体、部分、子部分)。训练时,仅掩码进行反向传播。为了对掩码进行排名,模型会预测每个掩码的置信分数(使用IOU度量),所谓的整体、部分、子部分,比如说:

SAM v2[4]

https://arxiv.org/pdf/2408.00714

SAM v2更像是SAM v1在视频邻域的泛化,整个模型结构如下所示:

主要值得关注的是其中的 Memory Attention:将当前帧的特征与过去帧的特征和预测以及任何新的提示联系起来。通过堆叠了 L 个transformer模块,第一个模块将当前帧的图像编码作为输入。每个区块执行self-attention,然后cross-attention(提示/未提示)帧和对象的记忆,这些记忆存储在一个记忆库中,接着是一个 MLP。在self-attention和cross-attention中使用了vanilla注意力操作,从而受益于高效注意力内核的最新发展。
memory encoder通过使用卷积模块对输出掩码进行下采样,并将其与图像编码器的无条件帧嵌入相加,生成记忆,然后使用轻量级卷积层来融合信息。
memory bank通过维护最多N个最近帧的FIFO记忆队列来保留视频中目标对象的过去预测信息,并将提示信息存储在最多M个提示帧的FIFO队列中。例如,在VOS任务中,初始掩码是唯一的提示,内存库始终保留第一帧的记忆以及最多N个最近(非提示)帧的记忆。两组记忆都以空间特征图的形式存储。
除空间存储器外,还根据每个帧的掩码解码器输出标记,将对象指针列表作为轻量级向量存储起来,用于存储要分割对象的高级语义信息。
我们将时间位置信息嵌入到N个最近帧的memory中,允许模型表示短期物体运动,但不包含到提示帧的记忆中,因为提示帧的训练信号更稀疏,并且更难以推广到推理设置中,提示帧可能来自与训练期间看到的时间范围非常不同的时间范围。

Clip[5]

Clip模型结构(论文里面提到的)也比较简单,其核心机制为:核心机制是通过对比学习和嵌入空间对齐,将图像和文本映射到一个共享的语义空间中

预训练过程:直接将文本和图像都进行编码,而后将编码后的内容通过计算他的相似度(比如:cosine similarities)来确保模型最后能够对齐文本和图像之间的特征。
使用过程:对于给定的图像直接通过Clip的图像编码,而后将文本进行编码(文本编码中会有一个 label dataset通过从label dataset中抽取出标签和自己文本进行组合得到n条微博呢)再去计算最后的结果。

代码操作

所有代码见:sam-clip.ipynb

参考


  1. https://arxiv.org/pdf/2304.02643 ↩︎

  2. https://ai.meta.com/blog/segment-anything-foundation-model-image-segmentation/ ↩︎

  3. https://arxiv.org/abs/2006.10739 ↩︎

  4. https://arxiv.org/pdf/2408.00714 ↩︎

  5. https://arxiv.org/pdf/2103.00020 ↩︎

原创作者: Big-Yellow 转载于: https://www.cnblogs.com/Big-Yellow/p/18895944
# =================================================================== # FULL IMPLEMENTATION: Lane Detection on TuSimple # Model: ResNet50-DCNv2 + FPN + MultiTaskHead # Loss: Focal + Regress + Distance + Variance # Dataset: Auto-download & organize TuSimple (JSONL support) # Eval Metrics: Acc, P, R, F1, FPR, FNR # =================================================================== import os import json import cv2 import numpy as np from PIL import Image import torch import torch.nn as nn import torch.nn.functional as F from torch.utils.data import Dataset, DataLoader import torchvision.transforms as T import torchvision.models as models from torchvision.ops import DeformConv2d # ====================================== # 1. 可变形卷积模块 # ====================================== class DeformableConv(nn.Module): def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1): super().__init__() self.offset_conv = nn.Conv2d(in_channels, 2 * kernel_size * kernel_size, kernel_size, stride, padding) self.dcn = DeformConv2d(in_channels, out_channels, kernel_size, stride=stride, padding=padding) def forward(self, x): offset = self.offset_conv(x) return self.dcn(x, offset) # ====================================== # 2. 支持 DCN 的 Bottleneck # ====================================== class Bottleneck(nn.Module): expansion = 4 def __init__(self, inplanes, planes, stride=1, downsample=None, dilation=1, use_dcn=False): super(Bottleneck, self).__init__() self.use_dcn = use_dcn self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False) self.bn1 = nn.BatchNorm2d(planes) if use_dcn: self.conv2 = DeformableConv(planes, planes, kernel_size=3, stride=stride, padding=dilation) else: self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=dilation, dilation=dilation, bias=False) self.bn2 = nn.BatchNorm2d(planes) self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, bias=False) self.bn3 = nn.BatchNorm2d(planes * self.expansion) self.relu = nn.ReLU(inplace=True) self.downsample = downsample self.stride = stride def forward(self, x): identity = x out = self.conv1(x) out = self.bn1(out) out = self.relu(out) out = self.conv2(out) if not self.use_dcn: out = self.bn2(out) out = self.relu(out) out = self.conv3(out) out = self.bn3(out) if self.downsample is not None: identity = self.downsample(x) # 自动对齐 spatial 维度 if out.shape[-1] != identity.shape[-1] or out.shape[-2] != identity.shape[-2]: h, w = identity.shape[2:] out = F.interpolate(out, size=(h, w), mode=&#39;bilinear&#39;, align_corners=False) out += identity out = self.relu(out) return out # ====================================== # 3. 构建残差层 # ====================================== def make_res_layer(block, inplanes, planes, blocks, stride=1, dilation=1, use_dcn=False): downsample = None if stride != 1 or inplanes != planes * block.expansion: downsample = nn.Sequential( nn.Conv2d(inplanes, planes * block.expansion, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(planes * block.expansion), ) layers = [] first_dilation = 1 if dilation == 1 else dilation // 2 layers.append(block(inplanes, planes, stride, downsample, dilation=first_dilation, use_dcn=False)) for _ in range(1, blocks): layers.append(block(planes * block.expansion, planes, dilation=dilation, use_dcn=use_dcn)) return nn.Sequential(*layers) # ====================================== # 4. 修改版 ResNet50 主干 # ====================================== class ModifiedResNet50(nn.Module): def __init__(self, pretrained=True): super(ModifiedResNet50, self).__init__() weights = models.ResNet50_Weights.IMAGENET1K_V1 if pretrained else None original = models.resnet50(weights=weights) self.inplanes = 64 self.conv1 = original.conv1 self.bn1 = original.bn1 self.relu = original.relu self.maxpool = original.maxpool self.layer1 = original.layer1 # out: 256 self.layer2 = original.layer2 # out: 512 self.layer3 = make_res_layer(Bottleneck, 512, 256, 6, stride=1, dilation=2, use_dcn=True) # 256*4=1024 self.layer4 = make_res_layer(Bottleneck, 1024, 512, 3, stride=1, dilation=4, use_dcn=True) # 512*4=2048 def forward(self, x): c1 = self.relu(self.bn1(self.conv1(x))) c1 = self.maxpool(c1) c2 = self.layer1(c1) # H/4 c3 = self.layer2(c2) # H/8 c4 = self.layer3(c3) # H/8, d=2 c5 = self.layer4(c4) # H/8, d=4 return c3, c4, c5 # ====================================== # 5. FPN Neck # ====================================== class FPN(nn.Module): def __init__(self, in_channels_list=[512, 1024, 2048], out_channels=256): super(FPN, self).__init__() self.lateral_convs = nn.ModuleList([ nn.Conv2d(in_c, out_channels, 1) for in_c in in_channels_list ]) self.fusion_conv = nn.Conv2d(out_channels * 3, out_channels, 3, padding=1, bias=False) self.norm = nn.BatchNorm2d(out_channels) self.relu = nn.ReLU(inplace=True) def forward(self, inputs): c3, c4, c5 = inputs l3 = self.lateral_convs[0](c3) l4 = self.lateral_convs[1](c4) l5 = self.lateral_convs[2](c5) fused = torch.cat([l3, l4, l5], dim=1) return self.relu(self.norm(self.fusion_conv(fused))) # ====================================== # 6. 多任务 Head # ====================================== class MultiTaskHead(nn.Module): def __init__(self, in_channels=256, num_classes=5, output_size=(384, 640)): super(MultiTaskHead, self).__init__() self.cls = nn.Conv2d(in_channels, num_classes, 1) self.offset = nn.Conv2d(in_channels, 2, 1) self.distance = nn.Conv2d(in_channels, 1, 1) self.variance = nn.Conv2d(in_channels, 2, 1) self.output_size = output_size def forward(self, x): upsample = lambda pred: F.interpolate(pred, size=self.output_size, mode=&#39;bilinear&#39;, align_corners=False) return { &#39;cls&#39;: upsample(self.cls(x)), &#39;offset&#39;: upsample(self.offset(x)), &#39;distance&#39;: upsample(self.distance(x)), &#39;variance&#39;: upsample(self.variance(x)) } # ====================================== # 7. 主模型 # ====================================== class LaneSegNet_MultiTask(nn.Module): def __init__(self, num_classes=5): super(LaneSegNet_MultiTask, self).__init__() self.backbone = ModifiedResNet50(pretrained=True) self.fpn = FPN(out_channels=256) self.head = MultiTaskHead(256, num_classes, output_size=(384, 640)) def forward(self, x): feats = self.backbone(x) fpn_out = self.fpn(feats) return self.head(fpn_out) # ====================================== # 8. TuSimple Dataset (支持 JSONL) # ====================================== class TuSimpleDataset(Dataset): def __init__(self, root, split=&#39;train&#39;, img_size=(384, 640), transform=None): self.root = root self.img_size = img_size self.transform = transform or T.Compose([ T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) self.image_paths = [] self.lanes_data = [] if split == &#39;train&#39;: label_files = [ os.path.join(root, &#39;train_set&#39;, &#39;label_data_0313.json&#39;), os.path.join(root, &#39;train_set&#39;, &#39;label_data_0531.json&#39;), os.path.join(root, &#39;train_set&#39;, &#39;label_data_0601.json&#39;) ] require_lanes = True else: label_files = [os.path.join(root, &#39;test_set&#39;, &#39;test_tasks_0627.json&#39;)] require_lanes = False # 测试集不需要 lanes 标注 for file in label_files: if not os.path.exists(file): print(f"[WARNING] Label file not found: {file}") continue with open(file, &#39;r&#39;) as f: lines = f.readlines() for line in lines: line = line.strip() if not line: continue try: data = json.loads(line) raw_file = data[&#39;raw_file&#39;] # === 修复路径映射问题 === if split == &#39;train&#39;: if not raw_file.startswith(&#39;train_set&#39;): raw_file = os.path.join(&#39;train_set&#39;, raw_file) img_path = os.path.join(root, raw_file) else: # split == &#39;test&#39; # 测试集图片位于 test_set/images/clips/xxx img_path = os.path.join(root, &#39;test_set&#39;, &#39;images&#39;, raw_file) # 检查图像是否存在 if not os.path.exists(img_path): # 可选:打印缺失文件调试 # print(f"[DEBUG] Image not found: {img_path}") continue # 是否需要标注字段? has_annotations = (&#39;lanes&#39; in data and &#39;h_samples&#39; in data) if require_lanes and (not has_annotations or len(data[&#39;lanes&#39;]) == 0): continue self.image_paths.append(img_path) self.lanes_data.append(data) except json.JSONDecodeError as e: print(f"[ERROR] Failed to parse line: {line[:60]}... | Error: {e}") continue self.image_paths.append(img_path) self.lanes_data.append(data) except json.JSONDecodeError as e: print(f"[ERROR] Failed to parse line: {line[:60]}... | Error: {e}") continue assert len(self.image_paths) > 0, f"No valid samples found for {split} set under {root}" print(f"✅ Loaded {len(self)} samples for {split} set.") def __len__(self): return len(self.image_paths) def __getitem__(self, idx): img_path = self.image_paths[idx] data = self.lanes_data[idx] image = cv2.imread(img_path) if image is None: print(f"[WARNING] Failed to load image: {img_path}") return self[(idx + 1) % len(self)] image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) orig_h, orig_w = image.shape[:2] image_pil = Image.fromarray(image) image_resized = image_pil.resize((self.img_size[1], self.img_size[0]), Image.BILINEAR) cls_map = np.zeros((self.img_size[0], self.img_size[1]), dtype=np.int64) offset_map = np.zeros((2, self.img_size[0], self.img_size[1]), dtype=np.float32) distance_map = np.full((1, self.img_size[0], self.img_size[1]), 1e6, dtype=np.float32) ratio_h, ratio_w = self.img_size[0] / orig_h, self.img_size[1] / orig_w lanes = data[&#39;lanes&#39;] h_samples = data[&#39;h_samples&#39;] all_lane_points = [] for xs in lanes: points = [(int(x * ratio_w), int(y * ratio_h)) for x, y in zip(xs, h_samples) if x >= 0] points = [(px, py) for px, py in points if 0 <= px < self.img_size[1] and 0 <= py < self.img_size[0]] if len(points) > 1: all_lane_points.append(np.array(points)) if all_lane_points: yy, xx = np.mgrid[0:self.img_size[0], 0:self.img_size[1]] coords = np.stack([xx, yy], axis=-1).astype(np.float32) for points in all_lane_points: dists = np.linalg.norm(coords[:, :, None] - points[None, None, :], axis=-1) min_dists = dists.min(axis=-1) nearest_idx = dists.argmin(axis=-1) nearest_pts = points[nearest_idx] offsets = nearest_pts - coords update_mask = min_dists < distance_map[0] offset_map[:, update_mask] = offsets[update_mask].T distance_map[0][update_mask] = min_dists[update_mask] offset_map /= 16.0 distance_map = np.clip(distance_map, 0, 100) / 50.0 image_tensor = self.transform(image_resized) label_tensor = torch.from_numpy(cls_map).long() offset_tensor = torch.from_numpy(offset_map).float() distance_tensor = torch.from_numpy(distance_map).float() return { &#39;image&#39;: image_tensor, &#39;label&#39;: label_tensor, &#39;offset&#39;: offset_tensor, &#39;distance&#39;: distance_tensor } # ====================================== # 9. 损失函数 # ====================================== class FocalLoss(nn.Module): def __init__(self, alpha=1, gamma=2): super().__init__() self.alpha = alpha self.gamma = gamma def forward(self, pred, target): ce_loss = F.cross_entropy(pred, target, ignore_index=0, reduction=&#39;none&#39;) pt = torch.exp(-ce_loss) focal_loss = self.alpha * (1 - pt) ** self.gamma * ce_loss return focal_loss.mean() class RegressLoss(nn.Module): def __init__(self): super().__init__() self.criterion = nn.SmoothL1Loss(reduction=&#39;mean&#39;) def forward(self, pred, target, mask): if mask.sum() == 0: return pred.new_zeros([]) return self.criterion(pred[mask], target[mask]) class DistanceLoss(nn.Module): def __init__(self): super().__init__() self.criterion = nn.MSELoss() def forward(self, pred, target): return self.criterion(pred, target) class VarianceLoss(nn.Module): def __init__(self): super().__init__() def forward(self, pred_mean, pred_logvar, target): precision = torch.exp(-pred_logvar) loss = precision * (target - pred_mean) ** 2 + pred_logvar return loss.mean() # ====================================== # 10. 评估函数 # ====================================== @torch.no_grad() def compute_metrics(pred_cls, target, num_classes=5, ignore_index=0): pred_label = torch.argmax(pred_cls, dim=1) pred_flat = pred_label.view(-1) target_flat = target.view(-1) mask = (target_flat >= ignore_index) & (target_flat < num_classes) n = num_classes hist = torch.bincount(n * target_flat[mask] + pred_flat[mask], minlength=n**2).reshape(n, n).float() tp = hist[1:, 1:].diag().sum().item() fp = (hist.sum(dim=0)[1:] - hist[1:, 1:].diag()).sum().item() fn = (hist.sum(dim=1)[1:] - hist[1:, 1:].diag()).sum().item() tn = hist[0, 0].item() eps = 1e-8 acc = (tp + tn) / (tp + tn + fp + fn + eps) prec = tp / (tp + fp + eps) rec = tp / (tp + fn + eps) f1 = 2 * prec * rec / (prec + rec + eps) fpr = fp / (fp + tn + eps) fnr = fn / (fn + tp + eps) return {&#39;accuracy&#39;: acc, &#39;precision&#39;: prec, &#39;recall&#39;: rec, &#39;f1&#39;: f1, &#39;fpr&#39;: fpr, &#39;fnr&#39;: fnr} @torch.no_grad() def evaluate(model, dataloader, device): model.eval() total_metrics = {k: 0.0 for k in [&#39;accuracy&#39;, &#39;precision&#39;, &#39;recall&#39;, &#39;f1&#39;, &#39;fpr&#39;, &#39;fnr&#39;]} count = 0 for data in dataloader: images = data[&#39;image&#39;].to(device) labels = data[&#39;label&#39;].to(device) outputs = model(images) metrics = compute_metrics(outputs[&#39;cls&#39;], labels) for k in total_metrics: total_metrics[k] += metrics[k] count += 1 return {k: v / count for k, v in total_metrics.items()} if count else total_metrics # ====================================== # 11. 自动准备 TuSimple 数据集 # ====================================== def prepare_tusimple_dataset(): try: import kagglehub except ImportError: raise ImportError( "Please install kagglehub:\n" "pip install kagglehub\n\n" "Or download dataset manually from:\n" "https://www.kaggle.com/datasets/manideep1108/tusimple" ) candidate_paths = [ os.getenv("TUSIMPLE_DATA_PATH"), os.path.join(os.getcwd(), "TUSimple"), os.path.expanduser("~/TUSimple") ] dataset_root = None for path in candidate_paths: if not path or not os.path.exists(path): continue has_train = os.path.isdir(os.path.join(path, "train_set")) has_test = os.path.isdir(os.path.join(path, "test_set")) if has_train or has_test: dataset_root = path print(f"✅ Found existing dataset at: {dataset_root}") break if dataset_root is None: print("🔍 No local dataset found. Downloading via kagglehub...") try: dataset_root = kagglehub.dataset_download("manideep1108/tusimple") print(f"🎉 Dataset downloaded to: {dataset_root}") except Exception as e: raise RuntimeError( f"Download failed: {e}\n" "👉 Please place the dataset in one of these paths:\n" f" - {candidate_paths[1]}\n" f" - {candidate_paths[2]}" ) return dataset_root # ====================================== # 12. 主函数 # ====================================== def main(): device = torch.device(&#39;cuda&#39; if torch.cuda.is_available() else &#39;cpu&#39;) print(f"🚀 Using device: {device}") try: TUSIMPLE_ROOT = prepare_tusimple_dataset() except Exception as e: print(e) return transform = T.Compose([ T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) train_dataset = TuSimpleDataset(TUSIMPLE_ROOT, split=&#39;train&#39;, img_size=(384, 640), transform=transform) val_dataset = TuSimpleDataset(TUSIMPLE_ROOT, split=&#39;test&#39;, img_size=(384, 640), transform=transform) train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True, num_workers=4, pin_memory=True) val_loader = DataLoader(val_dataset, batch_size=4, shuffle=False, num_workers=4) model = LaneSegNet_MultiTask(num_classes=5).to(device) optimizer = torch.optim.Adam(model.parameters(), lr=0.001) criterion_focal = FocalLoss().to(device) criterion_regress = RegressLoss().to(device) criterion_distance = DistanceLoss().to(device) criterion_variance = VarianceLoss().to(device) w_cls, w_reg, w_dist, w_var = 1.0, 1.0, 0.5, 0.3 epochs = 50 for epoch in range(epochs): model.train() total_loss = 0.0 for i, data in enumerate(train_loader): images = data[&#39;image&#39;].to(device) labels = data[&#39;label&#39;].to(device) offsets = data[&#39;offset&#39;].to(device) distances = data[&#39;distance&#39;].to(device) masks = (labels > 0).unsqueeze(1).expand_as(offsets) optimizer.zero_grad() outputs = model(images) loss_cls = criterion_focal(outputs[&#39;cls&#39;], labels) loss_reg = criterion_regress(outputs[&#39;offset&#39;], offsets, masks) loss_dist = criterion_distance(outputs[&#39;distance&#39;], distances) loss_var = criterion_variance(outputs[&#39;offset&#39;], outputs[&#39;variance&#39;], offsets) loss = w_cls * loss_cls + w_reg * loss_reg + w_dist * loss_dist + w_var * loss_var loss.backward() optimizer.step() total_loss += loss.item() if (i + 1) % 20 == 0: print(f"Epoch [{epoch + 1}/50], Step [{i + 1}/{len(train_loader)}], Loss: {loss.item():.4f}") avg_loss = total_loss / len(train_loader) print(f"Epoch [{epoch + 1}/50] Average Loss: {avg_loss:.4f}") val_metrics = evaluate(model, val_loader, device) print(f"Val Acc: {val_metrics[&#39;accuracy&#39;]:.4f}, F1: {val_metrics[&#39;f1&#39;]:.4f}, " f"Prec: {val_metrics[&#39;precision&#39;]:.4f}, Rec: {val_metrics[&#39;recall&#39;]:.4f}, " f"FPR: {val_metrics[&#39;fpr&#39;]:.4f}, FNR: {val_metrics[&#39;fnr&#39;]:.4f}") save_path = os.path.join(os.path.dirname(__file__), "tusimple_lane_model.pth") torch.save(model.state_dict(), save_path) print(f"✅ Training completed. Model saved to {save_path}") if __name__ == "__main__": main() 增加特征融合使用特征金字塔(FPN) 融合低层高分辨率特征与高层高语义特征,提升复杂场景特征提取能力 增加注意力机制使用CBAM 混合注意力 结合通道注意力(SE 模块)与空间注意力(SAM 模块),全面捕捉关键特征,降低噪声干扰 增加BEV 视角转换使用IPM+MLP IPM 计算快但依赖参数,MLP 轻量易部署,结合后平衡精度与效率
最新发布
11-29
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值