TypeError: Can not convert a float32 into a Tensor or Operation.

本文介绍了一种常见的TensorFlow编程错误:尝试将float32类型的变量作为Tensor使用时出现的TypeError,并提供了详细的错误分析及解决方法。

错误

TypeError: Can not convert a float32 into a Tensor or Operation.
# 类型错误:不能将一个浮动32转换为一个张量或操作。

TypeError: Fetch argument 2.3025854 has invalid type <class 'numpy.float32'>, must be a string or Tensor. (Can not convert a float32 into a Tensor or Operation.)

如其意,类型错误:不能将一个浮动32转换为一个张量或操作。也就是在计算图实际运算的时候发生的错误,错误为数据类型错误,将应该是传入的张量或者操作的的地方传入了一个 float32 类型的数据.

分析

源码

      for step in range(1000):
            batch_xs, batch_ys = mnist.train.next_batch(100)
            _, Train_Loss = sess.run([Train_Step,Train_Loss], feed_dict={X: batch_xs, Y_true: batch_ys})
            print('Train step :',step,'Train_Losss :',Train_Loss)

错误产生的时候我查了查代码,没明白什么意思,后来debug才发现,是变量在第二遍运行的时候数据类型发生了改变,
变量的数据类型发生了改变

其改变是在第一遍的时候,仔细看源代码,输出的Train_Loss与后面sess.run([Train_Step,Train_Loss], feed_dict={X: batch_xs, Y_true: batch_ys})中的’Train_Loss’同名,显然第一遍运行之后,原本应该传入session的Train_Loss就从张量变成了 float32

解决方案

避免重名,使用不同的名字
如原来的

_, Train_Loss = sess.run([Train_Step,Train_Loss], feed_dict={X: batch_xs, Y_true: batch_ys})

改为

 _, train_loss = sess.run([Train_Step,Train_Loss], feed_dict={X: batch_xs, Y_true: batch_ys})

BUG就消失了
这是一个很低级的BUG…..但是也很有趣.

转载于:https://www.cnblogs.com/fonttian/p/9162792.html

# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license import contextlib import pickle import re import types from copy import deepcopy from pathlib import Path from .AddModules import * import thop import torch import torch.nn as nn from ultralytics.nn.modules import ( AIFI, C1, C2, C2PSA, C3, C3TR, ELAN1, OBB, PSA, SPP, SPPELAN, SPPF, AConv, ADown, Bottleneck, BottleneckCSP, C2f, C2fAttn, C2fCIB, C2fPSA, C3Ghost, C3k2, C3x, CBFuse, CBLinear, Classify, Concat, Conv, Conv2, ConvTranspose, Detect, DWConv, DWConvTranspose2d, Focus, GhostBottleneck, GhostConv, HGBlock, HGStem, ImagePoolingAttn, Index, Pose, RepC3, RepConv, RepNCSPELAN4, RepVGGDW, ResNetLayer, RTDETRDecoder, SCDown, Segment, TorchVision, WorldDetect, v10Detect, A2C2f, ) from ultralytics.utils import DEFAULT_CFG_DICT, DEFAULT_CFG_KEYS, LOGGER, colorstr, emojis, yaml_load from ultralytics.utils.checks import check_requirements, check_suffix, check_yaml from ultralytics.utils.loss import ( E2EDetectLoss, v8ClassificationLoss, v8DetectionLoss, v8OBBLoss, v8PoseLoss, v8SegmentationLoss, ) from ultralytics.utils.ops import make_divisible from ultralytics.utils.plotting import feature_visualization from ultralytics.utils.torch_utils import ( fuse_conv_and_bn, fuse_deconv_and_bn, initialize_weights, intersect_dicts, model_info, scale_img, time_sync, ) from .AddModules import * class BaseModel(nn.Module): """The BaseModel class serves as a base class for all the models in the Ultralytics YOLO family.""" def forward(self, x, *args, **kwargs): """ Perform forward pass of the model for either training or inference. If x is a dict, calculates and returns the loss for training. Otherwise, returns predictions for inference. Args: x (torch.Tensor | dict): Input tensor for inference, or dict with image tensor and labels for training. *args (Any): Variable length argument list. **kwargs (Any): Arbitrary keyword arguments. Returns: (torch.Tensor): Loss if x is a dict (training), or network predictions (inference). """ if isinstance(x, dict): # for cases of training and validating while training. return self.loss(x, *args, **kwargs) return self.predict(x, *args, **kwargs) def predict(self, x, profile=False, visualize=False, augment=False, embed=None): """ Perform a forward pass through the network. Args: x (torch.Tensor): The input tensor to the model. profile (bool): Print the computation time of each layer if True, defaults to False. visualize (bool): Save the feature maps of the model if True, defaults to False. augment (bool): Augment image during prediction, defaults to False. embed (list, optional): A list of feature vectors/embeddings to return. Returns: (torch.Tensor): The last output of the model. """ if augment: return self._predict_augment(x) return self._predict_once(x, profile, visualize, embed) def _predict_once(self, x, profile=False, visualize=False, embed=None): y, dt, embeddings = [], [], [] # outputs for m in self.model: if m.f != -1: # if not from previous layer x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers if profile: self._profile_one_layer(m, x, dt) if hasattr(m, 'backbone'): x = m(x) if len(x) != 5: # 0 - 5 x.insert(0, None) for index, i in enumerate(x): if index in self.save: y.append(i) else: y.append(None) x = x[-1] # 最后一个输出传给下一层 else: x = m(x) # run y.append(x if m.i in self.save else None) # save output if visualize: feature_visualization(x, m.type, m.i, save_dir=visualize) if embed and m.i in embed: embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1)) # flatten if m.i == max(embed): return torch.unbind(torch.cat(embeddings, 1), dim=0) return x def _predict_augment(self, x): """Perform augmentations on input image x and return augmented inference.""" LOGGER.warning( f"WARNING ⚠️ {self.__class__.__name__} does not support 'augment=True' prediction. " f"Reverting to single-scale prediction." ) return self._predict_once(x) def _profile_one_layer(self, m, x, dt): """ Profile the computation time and FLOPs of a single layer of the model on a given input. Appends the results to the provided list. Args: m (nn.Module): The layer to be profiled. x (torch.Tensor): The input data to the layer. dt (list): A list to store the computation time of the layer. Returns: None """ c = m == self.model[-1] and isinstance(x, list) # is final layer list, copy input as inplace fix flops = thop.profile(m, inputs=[x.copy() if c else x], verbose=False)[0] / 1e9 * 2 if thop else 0 # GFLOPs t = time_sync() for _ in range(10): m(x.copy() if c else x) dt.append((time_sync() - t) * 100) if m == self.model[0]: LOGGER.info(f"{'time (ms)':>10s} {'GFLOPs':>10s} {'params':>10s} module") LOGGER.info(f"{dt[-1]:10.2f} {flops:10.2f} {m.np:10.0f} {m.type}") if c: LOGGER.info(f"{sum(dt):10.2f} {'-':>10s} {'-':>10s} Total") def fuse(self, verbose=True): """ Fuse the `Conv2d()` and `BatchNorm2d()` layers of the model into a single layer, in order to improve the computation efficiency. Returns: (nn.Module): The fused model is returned. """ if not self.is_fused(): for m in self.model.modules(): if isinstance(m, (Conv, Conv2, DWConv)) and hasattr(m, "bn"): if isinstance(m, Conv2): m.fuse_convs() m.conv = fuse_conv_and_bn(m.conv, m.bn) # update conv delattr(m, "bn") # remove batchnorm m.forward = m.forward_fuse # update forward if isinstance(m, ConvTranspose) and hasattr(m, "bn"): m.conv_transpose = fuse_deconv_and_bn(m.conv_transpose, m.bn) delattr(m, "bn") # remove batchnorm m.forward = m.forward_fuse # update forward if isinstance(m, RepConv): m.fuse_convs() m.forward = m.forward_fuse # update forward if isinstance(m, RepVGGDW): m.fuse() m.forward = m.forward_fuse self.info(verbose=verbose) return self def is_fused(self, thresh=10): """ Check if the model has less than a certain threshold of BatchNorm layers. Args: thresh (int, optional): The threshold number of BatchNorm layers. Default is 10. Returns: (bool): True if the number of BatchNorm layers in the model is less than the threshold, False otherwise. """ bn = tuple(v for k, v in nn.__dict__.items() if "Norm" in k) # normalization layers, i.e. BatchNorm2d() return sum(isinstance(v, bn) for v in self.modules()) < thresh # True if < 'thresh' BatchNorm layers in model def info(self, detailed=False, verbose=True, imgsz=640): """ Prints model information. Args: detailed (bool): if True, prints out detailed information about the model. Defaults to False verbose (bool): if True, prints out the model information. Defaults to False imgsz (int): the size of the image that the model will be trained on. Defaults to 640 """ return model_info(self, detailed=detailed, verbose=verbose, imgsz=imgsz) def _apply(self, fn): """ Applies a function to all the tensors in the model that are not parameters or registered buffers. Args: fn (function): the function to apply to the model Returns: (BaseModel): An updated BaseModel object. """ self = super()._apply(fn) m = self.model[-1] # Detect() if isinstance(m, Detect): # includes all Detect subclasses like Segment, Pose, OBB, WorldDetect m.stride = fn(m.stride) m.anchors = fn(m.anchors) m.strides = fn(m.strides) return self def load(self, weights, verbose=True): """ Load the weights into the model. Args: weights (dict | torch.nn.Module): The pre-trained weights to be loaded. verbose (bool, optional): Whether to log the transfer progress. Defaults to True. """ model = weights["model"] if isinstance(weights, dict) else weights # torchvision models are not dicts csd = model.float().state_dict() # checkpoint state_dict as FP32 csd = intersect_dicts(csd, self.state_dict()) # intersect self.load_state_dict(csd, strict=False) # load if verbose: LOGGER.info(f"Transferred {len(csd)}/{len(self.model.state_dict())} items from pretrained weights") def loss(self, batch, preds=None): """ Compute loss. Args: batch (dict): Batch to compute loss on preds (torch.Tensor | List[torch.Tensor]): Predictions. """ if getattr(self, "criterion", None) is None: self.criterion = self.init_criterion() preds = self.forward(batch["img"]) if preds is None else preds return self.criterion(preds, batch) def init_criterion(self): """Initialize the loss criterion for the BaseModel.""" raise NotImplementedError("compute_loss() needs to be implemented by task heads") class DetectionModel(BaseModel): """YOLOv8 detection model.""" def __init__(self, cfg="yolov8n.yaml", ch=3, nc=None, verbose=True): # model, input channels, number of classes """Initialize the YOLOv8 detection model with the given config and parameters.""" super().__init__() self.yaml = cfg if isinstance(cfg, dict) else yaml_model_load(cfg) # cfg dict if self.yaml["backbone"][0][2] == "Silence": LOGGER.warning( "WARNING ⚠️ YOLOv9 `Silence` module is deprecated in favor of nn.Identity. " "Please delete local *.pt file and re-download the latest model checkpoint." ) self.yaml["backbone"][0][2] = "nn.Identity" # Define model ch = self.yaml["ch"] = self.yaml.get("ch", ch) # input channels if nc and nc != self.yaml["nc"]: LOGGER.info(f"Overriding model.yaml nc={self.yaml['nc']} with nc={nc}") self.yaml["nc"] = nc # override YAML value self.model, self.save = parse_model(deepcopy(self.yaml), ch=ch, verbose=verbose) # model, savelist self.names = {i: f"{i}" for i in range(self.yaml["nc"])} # default names dict self.inplace = self.yaml.get("inplace", True) self.end2end = getattr(self.model[-1], "end2end", False) # Build strides m = self.model[-1] # Detect() if isinstance(m, Detect): # includes all Detect subclasses like Segment, Pose, OBB, WorldDetect s = 256 # 2x min stride m.inplace = self.inplace def _forward(x): """Performs a forward pass through the model, handling different Detect subclass types accordingly.""" if self.end2end: return self.forward(x)["one2many"] return self.forward(x)[0] if isinstance(m, (Segment, Pose, OBB)) else self.forward(x) m.stride = torch.tensor([s / x.shape[-2] for x in _forward(torch.zeros(1, ch, s, s))]) # forward self.stride = m.stride m.bias_init() # only run once else: self.stride = torch.Tensor([32]) # default stride for i.e. RTDETR # Init weights, biases initialize_weights(self) if verbose: self.info() LOGGER.info("") def _predict_augment(self, x): """Perform augmentations on input image x and return augmented inference and train outputs.""" if getattr(self, "end2end", False) or self.__class__.__name__ != "DetectionModel": LOGGER.warning("WARNING ⚠️ Model does not support 'augment=True', reverting to single-scale prediction.") return self._predict_once(x) img_size = x.shape[-2:] # height, width s = [1, 0.83, 0.67] # scales f = [None, 3, None] # flips (2-ud, 3-lr) y = [] # outputs for si, fi in zip(s, f): xi = scale_img(x.flip(fi) if fi else x, si, gs=int(self.stride.max())) yi = super().predict(xi)[0] # forward yi = self._descale_pred(yi, fi, si, img_size) y.append(yi) y = self._clip_augmented(y) # clip augmented tails return torch.cat(y, -1), None # augmented inference, train @staticmethod def _descale_pred(p, flips, scale, img_size, dim=1): """De-scale predictions following augmented inference (inverse operation).""" p[:, :4] /= scale # de-scale x, y, wh, cls = p.split((1, 1, 2, p.shape[dim] - 4), dim) if flips == 2: y = img_size[0] - y # de-flip ud elif flips == 3: x = img_size[1] - x # de-flip lr return torch.cat((x, y, wh, cls), dim) def _clip_augmented(self, y): """Clip YOLO augmented inference tails.""" nl = self.model[-1].nl # number of detection layers (P3-P5) g = sum(4**x for x in range(nl)) # grid points e = 1 # exclude layer count i = (y[0].shape[-1] // g) * sum(4**x for x in range(e)) # indices y[0] = y[0][..., :-i] # large i = (y[-1].shape[-1] // g) * sum(4 ** (nl - 1 - x) for x in range(e)) # indices y[-1] = y[-1][..., i:] # small return y def init_criterion(self): """Initialize the loss criterion for the DetectionModel.""" return E2EDetectLoss(self) if getattr(self, "end2end", False) else v8DetectionLoss(self) class OBBModel(DetectionModel): """YOLOv8 Oriented Bounding Box (OBB) model.""" def __init__(self, cfg="yolov8n-obb.yaml", ch=3, nc=None, verbose=True): """Initialize YOLOv8 OBB model with given config and parameters.""" super().__init__(cfg=cfg, ch=ch, nc=nc, verbose=verbose) def init_criterion(self): """Initialize the loss criterion for the model.""" return v8OBBLoss(self) class SegmentationModel(DetectionModel): """YOLOv8 segmentation model.""" def __init__(self, cfg="yolov8n-seg.yaml", ch=3, nc=None, verbose=True): """Initialize YOLOv8 segmentation model with given config and parameters.""" super().__init__(cfg=cfg, ch=ch, nc=nc, verbose=verbose) def init_criterion(self): """Initialize the loss criterion for the SegmentationModel.""" return v8SegmentationLoss(self) class PoseModel(DetectionModel): """YOLOv8 pose model.""" def __init__(self, cfg="yolov8n-pose.yaml", ch=3, nc=None, data_kpt_shape=(None, None), verbose=True): """Initialize YOLOv8 Pose model.""" if not isinstance(cfg, dict): cfg = yaml_model_load(cfg) # load model YAML if any(data_kpt_shape) and list(data_kpt_shape) != list(cfg["kpt_shape"]): LOGGER.info(f"Overriding model.yaml kpt_shape={cfg['kpt_shape']} with kpt_shape={data_kpt_shape}") cfg["kpt_shape"] = data_kpt_shape super().__init__(cfg=cfg, ch=ch, nc=nc, verbose=verbose) def init_criterion(self): """Initialize the loss criterion for the PoseModel.""" return v8PoseLoss(self) class ClassificationModel(BaseModel): """YOLOv8 classification model.""" def __init__(self, cfg="yolov8n-cls.yaml", ch=3, nc=None, verbose=True): """Init ClassificationModel with YAML, channels, number of classes, verbose flag.""" super().__init__() self._from_yaml(cfg, ch, nc, verbose) def _from_yaml(self, cfg, ch, nc, verbose): """Set YOLOv8 model configurations and define the model architecture.""" self.yaml = cfg if isinstance(cfg, dict) else yaml_model_load(cfg) # cfg dict # Define model ch = self.yaml["ch"] = self.yaml.get("ch", ch) # input channels if nc and nc != self.yaml["nc"]: LOGGER.info(f"Overriding model.yaml nc={self.yaml['nc']} with nc={nc}") self.yaml["nc"] = nc # override YAML value elif not nc and not self.yaml.get("nc", None): raise ValueError("nc not specified. Must specify nc in model.yaml or function arguments.") self.model, self.save = parse_model(deepcopy(self.yaml), ch=ch, verbose=verbose) # model, savelist self.stride = torch.Tensor([1]) # no stride constraints self.names = {i: f"{i}" for i in range(self.yaml["nc"])} # default names dict self.info() @staticmethod def reshape_outputs(model, nc): """Update a TorchVision classification model to class count 'n' if required.""" name, m = list((model.model if hasattr(model, "model") else model).named_children())[-1] # last module if isinstance(m, Classify): # YOLO Classify() head if m.linear.out_features != nc: m.linear = nn.Linear(m.linear.in_features, nc) elif isinstance(m, nn.Linear): # ResNet, EfficientNet if m.out_features != nc: setattr(model, name, nn.Linear(m.in_features, nc)) elif isinstance(m, nn.Sequential): types = [type(x) for x in m] if nn.Linear in types: i = len(types) - 1 - types[::-1].index(nn.Linear) # last nn.Linear index if m[i].out_features != nc: m[i] = nn.Linear(m[i].in_features, nc) elif nn.Conv2d in types: i = len(types) - 1 - types[::-1].index(nn.Conv2d) # last nn.Conv2d index if m[i].out_channels != nc: m[i] = nn.Conv2d(m[i].in_channels, nc, m[i].kernel_size, m[i].stride, bias=m[i].bias is not None) def init_criterion(self): """Initialize the loss criterion for the ClassificationModel.""" return v8ClassificationLoss() class RTDETRDetectionModel(DetectionModel): """ RTDETR (Real-time DEtection and Tracking using Transformers) Detection Model class. This class is responsible for constructing the RTDETR architecture, defining loss functions, and facilitating both the training and inference processes. RTDETR is an object detection and tracking model that extends from the DetectionModel base class. Attributes: cfg (str): The configuration file path or preset string. Default is 'rtdetr-l.yaml'. ch (int): Number of input channels. Default is 3 (RGB). nc (int, optional): Number of classes for object detection. Default is None. verbose (bool): Specifies if summary statistics are shown during initialization. Default is True. Methods: init_criterion: Initializes the criterion used for loss calculation. loss: Computes and returns the loss during training. predict: Performs a forward pass through the network and returns the output. """ def __init__(self, cfg="rtdetr-l.yaml", ch=3, nc=None, verbose=True): """ Initialize the RTDETRDetectionModel. Args: cfg (str): Configuration file name or path. ch (int): Number of input channels. nc (int, optional): Number of classes. Defaults to None. verbose (bool, optional): Print additional information during initialization. Defaults to True. """ super().__init__(cfg=cfg, ch=ch, nc=nc, verbose=verbose) def init_criterion(self): """Initialize the loss criterion for the RTDETRDetectionModel.""" from ultralytics.models.utils.loss import RTDETRDetectionLoss return RTDETRDetectionLoss(nc=self.nc, use_vfl=True) def loss(self, batch, preds=None): """ Compute the loss for the given batch of data. Args: batch (dict): Dictionary containing image and label data. preds (torch.Tensor, optional): Precomputed model predictions. Defaults to None. Returns: (tuple): A tuple containing the total loss and main three losses in a tensor. """ if not hasattr(self, "criterion"): self.criterion = self.init_criterion() img = batch["img"] # NOTE: preprocess gt_bbox and gt_labels to list. bs = len(img) batch_idx = batch["batch_idx"] gt_groups = [(batch_idx == i).sum().item() for i in range(bs)] targets = { "cls": batch["cls"].to(img.device, dtype=torch.long).view(-1), "bboxes": batch["bboxes"].to(device=img.device), "batch_idx": batch_idx.to(img.device, dtype=torch.long).view(-1), "gt_groups": gt_groups, } preds = self.predict(img, batch=targets) if preds is None else preds dec_bboxes, dec_scores, enc_bboxes, enc_scores, dn_meta = preds if self.training else preds[1] if dn_meta is None: dn_bboxes, dn_scores = None, None else: dn_bboxes, dec_bboxes = torch.split(dec_bboxes, dn_meta["dn_num_split"], dim=2) dn_scores, dec_scores = torch.split(dec_scores, dn_meta["dn_num_split"], dim=2) dec_bboxes = torch.cat([enc_bboxes.unsqueeze(0), dec_bboxes]) # (7, bs, 300, 4) dec_scores = torch.cat([enc_scores.unsqueeze(0), dec_scores]) loss = self.criterion( (dec_bboxes, dec_scores), targets, dn_bboxes=dn_bboxes, dn_scores=dn_scores, dn_meta=dn_meta ) # NOTE: There are like 12 losses in RTDETR, backward with all losses but only show the main three losses. return sum(loss.values()), torch.as_tensor( [loss[k].detach() for k in ["loss_giou", "loss_class", "loss_bbox"]], device=img.device ) def predict(self, x, profile=False, visualize=False, batch=None, augment=False, embed=None): """ Perform a forward pass through the model. Args: x (torch.Tensor): The input tensor. profile (bool, optional): If True, profile the computation time for each layer. Defaults to False. visualize (bool, optional): If True, save feature maps for visualization. Defaults to False. batch (dict, optional): Ground truth data for evaluation. Defaults to None. augment (bool, optional): If True, perform data augmentation during inference. Defaults to False. embed (list, optional): A list of feature vectors/embeddings to return. Returns: (torch.Tensor): Model's output tensor. """ y, dt, embeddings = [], [], [] # outputs for m in self.model[:-1]: # except the head part if m.f != -1: # if not from previous layer x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers if profile: self._profile_one_layer(m, x, dt) x = m(x) # run y.append(x if m.i in self.save else None) # save output if visualize: feature_visualization(x, m.type, m.i, save_dir=visualize) if embed and m.i in embed: embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1)) # flatten if m.i == max(embed): return torch.unbind(torch.cat(embeddings, 1), dim=0) head = self.model[-1] x = head([y[j] for j in head.f], batch) # head inference return x class WorldModel(DetectionModel): """YOLOv8 World Model.""" def __init__(self, cfg="yolov8s-world.yaml", ch=3, nc=None, verbose=True): """Initialize YOLOv8 world model with given config and parameters.""" self.txt_feats = torch.randn(1, nc or 80, 512) # features placeholder self.clip_model = None # CLIP model placeholder super().__init__(cfg=cfg, ch=ch, nc=nc, verbose=verbose) def set_classes(self, text, batch=80, cache_clip_model=True): """Set classes in advance so that model could do offline-inference without clip model.""" try: import clip except ImportError: check_requirements("git+https://github.com/ultralytics/CLIP.git") import clip if ( not getattr(self, "clip_model", None) and cache_clip_model ): # for backwards compatibility of models lacking clip_model attribute self.clip_model = clip.load("ViT-B/32")[0] model = self.clip_model if cache_clip_model else clip.load("ViT-B/32")[0] device = next(model.parameters()).device text_token = clip.tokenize(text).to(device) txt_feats = [model.encode_text(token).detach() for token in text_token.split(batch)] txt_feats = txt_feats[0] if len(txt_feats) == 1 else torch.cat(txt_feats, dim=0) txt_feats = txt_feats / txt_feats.norm(p=2, dim=-1, keepdim=True) self.txt_feats = txt_feats.reshape(-1, len(text), txt_feats.shape[-1]) self.model[-1].nc = len(text) def predict(self, x, profile=False, visualize=False, txt_feats=None, augment=False, embed=None): """ Perform a forward pass through the model. Args: x (torch.Tensor): The input tensor. profile (bool, optional): If True, profile the computation time for each layer. Defaults to False. visualize (bool, optional): If True, save feature maps for visualization. Defaults to False. txt_feats (torch.Tensor): The text features, use it if it's given. Defaults to None. augment (bool, optional): If True, perform data augmentation during inference. Defaults to False. embed (list, optional): A list of feature vectors/embeddings to return. Returns: (torch.Tensor): Model's output tensor. """ txt_feats = (self.txt_feats if txt_feats is None else txt_feats).to(device=x.device, dtype=x.dtype) if len(txt_feats) != len(x): txt_feats = txt_feats.repeat(len(x), 1, 1) ori_txt_feats = txt_feats.clone() y, dt, embeddings = [], [], [] # outputs for m in self.model: # except the head part if m.f != -1: # if not from previous layer x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers if profile: self._profile_one_layer(m, x, dt) if isinstance(m, C2fAttn): x = m(x, txt_feats) elif isinstance(m, WorldDetect): x = m(x, ori_txt_feats) elif isinstance(m, ImagePoolingAttn): txt_feats = m(x, txt_feats) else: x = m(x) # run y.append(x if m.i in self.save else None) # save output if visualize: feature_visualization(x, m.type, m.i, save_dir=visualize) if embed and m.i in embed: embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1)) # flatten if m.i == max(embed): return torch.unbind(torch.cat(embeddings, 1), dim=0) return x def loss(self, batch, preds=None): """ Compute loss. Args: batch (dict): Batch to compute loss on. preds (torch.Tensor | List[torch.Tensor]): Predictions. """ if not hasattr(self, "criterion"): self.criterion = self.init_criterion() if preds is None: preds = self.forward(batch["img"], txt_feats=batch["txt_feats"]) return self.criterion(preds, batch) class Ensemble(nn.ModuleList): """Ensemble of models.""" def __init__(self): """Initialize an ensemble of models.""" super().__init__() def forward(self, x, augment=False, profile=False, visualize=False): """Function generates the YOLO network's final layer.""" y = [module(x, augment, profile, visualize)[0] for module in self] # y = torch.stack(y).max(0)[0] # max ensemble # y = torch.stack(y).mean(0) # mean ensemble y = torch.cat(y, 2) # nms ensemble, y shape(B, HW, C) return y, None # inference, train output # Functions ------------------------------------------------------------------------------------------------------------ @contextlib.contextmanager def temporary_modules(modules=None, attributes=None): """ Context manager for temporarily adding or modifying modules in Python's module cache (`sys.modules`). This function can be used to change the module paths during runtime. It's useful when refactoring code, where you've moved a module from one location to another, but you still want to support the old import paths for backwards compatibility. Args: modules (dict, optional): A dictionary mapping old module paths to new module paths. attributes (dict, optional): A dictionary mapping old module attributes to new module attributes. Example: ```python with temporary_modules({"old.module": "new.module"}, {"old.module.attribute": "new.module.attribute"}): import old.module # this will now import new.module from old.module import attribute # this will now import new.module.attribute ``` Note: The changes are only in effect inside the context manager and are undone once the context manager exits. Be aware that directly manipulating `sys.modules` can lead to unpredictable results, especially in larger applications or libraries. Use this function with caution. """ if modules is None: modules = {} if attributes is None: attributes = {} import sys from importlib import import_module try: # Set attributes in sys.modules under their old name for old, new in attributes.items(): old_module, old_attr = old.rsplit(".", 1) new_module, new_attr = new.rsplit(".", 1) setattr(import_module(old_module), old_attr, getattr(import_module(new_module), new_attr)) # Set modules in sys.modules under their old name for old, new in modules.items(): sys.modules[old] = import_module(new) yield finally: # Remove the temporary module paths for old in modules: if old in sys.modules: del sys.modules[old] class SafeClass: """A placeholder class to replace unknown classes during unpickling.""" def __init__(self, *args, **kwargs): """Initialize SafeClass instance, ignoring all arguments.""" pass def __call__(self, *args, **kwargs): """Run SafeClass instance, ignoring all arguments.""" pass class SafeUnpickler(pickle.Unpickler): """Custom Unpickler that replaces unknown classes with SafeClass.""" def find_class(self, module, name): """Attempt to find a class, returning SafeClass if not among safe modules.""" safe_modules = ( "torch", "collections", "collections.abc", "builtins", "math", "numpy", # Add other modules considered safe ) if module in safe_modules: return super().find_class(module, name) else: return SafeClass def torch_safe_load(weight, safe_only=False): """ Attempts to load a PyTorch model with the torch.load() function. If a ModuleNotFoundError is raised, it catches the error, logs a warning message, and attempts to install the missing module via the check_requirements() function. After installation, the function again attempts to load the model using torch.load(). Args: weight (str): The file path of the PyTorch model. safe_only (bool): If True, replace unknown classes with SafeClass during loading. Example: ```python from ultralytics.nn.tasks import torch_safe_load ckpt, file = torch_safe_load("path/to/best.pt", safe_only=True) ``` Returns: ckpt (dict): The loaded model checkpoint. file (str): The loaded filename """ from ultralytics.utils.downloads import attempt_download_asset check_suffix(file=weight, suffix=".pt") file = attempt_download_asset(weight) # search online if missing locally try: with temporary_modules( modules={ "ultralytics.yolo.utils": "ultralytics.utils", "ultralytics.yolo.v8": "ultralytics.models.yolo", "ultralytics.yolo.data": "ultralytics.data", }, attributes={ "ultralytics.nn.modules.block.Silence": "torch.nn.Identity", # YOLOv9e "ultralytics.nn.tasks.YOLOv10DetectionModel": "ultralytics.nn.tasks.DetectionModel", # YOLOv10 "ultralytics.utils.loss.v10DetectLoss": "ultralytics.utils.loss.E2EDetectLoss", # YOLOv10 }, ): if safe_only: # Load via custom pickle module safe_pickle = types.ModuleType("safe_pickle") safe_pickle.Unpickler = SafeUnpickler safe_pickle.load = lambda file_obj: SafeUnpickler(file_obj).load() with open(file, "rb") as f: ckpt = torch.load(f, pickle_module=safe_pickle) else: ckpt = torch.load(file, map_location="cpu") except ModuleNotFoundError as e: # e.name is missing module name if e.name == "models": raise TypeError( emojis( f"ERROR ❌️ {weight} appears to be an Ultralytics YOLOv5 model originally trained " f"with https://github.com/ultralytics/yolov5.\nThis model is NOT forwards compatible with " f"YOLOv8 at https://github.com/ultralytics/ultralytics." f"\nRecommend fixes are to train a new model using the latest 'ultralytics' package or to " f"run a command with an official Ultralytics model, i.e. 'yolo predict model=yolov8n.pt'" ) ) from e LOGGER.warning( f"WARNING ⚠️ {weight} appears to require '{e.name}', which is not in Ultralytics requirements." f"\nAutoInstall will run now for '{e.name}' but this feature will be removed in the future." f"\nRecommend fixes are to train a new model using the latest 'ultralytics' package or to " f"run a command with an official Ultralytics model, i.e. 'yolo predict model=yolov8n.pt'" ) check_requirements(e.name) # install missing module ckpt = torch.load(file, map_location="cpu") if not isinstance(ckpt, dict): # File is likely a YOLO instance saved with i.e. torch.save(model, "saved_model.pt") LOGGER.warning( f"WARNING ⚠️ The file '{weight}' appears to be improperly saved or formatted. " f"For optimal results, use model.save('filename.pt') to correctly save YOLO models." ) ckpt = {"model": ckpt.model} return ckpt, file def attempt_load_weights(weights, device=None, inplace=True, fuse=False): """Loads an ensemble of models weights=[a,b,c] or a single model weights=[a] or weights=a.""" ensemble = Ensemble() for w in weights if isinstance(weights, list) else [weights]: ckpt, w = torch_safe_load(w) # load ckpt args = {**DEFAULT_CFG_DICT, **ckpt["train_args"]} if "train_args" in ckpt else None # combined args model = (ckpt.get("ema") or ckpt["model"]).to(device).float() # FP32 model # Model compatibility updates model.args = args # attach args to model model.pt_path = w # attach *.pt file path to model model.task = guess_model_task(model) if not hasattr(model, "stride"): model.stride = torch.tensor([32.0]) # Append ensemble.append(model.fuse().eval() if fuse and hasattr(model, "fuse") else model.eval()) # model in eval mode # Module updates for m in ensemble.modules(): if hasattr(m, "inplace"): m.inplace = inplace elif isinstance(m, nn.Upsample) and not hasattr(m, "recompute_scale_factor"): m.recompute_scale_factor = None # torch 1.11.0 compatibility # Return model if len(ensemble) == 1: return ensemble[-1] # Return ensemble LOGGER.info(f"Ensemble created with {weights}\n") for k in "names", "nc", "yaml": setattr(ensemble, k, getattr(ensemble[0], k)) ensemble.stride = ensemble[int(torch.argmax(torch.tensor([m.stride.max() for m in ensemble])))].stride assert all(ensemble[0].nc == m.nc for m in ensemble), f"Models differ in class counts {[m.nc for m in ensemble]}" return ensemble def attempt_load_one_weight(weight, device=None, inplace=True, fuse=False): """Loads a single model weights.""" ckpt, weight = torch_safe_load(weight) # load ckpt args = {**DEFAULT_CFG_DICT, **(ckpt.get("train_args", {}))} # combine model and default args, preferring model args model = (ckpt.get("ema") or ckpt["model"]).to(device).float() # FP32 model # Model compatibility updates model.args = {k: v for k, v in args.items() if k in DEFAULT_CFG_KEYS} # attach args to model model.pt_path = weight # attach *.pt file path to model model.task = guess_model_task(model) if not hasattr(model, "stride"): model.stride = torch.tensor([32.0]) model = model.fuse().eval() if fuse and hasattr(model, "fuse") else model.eval() # model in eval mode # Module updates for m in model.modules(): if hasattr(m, "inplace"): m.inplace = inplace elif isinstance(m, nn.Upsample) and not hasattr(m, "recompute_scale_factor"): m.recompute_scale_factor = None # torch 1.11.0 compatibility # Return model and ckpt return model, ckpt def parse_model(d, ch, verbose=True): # model_dict, input_channels(3) """Parse a YOLO model.yaml dictionary into a PyTorch model.""" import ast # Args legacy = True # backward compatibility for v3/v5/v8/v9 models max_channels = float("inf") nc, act, scales = (d.get(x) for x in ("nc", "activation", "scales")) depth, width, kpt_shape = (d.get(x, 1.0) for x in ("depth_multiple", "width_multiple", "kpt_shape")) if scales: scale = d.get("scale") if not scale: scale = tuple(scales.keys())[0] LOGGER.warning(f"WARNING ⚠️ no model scale passed. Assuming scale='{scale}'.") depth, width, max_channels = scales[scale] if act: Conv.default_act = eval(act) # redefine default activation, i.e. Conv.default_act = nn.SiLU() if verbose: LOGGER.info(f"{colorstr('activation:')} {act}") # print if verbose: LOGGER.info(f"\n{'':>3}{'from':>20}{'n':>3}{'params':>10} {'module':<45}{'arguments':<30}") ch = [ch] layers, save, c2 = [], [], ch[-1] # layers, savelist, ch out backbone = False for i, (f, n, m, args) in enumerate(d["backbone"] + d["head"]): # from, number, module, args t=m m = getattr(torch.nn, m[3:]) if "nn." in m else globals()[m] # get module for j, a in enumerate(args): if isinstance(a, str): with contextlib.suppress(ValueError): args[j] = locals()[a] if a in locals() else ast.literal_eval(a) n = n_ = max(round(n * depth), 1) if n > 1 else n # depth gain if m in { Classify, Conv, ConvTranspose, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, C2fPSA, C2PSA, DWConv, Focus, BottleneckCSP, C1, C2, C2f, C3k2, RepNCSPELAN4, ELAN1, ADown, AConv, SPPELAN, C2fAttn, C3, C3TR, C3Ghost, nn.ConvTranspose2d, DWConvTranspose2d, C3x, RepC3, PSA, SCDown, C2fCIB, A2C2f, }: c1, c2 = ch[f], args[0] if c2 != nc: # if c2 not equal to number of classes (i.e. for Classify() output) c2 = make_divisible(min(c2, max_channels) * width, 8) if m is C2fAttn: args[1] = make_divisible(min(args[1], max_channels // 2) * width, 8) # embed channels args[2] = int( max(round(min(args[2], max_channels // 2 // 32)) * width, 1) if args[2] > 1 else args[2] ) # num heads args = [c1, c2, *args[1:]] if m in { BottleneckCSP, C1, C2, C2f, C3k2, C2fAttn, C3, C3TR, C3Ghost, C3x, RepC3, C2fPSA, C2fCIB, C2PSA, A2C2f, }: args.insert(2, n) # number of repeats n = 1 if m is C3k2: # for M/L/X sizes legacy = False if scale in "mlx": args[3] = True if m is A2C2f: legacy = False if scale in "lx": # for L/X sizes args.append(True) args.append(1.2) elif m is AIFI: args = [ch[f], *args] elif m in {HGStem, HGBlock}: c1, cm, c2 = ch[f], args[0], args[1] args = [c1, cm, c2, *args[2:]] if m is HGBlock: args.insert(4, n) # number of repeats n = 1 elif m is ResNetLayer: c2 = args[1] if args[3] else args[1] * 4 elif m is nn.BatchNorm2d: args = [ch[f]] #LSKNet elif m in {LSKNET_T,LSKNET_S}: m = m(*args) c2 = m.width_list backbone =True #BiFPN elif m is BiFPN: length = len([ch[x] for x in f]) args = [length] elif m is Concat: c2 = sum(ch[x] for x in f) elif m in {Detect, WorldDetect, Segment, Pose, OBB, ImagePoolingAttn, v10Detect}: args.append([ch[x] for x in f]) if m is Segment: args[2] = make_divisible(min(args[2], max_channels) * width, 8) if m in {Detect, Segment, Pose, OBB}: m.legacy = legacy elif m is RTDETRDecoder: # special case, channels arg must be passed in index 1 args.insert(1, [ch[x] for x in f]) elif m in {CBLinear, TorchVision, Index}: c2 = args[0] c1 = ch[f] args = [c1, c2, *args[1:]] elif m is CBFuse: c2 = ch[f[-1]] else: c2 = ch[f] # m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args) # module # t = str(m)[8:-2].replace("__main__.", "") # module type # m_.np = sum(x.numel() for x in m_.parameters()) # number params # m_.i, m_.f, m_.type = i, f, t # attach index, 'from' index, type # if verbose: # LOGGER.info(f"{i:>3}{str(f):>20}{n_:>3}{m_.np:10.0f} {t:<45}{str(args):<30}") # print # save.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist # layers.append(m_) # if i == 0: # ch = [] # ch.append(c2) #替换上面的 if isinstance(c2, list): backbone = True m_ = m m_.backbone = True else: m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args) # module t = str(m)[8:-2].replace('__main__.', '') # module type m.np = sum(x.numel() for x in m_.parameters()) # number params m_.i, m_.f, m_.type = i + 4 if backbone else i, f, t # attach index, 'from' index, type if verbose: LOGGER.info(f'{i:>3}{str(f):>20}{n_:>3}{m.np:10.0f} {t:<45}{str(args):<30}') # print save.extend(x % (i + 4 if backbone else i) for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist layers.append(m_) if i == 0: ch = [] if isinstance(c2, list): ch.extend(c2) for _ in range(5 - len(ch)): ch.insert(0, 0) else: ch.append(c2) return nn.Sequential(*layers), sorted(save) def yaml_model_load(path): """Load a YOLOv8 model from a YAML file.""" path = Path(path) if path.stem in (f"yolov{d}{x}6" for x in "nsmlx" for d in (5, 8)): new_stem = re.sub(r"(\d+)([nslmx])6(.+)?$", r"\1\2-p6\3", path.stem) LOGGER.warning(f"WARNING ⚠️ Ultralytics YOLO P6 models now use -p6 suffix. Renaming {path.stem} to {new_stem}.") path = path.with_name(new_stem + path.suffix) unified_path = re.sub(r"(\d+)([nslmx])(.+)?$", r"\1\3", str(path)) # i.e. yolov8x.yaml -> yolov8.yaml yaml_file = check_yaml(unified_path, hard=False) or check_yaml(path) d = yaml_load(yaml_file) # model dict d["scale"] = guess_model_scale(path) d["yaml_file"] = str(path) return d def guess_model_scale(model_path): """ Takes a path to a YOLO model's YAML file as input and extracts the size character of the model's scale. The function uses regular expression matching to find the pattern of the model scale in the YAML file name, which is denoted by n, s, m, l, or x. The function returns the size character of the model scale as a string. Args: model_path (str | Path): The path to the YOLO model's YAML file. Returns: (str): The size character of the model's scale, which can be n, s, m, l, or x. """ try: return re.search(r"yolo[v]?\d+([nslmx])", Path(model_path).stem).group(1) # noqa, returns n, s, m, l, or x except AttributeError: return "" def guess_model_task(model): """ Guess the task of a PyTorch model from its architecture or configuration. Args: model (nn.Module | dict): PyTorch model or model configuration in YAML format. Returns: (str): Task of the model ('detect', 'segment', 'classify', 'pose'). Raises: SyntaxError: If the task of the model could not be determined. """ def cfg2task(cfg): """Guess from YAML dictionary.""" m = cfg["head"][-1][-2].lower() # output module name if m in {"classify", "classifier", "cls", "fc"}: return "classify" if "detect" in m: return "detect" if m == "segment": return "segment" if m == "pose": return "pose" if m == "obb": return "obb" # Guess from model cfg if isinstance(model, dict): with contextlib.suppress(Exception): return cfg2task(model) # Guess from PyTorch model if isinstance(model, nn.Module): # PyTorch model for x in "model.args", "model.model.args", "model.model.model.args": with contextlib.suppress(Exception): return eval(x)["task"] for x in "model.yaml", "model.model.yaml", "model.model.model.yaml": with contextlib.suppress(Exception): return cfg2task(eval(x)) for m in model.modules(): if isinstance(m, Segment): return "segment" elif isinstance(m, Classify): return "classify" elif isinstance(m, Pose): return "pose" elif isinstance(m, OBB): return "obb" elif isinstance(m, (Detect, WorldDetect, v10Detect)): return "detect" # Guess from model filename if isinstance(model, (str, Path)): model = Path(model) if "-seg" in model.stem or "segment" in model.parts: return "segment" elif "-cls" in model.stem or "classify" in model.parts: return "classify" elif "-pose" in model.stem or "pose" in model.parts: return "pose" elif "-obb" in model.stem or "obb" in model.parts: return "obb" elif "detect" in model.parts: return "detect" # Unable to determine task from model LOGGER.warning( "WARNING ⚠️ Unable to automatically guess model task, assuming 'task=detect'. " "Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify','pose' or 'obb'." ) return "detect" # assume detect 这上面是task.py文件的内容。下面是LSKNet.py文件的内容。 import torch import torch.nn as nn from torch.nn.modules.utils import _pair as to_2tuple from timm.models.layers import DropPath, to_2tuple from functools import partial import warnings __all__ = ['LSKNET_T', 'LSKNET_S'] class Mlp(nn.Module): def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.): super().__init__() out_features = out_features or in_features hidden_features = hidden_features or in_features self.fc1 = nn.Conv2d(in_features, hidden_features, 1) self.dwconv = DWConv(hidden_features) self.act = act_layer() self.fc2 = nn.Conv2d(hidden_features, out_features, 1) self.drop = nn.Dropout(drop) def forward(self, x): x = self.fc1(x) x = self.dwconv(x) x = self.act(x) x = self.drop(x) x = self.fc2(x) x = self.drop(x) return x class LSKblock(nn.Module): def __init__(self, dim): super().__init__() self.conv0 = nn.Conv2d(dim, dim, 5, padding=2, groups=dim) self.conv_spatial = nn.Conv2d(dim, dim, 7, stride=1, padding=9, groups=dim, dilation=3) self.conv1 = nn.Conv2d(dim, dim // 2, 1) self.conv2 = nn.Conv2d(dim, dim // 2, 1) self.conv_squeeze = nn.Conv2d(2, 2, 7, padding=3) self.conv = nn.Conv2d(dim // 2, dim, 1) def forward(self, x): attn1 = self.conv0(x) attn2 = self.conv_spatial(attn1) attn1 = self.conv1(attn1) attn2 = self.conv2(attn2) attn = torch.cat([attn1, attn2], dim=1) avg_attn = torch.mean(attn, dim=1, keepdim=True) max_attn, _ = torch.max(attn, dim=1, keepdim=True) agg = torch.cat([avg_attn, max_attn], dim=1) sig = self.conv_squeeze(agg).sigmoid() attn = attn1 * sig[:, 0, :, :].unsqueeze(1) + attn2 * sig[:, 1, :, :].unsqueeze(1) attn = self.conv(attn) return x * attn class Attention(nn.Module): def __init__(self, d_model): super().__init__() self.proj_1 = nn.Conv2d(d_model, d_model, 1) self.activation = nn.GELU() self.spatial_gating_unit = LSKblock(d_model) self.proj_2 = nn.Conv2d(d_model, d_model, 1) def forward(self, x): shorcut = x.clone() x = self.proj_1(x) x = self.activation(x) x = self.spatial_gating_unit(x) x = self.proj_2(x) x = x + shorcut return x class Block(nn.Module): def __init__(self, dim, mlp_ratio=4., drop=0., drop_path=0., act_layer=nn.GELU, norm_cfg=None): super().__init__() if norm_cfg: self.norm1 = nn.BatchNorm2d(norm_cfg, dim) self.norm2 = nn.BatchNorm2d(norm_cfg, dim) else: self.norm1 = nn.BatchNorm2d(dim) self.norm2 = nn.BatchNorm2d(dim) self.attn = Attention(dim) self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity() mlp_hidden_dim = int(dim * mlp_ratio) self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop) layer_scale_init_value = 1e-2 self.layer_scale_1 = nn.Parameter( layer_scale_init_value * torch.ones((dim)), requires_grad=True) self.layer_scale_2 = nn.Parameter( layer_scale_init_value * torch.ones((dim)), requires_grad=True) def forward(self, x): x = x + self.drop_path(self.layer_scale_1.unsqueeze(-1).unsqueeze(-1) * self.attn(self.norm1(x))) x = x + self.drop_path(self.layer_scale_2.unsqueeze(-1).unsqueeze(-1) * self.mlp(self.norm2(x))) return x class OverlapPatchEmbed(nn.Module): """ Image to Patch Embedding """ def __init__(self, img_size=224, patch_size=7, stride=4, in_chans=3, embed_dim=768, norm_cfg=None): super().__init__() patch_size = to_2tuple(patch_size) self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=stride, padding=(patch_size[0] // 2, patch_size[1] // 2)) if norm_cfg: self.norm = nn.BatchNorm2d(norm_cfg, embed_dim) else: self.norm = nn.BatchNorm2d(embed_dim) def forward(self, x): x = self.proj(x) _, _, H, W = x.shape x = self.norm(x) return x, H, W class LSKNet(nn.Module): def __init__(self, img_size=224, in_chans=3, dim=None, embed_dims=[64, 128, 256, 512], mlp_ratios=[8, 8, 4, 4], drop_rate=0., drop_path_rate=0., norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[3, 4, 6, 3], num_stages=4, pretrained=None, init_cfg=None, norm_cfg=None): super().__init__() assert not (init_cfg and pretrained), \ 'init_cfg and pretrained cannot be set at the same time' if isinstance(pretrained, str): warnings.warn('DeprecationWarning: pretrained is deprecated, ' 'please use "init_cfg" instead') self.init_cfg = dict(type='Pretrained', checkpoint=pretrained) elif pretrained is not None: raise TypeError('pretrained must be a str or None') self.depths = depths self.num_stages = num_stages dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))] # stochastic depth decay rule cur = 0 for i in range(num_stages): patch_embed = OverlapPatchEmbed(img_size=img_size if i == 0 else img_size // (2 ** (i + 1)), patch_size=7 if i == 0 else 3, stride=4 if i == 0 else 2, in_chans=in_chans if i == 0 else embed_dims[i - 1], embed_dim=embed_dims[i], norm_cfg=norm_cfg) block = nn.ModuleList([Block( dim=embed_dims[i], mlp_ratio=mlp_ratios[i], drop=drop_rate, drop_path=dpr[cur + j], norm_cfg=norm_cfg) for j in range(depths[i])]) norm = norm_layer(embed_dims[i]) cur += depths[i] setattr(self, f"patch_embed{i + 1}", patch_embed) setattr(self, f"block{i + 1}", block) setattr(self, f"norm{i + 1}", norm) self.width_list = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))] def freeze_patch_emb(self): self.patch_embed1.requires_grad = False @torch.jit.ignore def no_weight_decay(self): return {'pos_embed1', 'pos_embed2', 'pos_embed3', 'pos_embed4', 'cls_token'} # has pos_embed may be better def get_classifier(self): return self.head def reset_classifier(self, num_classes, global_pool=''): self.num_classes = num_classes self.head = nn.Linear(self.embed_dim, num_classes) if num_classes > 0 else nn.Identity() def forward_features(self, x): B = x.shape[0] outs = [] for i in range(self.num_stages): patch_embed = getattr(self, f"patch_embed{i + 1}") block = getattr(self, f"block{i + 1}") norm = getattr(self, f"norm{i + 1}") x, H, W = patch_embed(x) for blk in block: x = blk(x) x = x.flatten(2).transpose(1, 2) x = norm(x) x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous() outs.append(x) return outs def forward(self, x): x = self.forward_features(x) # x = self.head(x) return x class DWConv(nn.Module): def __init__(self, dim=768): super(DWConv, self).__init__() self.dwconv = nn.Conv2d(dim, dim, 3, 1, 1, bias=True, groups=dim) def forward(self, x): x = self.dwconv(x) return x def _conv_filter(state_dict, patch_size=16): """ convert patch embedding weight from manual patchify + linear proj to conv""" out_dict = {} for k, v in state_dict.items(): if 'patch_embed.proj.weight' in k: v = v.reshape((v.shape[0], 3, patch_size, patch_size)) out_dict[k] = v return out_dict def LSKNET_T(): model = LSKNet(depths=[2, 2, 2, 2]) return model def LSKNET_S(): model = LSKNet() return model if __name__ == '__main__': model = LSKNet() inputs = torch.randn((1, 3, 640, 640)) for i in model(inputs): print(i.size()) 最下面是BiFPN.py文件,请你结合这三个文件和上面刚刚的运行错误解决这个问题。 import torch.nn as nn import torch class swish(nn.Module): def forward(self, x): return x * torch.sigmoid(x) class BiFPN(nn.Module): def __init__(self, length): super().__init__() self.weight = nn.Parameter(torch.ones(length, dtype=torch.float32), requires_grad=True) self.swish = swish() self.epsilon = 0.0001 def forward(self, x): weights = self.weight / (torch.sum(self.swish(self.weight), dim=0) + self.epsilon) weighted_feature_maps = [weights[i] * x[i] for i in range(len(x))] stacked_feature_maps = torch.stack(weighted_feature_maps, dim=0) result = torch.sum(stacked_feature_maps, dim=0) return result
09-01
我希望你能帮助我改进YOLOv12模型,一个识别水稻稻苗的模型,这个模型我是用来识别之后用来计数苗数的。现在我已经将其骨干网络换为了LSKNet结构,请你在这个基础上加入或者替换结构帮我进一步提升精度,记得后面加入的模块要适配LSKNet结构,下面是一些关于该结构的文件。 LSKNet.py import torch import torch.nn as nn from torch.nn.modules.utils import _pair as to_2tuple from timm.models.layers import DropPath, to_2tuple from functools import partial import warnings __all__ = ['LSKNET_T', 'LSKNET_S'] class Mlp(nn.Module): def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.): super().__init__() out_features = out_features or in_features hidden_features = hidden_features or in_features self.fc1 = nn.Conv2d(in_features, hidden_features, 1) self.dwconv = DWConv(hidden_features) self.act = act_layer() self.fc2 = nn.Conv2d(hidden_features, out_features, 1) self.drop = nn.Dropout(drop) def forward(self, x): x = self.fc1(x) x = self.dwconv(x) x = self.act(x) x = self.drop(x) x = self.fc2(x) x = self.drop(x) return x class LSKblock(nn.Module): def __init__(self, dim): super().__init__() self.conv0 = nn.Conv2d(dim, dim, 5, padding=2, groups=dim) self.conv_spatial = nn.Conv2d(dim, dim, 7, stride=1, padding=9, groups=dim, dilation=3) self.conv1 = nn.Conv2d(dim, dim // 2, 1) self.conv2 = nn.Conv2d(dim, dim // 2, 1) self.conv_squeeze = nn.Conv2d(2, 2, 7, padding=3) self.conv = nn.Conv2d(dim // 2, dim, 1) def forward(self, x): attn1 = self.conv0(x) attn2 = self.conv_spatial(attn1) attn1 = self.conv1(attn1) attn2 = self.conv2(attn2) attn = torch.cat([attn1, attn2], dim=1) avg_attn = torch.mean(attn, dim=1, keepdim=True) max_attn, _ = torch.max(attn, dim=1, keepdim=True) agg = torch.cat([avg_attn, max_attn], dim=1) sig = self.conv_squeeze(agg).sigmoid() attn = attn1 * sig[:, 0, :, :].unsqueeze(1) + attn2 * sig[:, 1, :, :].unsqueeze(1) attn = self.conv(attn) return x * attn class Attention(nn.Module): def __init__(self, d_model): super().__init__() self.proj_1 = nn.Conv2d(d_model, d_model, 1) self.activation = nn.GELU() self.spatial_gating_unit = LSKblock(d_model) self.proj_2 = nn.Conv2d(d_model, d_model, 1) def forward(self, x): shorcut = x.clone() x = self.proj_1(x) x = self.activation(x) x = self.spatial_gating_unit(x) x = self.proj_2(x) x = x + shorcut return x class Block(nn.Module): def __init__(self, dim, mlp_ratio=4., drop=0., drop_path=0., act_layer=nn.GELU, norm_cfg=None): super().__init__() if norm_cfg: self.norm1 = nn.BatchNorm2d(norm_cfg, dim) self.norm2 = nn.BatchNorm2d(norm_cfg, dim) else: self.norm1 = nn.BatchNorm2d(dim) self.norm2 = nn.BatchNorm2d(dim) self.attn = Attention(dim) self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity() mlp_hidden_dim = int(dim * mlp_ratio) self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop) layer_scale_init_value = 1e-2 self.layer_scale_1 = nn.Parameter( layer_scale_init_value * torch.ones((dim)), requires_grad=True) self.layer_scale_2 = nn.Parameter( layer_scale_init_value * torch.ones((dim)), requires_grad=True) def forward(self, x): x = x + self.drop_path(self.layer_scale_1.unsqueeze(-1).unsqueeze(-1) * self.attn(self.norm1(x))) x = x + self.drop_path(self.layer_scale_2.unsqueeze(-1).unsqueeze(-1) * self.mlp(self.norm2(x))) return x class OverlapPatchEmbed(nn.Module): """ Image to Patch Embedding """ def __init__(self, img_size=224, patch_size=7, stride=4, in_chans=3, embed_dim=768, norm_cfg=None): super().__init__() patch_size = to_2tuple(patch_size) self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=stride, padding=(patch_size[0] // 2, patch_size[1] // 2)) if norm_cfg: self.norm = nn.BatchNorm2d(norm_cfg, embed_dim) else: self.norm = nn.BatchNorm2d(embed_dim) def forward(self, x): x = self.proj(x) _, _, H, W = x.shape x = self.norm(x) return x, H, W class LSKNet(nn.Module): def __init__(self, img_size=224, in_chans=3, dim=None, embed_dims=[64, 128, 256, 512], mlp_ratios=[8, 8, 4, 4], drop_rate=0., drop_path_rate=0., norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[3, 4, 6, 3], num_stages=4, pretrained=None, init_cfg=None, norm_cfg=None): super().__init__() assert not (init_cfg and pretrained), \ 'init_cfg and pretrained cannot be set at the same time' if isinstance(pretrained, str): warnings.warn('DeprecationWarning: pretrained is deprecated, ' 'please use "init_cfg" instead') self.init_cfg = dict(type='Pretrained', checkpoint=pretrained) elif pretrained is not None: raise TypeError('pretrained must be a str or None') self.depths = depths self.num_stages = num_stages dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))] # stochastic depth decay rule cur = 0 for i in range(num_stages): patch_embed = OverlapPatchEmbed(img_size=img_size if i == 0 else img_size // (2 ** (i + 1)), patch_size=7 if i == 0 else 3, stride=4 if i == 0 else 2, in_chans=in_chans if i == 0 else embed_dims[i - 1], embed_dim=embed_dims[i], norm_cfg=norm_cfg) block = nn.ModuleList([Block( dim=embed_dims[i], mlp_ratio=mlp_ratios[i], drop=drop_rate, drop_path=dpr[cur + j], norm_cfg=norm_cfg) for j in range(depths[i])]) norm = norm_layer(embed_dims[i]) cur += depths[i] setattr(self, f"patch_embed{i + 1}", patch_embed) setattr(self, f"block{i + 1}", block) setattr(self, f"norm{i + 1}", norm) self.width_list = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))] def freeze_patch_emb(self): self.patch_embed1.requires_grad = False @torch.jit.ignore def no_weight_decay(self): return {'pos_embed1', 'pos_embed2', 'pos_embed3', 'pos_embed4', 'cls_token'} # has pos_embed may be better def get_classifier(self): return self.head def reset_classifier(self, num_classes, global_pool=''): self.num_classes = num_classes self.head = nn.Linear(self.embed_dim, num_classes) if num_classes > 0 else nn.Identity() def forward_features(self, x): B = x.shape[0] outs = [] for i in range(self.num_stages): patch_embed = getattr(self, f"patch_embed{i + 1}") block = getattr(self, f"block{i + 1}") norm = getattr(self, f"norm{i + 1}") x, H, W = patch_embed(x) for blk in block: x = blk(x) x = x.flatten(2).transpose(1, 2) x = norm(x) x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous() outs.append(x) return outs def forward(self, x): x = self.forward_features(x) # x = self.head(x) return x class DWConv(nn.Module): def __init__(self, dim=768): super(DWConv, self).__init__() self.dwconv = nn.Conv2d(dim, dim, 3, 1, 1, bias=True, groups=dim) def forward(self, x): x = self.dwconv(x) return x def _conv_filter(state_dict, patch_size=16): """ convert patch embedding weight from manual patchify + linear proj to conv""" out_dict = {} for k, v in state_dict.items(): if 'patch_embed.proj.weight' in k: v = v.reshape((v.shape[0], 3, patch_size, patch_size)) out_dict[k] = v return out_dict def LSKNET_T(): model = LSKNet(depths=[2, 2, 2, 2]) return model def LSKNET_S(): model = LSKNet() return model if __name__ == '__main__': model = LSKNet() inputs = torch.randn((1, 3, 640, 640)) for i in model(inputs): print(i.size()) task.py # Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license import contextlib import pickle import re import types from copy import deepcopy from pathlib import Path from .AddModules import * import thop import torch import torch.nn as nn from ultralytics.nn.modules import ( AIFI, C1, C2, C2PSA, C3, C3TR, ELAN1, OBB, PSA, SPP, SPPELAN, SPPF, AConv, ADown, Bottleneck, BottleneckCSP, C2f, C2fAttn, C2fCIB, C2fPSA, C3Ghost, C3k2, C3x, CBFuse, CBLinear, Classify, Concat, Conv, Conv2, ConvTranspose, Detect, DWConv, DWConvTranspose2d, Focus, GhostBottleneck, GhostConv, HGBlock, HGStem, ImagePoolingAttn, Index, Pose, RepC3, RepConv, RepNCSPELAN4, RepVGGDW, ResNetLayer, RTDETRDecoder, SCDown, Segment, TorchVision, WorldDetect, v10Detect, A2C2f, ) from ultralytics.utils import DEFAULT_CFG_DICT, DEFAULT_CFG_KEYS, LOGGER, colorstr, emojis, yaml_load from ultralytics.utils.checks import check_requirements, check_suffix, check_yaml from ultralytics.utils.loss import ( E2EDetectLoss, v8ClassificationLoss, v8DetectionLoss, v8OBBLoss, v8PoseLoss, v8SegmentationLoss, ) from ultralytics.utils.ops import make_divisible from ultralytics.utils.plotting import feature_visualization from ultralytics.utils.torch_utils import ( fuse_conv_and_bn, fuse_deconv_and_bn, initialize_weights, intersect_dicts, model_info, scale_img, time_sync, ) from .AddModules import * class BaseModel(nn.Module): """The BaseModel class serves as a base class for all the models in the Ultralytics YOLO family.""" def forward(self, x, *args, **kwargs): """ Perform forward pass of the model for either training or inference. If x is a dict, calculates and returns the loss for training. Otherwise, returns predictions for inference. Args: x (torch.Tensor | dict): Input tensor for inference, or dict with image tensor and labels for training. *args (Any): Variable length argument list. **kwargs (Any): Arbitrary keyword arguments. Returns: (torch.Tensor): Loss if x is a dict (training), or network predictions (inference). """ if isinstance(x, dict): # for cases of training and validating while training. return self.loss(x, *args, **kwargs) return self.predict(x, *args, **kwargs) def predict(self, x, profile=False, visualize=False, augment=False, embed=None): """ Perform a forward pass through the network. Args: x (torch.Tensor): The input tensor to the model. profile (bool): Print the computation time of each layer if True, defaults to False. visualize (bool): Save the feature maps of the model if True, defaults to False. augment (bool): Augment image during prediction, defaults to False. embed (list, optional): A list of feature vectors/embeddings to return. Returns: (torch.Tensor): The last output of the model. """ if augment: return self._predict_augment(x) return self._predict_once(x, profile, visualize, embed) def _predict_once(self, x, profile=False, visualize=False, embed=None): y, dt, embeddings = [], [], [] # outputs for m in self.model: if m.f != -1: # if not from previous layer x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers if profile: self._profile_one_layer(m, x, dt) if hasattr(m, 'backbone'): x = m(x) if len(x) != 5: # 0 - 5 x.insert(0, None) for index, i in enumerate(x): if index in self.save: y.append(i) else: y.append(None) x = x[-1] # 最后一个输出传给下一层 else: x = m(x) # run y.append(x if m.i in self.save else None) # save output if visualize: feature_visualization(x, m.type, m.i, save_dir=visualize) if embed and m.i in embed: embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1)) # flatten if m.i == max(embed): return torch.unbind(torch.cat(embeddings, 1), dim=0) return x def _predict_augment(self, x): """Perform augmentations on input image x and return augmented inference.""" LOGGER.warning( f"WARNING ⚠️ {self.__class__.__name__} does not support 'augment=True' prediction. " f"Reverting to single-scale prediction." ) return self._predict_once(x) def _profile_one_layer(self, m, x, dt): """ Profile the computation time and FLOPs of a single layer of the model on a given input. Appends the results to the provided list. Args: m (nn.Module): The layer to be profiled. x (torch.Tensor): The input data to the layer. dt (list): A list to store the computation time of the layer. Returns: None """ c = m == self.model[-1] and isinstance(x, list) # is final layer list, copy input as inplace fix flops = thop.profile(m, inputs=[x.copy() if c else x], verbose=False)[0] / 1e9 * 2 if thop else 0 # GFLOPs t = time_sync() for _ in range(10): m(x.copy() if c else x) dt.append((time_sync() - t) * 100) if m == self.model[0]: LOGGER.info(f"{'time (ms)':>10s} {'GFLOPs':>10s} {'params':>10s} module") LOGGER.info(f"{dt[-1]:10.2f} {flops:10.2f} {m.np:10.0f} {m.type}") if c: LOGGER.info(f"{sum(dt):10.2f} {'-':>10s} {'-':>10s} Total") def fuse(self, verbose=True): """ Fuse the `Conv2d()` and `BatchNorm2d()` layers of the model into a single layer, in order to improve the computation efficiency. Returns: (nn.Module): The fused model is returned. """ if not self.is_fused(): for m in self.model.modules(): if isinstance(m, (Conv, Conv2, DWConv)) and hasattr(m, "bn"): if isinstance(m, Conv2): m.fuse_convs() m.conv = fuse_conv_and_bn(m.conv, m.bn) # update conv delattr(m, "bn") # remove batchnorm m.forward = m.forward_fuse # update forward if isinstance(m, ConvTranspose) and hasattr(m, "bn"): m.conv_transpose = fuse_deconv_and_bn(m.conv_transpose, m.bn) delattr(m, "bn") # remove batchnorm m.forward = m.forward_fuse # update forward if isinstance(m, RepConv): m.fuse_convs() m.forward = m.forward_fuse # update forward if isinstance(m, RepVGGDW): m.fuse() m.forward = m.forward_fuse self.info(verbose=verbose) return self def is_fused(self, thresh=10): """ Check if the model has less than a certain threshold of BatchNorm layers. Args: thresh (int, optional): The threshold number of BatchNorm layers. Default is 10. Returns: (bool): True if the number of BatchNorm layers in the model is less than the threshold, False otherwise. """ bn = tuple(v for k, v in nn.__dict__.items() if "Norm" in k) # normalization layers, i.e. BatchNorm2d() return sum(isinstance(v, bn) for v in self.modules()) < thresh # True if < 'thresh' BatchNorm layers in model def info(self, detailed=False, verbose=True, imgsz=640): """ Prints model information. Args: detailed (bool): if True, prints out detailed information about the model. Defaults to False verbose (bool): if True, prints out the model information. Defaults to False imgsz (int): the size of the image that the model will be trained on. Defaults to 640 """ return model_info(self, detailed=detailed, verbose=verbose, imgsz=imgsz) def _apply(self, fn): """ Applies a function to all the tensors in the model that are not parameters or registered buffers. Args: fn (function): the function to apply to the model Returns: (BaseModel): An updated BaseModel object. """ self = super()._apply(fn) m = self.model[-1] # Detect() if isinstance(m, Detect): # includes all Detect subclasses like Segment, Pose, OBB, WorldDetect m.stride = fn(m.stride) m.anchors = fn(m.anchors) m.strides = fn(m.strides) return self def load(self, weights, verbose=True): """ Load the weights into the model. Args: weights (dict | torch.nn.Module): The pre-trained weights to be loaded. verbose (bool, optional): Whether to log the transfer progress. Defaults to True. """ model = weights["model"] if isinstance(weights, dict) else weights # torchvision models are not dicts csd = model.float().state_dict() # checkpoint state_dict as FP32 csd = intersect_dicts(csd, self.state_dict()) # intersect self.load_state_dict(csd, strict=False) # load if verbose: LOGGER.info(f"Transferred {len(csd)}/{len(self.model.state_dict())} items from pretrained weights") def loss(self, batch, preds=None): """ Compute loss. Args: batch (dict): Batch to compute loss on preds (torch.Tensor | List[torch.Tensor]): Predictions. """ if getattr(self, "criterion", None) is None: self.criterion = self.init_criterion() preds = self.forward(batch["img"]) if preds is None else preds return self.criterion(preds, batch) def init_criterion(self): """Initialize the loss criterion for the BaseModel.""" raise NotImplementedError("compute_loss() needs to be implemented by task heads") class DetectionModel(BaseModel): """YOLOv8 detection model.""" def __init__(self, cfg="yolov8n.yaml", ch=3, nc=None, verbose=True): # model, input channels, number of classes """Initialize the YOLOv8 detection model with the given config and parameters.""" super().__init__() self.yaml = cfg if isinstance(cfg, dict) else yaml_model_load(cfg) # cfg dict if self.yaml["backbone"][0][2] == "Silence": LOGGER.warning( "WARNING ⚠️ YOLOv9 `Silence` module is deprecated in favor of nn.Identity. " "Please delete local *.pt file and re-download the latest model checkpoint." ) self.yaml["backbone"][0][2] = "nn.Identity" # Define model ch = self.yaml["ch"] = self.yaml.get("ch", ch) # input channels if nc and nc != self.yaml["nc"]: LOGGER.info(f"Overriding model.yaml nc={self.yaml['nc']} with nc={nc}") self.yaml["nc"] = nc # override YAML value self.model, self.save = parse_model(deepcopy(self.yaml), ch=ch, verbose=verbose) # model, savelist self.names = {i: f"{i}" for i in range(self.yaml["nc"])} # default names dict self.inplace = self.yaml.get("inplace", True) self.end2end = getattr(self.model[-1], "end2end", False) # Build strides m = self.model[-1] # Detect() if isinstance(m, Detect): # includes all Detect subclasses like Segment, Pose, OBB, WorldDetect s = 256 # 2x min stride m.inplace = self.inplace def _forward(x): """Performs a forward pass through the model, handling different Detect subclass types accordingly.""" if self.end2end: return self.forward(x)["one2many"] return self.forward(x)[0] if isinstance(m, (Segment, Pose, OBB)) else self.forward(x) m.stride = torch.tensor([s / x.shape[-2] for x in _forward(torch.zeros(1, ch, s, s))]) # forward self.stride = m.stride m.bias_init() # only run once else: self.stride = torch.Tensor([32]) # default stride for i.e. RTDETR # Init weights, biases initialize_weights(self) if verbose: self.info() LOGGER.info("") def _predict_augment(self, x): """Perform augmentations on input image x and return augmented inference and train outputs.""" if getattr(self, "end2end", False) or self.__class__.__name__ != "DetectionModel": LOGGER.warning("WARNING ⚠️ Model does not support 'augment=True', reverting to single-scale prediction.") return self._predict_once(x) img_size = x.shape[-2:] # height, width s = [1, 0.83, 0.67] # scales f = [None, 3, None] # flips (2-ud, 3-lr) y = [] # outputs for si, fi in zip(s, f): xi = scale_img(x.flip(fi) if fi else x, si, gs=int(self.stride.max())) yi = super().predict(xi)[0] # forward yi = self._descale_pred(yi, fi, si, img_size) y.append(yi) y = self._clip_augmented(y) # clip augmented tails return torch.cat(y, -1), None # augmented inference, train @staticmethod def _descale_pred(p, flips, scale, img_size, dim=1): """De-scale predictions following augmented inference (inverse operation).""" p[:, :4] /= scale # de-scale x, y, wh, cls = p.split((1, 1, 2, p.shape[dim] - 4), dim) if flips == 2: y = img_size[0] - y # de-flip ud elif flips == 3: x = img_size[1] - x # de-flip lr return torch.cat((x, y, wh, cls), dim) def _clip_augmented(self, y): """Clip YOLO augmented inference tails.""" nl = self.model[-1].nl # number of detection layers (P3-P5) g = sum(4**x for x in range(nl)) # grid points e = 1 # exclude layer count i = (y[0].shape[-1] // g) * sum(4**x for x in range(e)) # indices y[0] = y[0][..., :-i] # large i = (y[-1].shape[-1] // g) * sum(4 ** (nl - 1 - x) for x in range(e)) # indices y[-1] = y[-1][..., i:] # small return y def init_criterion(self): """Initialize the loss criterion for the DetectionModel.""" return E2EDetectLoss(self) if getattr(self, "end2end", False) else v8DetectionLoss(self) class OBBModel(DetectionModel): """YOLOv8 Oriented Bounding Box (OBB) model.""" def __init__(self, cfg="yolov8n-obb.yaml", ch=3, nc=None, verbose=True): """Initialize YOLOv8 OBB model with given config and parameters.""" super().__init__(cfg=cfg, ch=ch, nc=nc, verbose=verbose) def init_criterion(self): """Initialize the loss criterion for the model.""" return v8OBBLoss(self) class SegmentationModel(DetectionModel): """YOLOv8 segmentation model.""" def __init__(self, cfg="yolov8n-seg.yaml", ch=3, nc=None, verbose=True): """Initialize YOLOv8 segmentation model with given config and parameters.""" super().__init__(cfg=cfg, ch=ch, nc=nc, verbose=verbose) def init_criterion(self): """Initialize the loss criterion for the SegmentationModel.""" return v8SegmentationLoss(self) class PoseModel(DetectionModel): """YOLOv8 pose model.""" def __init__(self, cfg="yolov8n-pose.yaml", ch=3, nc=None, data_kpt_shape=(None, None), verbose=True): """Initialize YOLOv8 Pose model.""" if not isinstance(cfg, dict): cfg = yaml_model_load(cfg) # load model YAML if any(data_kpt_shape) and list(data_kpt_shape) != list(cfg["kpt_shape"]): LOGGER.info(f"Overriding model.yaml kpt_shape={cfg['kpt_shape']} with kpt_shape={data_kpt_shape}") cfg["kpt_shape"] = data_kpt_shape super().__init__(cfg=cfg, ch=ch, nc=nc, verbose=verbose) def init_criterion(self): """Initialize the loss criterion for the PoseModel.""" return v8PoseLoss(self) class ClassificationModel(BaseModel): """YOLOv8 classification model.""" def __init__(self, cfg="yolov8n-cls.yaml", ch=3, nc=None, verbose=True): """Init ClassificationModel with YAML, channels, number of classes, verbose flag.""" super().__init__() self._from_yaml(cfg, ch, nc, verbose) def _from_yaml(self, cfg, ch, nc, verbose): """Set YOLOv8 model configurations and define the model architecture.""" self.yaml = cfg if isinstance(cfg, dict) else yaml_model_load(cfg) # cfg dict # Define model ch = self.yaml["ch"] = self.yaml.get("ch", ch) # input channels if nc and nc != self.yaml["nc"]: LOGGER.info(f"Overriding model.yaml nc={self.yaml['nc']} with nc={nc}") self.yaml["nc"] = nc # override YAML value elif not nc and not self.yaml.get("nc", None): raise ValueError("nc not specified. Must specify nc in model.yaml or function arguments.") self.model, self.save = parse_model(deepcopy(self.yaml), ch=ch, verbose=verbose) # model, savelist self.stride = torch.Tensor([1]) # no stride constraints self.names = {i: f"{i}" for i in range(self.yaml["nc"])} # default names dict self.info() @staticmethod def reshape_outputs(model, nc): """Update a TorchVision classification model to class count 'n' if required.""" name, m = list((model.model if hasattr(model, "model") else model).named_children())[-1] # last module if isinstance(m, Classify): # YOLO Classify() head if m.linear.out_features != nc: m.linear = nn.Linear(m.linear.in_features, nc) elif isinstance(m, nn.Linear): # ResNet, EfficientNet if m.out_features != nc: setattr(model, name, nn.Linear(m.in_features, nc)) elif isinstance(m, nn.Sequential): types = [type(x) for x in m] if nn.Linear in types: i = len(types) - 1 - types[::-1].index(nn.Linear) # last nn.Linear index if m[i].out_features != nc: m[i] = nn.Linear(m[i].in_features, nc) elif nn.Conv2d in types: i = len(types) - 1 - types[::-1].index(nn.Conv2d) # last nn.Conv2d index if m[i].out_channels != nc: m[i] = nn.Conv2d(m[i].in_channels, nc, m[i].kernel_size, m[i].stride, bias=m[i].bias is not None) def init_criterion(self): """Initialize the loss criterion for the ClassificationModel.""" return v8ClassificationLoss() class RTDETRDetectionModel(DetectionModel): """ RTDETR (Real-time DEtection and Tracking using Transformers) Detection Model class. This class is responsible for constructing the RTDETR architecture, defining loss functions, and facilitating both the training and inference processes. RTDETR is an object detection and tracking model that extends from the DetectionModel base class. Attributes: cfg (str): The configuration file path or preset string. Default is 'rtdetr-l.yaml'. ch (int): Number of input channels. Default is 3 (RGB). nc (int, optional): Number of classes for object detection. Default is None. verbose (bool): Specifies if summary statistics are shown during initialization. Default is True. Methods: init_criterion: Initializes the criterion used for loss calculation. loss: Computes and returns the loss during training. predict: Performs a forward pass through the network and returns the output. """ def __init__(self, cfg="rtdetr-l.yaml", ch=3, nc=None, verbose=True): """ Initialize the RTDETRDetectionModel. Args: cfg (str): Configuration file name or path. ch (int): Number of input channels. nc (int, optional): Number of classes. Defaults to None. verbose (bool, optional): Print additional information during initialization. Defaults to True. """ super().__init__(cfg=cfg, ch=ch, nc=nc, verbose=verbose) def init_criterion(self): """Initialize the loss criterion for the RTDETRDetectionModel.""" from ultralytics.models.utils.loss import RTDETRDetectionLoss return RTDETRDetectionLoss(nc=self.nc, use_vfl=True) def loss(self, batch, preds=None): """ Compute the loss for the given batch of data. Args: batch (dict): Dictionary containing image and label data. preds (torch.Tensor, optional): Precomputed model predictions. Defaults to None. Returns: (tuple): A tuple containing the total loss and main three losses in a tensor. """ if not hasattr(self, "criterion"): self.criterion = self.init_criterion() img = batch["img"] # NOTE: preprocess gt_bbox and gt_labels to list. bs = len(img) batch_idx = batch["batch_idx"] gt_groups = [(batch_idx == i).sum().item() for i in range(bs)] targets = { "cls": batch["cls"].to(img.device, dtype=torch.long).view(-1), "bboxes": batch["bboxes"].to(device=img.device), "batch_idx": batch_idx.to(img.device, dtype=torch.long).view(-1), "gt_groups": gt_groups, } preds = self.predict(img, batch=targets) if preds is None else preds dec_bboxes, dec_scores, enc_bboxes, enc_scores, dn_meta = preds if self.training else preds[1] if dn_meta is None: dn_bboxes, dn_scores = None, None else: dn_bboxes, dec_bboxes = torch.split(dec_bboxes, dn_meta["dn_num_split"], dim=2) dn_scores, dec_scores = torch.split(dec_scores, dn_meta["dn_num_split"], dim=2) dec_bboxes = torch.cat([enc_bboxes.unsqueeze(0), dec_bboxes]) # (7, bs, 300, 4) dec_scores = torch.cat([enc_scores.unsqueeze(0), dec_scores]) loss = self.criterion( (dec_bboxes, dec_scores), targets, dn_bboxes=dn_bboxes, dn_scores=dn_scores, dn_meta=dn_meta ) # NOTE: There are like 12 losses in RTDETR, backward with all losses but only show the main three losses. return sum(loss.values()), torch.as_tensor( [loss[k].detach() for k in ["loss_giou", "loss_class", "loss_bbox"]], device=img.device ) def predict(self, x, profile=False, visualize=False, batch=None, augment=False, embed=None): """ Perform a forward pass through the model. Args: x (torch.Tensor): The input tensor. profile (bool, optional): If True, profile the computation time for each layer. Defaults to False. visualize (bool, optional): If True, save feature maps for visualization. Defaults to False. batch (dict, optional): Ground truth data for evaluation. Defaults to None. augment (bool, optional): If True, perform data augmentation during inference. Defaults to False. embed (list, optional): A list of feature vectors/embeddings to return. Returns: (torch.Tensor): Model's output tensor. """ y, dt, embeddings = [], [], [] # outputs for m in self.model[:-1]: # except the head part if m.f != -1: # if not from previous layer x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers if profile: self._profile_one_layer(m, x, dt) x = m(x) # run y.append(x if m.i in self.save else None) # save output if visualize: feature_visualization(x, m.type, m.i, save_dir=visualize) if embed and m.i in embed: embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1)) # flatten if m.i == max(embed): return torch.unbind(torch.cat(embeddings, 1), dim=0) head = self.model[-1] x = head([y[j] for j in head.f], batch) # head inference return x class WorldModel(DetectionModel): """YOLOv8 World Model.""" def __init__(self, cfg="yolov8s-world.yaml", ch=3, nc=None, verbose=True): """Initialize YOLOv8 world model with given config and parameters.""" self.txt_feats = torch.randn(1, nc or 80, 512) # features placeholder self.clip_model = None # CLIP model placeholder super().__init__(cfg=cfg, ch=ch, nc=nc, verbose=verbose) def set_classes(self, text, batch=80, cache_clip_model=True): """Set classes in advance so that model could do offline-inference without clip model.""" try: import clip except ImportError: check_requirements("git+https://github.com/ultralytics/CLIP.git") import clip if ( not getattr(self, "clip_model", None) and cache_clip_model ): # for backwards compatibility of models lacking clip_model attribute self.clip_model = clip.load("ViT-B/32")[0] model = self.clip_model if cache_clip_model else clip.load("ViT-B/32")[0] device = next(model.parameters()).device text_token = clip.tokenize(text).to(device) txt_feats = [model.encode_text(token).detach() for token in text_token.split(batch)] txt_feats = txt_feats[0] if len(txt_feats) == 1 else torch.cat(txt_feats, dim=0) txt_feats = txt_feats / txt_feats.norm(p=2, dim=-1, keepdim=True) self.txt_feats = txt_feats.reshape(-1, len(text), txt_feats.shape[-1]) self.model[-1].nc = len(text) def predict(self, x, profile=False, visualize=False, txt_feats=None, augment=False, embed=None): """ Perform a forward pass through the model. Args: x (torch.Tensor): The input tensor. profile (bool, optional): If True, profile the computation time for each layer. Defaults to False. visualize (bool, optional): If True, save feature maps for visualization. Defaults to False. txt_feats (torch.Tensor): The text features, use it if it's given. Defaults to None. augment (bool, optional): If True, perform data augmentation during inference. Defaults to False. embed (list, optional): A list of feature vectors/embeddings to return. Returns: (torch.Tensor): Model's output tensor. """ txt_feats = (self.txt_feats if txt_feats is None else txt_feats).to(device=x.device, dtype=x.dtype) if len(txt_feats) != len(x): txt_feats = txt_feats.repeat(len(x), 1, 1) ori_txt_feats = txt_feats.clone() y, dt, embeddings = [], [], [] # outputs for m in self.model: # except the head part if m.f != -1: # if not from previous layer x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers if profile: self._profile_one_layer(m, x, dt) if isinstance(m, C2fAttn): x = m(x, txt_feats) elif isinstance(m, WorldDetect): x = m(x, ori_txt_feats) elif isinstance(m, ImagePoolingAttn): txt_feats = m(x, txt_feats) else: x = m(x) # run y.append(x if m.i in self.save else None) # save output if visualize: feature_visualization(x, m.type, m.i, save_dir=visualize) if embed and m.i in embed: embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1)) # flatten if m.i == max(embed): return torch.unbind(torch.cat(embeddings, 1), dim=0) return x def loss(self, batch, preds=None): """ Compute loss. Args: batch (dict): Batch to compute loss on. preds (torch.Tensor | List[torch.Tensor]): Predictions. """ if not hasattr(self, "criterion"): self.criterion = self.init_criterion() if preds is None: preds = self.forward(batch["img"], txt_feats=batch["txt_feats"]) return self.criterion(preds, batch) class Ensemble(nn.ModuleList): """Ensemble of models.""" def __init__(self): """Initialize an ensemble of models.""" super().__init__() def forward(self, x, augment=False, profile=False, visualize=False): """Function generates the YOLO network's final layer.""" y = [module(x, augment, profile, visualize)[0] for module in self] # y = torch.stack(y).max(0)[0] # max ensemble # y = torch.stack(y).mean(0) # mean ensemble y = torch.cat(y, 2) # nms ensemble, y shape(B, HW, C) return y, None # inference, train output # Functions ------------------------------------------------------------------------------------------------------------ @contextlib.contextmanager def temporary_modules(modules=None, attributes=None): """ Context manager for temporarily adding or modifying modules in Python's module cache (`sys.modules`). This function can be used to change the module paths during runtime. It's useful when refactoring code, where you've moved a module from one location to another, but you still want to support the old import paths for backwards compatibility. Args: modules (dict, optional): A dictionary mapping old module paths to new module paths. attributes (dict, optional): A dictionary mapping old module attributes to new module attributes. Example: ```python with temporary_modules({"old.module": "new.module"}, {"old.module.attribute": "new.module.attribute"}): import old.module # this will now import new.module from old.module import attribute # this will now import new.module.attribute ``` Note: The changes are only in effect inside the context manager and are undone once the context manager exits. Be aware that directly manipulating `sys.modules` can lead to unpredictable results, especially in larger applications or libraries. Use this function with caution. """ if modules is None: modules = {} if attributes is None: attributes = {} import sys from importlib import import_module try: # Set attributes in sys.modules under their old name for old, new in attributes.items(): old_module, old_attr = old.rsplit(".", 1) new_module, new_attr = new.rsplit(".", 1) setattr(import_module(old_module), old_attr, getattr(import_module(new_module), new_attr)) # Set modules in sys.modules under their old name for old, new in modules.items(): sys.modules[old] = import_module(new) yield finally: # Remove the temporary module paths for old in modules: if old in sys.modules: del sys.modules[old] class SafeClass: """A placeholder class to replace unknown classes during unpickling.""" def __init__(self, *args, **kwargs): """Initialize SafeClass instance, ignoring all arguments.""" pass def __call__(self, *args, **kwargs): """Run SafeClass instance, ignoring all arguments.""" pass class SafeUnpickler(pickle.Unpickler): """Custom Unpickler that replaces unknown classes with SafeClass.""" def find_class(self, module, name): """Attempt to find a class, returning SafeClass if not among safe modules.""" safe_modules = ( "torch", "collections", "collections.abc", "builtins", "math", "numpy", # Add other modules considered safe ) if module in safe_modules: return super().find_class(module, name) else: return SafeClass def torch_safe_load(weight, safe_only=False): """ Attempts to load a PyTorch model with the torch.load() function. If a ModuleNotFoundError is raised, it catches the error, logs a warning message, and attempts to install the missing module via the check_requirements() function. After installation, the function again attempts to load the model using torch.load(). Args: weight (str): The file path of the PyTorch model. safe_only (bool): If True, replace unknown classes with SafeClass during loading. Example: ```python from ultralytics.nn.tasks import torch_safe_load ckpt, file = torch_safe_load("path/to/best.pt", safe_only=True) ``` Returns: ckpt (dict): The loaded model checkpoint. file (str): The loaded filename """ from ultralytics.utils.downloads import attempt_download_asset check_suffix(file=weight, suffix=".pt") file = attempt_download_asset(weight) # search online if missing locally try: with temporary_modules( modules={ "ultralytics.yolo.utils": "ultralytics.utils", "ultralytics.yolo.v8": "ultralytics.models.yolo", "ultralytics.yolo.data": "ultralytics.data", }, attributes={ "ultralytics.nn.modules.block.Silence": "torch.nn.Identity", # YOLOv9e "ultralytics.nn.tasks.YOLOv10DetectionModel": "ultralytics.nn.tasks.DetectionModel", # YOLOv10 "ultralytics.utils.loss.v10DetectLoss": "ultralytics.utils.loss.E2EDetectLoss", # YOLOv10 }, ): if safe_only: # Load via custom pickle module safe_pickle = types.ModuleType("safe_pickle") safe_pickle.Unpickler = SafeUnpickler safe_pickle.load = lambda file_obj: SafeUnpickler(file_obj).load() with open(file, "rb") as f: ckpt = torch.load(f, pickle_module=safe_pickle) else: ckpt = torch.load(file, map_location="cpu") except ModuleNotFoundError as e: # e.name is missing module name if e.name == "models": raise TypeError( emojis( f"ERROR ❌️ {weight} appears to be an Ultralytics YOLOv5 model originally trained " f"with https://github.com/ultralytics/yolov5.\nThis model is NOT forwards compatible with " f"YOLOv8 at https://github.com/ultralytics/ultralytics." f"\nRecommend fixes are to train a new model using the latest 'ultralytics' package or to " f"run a command with an official Ultralytics model, i.e. 'yolo predict model=yolov8n.pt'" ) ) from e LOGGER.warning( f"WARNING ⚠️ {weight} appears to require '{e.name}', which is not in Ultralytics requirements." f"\nAutoInstall will run now for '{e.name}' but this feature will be removed in the future." f"\nRecommend fixes are to train a new model using the latest 'ultralytics' package or to " f"run a command with an official Ultralytics model, i.e. 'yolo predict model=yolov8n.pt'" ) check_requirements(e.name) # install missing module ckpt = torch.load(file, map_location="cpu") if not isinstance(ckpt, dict): # File is likely a YOLO instance saved with i.e. torch.save(model, "saved_model.pt") LOGGER.warning( f"WARNING ⚠️ The file '{weight}' appears to be improperly saved or formatted. " f"For optimal results, use model.save('filename.pt') to correctly save YOLO models." ) ckpt = {"model": ckpt.model} return ckpt, file def attempt_load_weights(weights, device=None, inplace=True, fuse=False): """Loads an ensemble of models weights=[a,b,c] or a single model weights=[a] or weights=a.""" ensemble = Ensemble() for w in weights if isinstance(weights, list) else [weights]: ckpt, w = torch_safe_load(w) # load ckpt args = {**DEFAULT_CFG_DICT, **ckpt["train_args"]} if "train_args" in ckpt else None # combined args model = (ckpt.get("ema") or ckpt["model"]).to(device).float() # FP32 model # Model compatibility updates model.args = args # attach args to model model.pt_path = w # attach *.pt file path to model model.task = guess_model_task(model) if not hasattr(model, "stride"): model.stride = torch.tensor([32.0]) # Append ensemble.append(model.fuse().eval() if fuse and hasattr(model, "fuse") else model.eval()) # model in eval mode # Module updates for m in ensemble.modules(): if hasattr(m, "inplace"): m.inplace = inplace elif isinstance(m, nn.Upsample) and not hasattr(m, "recompute_scale_factor"): m.recompute_scale_factor = None # torch 1.11.0 compatibility # Return model if len(ensemble) == 1: return ensemble[-1] # Return ensemble LOGGER.info(f"Ensemble created with {weights}\n") for k in "names", "nc", "yaml": setattr(ensemble, k, getattr(ensemble[0], k)) ensemble.stride = ensemble[int(torch.argmax(torch.tensor([m.stride.max() for m in ensemble])))].stride assert all(ensemble[0].nc == m.nc for m in ensemble), f"Models differ in class counts {[m.nc for m in ensemble]}" return ensemble def attempt_load_one_weight(weight, device=None, inplace=True, fuse=False): """Loads a single model weights.""" ckpt, weight = torch_safe_load(weight) # load ckpt args = {**DEFAULT_CFG_DICT, **(ckpt.get("train_args", {}))} # combine model and default args, preferring model args model = (ckpt.get("ema") or ckpt["model"]).to(device).float() # FP32 model # Model compatibility updates model.args = {k: v for k, v in args.items() if k in DEFAULT_CFG_KEYS} # attach args to model model.pt_path = weight # attach *.pt file path to model model.task = guess_model_task(model) if not hasattr(model, "stride"): model.stride = torch.tensor([32.0]) model = model.fuse().eval() if fuse and hasattr(model, "fuse") else model.eval() # model in eval mode # Module updates for m in model.modules(): if hasattr(m, "inplace"): m.inplace = inplace elif isinstance(m, nn.Upsample) and not hasattr(m, "recompute_scale_factor"): m.recompute_scale_factor = None # torch 1.11.0 compatibility # Return model and ckpt return model, ckpt def parse_model(d, ch, verbose=True): # model_dict, input_channels(3) """Parse a YOLO model.yaml dictionary into a PyTorch model.""" import ast # Args legacy = True # backward compatibility for v3/v5/v8/v9 models max_channels = float("inf") nc, act, scales = (d.get(x) for x in ("nc", "activation", "scales")) depth, width, kpt_shape = (d.get(x, 1.0) for x in ("depth_multiple", "width_multiple", "kpt_shape")) if scales: scale = d.get("scale") if not scale: scale = tuple(scales.keys())[0] LOGGER.warning(f"WARNING ⚠️ no model scale passed. Assuming scale='{scale}'.") depth, width, max_channels = scales[scale] if act: Conv.default_act = eval(act) # redefine default activation, i.e. Conv.default_act = nn.SiLU() if verbose: LOGGER.info(f"{colorstr('activation:')} {act}") # print if verbose: LOGGER.info(f"\n{'':>3}{'from':>20}{'n':>3}{'params':>10} {'module':<45}{'arguments':<30}") ch = [ch] layers, save, c2 = [], [], ch[-1] # layers, savelist, ch out backbone = False for i, (f, n, m, args) in enumerate(d["backbone"] + d["head"]): # from, number, module, args t=m m = getattr(torch.nn, m[3:]) if "nn." in m else globals()[m] # get module for j, a in enumerate(args): if isinstance(a, str): with contextlib.suppress(ValueError): args[j] = locals()[a] if a in locals() else ast.literal_eval(a) n = n_ = max(round(n * depth), 1) if n > 1 else n # depth gain if m in { Classify, Conv, ConvTranspose, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, C2fPSA, C2PSA, DWConv, Focus, BottleneckCSP, C1, C2, C2f, C3k2, RepNCSPELAN4, ELAN1, ADown, AConv, SPPELAN, C2fAttn, C3, C3TR, C3Ghost, nn.ConvTranspose2d, DWConvTranspose2d, C3x, RepC3, PSA, SCDown, C2fCIB, A2C2f, }: c1, c2 = ch[f], args[0] if c2 != nc: # if c2 not equal to number of classes (i.e. for Classify() output) c2 = make_divisible(min(c2, max_channels) * width, 8) if m is C2fAttn: args[1] = make_divisible(min(args[1], max_channels // 2) * width, 8) # embed channels args[2] = int( max(round(min(args[2], max_channels // 2 // 32)) * width, 1) if args[2] > 1 else args[2] ) # num heads args = [c1, c2, *args[1:]] if m in { BottleneckCSP, C1, C2, C2f, C3k2, C2fAttn, C3, C3TR, C3Ghost, C3x, RepC3, C2fPSA, C2fCIB, C2PSA, A2C2f, }: args.insert(2, n) # number of repeats n = 1 if m is C3k2: # for M/L/X sizes legacy = False if scale in "mlx": args[3] = True if m is A2C2f: legacy = False if scale in "lx": # for L/X sizes args.append(True) args.append(1.2) elif m is AIFI: args = [ch[f], *args] elif m in {HGStem, HGBlock}: c1, cm, c2 = ch[f], args[0], args[1] args = [c1, cm, c2, *args[2:]] if m is HGBlock: args.insert(4, n) # number of repeats n = 1 elif m is ResNetLayer: c2 = args[1] if args[3] else args[1] * 4 elif m is nn.BatchNorm2d: args = [ch[f]] #LSKNet elif m in {LSKNET_T,LSKNET_S}: m = m(*args) c2 = m.width_list backbone =True elif m is Concat: c2 = sum(ch[x] for x in f) elif m in {Detect, WorldDetect, Segment, Pose, OBB, ImagePoolingAttn, v10Detect}: args.append([ch[x] for x in f]) if m is Segment: args[2] = make_divisible(min(args[2], max_channels) * width, 8) if m in {Detect, Segment, Pose, OBB}: m.legacy = legacy elif m is RTDETRDecoder: # special case, channels arg must be passed in index 1 args.insert(1, [ch[x] for x in f]) elif m in {CBLinear, TorchVision, Index}: c2 = args[0] c1 = ch[f] args = [c1, c2, *args[1:]] elif m is CBFuse: c2 = ch[f[-1]] else: c2 = ch[f] # m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args) # module # t = str(m)[8:-2].replace("__main__.", "") # module type # m_.np = sum(x.numel() for x in m_.parameters()) # number params # m_.i, m_.f, m_.type = i, f, t # attach index, 'from' index, type # if verbose: # LOGGER.info(f"{i:>3}{str(f):>20}{n_:>3}{m_.np:10.0f} {t:<45}{str(args):<30}") # print # save.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist # layers.append(m_) # if i == 0: # ch = [] # ch.append(c2) if isinstance(c2, list): backbone = True m_ = m m_.backbone = True else: m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args) # module t = str(m)[8:-2].replace('__main__.', '') # module type m.np = sum(x.numel() for x in m_.parameters()) # number params m_.i, m_.f, m_.type = i + 4 if backbone else i, f, t # attach index, 'from' index, type if verbose: LOGGER.info(f'{i:>3}{str(f):>20}{n_:>3}{m.np:10.0f} {t:<45}{str(args):<30}') # print save.extend(x % (i + 4 if backbone else i) for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist layers.append(m_) if i == 0: ch = [] if isinstance(c2, list): ch.extend(c2) for _ in range(5 - len(ch)): ch.insert(0, 0) else: ch.append(c2) return nn.Sequential(*layers), sorted(save) def yaml_model_load(path): """Load a YOLOv8 model from a YAML file.""" path = Path(path) if path.stem in (f"yolov{d}{x}6" for x in "nsmlx" for d in (5, 8)): new_stem = re.sub(r"(\d+)([nslmx])6(.+)?$", r"\1\2-p6\3", path.stem) LOGGER.warning(f"WARNING ⚠️ Ultralytics YOLO P6 models now use -p6 suffix. Renaming {path.stem} to {new_stem}.") path = path.with_name(new_stem + path.suffix) unified_path = re.sub(r"(\d+)([nslmx])(.+)?$", r"\1\3", str(path)) # i.e. yolov8x.yaml -> yolov8.yaml yaml_file = check_yaml(unified_path, hard=False) or check_yaml(path) d = yaml_load(yaml_file) # model dict d["scale"] = guess_model_scale(path) d["yaml_file"] = str(path) return d def guess_model_scale(model_path): """ Takes a path to a YOLO model's YAML file as input and extracts the size character of the model's scale. The function uses regular expression matching to find the pattern of the model scale in the YAML file name, which is denoted by n, s, m, l, or x. The function returns the size character of the model scale as a string. Args: model_path (str | Path): The path to the YOLO model's YAML file. Returns: (str): The size character of the model's scale, which can be n, s, m, l, or x. """ try: return re.search(r"yolo[v]?\d+([nslmx])", Path(model_path).stem).group(1) # noqa, returns n, s, m, l, or x except AttributeError: return "" def guess_model_task(model): """ Guess the task of a PyTorch model from its architecture or configuration. Args: model (nn.Module | dict): PyTorch model or model configuration in YAML format. Returns: (str): Task of the model ('detect', 'segment', 'classify', 'pose'). Raises: SyntaxError: If the task of the model could not be determined. """ def cfg2task(cfg): """Guess from YAML dictionary.""" m = cfg["head"][-1][-2].lower() # output module name if m in {"classify", "classifier", "cls", "fc"}: return "classify" if "detect" in m: return "detect" if m == "segment": return "segment" if m == "pose": return "pose" if m == "obb": return "obb" # Guess from model cfg if isinstance(model, dict): with contextlib.suppress(Exception): return cfg2task(model) # Guess from PyTorch model if isinstance(model, nn.Module): # PyTorch model for x in "model.args", "model.model.args", "model.model.model.args": with contextlib.suppress(Exception): return eval(x)["task"] for x in "model.yaml", "model.model.yaml", "model.model.model.yaml": with contextlib.suppress(Exception): return cfg2task(eval(x)) for m in model.modules(): if isinstance(m, Segment): return "segment" elif isinstance(m, Classify): return "classify" elif isinstance(m, Pose): return "pose" elif isinstance(m, OBB): return "obb" elif isinstance(m, (Detect, WorldDetect, v10Detect)): return "detect" # Guess from model filename if isinstance(model, (str, Path)): model = Path(model) if "-seg" in model.stem or "segment" in model.parts: return "segment" elif "-cls" in model.stem or "classify" in model.parts: return "classify" elif "-pose" in model.stem or "pose" in model.parts: return "pose" elif "-obb" in model.stem or "obb" in model.parts: return "obb" elif "detect" in model.parts: return "detect" # Unable to determine task from model LOGGER.warning( "WARNING ⚠️ Unable to automatically guess model task, assuming 'task=detect'. " "Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify','pose' or 'obb'." ) return "detect" # assume detect __init__.py from .LSKNet import * yolov12-LSKNet.yaml # YOLOv12 🚀, AGPL-3.0 license # YOLOv12 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect # Parameters nc: 1 # number of classes scales: # model compound scaling constants, i.e. 'model=yolov12n.yaml' will call yolov12.yaml with scale 'n' # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 465 layers, 2,603,056 parameters, 2,603,040 gradients, 6.7 GFLOPs s: [0.50, 0.50, 1024] # summary: 465 layers, 9,285,632 parameters, 9,285,616 gradients, 21.7 GFLOPs m: [0.50, 1.00, 512] # summary: 501 layers, 20,201,216 parameters, 20,201,200 gradients, 68.1 GFLOPs l: [1.00, 1.00, 512] # summary: 831 layers, 26,454,880 parameters, 26,454,864 gradients, 89.7 GFLOPs x: [1.00, 1.50, 512] # summary: 831 layers, 59,216,928 parameters, 59,216,912 gradients, 200.3 GFLOPs # YOLO12n backbone backbone: # [from, repeats, module, args] - [-1, 1, LSKNET_T, []] # YOLO12n head head: - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 3], 1, Concat, [1]] # cat backbone P4 - [-1, 2, A2C2f, [512, False, -1]] # 8 - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 2], 1, Concat, [1]] # cat backbone P3 - [-1, 2, A2C2f, [256, False, -1]] # 11 - [-1, 1, Conv, [256, 3, 2]] - [[-1, 7], 1, Concat, [1]] # cat head P4 - [-1, 2, A2C2f, [512, False, -1]] # 14 - [-1, 1, Conv, [512, 3, 2]] - [[-1, 4], 1, Concat, [1]] # cat head P5 - [-1, 2, C3k2, [1024, True]] # 17 (P5/32-large) - [[10, 13, 16], 1, Detect, [nc]] # Detect(P3, P4, P5) 下面是我训练时所使用到的代码 train.py from ultralytics import YOLO def main(): model = YOLO("yolov12-LSKNet.yaml") results = model.train( data='./data.yaml', epochs=200, imgsz=640, batch=16, amp=False, device=0, save=True, project='runs/train', name='rice_seedling-200L+NWD', verbose=True, workers=2, cache=True, ) if __name__ == '__main__': from multiprocessing import freeze_support freeze_support() main()
最新发布
09-01
<think>我们正在处理一个TypeError: can't convert float to int错误,这通常发生在尝试将浮点数直接转换为整数时,因为Python不允许隐式地将浮点数转换为整数(尤其是在某些需要整数的上下文中)。在时间戳处理中,我们经常需要将时间戳(通常是浮点数)转换为整数以便进行某些操作(如索引、整数运算等),但必须使用显式转换。 根据用户提供的引用信息,特别是引用[1]和引用[2]涉及时间处理和类型转换,我们可以推测用户可能在处理时间数据时遇到了浮点数转整数的问题。 解决方案步骤: 1. 确认错误发生的位置:检查代码中尝试将浮点数作为整数使用的地方,例如在索引、数学运算中要求整数(如整除、位运算)或函数参数要求整数时。 2. 使用显式转换:在需要整数的地方,使用int()函数将浮点数转换为整数。但注意,这样会截断小数部分(向下取整)。如果需要进行四舍五入,可以使用round()函数后再转换为整数,或者使用math.floor()或math.ceil()。 在时间戳处理中,常见的情况是: - 将时间戳(以秒为单位的浮点数)转换为整数时间戳(毫秒、微秒等)时,需要乘以一个因子然后取整。 - 或者将时间戳转换为整数用于其他计算。 例如,将秒级时间戳转换为毫秒级时间戳(整数): timestamp_ms = int(timestamp_sec * 1000) # 这样会截断小数部分,可能丢失精度 或者使用四舍五入: timestamp_ms = round(timestamp_sec * 1000) 但是,注意四舍五入可能不是在所有场景都适用,比如时间戳通常不需要四舍五入,而是直接截断或向上取整,具体取决于业务需求。 3. 检查数据来源:确保你处理的数据确实是数值类型。有时,数据可能是字符串形式,需要先转换为浮点数再转换为整数。 4. 使用数学模块:对于复杂的取整需求,可以使用math模块中的函数。 5. 异常处理:如果转换可能失败(例如,数据不是数字),可以使用try-except块捕获异常。 示例代码: 假设我们有一个浮点数的时间戳,我们想把它转换为整数(秒级时间戳的整数部分): timestamp_float = 1609459200.1234 timestamp_int = int(timestamp_float) # 得到1609459200 但是,如果我们需要毫秒级,则: timestamp_millis = int(timestamp_float * 1000) # 1609459200123(注意,这里小数部分0.1234乘以1000后是123.4,截断后为123) 如果我们需要四舍五入到最近的整数毫秒: timestamp_millis = round(timestamp_float * 1000) # 四舍五入,然后得到整数(但返回的是浮点数,需要再转换) # 或者 timestamp_millis = int(round(timestamp_float * 1000)) 但是注意,round()函数返回的是整数(如果ndigits参数为0或省略)或浮点数(如果ndigits不为0),所以当ndigits为0时,round返回整数,否则返回浮点数。因此,在需要整数的情况下,可以安全地使用int(round(...))。 另外,在时间处理中,我们经常使用pandas或datetime模块,这些库提供了更丰富的时间处理功能。例如,使用pandas的to_datetime函数可以处理时间戳,并可以指定单位。 根据引用[2]中的例子,我们也可以使用time模块进行转换,但注意time模块处理的是秒级时间戳(整数或浮点数),但它的某些函数要求整数时间戳(如gmtime, localtime等),这时我们可以用int()转换,但这样会丢失小数部分(即秒的小数部分)。所以,如果我们需要保留小数部分,应该使用浮点数,但某些函数不支持。因此,在需要精确到毫秒或微秒时,建议使用datetime模块。 使用datetime的例子: from datetime import datetime # 将浮点数时间戳转换为datetime对象 dt = datetime.fromtimestamp(timestamp_float) # 这里timestamp_float可以是浮点数,小数部分表示毫秒和微秒 然后,如果我们想提取整数部分(例如,只取到秒),可以: dt = dt.replace(microsecond=0) # 去掉微秒部分 或者,如果我们想得到整数时间戳(秒级): timestamp_int = int(dt.timestamp()) # 这样得到整数秒 但是注意,这样转换后,小数部分就丢失了。 总结:在时间戳处理中,遇到float转int的错误,通常是因为在需要整数的位置使用了浮点数。解决方法是在转换前确保使用显式的int()或round(),或者使用datetime等库进行更精确的处理。 根据用户的问题,我们提供以下解决方案步骤: 步骤1:检查错误发生的位置,找到导致错误的变量。 步骤2:确定该变量的类型和值(使用type()和print())。 步骤3:根据需求,决定如何转换(直接截断、四舍五入、向上取整等)。 步骤4:使用适当的函数进行转换。 步骤5:如果数据可能不是数字,添加异常处理。 示例代码片段: ```python try: # 假设timestamp是一个浮点数 timestamp_float = 1609459200.1234 # 需求:转换为整数秒(截断小数) timestamp_int = int(timestamp_float) # 或者,转换为毫秒(四舍五入) timestamp_millis = int(round(timestamp_float * 1000)) # 或者,转换为毫秒(截断) timestamp_millis_trunc = int(timestamp_float * 1000) except TypeError as e: print(f"转换错误: {e}") # 处理错误,例如给一个默认值或记录日志 ``` 另外,如果错误发生在其他上下文(比如在列表索引中使用浮点数),同样需要转换: index = int(round(some_float)) 但是,注意:在索引中,我们通常使用截断(int())而不是四舍五入,因为四舍五入可能会导致索引超出范围(例如,当浮点数恰好为两个整数的中间值时,四舍五入可能取较大的整数,而实际索引最大为长度减一)。 因此,在索引中使用时,通常直接使用int()截断,但前提是浮点数在合理的范围内。如果浮点数可能为负或超出索引范围,还需要进行范围检查。 最后,根据引用[4]中的调试建议,我们可以使用print语句输出变量的类型和值,以便于调试。 例如,在转换前: print(f"timestamp_float的类型: {type(timestamp_float)}, 值: {timestamp_float}") 这样可以帮助我们确认变量确实是浮点数,而不是其他类型(如字符串)。 如果变量是字符串,我们需要先将其转换为浮点数,然后再转换为整数: timestamp_float = float(timestamp_str) 所以,完整的处理流程可能是: ```python # 假设我们有一个可能是字符串或浮点数的时间戳变量 timestamp_str_or_float = "1609459200.1234" # 或者 1609459200.1234 try: # 如果它是字符串,先转成浮点数 if isinstance(timestamp_str_or_float, str): timestamp_float = float(timestamp_str_or_float) else: timestamp_float = timestamp_str_or_float # 现在转换为整数秒(截断) timestamp_int = int(timestamp_float) except ValueError: print("无法将字符串转换为浮点数") except TypeError: print("类型错误,可能是不支持的类型") ``` 这样,我们就可以处理多种输入类型。 因此,针对用户的问题,解决方案的核心是使用显式类型转换并处理可能的异常。 相关问题: 1. 在Python中,如何将浮点数安全地转换为整数? 2. 时间戳处理中,如何避免精度丢失? 3. 除了int()和round(),还有哪些处理浮点数的方法? 4. 在pandas中处理时间戳数据时,如何避免类型转换错误?</think>### Python TypeError: can't convert float to int 时间戳处理错误解决方案 这个错误通常发生在尝试将浮点数直接用于需要整数的操作(如索引、时间戳转换等)时。以下是分步解决方案: #### 1. **明确错误原因** ```python import time timestamp = time.time() # 返回浮点数(如 1678901234.567) int_timestamp = int(timestamp) # 正确:显式转换 wrong_use = timestamp // 1 # 错误://操作要求整数操作数 ``` #### 2. **时间戳处理方案** **方案1:显式类型转换** ```python # 浮点时间戳 → 整数(毫秒级) float_ts = 1678901234.567 int_ms = int(float_ts * 1000) # 乘以1000后转整数 ``` **方案2:使用`math.floor()`向下取整** ```python import math truncated_ts = math.floor(float_ts) # 舍去小数部分 ``` **方案3:四舍五入取整** ```python rounded_ts = round(float_ts) # 四舍五入 ``` #### 3. **时间类型转换最佳实践** ```python # 字符串 → 时间戳(引用[2]方法) from datetime import datetime dt_obj = datetime.strptime("2023-01-01 12:30:45", "%Y-%m-%d %H:%M:%S") int_timestamp = int(dt_obj.timestamp()) # 显式转整数 ``` #### 4. **调试技巧(引用[4])** ```python def process_timestamp(ts): print(f"[DEBUG] 输入类型: {type(ts)}, 值: {ts}") # 类型检查 try: return int(ts) # 显式转换 except TypeError: print("错误:必须显式转换浮点数为整数") return int(float(ts)) # 双重保险 print(process_timestamp(1678901234.567)) ``` #### 5. **Pandas数据处理** 当使用DataFrame时(引用[3]): ```python import pandas as pd df = pd.DataFrame({"timestamp": [1678901234.567, 1678905678.901]}) # 正确:创建整数时间戳列 df["int_ts"] = df["timestamp"].astype(int) # 显式转换 ``` > **关键点**:Python不会自动将float转为int,必须显式使用`int()`或取整函数。时间戳处理中需注意精度要求(秒/毫秒/微秒)。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值