Filter mapping specifies an unknown filter name CharaterEncodingFilter

博客提及在 web.xml 中,某字段需放在另一字段前面,强调了 web.xml 里字段顺序的重要性,属于前端开发中配置文件相关内容。

在web.xml中
字段要放在字段前面

profile M和profile G在以下这个函数上的区别GetProfiles Description: Retrieve the profile with the specified token or all defined media profiles. If no Type is provided the returned profiles shall contain no configuration information. If a single Type with value 'All' is provided the returned profiles shall include all associated configurations. Otherwise the requested list of configurations shall for each profile include the configurations present as Type. SOAP action: http://www.onvif.org/ver20/media/wsdl/GetProfiles Input: [GetProfiles] Token - optional; [ReferenceToken] Optional token of the requested profile. Type - optional, unbounded; [string] The types shall be provided as defined by tr2:ConfigurationEnumeration. Output: [GetProfilesResponse] Profiles - optional, unbounded; [MediaProfile] Lists all profiles that exist in the media service. The response provides the requested types of Configurations as far as available. If a profile doesn't contain a type no error shall be provided. token - required; [ReferenceToken] Unique identifier of the profile. fixed [boolean] A value of true signals that the profile cannot be deleted. Default is false. Name [Name] User readable name of the profile. Configurations - optional; [ConfigurationSet] The configurations assigned to the profile. VideoSource - optional; [VideoSourceConfiguration] Optional configuration of the Video input. token - required; [ReferenceToken] Token that uniquely references this configuration. Length up to 64 characters. Name [Name] User readable name. Length up to 64 characters. UseCount [int] Number of internal references currently using this configuration. This informational parameter is read-only. Deprecated for Media2 Service. ViewMode [string] Readonly parameter signalling Source configuration's view mode, for devices supporting different view modes as defined in tt:viewModes. SourceToken [ReferenceToken] Reference to the physical input. Bounds [IntRectangle] Rectangle specifying the Video capturing area. The capturing area shall not be larger than the whole Video source area. x - required; [int] y - required; [int] width - required; [int] height - required; [int] Extension - optional; [VideoSourceConfigurationExtension] Rotate - optional; [Rotate] Optional element to configure rotation of captured image. What resolutions a device supports shall be unaffected by the Rotate parameters. If a device is configured with Rotate=AUTO, the device shall take control over the Degree parameter and automatically update it so that a client can query current rotation. The device shall automatically apply the same rotation to its pan/tilt control direction depending on the following condition: if Reverse=AUTO in PTControlDirection or if the device doesn’t support Reverse in PTControlDirection Mode [RotateMode] Parameter to enable/disable Rotation feature. - enum { 'OFF', 'ON', 'AUTO' } Degree - optional; [int] Optional parameter to configure how much degree of clockwise rotation of image for On mode. Omitting this parameter for On mode means 180 degree rotation. Extension - optional; [RotateExtension] Extension - optional; [VideoSourceConfigurationExtension2] LensDescription - optional, unbounded; [LensDescription] Optional element describing the geometric lens distortion. Multiple instances for future variable lens support. FocalLength [float] Optional focal length of the optical system. Offset [LensOffset] Offset of the lens center to the imager center in normalized coordinates. x [float] Optional horizontal offset of the lens center in normalized coordinates. y [float] Optional vertical offset of the lens center in normalized coordinates. Projection - unbounded; [LensProjection] Radial description of the projection characteristics. The resulting curve is defined by the B-Spline interpolation over the given elements. The element for Radius zero shall not be provided. The projection points shall be ordered with ascending Radius. Items outside the last projection Radius shall be assumed to be invisible (black). Angle [float] Angle of incidence. Radius [float] Mapping radius as a consequence of the emergent angle. Transmittance - optional; [float] Optional ray absorption at the given angle due to vignetting. A value of one means no absorption. XFactor [float] Compensation of the x coordinate needed for the ONVIF normalized coordinate system. SceneOrientation - optional; [SceneOrientation] Optional element describing the scene orientation in the camera’s field of view. Mode [SceneOrientationMode] Parameter to assign the way the camera determines the scene orientation. - enum { 'MANUAL', 'AUTO' } Orientation - optional; [string] Assigned or determined scene orientation based on the Mode. When assigning the Mode to AUTO, this field is optional and will be ignored by the device. When assigning the Mode to MANUAL, this field is required and the device will return an InvalidArgs fault if missing. AudioSource - optional; [AudioSourceConfiguration] Optional configuration of the Audio input. token - required; [ReferenceToken] Token that uniquely references this configuration. Length up to 64 characters. Name [Name] User readable name. Length up to 64 characters. UseCount [int] Number of internal references currently using this configuration. This informational parameter is read-only. Deprecated for Media2 Service. SourceToken [ReferenceToken] Token of the Audio Source the configuration applies to VideoEncoder - optional; [VideoEncoder2Configuration] Optional configuration of the Video encoder. token - required; [ReferenceToken] Token that uniquely references this configuration. Length up to 64 characters. Name [Name] User readable name. Length up to 64 characters. UseCount [int] Number of internal references currently using this configuration. This informational parameter is read-only. Deprecated for Media2 Service. GovLength [int] Group of Video frames length. Determines typically the interval in which the I-Frames will be coded. An entry of 1 indicates I-Frames are continuously generated. An entry of 2 indicates that every 2nd image is an I-Frame, and 3 only every 3rd frame, etc. The frames in between are coded as P or B Frames. AnchorFrameDistance [int] Distance between anchor frames of type I-Frame and P-Frame. '1' indicates no B-Frames, '2' indicates that every 2nd frame is encoded as B-Frame, '3' indicates a structure like IBBPBBP..., etc. Profile [string] The encoder profile as defined in tt:VideoEncodingProfiles. GuaranteedFrameRate [boolean] A value of true indicates that frame rate is a fixed value rather than an upper limit, and that the video encoder shall prioritize frame rate over all other adaptable configuration values such as bitrate. Default is false. Signed [boolean] Indicates if this stream will be signed according to the Media Signing Specification. Encoding [string] Video Media Subtype for the video format. For definitions see tt:VideoEncodingMimeNames and IANA Media Types. Resolution [VideoResolution2] Configured video resolution Width [int] Number of the columns of the Video image. If there is a 90-degree rotation, this represents the number of lines of the Video image. Height [int] Number of the lines of the Video image. If there is a 90-degree rotation, this represents the number of columns of the Video image. RateControl - optional; [VideoRateControl2] Optional element to configure rate control related parameters. ConstantBitRate [boolean] Enforce constant bitrate. FrameRateLimit [float] Desired frame rate in fps. The actual rate may be lower due to e.g. performance limitations. BitrateLimit [int] the maximum output bitrate in kbps Multicast - optional; [MulticastConfiguration] Defines the multicast settings that could be used for video streaming. Address [IPAddress] The multicast address (if this address is set to 0 no multicast streaming is enaled) Type [IPType] Indicates if the address is an IPv4 or IPv6 address. - enum { 'IPv4', 'IPv6' } IPv4Address - optional; [IPv4Address] IPv4 address. IPv6Address - optional; [IPv6Address] IPv6 address Port [int] The RTP mutlicast destination port. A device may support RTCP. In this case the port value shall be even to allow the corresponding RTCP stream to be mapped to the next higher (odd) destination port number as defined in the RTSP specification. TTL [int] In case of IPv6 the TTL value is assumed as the hop limit. Note that for IPV6 and administratively scoped IPv4 multicast the primary use for hop limit / TTL is to prevent packets from (endlessly) circulating and not limiting scope. In these cases the address contains the scope. AutoStart [boolean] Read only property signalling that streaming is persistant. Use the methods StartMulticastStreaming and StopMulticastStreaming to switch its state. Quality [float] Relative value for the video quantizers and the quality of the video. A high value within supported quality range means higher quality AudioEncoder - optional; [AudioEncoder2Configuration] Optional configuration of the Audio encoder. token - required; [ReferenceToken] Token that uniquely references this configuration. Length up to 64 characters. Name [Name] User readable name. Length up to 64 characters. UseCount [int] Number of internal references currently using this configuration. This informational parameter is read-only. Deprecated for Media2 Service. Encoding [string] Audio Media Subtype for the audio format. For definitions see tt:AudioEncodingMimeNames and IANA Media Types. Multicast - optional; [MulticastConfiguration] Optional multicast configuration of the audio stream. Address [IPAddress] The multicast address (if this address is set to 0 no multicast streaming is enaled) Type [IPType] Indicates if the address is an IPv4 or IPv6 address. - enum { 'IPv4', 'IPv6' } IPv4Address - optional; [IPv4Address] IPv4 address. IPv6Address - optional; [IPv6Address] IPv6 address Port [int] The RTP mutlicast destination port. A device may support RTCP. In this case the port value shall be even to allow the corresponding RTCP stream to be mapped to the next higher (odd) destination port number as defined in the RTSP specification. TTL [int] In case of IPv6 the TTL value is assumed as the hop limit. Note that for IPV6 and administratively scoped IPv4 multicast the primary use for hop limit / TTL is to prevent packets from (endlessly) circulating and not limiting scope. In these cases the address contains the scope. AutoStart [boolean] Read only property signalling that streaming is persistant. Use the methods StartMulticastStreaming and StopMulticastStreaming to switch its state. Bitrate [int] The output bitrate in kbps. SampleRate [int] The output sample rate in kHz. Analytics - optional; [VideoAnalyticsConfiguration] Optional configuration of the analytics module and rule engine. token - required; [ReferenceToken] Token that uniquely references this configuration. Length up to 64 characters. Name [Name] User readable name. Length up to 64 characters. UseCount [int] Number of internal references currently using this configuration. This informational parameter is read-only. Deprecated for Media2 Service. AnalyticsEngineConfiguration [AnalyticsEngineConfiguration] AnalyticsModule - optional, unbounded; [Config] Name - required; [string] Name of the configuration. Type - required; [QName] The Type attribute specifies the type of rule and shall be equal to value of one of Name attributes of ConfigDescription elements returned by GetSupportedRules and GetSupportedAnalyticsModules command. Parameters [ItemList] List of configuration parameters as defined in the corresponding description. SimpleItem - optional, unbounded; Value name pair as defined by the corresponding description. Name - required; [string] Item name. Value - required; [anySimpleType] Item value. The type is defined in the corresponding description. ElementItem - optional, unbounded; Complex value structure. Name - required; [string] Item name. Extension - optional; [ItemListExtension] Extension - optional; [AnalyticsEngineConfigurationExtension] RuleEngineConfiguration [RuleEngineConfiguration] Rule - optional, unbounded; [Config] Name - required; [string] Name of the configuration. Type - required; [QName] The Type attribute specifies the type of rule and shall be equal to value of one of Name attributes of ConfigDescription elements returned by GetSupportedRules and GetSupportedAnalyticsModules command. Parameters [ItemList] List of configuration parameters as defined in the corresponding description. SimpleItem - optional, unbounded; Value name pair as defined by the corresponding description. Name - required; [string] Item name. Value - required; [anySimpleType] Item value. The type is defined in the corresponding description. ElementItem - optional, unbounded; Complex value structure. Name - required; [string] Item name. Extension - optional; [ItemListExtension] Extension - optional; [RuleEngineConfigurationExtension] PTZ - optional; [PTZConfiguration] Optional configuration of the pan tilt zoom unit. token - required; [ReferenceToken] Token that uniquely references this configuration. Length up to 64 characters. Name [Name] User readable name. Length up to 64 characters. UseCount [int] Number of internal references currently using this configuration. This informational parameter is read-only. Deprecated for Media2 Service. MoveRamp [int] The optional acceleration ramp used by the device when moving. PresetRamp [int] The optional acceleration ramp used by the device when recalling presets. PresetTourRamp [int] The optional acceleration ramp used by the device when executing PresetTours. NodeToken [ReferenceToken] A mandatory reference to the PTZ Node that the PTZ Configuration belongs to. DefaultAbsolutePantTiltPositionSpace - optional; [anyURI] If the PTZ Node supports absolute Pan/Tilt movements, it shall specify one Absolute Pan/Tilt Position Space as default. DefaultAbsoluteZoomPositionSpace - optional; [anyURI] If the PTZ Node supports absolute zoom movements, it shall specify one Absolute Zoom Position Space as default. DefaultRelativePanTiltTranslationSpace - optional; [anyURI] If the PTZ Node supports relative Pan/Tilt movements, it shall specify one RelativePan/Tilt Translation Space as default. DefaultRelativeZoomTranslationSpace - optional; [anyURI] If the PTZ Node supports relative zoom movements, it shall specify one Relative Zoom Translation Space as default. DefaultContinuousPanTiltVelocitySpace - optional; [anyURI] If the PTZ Node supports continuous Pan/Tilt movements, it shall specify one Continuous Pan/Tilt Velocity Space as default. DefaultContinuousZoomVelocitySpace - optional; [anyURI] If the PTZ Node supports continuous zoom movements, it shall specify one Continuous Zoom Velocity Space as default. DefaultPTZSpeed - optional; [PTZSpeed] If the PTZ Node supports absolute or relative PTZ movements, it shall specify corresponding default Pan/Tilt and Zoom speeds. PanTilt - optional; [Vector2D] Pan and tilt speed. The x component corresponds to pan and the y component to tilt. If omitted in a request, the current (if any) PanTilt movement should not be affected. Zoom - optional; [Vector1D] A zoom speed. If omitted in a request, the current (if any) Zoom movement should not be affected. DefaultPTZTimeout - optional; [duration] If the PTZ Node supports continuous movements, it shall specify a default timeout, after which the movement stops. PanTiltLimits - optional; [PanTiltLimits] The Pan/Tilt limits element should be present for a PTZ Node that supports an absolute Pan/Tilt. If the element is present it signals the support for configurable Pan/Tilt limits. If limits are enabled, the Pan/Tilt movements shall always stay within the specified range. The Pan/Tilt limits are disabled by setting the limits to –INF or +INF. Range [Space2DDescription] A range of pan tilt limits. URI [anyURI] A URI of coordinate systems. XRange [FloatRange] A range of x-axis. Min [float] Max [float] YRange [FloatRange] A range of y-axis. Min [float] Max [float] ZoomLimits - optional; [ZoomLimits] The Zoom limits element should be present for a PTZ Node that supports absolute zoom. If the element is present it signals the supports for configurable Zoom limits. If limits are enabled the zoom movements shall always stay within the specified range. The Zoom limits are disabled by settings the limits to -INF and +INF. Range [Space1DDescription] A range of zoom limit URI [anyURI] A URI of coordinate systems. XRange [FloatRange] A range of x-axis. Min [float] Max [float] Extension - optional; [PTZConfigurationExtension] PTControlDirection - optional; [PTControlDirection] Optional element to configure PT Control Direction related features. EFlip - optional; [EFlip] Optional element to configure related parameters for E-Flip. Mode [EFlipMode] Parameter to enable/disable E-Flip feature. - enum { 'OFF', 'ON', 'Extended' } Reverse - optional; [Reverse] Optional element to configure related parameters for reversing of PT Control Direction. Mode [ReverseMode] Parameter to enable/disable Reverse feature. - enum { 'OFF', 'ON', 'AUTO', 'Extended' } Extension - optional; [PTControlDirectionExtension] Extension - optional; [PTZConfigurationExtension2] Metadata - optional; [MetadataConfiguration] Optional configuration of the metadata stream. token - required; [ReferenceToken] Token that uniquely references this configuration. Length up to 64 characters. Name [Name] User readable name. Length up to 64 characters. UseCount [int] Number of internal references currently using this configuration. This informational parameter is read-only. Deprecated for Media2 Service. CompressionType [string] Optional parameter to configure compression type of Metadata payload. Use values from enumeration MetadataCompressionType. GeoLocation [boolean] Optional parameter to configure if the metadata stream shall contain the Geo Location coordinates of each target. ShapePolygon [boolean] Optional parameter to configure if the generated metadata stream should contain shape information as polygon. PTZStatus - optional; [PTZFilter] optional element to configure which PTZ related data is to include in the metadata stream Status [boolean] True if the metadata stream shall contain the PTZ status (IDLE, MOVING or UNKNOWN) Position [boolean] True if the metadata stream shall contain the PTZ position FieldOfView - optional; [boolean] True if the metadata stream shall contain the field-of-view information Events - optional; [EventSubscription] Optional element to configure the streaming of events. A client might be interested in receiving all, none or some of the events produced by the device: To get all events: Include the Events element but do not include a filter. To get no events: Do not include the Events element. To get only some events: Include the Events element and include a filter in the element. Filter - optional; [FilterType] SubscriptionPolicy - optional; Analytics - optional; [boolean] Defines whether the streamed metadata will include metadata from the analytics engines (video, cell motion, audio etc.) Multicast [MulticastConfiguration] Defines the multicast settings that could be used for video streaming. Address [IPAddress] The multicast address (if this address is set to 0 no multicast streaming is enaled) Type [IPType] Indicates if the address is an IPv4 or IPv6 address. - enum { 'IPv4', 'IPv6' } IPv4Address - optional; [IPv4Address] IPv4 address. IPv6Address - optional; [IPv6Address] IPv6 address Port [int] The RTP mutlicast destination port. A device may support RTCP. In this case the port value shall be even to allow the corresponding RTCP stream to be mapped to the next higher (odd) destination port number as defined in the RTSP specification. TTL [int] In case of IPv6 the TTL value is assumed as the hop limit. Note that for IPV6 and administratively scoped IPv4 multicast the primary use for hop limit / TTL is to prevent packets from (endlessly) circulating and not limiting scope. In these cases the address contains the scope. AutoStart [boolean] Read only property signalling that streaming is persistant. Use the methods StartMulticastStreaming and StopMulticastStreaming to switch its state. SessionTimeout [duration] The rtsp session timeout for the related audio stream (when using Media2 Service, this value is deprecated and ignored) AnalyticsEngineConfiguration - optional; [AnalyticsEngineConfiguration] Indication which AnalyticsModules shall output metadata. Note that the streaming behavior is undefined if the list includes items that are not part of the associated AnalyticsConfiguration. AnalyticsModule - optional, unbounded; [Config] Name - required; [string] Name of the configuration. Type - required; [QName] The Type attribute specifies the type of rule and shall be equal to value of one of Name attributes of ConfigDescription elements returned by GetSupportedRules and GetSupportedAnalyticsModules command. Parameters [ItemList] List of configuration parameters as defined in the corresponding description. SimpleItem - optional, unbounded; Value name pair as defined by the corresponding description. Name - required; [string] Item name. Value - required; [anySimpleType] Item value. The type is defined in the corresponding description. ElementItem - optional, unbounded; Complex value structure. Name - required; [string] Item name. Extension - optional; [ItemListExtension] Extension - optional; [AnalyticsEngineConfigurationExtension] Extension - optional; [MetadataConfigurationExtension] AudioOutput - optional; [AudioOutputConfiguration] Optional configuration of the Audio output. token - required; [ReferenceToken] Token that uniquely references this configuration. Length up to 64 characters. Name [Name] User readable name. Length up to 64 characters. UseCount [int] Number of internal references currently using this configuration. This informational parameter is read-only. Deprecated for Media2 Service. OutputToken [ReferenceToken] Token of the phsycial Audio output. SendPrimacy - optional; [anyURI] An audio channel MAY support different types of audio transmission. While for full duplex operation no special handling is required, in half duplex operation the transmission direction needs to be switched. The optional SendPrimacy parameter inside the AudioOutputConfiguration indicates which direction is currently active. An NVC can switch between different modes by setting the AudioOutputConfiguration. The following modes for the Send-Primacy are defined: www.onvif.org/ver20/HalfDuplex/Server The server is allowed to send audio data to the client. The client shall not send audio data via the backchannel to the NVT in this mode. www.onvif.org/ver20/HalfDuplex/Client The client is allowed to send audio data via the backchannel to the server. The NVT shall not send audio data to the client in this mode. www.onvif.org/ver20/HalfDuplex/Auto It is up to the device how to deal with sending and receiving audio data. Acoustic echo cancellation is out of ONVIF scope. OutputLevel [int] Volume setting of the output. The applicable range is defined via the option AudioOutputOptions.OutputLevelRange. AudioDecoder - optional; [AudioDecoderConfiguration] Optional configuration of the Audio decoder. token - required; [ReferenceToken] Token that uniquely references this configuration. Length up to 64 characters. Name [Name] User readable name. Length up to 64 characters. UseCount [int] Number of internal references currently using this configuration. This informational parameter is read-only. Deprecated for Media2 Service. Receiver - optional; [ReceiverConfiguration] Optional configuration of the Receiver. Mode [ReceiverMode] The following connection modes are defined: - AutoConnect: The receiver connects on demand, as required by consumers of the media streams. - AlwaysConnect: The receiver attempts to maintain a persistent connection to the configured endpoint. - NeverConnect: The receiver does not attempt to connect. - Unknown: This case should never happen. MediaUri [anyURI] Details of the URI to which the receiver should connect. StreamSetup [StreamSetup] Stream connection parameters. Stream [StreamType] Defines if a multicast or unicast stream is requested - enum { 'RTP-Unicast', 'RTP-Multicast' } Transport [Transport] Protocol [TransportProtocol] Defines the network protocol for streaming, either UDP=RTP/UDP, RTSP=RTP/RTSP/TCP or HTTP=RTP/RTSP/HTTP/TCP - enum { 'UDP', 'TCP', 'RTSP', 'HTTP' } Tunnel - optional; [Transport] Optional element to describe further tunnel options. This element is normally not needed Protocol [TransportProtocol] Defines the network protocol for streaming, either UDP=RTP/UDP, RTSP=RTP/RTSP/TCP or HTTP=RTP/RTSP/HTTP/TCP - enum { 'UDP', 'TCP', 'RTSP', 'HTTP' } Tunnel - optional; [Transport] Optional element to describe further tunnel options. This element is normally not needed ... is recursive token [ReferenceToken]
最新发布
09-19
我希望你能帮助我改进YOLOv12模型,一个识别水稻稻苗的模型,这个模型我是用来识别之后用来计数苗数的。现在我已经将其骨干网络换为了LSKNet结构,请你在这个基础上加入或者替换结构帮我进一步提升精度,记得后面加入的模块要适配LSKNet结构,下面是一些关于该结构的文件。 LSKNet.py import torch import torch.nn as nn from torch.nn.modules.utils import _pair as to_2tuple from timm.models.layers import DropPath, to_2tuple from functools import partial import warnings __all__ = ['LSKNET_T', 'LSKNET_S'] class Mlp(nn.Module): def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.): super().__init__() out_features = out_features or in_features hidden_features = hidden_features or in_features self.fc1 = nn.Conv2d(in_features, hidden_features, 1) self.dwconv = DWConv(hidden_features) self.act = act_layer() self.fc2 = nn.Conv2d(hidden_features, out_features, 1) self.drop = nn.Dropout(drop) def forward(self, x): x = self.fc1(x) x = self.dwconv(x) x = self.act(x) x = self.drop(x) x = self.fc2(x) x = self.drop(x) return x class LSKblock(nn.Module): def __init__(self, dim): super().__init__() self.conv0 = nn.Conv2d(dim, dim, 5, padding=2, groups=dim) self.conv_spatial = nn.Conv2d(dim, dim, 7, stride=1, padding=9, groups=dim, dilation=3) self.conv1 = nn.Conv2d(dim, dim // 2, 1) self.conv2 = nn.Conv2d(dim, dim // 2, 1) self.conv_squeeze = nn.Conv2d(2, 2, 7, padding=3) self.conv = nn.Conv2d(dim // 2, dim, 1) def forward(self, x): attn1 = self.conv0(x) attn2 = self.conv_spatial(attn1) attn1 = self.conv1(attn1) attn2 = self.conv2(attn2) attn = torch.cat([attn1, attn2], dim=1) avg_attn = torch.mean(attn, dim=1, keepdim=True) max_attn, _ = torch.max(attn, dim=1, keepdim=True) agg = torch.cat([avg_attn, max_attn], dim=1) sig = self.conv_squeeze(agg).sigmoid() attn = attn1 * sig[:, 0, :, :].unsqueeze(1) + attn2 * sig[:, 1, :, :].unsqueeze(1) attn = self.conv(attn) return x * attn class Attention(nn.Module): def __init__(self, d_model): super().__init__() self.proj_1 = nn.Conv2d(d_model, d_model, 1) self.activation = nn.GELU() self.spatial_gating_unit = LSKblock(d_model) self.proj_2 = nn.Conv2d(d_model, d_model, 1) def forward(self, x): shorcut = x.clone() x = self.proj_1(x) x = self.activation(x) x = self.spatial_gating_unit(x) x = self.proj_2(x) x = x + shorcut return x class Block(nn.Module): def __init__(self, dim, mlp_ratio=4., drop=0., drop_path=0., act_layer=nn.GELU, norm_cfg=None): super().__init__() if norm_cfg: self.norm1 = nn.BatchNorm2d(norm_cfg, dim) self.norm2 = nn.BatchNorm2d(norm_cfg, dim) else: self.norm1 = nn.BatchNorm2d(dim) self.norm2 = nn.BatchNorm2d(dim) self.attn = Attention(dim) self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity() mlp_hidden_dim = int(dim * mlp_ratio) self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop) layer_scale_init_value = 1e-2 self.layer_scale_1 = nn.Parameter( layer_scale_init_value * torch.ones((dim)), requires_grad=True) self.layer_scale_2 = nn.Parameter( layer_scale_init_value * torch.ones((dim)), requires_grad=True) def forward(self, x): x = x + self.drop_path(self.layer_scale_1.unsqueeze(-1).unsqueeze(-1) * self.attn(self.norm1(x))) x = x + self.drop_path(self.layer_scale_2.unsqueeze(-1).unsqueeze(-1) * self.mlp(self.norm2(x))) return x class OverlapPatchEmbed(nn.Module): """ Image to Patch Embedding """ def __init__(self, img_size=224, patch_size=7, stride=4, in_chans=3, embed_dim=768, norm_cfg=None): super().__init__() patch_size = to_2tuple(patch_size) self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=stride, padding=(patch_size[0] // 2, patch_size[1] // 2)) if norm_cfg: self.norm = nn.BatchNorm2d(norm_cfg, embed_dim) else: self.norm = nn.BatchNorm2d(embed_dim) def forward(self, x): x = self.proj(x) _, _, H, W = x.shape x = self.norm(x) return x, H, W class LSKNet(nn.Module): def __init__(self, img_size=224, in_chans=3, dim=None, embed_dims=[64, 128, 256, 512], mlp_ratios=[8, 8, 4, 4], drop_rate=0., drop_path_rate=0., norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[3, 4, 6, 3], num_stages=4, pretrained=None, init_cfg=None, norm_cfg=None): super().__init__() assert not (init_cfg and pretrained), \ 'init_cfg and pretrained cannot be set at the same time' if isinstance(pretrained, str): warnings.warn('DeprecationWarning: pretrained is deprecated, ' 'please use "init_cfg" instead') self.init_cfg = dict(type='Pretrained', checkpoint=pretrained) elif pretrained is not None: raise TypeError('pretrained must be a str or None') self.depths = depths self.num_stages = num_stages dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))] # stochastic depth decay rule cur = 0 for i in range(num_stages): patch_embed = OverlapPatchEmbed(img_size=img_size if i == 0 else img_size // (2 ** (i + 1)), patch_size=7 if i == 0 else 3, stride=4 if i == 0 else 2, in_chans=in_chans if i == 0 else embed_dims[i - 1], embed_dim=embed_dims[i], norm_cfg=norm_cfg) block = nn.ModuleList([Block( dim=embed_dims[i], mlp_ratio=mlp_ratios[i], drop=drop_rate, drop_path=dpr[cur + j], norm_cfg=norm_cfg) for j in range(depths[i])]) norm = norm_layer(embed_dims[i]) cur += depths[i] setattr(self, f"patch_embed{i + 1}", patch_embed) setattr(self, f"block{i + 1}", block) setattr(self, f"norm{i + 1}", norm) self.width_list = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))] def freeze_patch_emb(self): self.patch_embed1.requires_grad = False @torch.jit.ignore def no_weight_decay(self): return {'pos_embed1', 'pos_embed2', 'pos_embed3', 'pos_embed4', 'cls_token'} # has pos_embed may be better def get_classifier(self): return self.head def reset_classifier(self, num_classes, global_pool=''): self.num_classes = num_classes self.head = nn.Linear(self.embed_dim, num_classes) if num_classes > 0 else nn.Identity() def forward_features(self, x): B = x.shape[0] outs = [] for i in range(self.num_stages): patch_embed = getattr(self, f"patch_embed{i + 1}") block = getattr(self, f"block{i + 1}") norm = getattr(self, f"norm{i + 1}") x, H, W = patch_embed(x) for blk in block: x = blk(x) x = x.flatten(2).transpose(1, 2) x = norm(x) x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous() outs.append(x) return outs def forward(self, x): x = self.forward_features(x) # x = self.head(x) return x class DWConv(nn.Module): def __init__(self, dim=768): super(DWConv, self).__init__() self.dwconv = nn.Conv2d(dim, dim, 3, 1, 1, bias=True, groups=dim) def forward(self, x): x = self.dwconv(x) return x def _conv_filter(state_dict, patch_size=16): """ convert patch embedding weight from manual patchify + linear proj to conv""" out_dict = {} for k, v in state_dict.items(): if 'patch_embed.proj.weight' in k: v = v.reshape((v.shape[0], 3, patch_size, patch_size)) out_dict[k] = v return out_dict def LSKNET_T(): model = LSKNet(depths=[2, 2, 2, 2]) return model def LSKNET_S(): model = LSKNet() return model if __name__ == '__main__': model = LSKNet() inputs = torch.randn((1, 3, 640, 640)) for i in model(inputs): print(i.size()) task.py # Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license import contextlib import pickle import re import types from copy import deepcopy from pathlib import Path from .AddModules import * import thop import torch import torch.nn as nn from ultralytics.nn.modules import ( AIFI, C1, C2, C2PSA, C3, C3TR, ELAN1, OBB, PSA, SPP, SPPELAN, SPPF, AConv, ADown, Bottleneck, BottleneckCSP, C2f, C2fAttn, C2fCIB, C2fPSA, C3Ghost, C3k2, C3x, CBFuse, CBLinear, Classify, Concat, Conv, Conv2, ConvTranspose, Detect, DWConv, DWConvTranspose2d, Focus, GhostBottleneck, GhostConv, HGBlock, HGStem, ImagePoolingAttn, Index, Pose, RepC3, RepConv, RepNCSPELAN4, RepVGGDW, ResNetLayer, RTDETRDecoder, SCDown, Segment, TorchVision, WorldDetect, v10Detect, A2C2f, ) from ultralytics.utils import DEFAULT_CFG_DICT, DEFAULT_CFG_KEYS, LOGGER, colorstr, emojis, yaml_load from ultralytics.utils.checks import check_requirements, check_suffix, check_yaml from ultralytics.utils.loss import ( E2EDetectLoss, v8ClassificationLoss, v8DetectionLoss, v8OBBLoss, v8PoseLoss, v8SegmentationLoss, ) from ultralytics.utils.ops import make_divisible from ultralytics.utils.plotting import feature_visualization from ultralytics.utils.torch_utils import ( fuse_conv_and_bn, fuse_deconv_and_bn, initialize_weights, intersect_dicts, model_info, scale_img, time_sync, ) from .AddModules import * class BaseModel(nn.Module): """The BaseModel class serves as a base class for all the models in the Ultralytics YOLO family.""" def forward(self, x, *args, **kwargs): """ Perform forward pass of the model for either training or inference. If x is a dict, calculates and returns the loss for training. Otherwise, returns predictions for inference. Args: x (torch.Tensor | dict): Input tensor for inference, or dict with image tensor and labels for training. *args (Any): Variable length argument list. **kwargs (Any): Arbitrary keyword arguments. Returns: (torch.Tensor): Loss if x is a dict (training), or network predictions (inference). """ if isinstance(x, dict): # for cases of training and validating while training. return self.loss(x, *args, **kwargs) return self.predict(x, *args, **kwargs) def predict(self, x, profile=False, visualize=False, augment=False, embed=None): """ Perform a forward pass through the network. Args: x (torch.Tensor): The input tensor to the model. profile (bool): Print the computation time of each layer if True, defaults to False. visualize (bool): Save the feature maps of the model if True, defaults to False. augment (bool): Augment image during prediction, defaults to False. embed (list, optional): A list of feature vectors/embeddings to return. Returns: (torch.Tensor): The last output of the model. """ if augment: return self._predict_augment(x) return self._predict_once(x, profile, visualize, embed) def _predict_once(self, x, profile=False, visualize=False, embed=None): y, dt, embeddings = [], [], [] # outputs for m in self.model: if m.f != -1: # if not from previous layer x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers if profile: self._profile_one_layer(m, x, dt) if hasattr(m, 'backbone'): x = m(x) if len(x) != 5: # 0 - 5 x.insert(0, None) for index, i in enumerate(x): if index in self.save: y.append(i) else: y.append(None) x = x[-1] # 最后一个输出传给下一层 else: x = m(x) # run y.append(x if m.i in self.save else None) # save output if visualize: feature_visualization(x, m.type, m.i, save_dir=visualize) if embed and m.i in embed: embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1)) # flatten if m.i == max(embed): return torch.unbind(torch.cat(embeddings, 1), dim=0) return x def _predict_augment(self, x): """Perform augmentations on input image x and return augmented inference.""" LOGGER.warning( f"WARNING ⚠️ {self.__class__.__name__} does not support 'augment=True' prediction. " f"Reverting to single-scale prediction." ) return self._predict_once(x) def _profile_one_layer(self, m, x, dt): """ Profile the computation time and FLOPs of a single layer of the model on a given input. Appends the results to the provided list. Args: m (nn.Module): The layer to be profiled. x (torch.Tensor): The input data to the layer. dt (list): A list to store the computation time of the layer. Returns: None """ c = m == self.model[-1] and isinstance(x, list) # is final layer list, copy input as inplace fix flops = thop.profile(m, inputs=[x.copy() if c else x], verbose=False)[0] / 1e9 * 2 if thop else 0 # GFLOPs t = time_sync() for _ in range(10): m(x.copy() if c else x) dt.append((time_sync() - t) * 100) if m == self.model[0]: LOGGER.info(f"{'time (ms)':>10s} {'GFLOPs':>10s} {'params':>10s} module") LOGGER.info(f"{dt[-1]:10.2f} {flops:10.2f} {m.np:10.0f} {m.type}") if c: LOGGER.info(f"{sum(dt):10.2f} {'-':>10s} {'-':>10s} Total") def fuse(self, verbose=True): """ Fuse the `Conv2d()` and `BatchNorm2d()` layers of the model into a single layer, in order to improve the computation efficiency. Returns: (nn.Module): The fused model is returned. """ if not self.is_fused(): for m in self.model.modules(): if isinstance(m, (Conv, Conv2, DWConv)) and hasattr(m, "bn"): if isinstance(m, Conv2): m.fuse_convs() m.conv = fuse_conv_and_bn(m.conv, m.bn) # update conv delattr(m, "bn") # remove batchnorm m.forward = m.forward_fuse # update forward if isinstance(m, ConvTranspose) and hasattr(m, "bn"): m.conv_transpose = fuse_deconv_and_bn(m.conv_transpose, m.bn) delattr(m, "bn") # remove batchnorm m.forward = m.forward_fuse # update forward if isinstance(m, RepConv): m.fuse_convs() m.forward = m.forward_fuse # update forward if isinstance(m, RepVGGDW): m.fuse() m.forward = m.forward_fuse self.info(verbose=verbose) return self def is_fused(self, thresh=10): """ Check if the model has less than a certain threshold of BatchNorm layers. Args: thresh (int, optional): The threshold number of BatchNorm layers. Default is 10. Returns: (bool): True if the number of BatchNorm layers in the model is less than the threshold, False otherwise. """ bn = tuple(v for k, v in nn.__dict__.items() if "Norm" in k) # normalization layers, i.e. BatchNorm2d() return sum(isinstance(v, bn) for v in self.modules()) < thresh # True if < 'thresh' BatchNorm layers in model def info(self, detailed=False, verbose=True, imgsz=640): """ Prints model information. Args: detailed (bool): if True, prints out detailed information about the model. Defaults to False verbose (bool): if True, prints out the model information. Defaults to False imgsz (int): the size of the image that the model will be trained on. Defaults to 640 """ return model_info(self, detailed=detailed, verbose=verbose, imgsz=imgsz) def _apply(self, fn): """ Applies a function to all the tensors in the model that are not parameters or registered buffers. Args: fn (function): the function to apply to the model Returns: (BaseModel): An updated BaseModel object. """ self = super()._apply(fn) m = self.model[-1] # Detect() if isinstance(m, Detect): # includes all Detect subclasses like Segment, Pose, OBB, WorldDetect m.stride = fn(m.stride) m.anchors = fn(m.anchors) m.strides = fn(m.strides) return self def load(self, weights, verbose=True): """ Load the weights into the model. Args: weights (dict | torch.nn.Module): The pre-trained weights to be loaded. verbose (bool, optional): Whether to log the transfer progress. Defaults to True. """ model = weights["model"] if isinstance(weights, dict) else weights # torchvision models are not dicts csd = model.float().state_dict() # checkpoint state_dict as FP32 csd = intersect_dicts(csd, self.state_dict()) # intersect self.load_state_dict(csd, strict=False) # load if verbose: LOGGER.info(f"Transferred {len(csd)}/{len(self.model.state_dict())} items from pretrained weights") def loss(self, batch, preds=None): """ Compute loss. Args: batch (dict): Batch to compute loss on preds (torch.Tensor | List[torch.Tensor]): Predictions. """ if getattr(self, "criterion", None) is None: self.criterion = self.init_criterion() preds = self.forward(batch["img"]) if preds is None else preds return self.criterion(preds, batch) def init_criterion(self): """Initialize the loss criterion for the BaseModel.""" raise NotImplementedError("compute_loss() needs to be implemented by task heads") class DetectionModel(BaseModel): """YOLOv8 detection model.""" def __init__(self, cfg="yolov8n.yaml", ch=3, nc=None, verbose=True): # model, input channels, number of classes """Initialize the YOLOv8 detection model with the given config and parameters.""" super().__init__() self.yaml = cfg if isinstance(cfg, dict) else yaml_model_load(cfg) # cfg dict if self.yaml["backbone"][0][2] == "Silence": LOGGER.warning( "WARNING ⚠️ YOLOv9 `Silence` module is deprecated in favor of nn.Identity. " "Please delete local *.pt file and re-download the latest model checkpoint." ) self.yaml["backbone"][0][2] = "nn.Identity" # Define model ch = self.yaml["ch"] = self.yaml.get("ch", ch) # input channels if nc and nc != self.yaml["nc"]: LOGGER.info(f"Overriding model.yaml nc={self.yaml['nc']} with nc={nc}") self.yaml["nc"] = nc # override YAML value self.model, self.save = parse_model(deepcopy(self.yaml), ch=ch, verbose=verbose) # model, savelist self.names = {i: f"{i}" for i in range(self.yaml["nc"])} # default names dict self.inplace = self.yaml.get("inplace", True) self.end2end = getattr(self.model[-1], "end2end", False) # Build strides m = self.model[-1] # Detect() if isinstance(m, Detect): # includes all Detect subclasses like Segment, Pose, OBB, WorldDetect s = 256 # 2x min stride m.inplace = self.inplace def _forward(x): """Performs a forward pass through the model, handling different Detect subclass types accordingly.""" if self.end2end: return self.forward(x)["one2many"] return self.forward(x)[0] if isinstance(m, (Segment, Pose, OBB)) else self.forward(x) m.stride = torch.tensor([s / x.shape[-2] for x in _forward(torch.zeros(1, ch, s, s))]) # forward self.stride = m.stride m.bias_init() # only run once else: self.stride = torch.Tensor([32]) # default stride for i.e. RTDETR # Init weights, biases initialize_weights(self) if verbose: self.info() LOGGER.info("") def _predict_augment(self, x): """Perform augmentations on input image x and return augmented inference and train outputs.""" if getattr(self, "end2end", False) or self.__class__.__name__ != "DetectionModel": LOGGER.warning("WARNING ⚠️ Model does not support 'augment=True', reverting to single-scale prediction.") return self._predict_once(x) img_size = x.shape[-2:] # height, width s = [1, 0.83, 0.67] # scales f = [None, 3, None] # flips (2-ud, 3-lr) y = [] # outputs for si, fi in zip(s, f): xi = scale_img(x.flip(fi) if fi else x, si, gs=int(self.stride.max())) yi = super().predict(xi)[0] # forward yi = self._descale_pred(yi, fi, si, img_size) y.append(yi) y = self._clip_augmented(y) # clip augmented tails return torch.cat(y, -1), None # augmented inference, train @staticmethod def _descale_pred(p, flips, scale, img_size, dim=1): """De-scale predictions following augmented inference (inverse operation).""" p[:, :4] /= scale # de-scale x, y, wh, cls = p.split((1, 1, 2, p.shape[dim] - 4), dim) if flips == 2: y = img_size[0] - y # de-flip ud elif flips == 3: x = img_size[1] - x # de-flip lr return torch.cat((x, y, wh, cls), dim) def _clip_augmented(self, y): """Clip YOLO augmented inference tails.""" nl = self.model[-1].nl # number of detection layers (P3-P5) g = sum(4**x for x in range(nl)) # grid points e = 1 # exclude layer count i = (y[0].shape[-1] // g) * sum(4**x for x in range(e)) # indices y[0] = y[0][..., :-i] # large i = (y[-1].shape[-1] // g) * sum(4 ** (nl - 1 - x) for x in range(e)) # indices y[-1] = y[-1][..., i:] # small return y def init_criterion(self): """Initialize the loss criterion for the DetectionModel.""" return E2EDetectLoss(self) if getattr(self, "end2end", False) else v8DetectionLoss(self) class OBBModel(DetectionModel): """YOLOv8 Oriented Bounding Box (OBB) model.""" def __init__(self, cfg="yolov8n-obb.yaml", ch=3, nc=None, verbose=True): """Initialize YOLOv8 OBB model with given config and parameters.""" super().__init__(cfg=cfg, ch=ch, nc=nc, verbose=verbose) def init_criterion(self): """Initialize the loss criterion for the model.""" return v8OBBLoss(self) class SegmentationModel(DetectionModel): """YOLOv8 segmentation model.""" def __init__(self, cfg="yolov8n-seg.yaml", ch=3, nc=None, verbose=True): """Initialize YOLOv8 segmentation model with given config and parameters.""" super().__init__(cfg=cfg, ch=ch, nc=nc, verbose=verbose) def init_criterion(self): """Initialize the loss criterion for the SegmentationModel.""" return v8SegmentationLoss(self) class PoseModel(DetectionModel): """YOLOv8 pose model.""" def __init__(self, cfg="yolov8n-pose.yaml", ch=3, nc=None, data_kpt_shape=(None, None), verbose=True): """Initialize YOLOv8 Pose model.""" if not isinstance(cfg, dict): cfg = yaml_model_load(cfg) # load model YAML if any(data_kpt_shape) and list(data_kpt_shape) != list(cfg["kpt_shape"]): LOGGER.info(f"Overriding model.yaml kpt_shape={cfg['kpt_shape']} with kpt_shape={data_kpt_shape}") cfg["kpt_shape"] = data_kpt_shape super().__init__(cfg=cfg, ch=ch, nc=nc, verbose=verbose) def init_criterion(self): """Initialize the loss criterion for the PoseModel.""" return v8PoseLoss(self) class ClassificationModel(BaseModel): """YOLOv8 classification model.""" def __init__(self, cfg="yolov8n-cls.yaml", ch=3, nc=None, verbose=True): """Init ClassificationModel with YAML, channels, number of classes, verbose flag.""" super().__init__() self._from_yaml(cfg, ch, nc, verbose) def _from_yaml(self, cfg, ch, nc, verbose): """Set YOLOv8 model configurations and define the model architecture.""" self.yaml = cfg if isinstance(cfg, dict) else yaml_model_load(cfg) # cfg dict # Define model ch = self.yaml["ch"] = self.yaml.get("ch", ch) # input channels if nc and nc != self.yaml["nc"]: LOGGER.info(f"Overriding model.yaml nc={self.yaml['nc']} with nc={nc}") self.yaml["nc"] = nc # override YAML value elif not nc and not self.yaml.get("nc", None): raise ValueError("nc not specified. Must specify nc in model.yaml or function arguments.") self.model, self.save = parse_model(deepcopy(self.yaml), ch=ch, verbose=verbose) # model, savelist self.stride = torch.Tensor([1]) # no stride constraints self.names = {i: f"{i}" for i in range(self.yaml["nc"])} # default names dict self.info() @staticmethod def reshape_outputs(model, nc): """Update a TorchVision classification model to class count 'n' if required.""" name, m = list((model.model if hasattr(model, "model") else model).named_children())[-1] # last module if isinstance(m, Classify): # YOLO Classify() head if m.linear.out_features != nc: m.linear = nn.Linear(m.linear.in_features, nc) elif isinstance(m, nn.Linear): # ResNet, EfficientNet if m.out_features != nc: setattr(model, name, nn.Linear(m.in_features, nc)) elif isinstance(m, nn.Sequential): types = [type(x) for x in m] if nn.Linear in types: i = len(types) - 1 - types[::-1].index(nn.Linear) # last nn.Linear index if m[i].out_features != nc: m[i] = nn.Linear(m[i].in_features, nc) elif nn.Conv2d in types: i = len(types) - 1 - types[::-1].index(nn.Conv2d) # last nn.Conv2d index if m[i].out_channels != nc: m[i] = nn.Conv2d(m[i].in_channels, nc, m[i].kernel_size, m[i].stride, bias=m[i].bias is not None) def init_criterion(self): """Initialize the loss criterion for the ClassificationModel.""" return v8ClassificationLoss() class RTDETRDetectionModel(DetectionModel): """ RTDETR (Real-time DEtection and Tracking using Transformers) Detection Model class. This class is responsible for constructing the RTDETR architecture, defining loss functions, and facilitating both the training and inference processes. RTDETR is an object detection and tracking model that extends from the DetectionModel base class. Attributes: cfg (str): The configuration file path or preset string. Default is 'rtdetr-l.yaml'. ch (int): Number of input channels. Default is 3 (RGB). nc (int, optional): Number of classes for object detection. Default is None. verbose (bool): Specifies if summary statistics are shown during initialization. Default is True. Methods: init_criterion: Initializes the criterion used for loss calculation. loss: Computes and returns the loss during training. predict: Performs a forward pass through the network and returns the output. """ def __init__(self, cfg="rtdetr-l.yaml", ch=3, nc=None, verbose=True): """ Initialize the RTDETRDetectionModel. Args: cfg (str): Configuration file name or path. ch (int): Number of input channels. nc (int, optional): Number of classes. Defaults to None. verbose (bool, optional): Print additional information during initialization. Defaults to True. """ super().__init__(cfg=cfg, ch=ch, nc=nc, verbose=verbose) def init_criterion(self): """Initialize the loss criterion for the RTDETRDetectionModel.""" from ultralytics.models.utils.loss import RTDETRDetectionLoss return RTDETRDetectionLoss(nc=self.nc, use_vfl=True) def loss(self, batch, preds=None): """ Compute the loss for the given batch of data. Args: batch (dict): Dictionary containing image and label data. preds (torch.Tensor, optional): Precomputed model predictions. Defaults to None. Returns: (tuple): A tuple containing the total loss and main three losses in a tensor. """ if not hasattr(self, "criterion"): self.criterion = self.init_criterion() img = batch["img"] # NOTE: preprocess gt_bbox and gt_labels to list. bs = len(img) batch_idx = batch["batch_idx"] gt_groups = [(batch_idx == i).sum().item() for i in range(bs)] targets = { "cls": batch["cls"].to(img.device, dtype=torch.long).view(-1), "bboxes": batch["bboxes"].to(device=img.device), "batch_idx": batch_idx.to(img.device, dtype=torch.long).view(-1), "gt_groups": gt_groups, } preds = self.predict(img, batch=targets) if preds is None else preds dec_bboxes, dec_scores, enc_bboxes, enc_scores, dn_meta = preds if self.training else preds[1] if dn_meta is None: dn_bboxes, dn_scores = None, None else: dn_bboxes, dec_bboxes = torch.split(dec_bboxes, dn_meta["dn_num_split"], dim=2) dn_scores, dec_scores = torch.split(dec_scores, dn_meta["dn_num_split"], dim=2) dec_bboxes = torch.cat([enc_bboxes.unsqueeze(0), dec_bboxes]) # (7, bs, 300, 4) dec_scores = torch.cat([enc_scores.unsqueeze(0), dec_scores]) loss = self.criterion( (dec_bboxes, dec_scores), targets, dn_bboxes=dn_bboxes, dn_scores=dn_scores, dn_meta=dn_meta ) # NOTE: There are like 12 losses in RTDETR, backward with all losses but only show the main three losses. return sum(loss.values()), torch.as_tensor( [loss[k].detach() for k in ["loss_giou", "loss_class", "loss_bbox"]], device=img.device ) def predict(self, x, profile=False, visualize=False, batch=None, augment=False, embed=None): """ Perform a forward pass through the model. Args: x (torch.Tensor): The input tensor. profile (bool, optional): If True, profile the computation time for each layer. Defaults to False. visualize (bool, optional): If True, save feature maps for visualization. Defaults to False. batch (dict, optional): Ground truth data for evaluation. Defaults to None. augment (bool, optional): If True, perform data augmentation during inference. Defaults to False. embed (list, optional): A list of feature vectors/embeddings to return. Returns: (torch.Tensor): Model's output tensor. """ y, dt, embeddings = [], [], [] # outputs for m in self.model[:-1]: # except the head part if m.f != -1: # if not from previous layer x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers if profile: self._profile_one_layer(m, x, dt) x = m(x) # run y.append(x if m.i in self.save else None) # save output if visualize: feature_visualization(x, m.type, m.i, save_dir=visualize) if embed and m.i in embed: embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1)) # flatten if m.i == max(embed): return torch.unbind(torch.cat(embeddings, 1), dim=0) head = self.model[-1] x = head([y[j] for j in head.f], batch) # head inference return x class WorldModel(DetectionModel): """YOLOv8 World Model.""" def __init__(self, cfg="yolov8s-world.yaml", ch=3, nc=None, verbose=True): """Initialize YOLOv8 world model with given config and parameters.""" self.txt_feats = torch.randn(1, nc or 80, 512) # features placeholder self.clip_model = None # CLIP model placeholder super().__init__(cfg=cfg, ch=ch, nc=nc, verbose=verbose) def set_classes(self, text, batch=80, cache_clip_model=True): """Set classes in advance so that model could do offline-inference without clip model.""" try: import clip except ImportError: check_requirements("git+https://github.com/ultralytics/CLIP.git") import clip if ( not getattr(self, "clip_model", None) and cache_clip_model ): # for backwards compatibility of models lacking clip_model attribute self.clip_model = clip.load("ViT-B/32")[0] model = self.clip_model if cache_clip_model else clip.load("ViT-B/32")[0] device = next(model.parameters()).device text_token = clip.tokenize(text).to(device) txt_feats = [model.encode_text(token).detach() for token in text_token.split(batch)] txt_feats = txt_feats[0] if len(txt_feats) == 1 else torch.cat(txt_feats, dim=0) txt_feats = txt_feats / txt_feats.norm(p=2, dim=-1, keepdim=True) self.txt_feats = txt_feats.reshape(-1, len(text), txt_feats.shape[-1]) self.model[-1].nc = len(text) def predict(self, x, profile=False, visualize=False, txt_feats=None, augment=False, embed=None): """ Perform a forward pass through the model. Args: x (torch.Tensor): The input tensor. profile (bool, optional): If True, profile the computation time for each layer. Defaults to False. visualize (bool, optional): If True, save feature maps for visualization. Defaults to False. txt_feats (torch.Tensor): The text features, use it if it's given. Defaults to None. augment (bool, optional): If True, perform data augmentation during inference. Defaults to False. embed (list, optional): A list of feature vectors/embeddings to return. Returns: (torch.Tensor): Model's output tensor. """ txt_feats = (self.txt_feats if txt_feats is None else txt_feats).to(device=x.device, dtype=x.dtype) if len(txt_feats) != len(x): txt_feats = txt_feats.repeat(len(x), 1, 1) ori_txt_feats = txt_feats.clone() y, dt, embeddings = [], [], [] # outputs for m in self.model: # except the head part if m.f != -1: # if not from previous layer x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers if profile: self._profile_one_layer(m, x, dt) if isinstance(m, C2fAttn): x = m(x, txt_feats) elif isinstance(m, WorldDetect): x = m(x, ori_txt_feats) elif isinstance(m, ImagePoolingAttn): txt_feats = m(x, txt_feats) else: x = m(x) # run y.append(x if m.i in self.save else None) # save output if visualize: feature_visualization(x, m.type, m.i, save_dir=visualize) if embed and m.i in embed: embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1)) # flatten if m.i == max(embed): return torch.unbind(torch.cat(embeddings, 1), dim=0) return x def loss(self, batch, preds=None): """ Compute loss. Args: batch (dict): Batch to compute loss on. preds (torch.Tensor | List[torch.Tensor]): Predictions. """ if not hasattr(self, "criterion"): self.criterion = self.init_criterion() if preds is None: preds = self.forward(batch["img"], txt_feats=batch["txt_feats"]) return self.criterion(preds, batch) class Ensemble(nn.ModuleList): """Ensemble of models.""" def __init__(self): """Initialize an ensemble of models.""" super().__init__() def forward(self, x, augment=False, profile=False, visualize=False): """Function generates the YOLO network's final layer.""" y = [module(x, augment, profile, visualize)[0] for module in self] # y = torch.stack(y).max(0)[0] # max ensemble # y = torch.stack(y).mean(0) # mean ensemble y = torch.cat(y, 2) # nms ensemble, y shape(B, HW, C) return y, None # inference, train output # Functions ------------------------------------------------------------------------------------------------------------ @contextlib.contextmanager def temporary_modules(modules=None, attributes=None): """ Context manager for temporarily adding or modifying modules in Python's module cache (`sys.modules`). This function can be used to change the module paths during runtime. It's useful when refactoring code, where you've moved a module from one location to another, but you still want to support the old import paths for backwards compatibility. Args: modules (dict, optional): A dictionary mapping old module paths to new module paths. attributes (dict, optional): A dictionary mapping old module attributes to new module attributes. Example: ```python with temporary_modules({"old.module": "new.module"}, {"old.module.attribute": "new.module.attribute"}): import old.module # this will now import new.module from old.module import attribute # this will now import new.module.attribute ``` Note: The changes are only in effect inside the context manager and are undone once the context manager exits. Be aware that directly manipulating `sys.modules` can lead to unpredictable results, especially in larger applications or libraries. Use this function with caution. """ if modules is None: modules = {} if attributes is None: attributes = {} import sys from importlib import import_module try: # Set attributes in sys.modules under their old name for old, new in attributes.items(): old_module, old_attr = old.rsplit(".", 1) new_module, new_attr = new.rsplit(".", 1) setattr(import_module(old_module), old_attr, getattr(import_module(new_module), new_attr)) # Set modules in sys.modules under their old name for old, new in modules.items(): sys.modules[old] = import_module(new) yield finally: # Remove the temporary module paths for old in modules: if old in sys.modules: del sys.modules[old] class SafeClass: """A placeholder class to replace unknown classes during unpickling.""" def __init__(self, *args, **kwargs): """Initialize SafeClass instance, ignoring all arguments.""" pass def __call__(self, *args, **kwargs): """Run SafeClass instance, ignoring all arguments.""" pass class SafeUnpickler(pickle.Unpickler): """Custom Unpickler that replaces unknown classes with SafeClass.""" def find_class(self, module, name): """Attempt to find a class, returning SafeClass if not among safe modules.""" safe_modules = ( "torch", "collections", "collections.abc", "builtins", "math", "numpy", # Add other modules considered safe ) if module in safe_modules: return super().find_class(module, name) else: return SafeClass def torch_safe_load(weight, safe_only=False): """ Attempts to load a PyTorch model with the torch.load() function. If a ModuleNotFoundError is raised, it catches the error, logs a warning message, and attempts to install the missing module via the check_requirements() function. After installation, the function again attempts to load the model using torch.load(). Args: weight (str): The file path of the PyTorch model. safe_only (bool): If True, replace unknown classes with SafeClass during loading. Example: ```python from ultralytics.nn.tasks import torch_safe_load ckpt, file = torch_safe_load("path/to/best.pt", safe_only=True) ``` Returns: ckpt (dict): The loaded model checkpoint. file (str): The loaded filename """ from ultralytics.utils.downloads import attempt_download_asset check_suffix(file=weight, suffix=".pt") file = attempt_download_asset(weight) # search online if missing locally try: with temporary_modules( modules={ "ultralytics.yolo.utils": "ultralytics.utils", "ultralytics.yolo.v8": "ultralytics.models.yolo", "ultralytics.yolo.data": "ultralytics.data", }, attributes={ "ultralytics.nn.modules.block.Silence": "torch.nn.Identity", # YOLOv9e "ultralytics.nn.tasks.YOLOv10DetectionModel": "ultralytics.nn.tasks.DetectionModel", # YOLOv10 "ultralytics.utils.loss.v10DetectLoss": "ultralytics.utils.loss.E2EDetectLoss", # YOLOv10 }, ): if safe_only: # Load via custom pickle module safe_pickle = types.ModuleType("safe_pickle") safe_pickle.Unpickler = SafeUnpickler safe_pickle.load = lambda file_obj: SafeUnpickler(file_obj).load() with open(file, "rb") as f: ckpt = torch.load(f, pickle_module=safe_pickle) else: ckpt = torch.load(file, map_location="cpu") except ModuleNotFoundError as e: # e.name is missing module name if e.name == "models": raise TypeError( emojis( f"ERROR ❌️ {weight} appears to be an Ultralytics YOLOv5 model originally trained " f"with https://github.com/ultralytics/yolov5.\nThis model is NOT forwards compatible with " f"YOLOv8 at https://github.com/ultralytics/ultralytics." f"\nRecommend fixes are to train a new model using the latest 'ultralytics' package or to " f"run a command with an official Ultralytics model, i.e. 'yolo predict model=yolov8n.pt'" ) ) from e LOGGER.warning( f"WARNING ⚠️ {weight} appears to require '{e.name}', which is not in Ultralytics requirements." f"\nAutoInstall will run now for '{e.name}' but this feature will be removed in the future." f"\nRecommend fixes are to train a new model using the latest 'ultralytics' package or to " f"run a command with an official Ultralytics model, i.e. 'yolo predict model=yolov8n.pt'" ) check_requirements(e.name) # install missing module ckpt = torch.load(file, map_location="cpu") if not isinstance(ckpt, dict): # File is likely a YOLO instance saved with i.e. torch.save(model, "saved_model.pt") LOGGER.warning( f"WARNING ⚠️ The file '{weight}' appears to be improperly saved or formatted. " f"For optimal results, use model.save('filename.pt') to correctly save YOLO models." ) ckpt = {"model": ckpt.model} return ckpt, file def attempt_load_weights(weights, device=None, inplace=True, fuse=False): """Loads an ensemble of models weights=[a,b,c] or a single model weights=[a] or weights=a.""" ensemble = Ensemble() for w in weights if isinstance(weights, list) else [weights]: ckpt, w = torch_safe_load(w) # load ckpt args = {**DEFAULT_CFG_DICT, **ckpt["train_args"]} if "train_args" in ckpt else None # combined args model = (ckpt.get("ema") or ckpt["model"]).to(device).float() # FP32 model # Model compatibility updates model.args = args # attach args to model model.pt_path = w # attach *.pt file path to model model.task = guess_model_task(model) if not hasattr(model, "stride"): model.stride = torch.tensor([32.0]) # Append ensemble.append(model.fuse().eval() if fuse and hasattr(model, "fuse") else model.eval()) # model in eval mode # Module updates for m in ensemble.modules(): if hasattr(m, "inplace"): m.inplace = inplace elif isinstance(m, nn.Upsample) and not hasattr(m, "recompute_scale_factor"): m.recompute_scale_factor = None # torch 1.11.0 compatibility # Return model if len(ensemble) == 1: return ensemble[-1] # Return ensemble LOGGER.info(f"Ensemble created with {weights}\n") for k in "names", "nc", "yaml": setattr(ensemble, k, getattr(ensemble[0], k)) ensemble.stride = ensemble[int(torch.argmax(torch.tensor([m.stride.max() for m in ensemble])))].stride assert all(ensemble[0].nc == m.nc for m in ensemble), f"Models differ in class counts {[m.nc for m in ensemble]}" return ensemble def attempt_load_one_weight(weight, device=None, inplace=True, fuse=False): """Loads a single model weights.""" ckpt, weight = torch_safe_load(weight) # load ckpt args = {**DEFAULT_CFG_DICT, **(ckpt.get("train_args", {}))} # combine model and default args, preferring model args model = (ckpt.get("ema") or ckpt["model"]).to(device).float() # FP32 model # Model compatibility updates model.args = {k: v for k, v in args.items() if k in DEFAULT_CFG_KEYS} # attach args to model model.pt_path = weight # attach *.pt file path to model model.task = guess_model_task(model) if not hasattr(model, "stride"): model.stride = torch.tensor([32.0]) model = model.fuse().eval() if fuse and hasattr(model, "fuse") else model.eval() # model in eval mode # Module updates for m in model.modules(): if hasattr(m, "inplace"): m.inplace = inplace elif isinstance(m, nn.Upsample) and not hasattr(m, "recompute_scale_factor"): m.recompute_scale_factor = None # torch 1.11.0 compatibility # Return model and ckpt return model, ckpt def parse_model(d, ch, verbose=True): # model_dict, input_channels(3) """Parse a YOLO model.yaml dictionary into a PyTorch model.""" import ast # Args legacy = True # backward compatibility for v3/v5/v8/v9 models max_channels = float("inf") nc, act, scales = (d.get(x) for x in ("nc", "activation", "scales")) depth, width, kpt_shape = (d.get(x, 1.0) for x in ("depth_multiple", "width_multiple", "kpt_shape")) if scales: scale = d.get("scale") if not scale: scale = tuple(scales.keys())[0] LOGGER.warning(f"WARNING ⚠️ no model scale passed. Assuming scale='{scale}'.") depth, width, max_channels = scales[scale] if act: Conv.default_act = eval(act) # redefine default activation, i.e. Conv.default_act = nn.SiLU() if verbose: LOGGER.info(f"{colorstr('activation:')} {act}") # print if verbose: LOGGER.info(f"\n{'':>3}{'from':>20}{'n':>3}{'params':>10} {'module':<45}{'arguments':<30}") ch = [ch] layers, save, c2 = [], [], ch[-1] # layers, savelist, ch out backbone = False for i, (f, n, m, args) in enumerate(d["backbone"] + d["head"]): # from, number, module, args t=m m = getattr(torch.nn, m[3:]) if "nn." in m else globals()[m] # get module for j, a in enumerate(args): if isinstance(a, str): with contextlib.suppress(ValueError): args[j] = locals()[a] if a in locals() else ast.literal_eval(a) n = n_ = max(round(n * depth), 1) if n > 1 else n # depth gain if m in { Classify, Conv, ConvTranspose, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, C2fPSA, C2PSA, DWConv, Focus, BottleneckCSP, C1, C2, C2f, C3k2, RepNCSPELAN4, ELAN1, ADown, AConv, SPPELAN, C2fAttn, C3, C3TR, C3Ghost, nn.ConvTranspose2d, DWConvTranspose2d, C3x, RepC3, PSA, SCDown, C2fCIB, A2C2f, }: c1, c2 = ch[f], args[0] if c2 != nc: # if c2 not equal to number of classes (i.e. for Classify() output) c2 = make_divisible(min(c2, max_channels) * width, 8) if m is C2fAttn: args[1] = make_divisible(min(args[1], max_channels // 2) * width, 8) # embed channels args[2] = int( max(round(min(args[2], max_channels // 2 // 32)) * width, 1) if args[2] > 1 else args[2] ) # num heads args = [c1, c2, *args[1:]] if m in { BottleneckCSP, C1, C2, C2f, C3k2, C2fAttn, C3, C3TR, C3Ghost, C3x, RepC3, C2fPSA, C2fCIB, C2PSA, A2C2f, }: args.insert(2, n) # number of repeats n = 1 if m is C3k2: # for M/L/X sizes legacy = False if scale in "mlx": args[3] = True if m is A2C2f: legacy = False if scale in "lx": # for L/X sizes args.append(True) args.append(1.2) elif m is AIFI: args = [ch[f], *args] elif m in {HGStem, HGBlock}: c1, cm, c2 = ch[f], args[0], args[1] args = [c1, cm, c2, *args[2:]] if m is HGBlock: args.insert(4, n) # number of repeats n = 1 elif m is ResNetLayer: c2 = args[1] if args[3] else args[1] * 4 elif m is nn.BatchNorm2d: args = [ch[f]] #LSKNet elif m in {LSKNET_T,LSKNET_S}: m = m(*args) c2 = m.width_list backbone =True elif m is Concat: c2 = sum(ch[x] for x in f) elif m in {Detect, WorldDetect, Segment, Pose, OBB, ImagePoolingAttn, v10Detect}: args.append([ch[x] for x in f]) if m is Segment: args[2] = make_divisible(min(args[2], max_channels) * width, 8) if m in {Detect, Segment, Pose, OBB}: m.legacy = legacy elif m is RTDETRDecoder: # special case, channels arg must be passed in index 1 args.insert(1, [ch[x] for x in f]) elif m in {CBLinear, TorchVision, Index}: c2 = args[0] c1 = ch[f] args = [c1, c2, *args[1:]] elif m is CBFuse: c2 = ch[f[-1]] else: c2 = ch[f] # m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args) # module # t = str(m)[8:-2].replace("__main__.", "") # module type # m_.np = sum(x.numel() for x in m_.parameters()) # number params # m_.i, m_.f, m_.type = i, f, t # attach index, 'from' index, type # if verbose: # LOGGER.info(f"{i:>3}{str(f):>20}{n_:>3}{m_.np:10.0f} {t:<45}{str(args):<30}") # print # save.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist # layers.append(m_) # if i == 0: # ch = [] # ch.append(c2) if isinstance(c2, list): backbone = True m_ = m m_.backbone = True else: m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args) # module t = str(m)[8:-2].replace('__main__.', '') # module type m.np = sum(x.numel() for x in m_.parameters()) # number params m_.i, m_.f, m_.type = i + 4 if backbone else i, f, t # attach index, 'from' index, type if verbose: LOGGER.info(f'{i:>3}{str(f):>20}{n_:>3}{m.np:10.0f} {t:<45}{str(args):<30}') # print save.extend(x % (i + 4 if backbone else i) for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist layers.append(m_) if i == 0: ch = [] if isinstance(c2, list): ch.extend(c2) for _ in range(5 - len(ch)): ch.insert(0, 0) else: ch.append(c2) return nn.Sequential(*layers), sorted(save) def yaml_model_load(path): """Load a YOLOv8 model from a YAML file.""" path = Path(path) if path.stem in (f"yolov{d}{x}6" for x in "nsmlx" for d in (5, 8)): new_stem = re.sub(r"(\d+)([nslmx])6(.+)?$", r"\1\2-p6\3", path.stem) LOGGER.warning(f"WARNING ⚠️ Ultralytics YOLO P6 models now use -p6 suffix. Renaming {path.stem} to {new_stem}.") path = path.with_name(new_stem + path.suffix) unified_path = re.sub(r"(\d+)([nslmx])(.+)?$", r"\1\3", str(path)) # i.e. yolov8x.yaml -> yolov8.yaml yaml_file = check_yaml(unified_path, hard=False) or check_yaml(path) d = yaml_load(yaml_file) # model dict d["scale"] = guess_model_scale(path) d["yaml_file"] = str(path) return d def guess_model_scale(model_path): """ Takes a path to a YOLO model's YAML file as input and extracts the size character of the model's scale. The function uses regular expression matching to find the pattern of the model scale in the YAML file name, which is denoted by n, s, m, l, or x. The function returns the size character of the model scale as a string. Args: model_path (str | Path): The path to the YOLO model's YAML file. Returns: (str): The size character of the model's scale, which can be n, s, m, l, or x. """ try: return re.search(r"yolo[v]?\d+([nslmx])", Path(model_path).stem).group(1) # noqa, returns n, s, m, l, or x except AttributeError: return "" def guess_model_task(model): """ Guess the task of a PyTorch model from its architecture or configuration. Args: model (nn.Module | dict): PyTorch model or model configuration in YAML format. Returns: (str): Task of the model ('detect', 'segment', 'classify', 'pose'). Raises: SyntaxError: If the task of the model could not be determined. """ def cfg2task(cfg): """Guess from YAML dictionary.""" m = cfg["head"][-1][-2].lower() # output module name if m in {"classify", "classifier", "cls", "fc"}: return "classify" if "detect" in m: return "detect" if m == "segment": return "segment" if m == "pose": return "pose" if m == "obb": return "obb" # Guess from model cfg if isinstance(model, dict): with contextlib.suppress(Exception): return cfg2task(model) # Guess from PyTorch model if isinstance(model, nn.Module): # PyTorch model for x in "model.args", "model.model.args", "model.model.model.args": with contextlib.suppress(Exception): return eval(x)["task"] for x in "model.yaml", "model.model.yaml", "model.model.model.yaml": with contextlib.suppress(Exception): return cfg2task(eval(x)) for m in model.modules(): if isinstance(m, Segment): return "segment" elif isinstance(m, Classify): return "classify" elif isinstance(m, Pose): return "pose" elif isinstance(m, OBB): return "obb" elif isinstance(m, (Detect, WorldDetect, v10Detect)): return "detect" # Guess from model filename if isinstance(model, (str, Path)): model = Path(model) if "-seg" in model.stem or "segment" in model.parts: return "segment" elif "-cls" in model.stem or "classify" in model.parts: return "classify" elif "-pose" in model.stem or "pose" in model.parts: return "pose" elif "-obb" in model.stem or "obb" in model.parts: return "obb" elif "detect" in model.parts: return "detect" # Unable to determine task from model LOGGER.warning( "WARNING ⚠️ Unable to automatically guess model task, assuming 'task=detect'. " "Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify','pose' or 'obb'." ) return "detect" # assume detect __init__.py from .LSKNet import * yolov12-LSKNet.yaml # YOLOv12 🚀, AGPL-3.0 license # YOLOv12 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect # Parameters nc: 1 # number of classes scales: # model compound scaling constants, i.e. 'model=yolov12n.yaml' will call yolov12.yaml with scale 'n' # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 465 layers, 2,603,056 parameters, 2,603,040 gradients, 6.7 GFLOPs s: [0.50, 0.50, 1024] # summary: 465 layers, 9,285,632 parameters, 9,285,616 gradients, 21.7 GFLOPs m: [0.50, 1.00, 512] # summary: 501 layers, 20,201,216 parameters, 20,201,200 gradients, 68.1 GFLOPs l: [1.00, 1.00, 512] # summary: 831 layers, 26,454,880 parameters, 26,454,864 gradients, 89.7 GFLOPs x: [1.00, 1.50, 512] # summary: 831 layers, 59,216,928 parameters, 59,216,912 gradients, 200.3 GFLOPs # YOLO12n backbone backbone: # [from, repeats, module, args] - [-1, 1, LSKNET_T, []] # YOLO12n head head: - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 3], 1, Concat, [1]] # cat backbone P4 - [-1, 2, A2C2f, [512, False, -1]] # 8 - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 2], 1, Concat, [1]] # cat backbone P3 - [-1, 2, A2C2f, [256, False, -1]] # 11 - [-1, 1, Conv, [256, 3, 2]] - [[-1, 7], 1, Concat, [1]] # cat head P4 - [-1, 2, A2C2f, [512, False, -1]] # 14 - [-1, 1, Conv, [512, 3, 2]] - [[-1, 4], 1, Concat, [1]] # cat head P5 - [-1, 2, C3k2, [1024, True]] # 17 (P5/32-large) - [[10, 13, 16], 1, Detect, [nc]] # Detect(P3, P4, P5) 下面是我训练时所使用到的代码 train.py from ultralytics import YOLO def main(): model = YOLO("yolov12-LSKNet.yaml") results = model.train( data='./data.yaml', epochs=200, imgsz=640, batch=16, amp=False, device=0, save=True, project='runs/train', name='rice_seedling-200L+NWD', verbose=True, workers=2, cache=True, ) if __name__ == '__main__': from multiprocessing import freeze_support freeze_support() main()
09-01
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值