latex elserticle代码-优快云博客

本文链接：https://blog.youkuaiyun.com/paidapipi/article/details/149121261

%%
%% Copyright 2019-2024 Elsevier Ltd
%%
%% Version 2.4
%%
%% This file is part of the 'CAS Bundle'.
%% --------------------------------------
%%
%% It may be distributed under the conditions of the LaTeX Project Public
%% License, either version 1.2 of this license or (at your option) any
%% later version. The latest version of this license is in
%% http://www.latex-project.org/lppl.txt
%% and version 1.2 or later is part of all distributions of LaTeX
%% version 1999/12/01 or later.
%%
%% The list of all files belonging to the 'CAS Bundle' is
%% given in the file `manifest.txt'.
%%
%% Template article for cas-dc documentclass for
%% double column output.

%\documentclass[a4paper,fleqn,longmktitle]{cas-dc}
\documentclass[a4paper,fleqn]{cas-dc}

%\usepackage[authoryear,longnamesfirst]{natbib}
%\usepackage[authoryear]{natbib}
\usepackage[numbers]{natbib}

%%%Author definitions
\def\tsc#1{\csdef{#1}{\textsc{\lowercase{#1}}\xspace}}
\tsc{WGM}
\tsc{QE}
\tsc{EP}
\tsc{PMS}
\tsc{BEC}
\tsc{DE}
%%%

\begin{document}
   \let\WriteBookmarks\relax
   \def\floatpagepagefraction{1}
   \def\textpagefraction{.001}
   \shorttitle{Leveraging social media news}
   \shortauthors{}

   \title [mode = title]{B2G-YOLOv11-S: An Efficient Intelligent Grading Model For Strawberry Maturity With Integrated Causal Analysis}

   % 作者1：Qian Zhao（单位a）
   \author[a]{Qian Zhao}[type=editor,
   auid=001,bioid=1,
   role=Researcher,
   orcid=]
   \fnmark[1]
   \ead{1611944365@qq.com}
   \credit{Writing-original draft, Validation, Methodology, Formal analysis, Data curation, Conceptualization}

   % 作者2：Chunxu Hao（单位a）
   \author[a]{Chunxu Hao}[type=editor,
   auid=002,bioid=2,
   role=Researcher,
   orcid=]
   \fnmark[1]
   \ead{1766493930@qq.com}
   \credit{Writing-original draft, Data curation, Conceptualization}

   % 作者3：Jianhua Cui（单位a）
   \author[a]{Jianhua Cui}[type=editor,
   auid=003,bioid=3,
   role=Researcher,
   orcid=]
   \fnmark[1]
   \ead{1755773142@qq.com}
   \credit{Validation, Formal analysis, Writing-review \& editing}

   % 作者4：Jiangchen Zan（单位a）
   \author[a]{Jiangchen Zan}[type=editor,
   auid=004,bioid=4,
   role=Researcher,
   orcid=]
   \fnmark[1]
   \ead{13934790471@163.com}
   \credit{Supervision, Project administration, Investigation}

   % 作者5：Xiongwei Han（单位b）
   \author[b]{Xiongwei Han}[type=editor,
   auid=005,bioid=5,
   role=Researcher,
   orcid=]
   \fnmark[1]
   \ead{202430004@stu.sxau.edu.cn}
   \credit{Conceptualization, Investigation}

   % 作者7：Qingqiang Chen（单位c）
   \author[c]{Qingqiang Chen}[type=editor,
   auid=007,bioid=7,
   role=Researcher,
   orcid=]
   \fnmark[1]
   \ead{chenqingqiang@example.com}
   \credit{Methodology, Resources}

   % 作者8：Xiaoying Zhang（单位a）
   \author[a]{Xiaoying Zhang}[type=editor,
   auid=008,bioid=8,
   role=Researcher,
   orcid=0009-0005-2449-7108]
   \fnmark[1]
   \ead{xiaoyingzhang@sxau.edu.cn}
   \credit{Funding acquisition, Resources}

   % 作者9：Fuzhong Li（单位a，通讯作者）
   \author[a]{Fuzhong Li}[type=editor,
   auid=009,bioid=9,
   role=Corresponding Author,
   orcid=]
   \cormark[1]
   \fnmark[1]
   \ead{lifuzhong@sxau.edu.cn}

   \credit{Supervision, Funding acquisition, Resources}



   % 单位信息（保留原有格式结构）
   \affiliation[a]{organization={School of Software, Shanxi Agricultural University},
       addressline={},
       city={Jinzhong},
       postcode={030801},
       state={Shanxi},
       country={China}}

   \affiliation[b]{organization={College of Information Science and Engineering, Shanxi Agricultural University},
       addressline={},
       city={Jinzhong},
       postcode={030801},
       state={Shanxi},
       country={China}}

   \affiliation[c]{organization={School of Computer and Information Technology (School of Big Data), Shanxi University},
       addressline={},
       city={Taiyuan},
       postcode={030006},
       state={Shanxi},
       country={China}}

   % 通讯作者说明（保留原有标记格式）

   \begin{abstract}
       As the economically valuable small fruit crop globally, strawberries (Fragaria spp.) face significant yield losses in automated harvesting systems due to inaccurate maturity assessment. To address critical challenges in precision agriculture—including the trade-off between detection speed and accuracy, insufficient environmental robustness, and lack of causal interpretability in decision-making—this study proposes an innovative integrated framework. The solution combines an enhanced YOLOv11 architecture with a dual-stream B2-Net backbone, an efficient HGNetv2-C feature extraction module, and a novel causal analysis metric (Average Causal Effect, ACE). Leveraging multi-source image fusion from aerial and ground-level imagery, the system enables comprehensive maturity evaluation under real-world conditions.
       Experimental evaluations demonstrate that the proposed model achieves state-of-the-art performance: a mAP (mean Average Precision) of 82.9\% across IoU (Intersection-over-Union) thresholds from 50\% to 95\%, peaking at 95.6\% mAP at the 50\% IoU threshold. P (precision) and R (recall) rates reach 89.6\% and 92.2\%, respectively, outperforming existing benchmarks. The model demonstrates exceptional resilience in dense occlusion, environmental clutter, fruit clustering, and long-range small-target detection scenarios.
       Causal analysis via the ACE metric further validated the robustness of the proposed framework. Specifically, under illumination intensity perturbations, the mean absolute percentage change of ACE was rigorously constrained within ±0.5\%Δ, in stark contrast to the ±1.7\%Δ and ±1.4\%Δ fluctuations observed for YOLOv8 and YOLOv11, respectively. This outcome demonstrates the model’s well-balanced performance across metrics, avoiding excessive trade-offs to optimize a single metric, thereby exhibiting superior comprehensive performance and stability.
   \end{abstract}

   \begin{graphicalabstract}
       \includegraphics{figs/cas-grabs.pdf}
   \end{graphicalabstract}

   \begin{highlights}
       \item Research highlights item 1
       \item Research highlights item 2
       \item Research highlights item 3
   \end{highlights}

   \begin{keywords}
       Causal analysis metric \sep Environmental robustness \sep Multi-source image fusion \sep Improved YOLOv11\sep Object detection
   \end{keywords}


   \maketitle

   \section{Introduction}
   Strawberry (Fragaria spp.), a perennial herbaceous plant belonging to the Rosaceae family, is renowned for its rich bioactive constituent profile including vitamin C, anthocyanins, and dietary fiber, which confer significant economic and nutritional value \cite{Wan2010}. However, the strawberry industry faces critical challenges arising from the crop's inherent biological characteristics: notably short postharvest shelf life, high susceptibility to mechanical damage, and labor-intensive harvesting operations. These factors collectively constrain sustainable industrial development. In this context, the development of non-contact, high-precision intelligent fruit maturity grading technology assumes strategic significance for optimizing harvest timing, improving quality control, and reducing postharvest losses,thereby facilitating industrial upgrading.

   The rapid development of drone technology and mobile imaging systems has introduced unprecedented opportunities for agricultural applications\cite{Wan2025}. Drones enable real-time monitoring of crop growth through their efficient large-scale data acquisition capabilities, while smartphone cameras provide convenient complementary data sources through their ubiquity. The collaborative application of these technologies constructs a multi-source data framework for strawberry maturity assessment.
   \begin{figure*}[H]
       \centering
       \includegraphics[width=.9\textwidth]{tupian/1.png}
       \caption{Geospatial Overview Map of the Study Area}
       \label{FIG:1}
   \end{figure*}

   In recent years, deep learning-based object detection techniques \cite{Zhu2024} have emerged as transformative tools for agricultural visual perception. Current algorithm systems exhibit a bimodal paradigm: two-stage frameworks like Faster R-CNN \cite{Abimbola2025} achieve precise localization through Region Proposal Networks (RPN), while single-stage architectures such as the YOLO series \cite{Shao2021} enable real-time performance via end-to-end optimization. Wu Xing et al. \cite{Wu2020} introduced a lightweight YOLOv3 model for apple detection, achieving an F1 score of 94.57\% at 116.96 FPS, thereby validating the efficacy of model compression strategies in orchard environments. Li Yangde et al. \cite{Li2023} proposed a MobileNet V3-YOLOv4 based analysis method for pineapple maturity detection during the growth period by replacing the backbone network of YOLOv4 with the lightweight MobileNet V3. Experimental results demonstrate that this approach achieves mean average precisions (mAP) of 87.62\% and 94.21\% for detecting pineapples at the yellow-ripe and green-ripe stages respectively, revealing the synergistic potential of lightweight networks in agricultural target detection applications. Despite these advancements, three critical challenges persist: (1) Inadequate maturity grading granularity—traditional convolutional kernels struggle to capture subtle color gradient features distinguishing immature,semi-mature,mature stages; (2) Environmental robustness gaps—detection confidence degrades significantly under complex conditions (e.g., illumination variations, foliage occlusion); (3) Agronomic knowledge decoupling—current models lack mechanisms to integrate agricultural domain expertise, leading to feature confusion between disease spots and maturity indicators.

   The deeper technical bottleneck stems from a methodological-level causal discontinuity. Although Granger causality tests have been widely applied in economic time series analysis \cite{Ren2021}, and Judea Pearl's causal inference theory has achieved groundbreaking progress in medical diagnostic decision-making \cite{Chen2024}, the agricultural vision field remains entrenched in the cognitive trap of "correlation equals causation.". This path dependence directly leads to two critical flaws: First, models overfit superficial associations such as color and texture while ignoring causal chains like variations in light intensity. Second, they lack interpretability under data distribution shifts, such as misclassifying pigment deposition caused by fungal infection as a maturity indicator.

   To address these issues, this study constructs a "detection-evaluation-decision" full-chain causal evaluation system. Using drone and smartphone-captured images of strawberries at different developmental stages, we innovatively introduce the Average Causal Effect (ACE)\cite{Rubin2014} metric into the YOLOv11 evaluation framework. The experimental design covers four maturity levels of strawberries during the coloring period—unripe, semi-ripe, ripe, and overripe—validating the method's effectiveness on the 'Xiangye' and 'Hongyan' strawberry cultivars. The results demonstrate that the proposed framework achieves an inference speed of 52.36 FPS while improving four-level classification accuracy to 89.6\%, recall to 92.2\%, and mean average precision (mAP) to 95.6\%.




   \section{Data Collection and Processing}
   \subsection{Data Collection}

   The data collection was conducted at the Mobile Fruit and Vegetable Industrial Park in Xiaowang Village, Nandali Township, Xia County, Yuncheng City, Shanxi Province, as illustrated in Figure\ref{FIG:1}. The experimental samples included two strawberry cultivars: ‘Xiangye’ and ‘Hongyan’. Data acquisition was carried out from late November 2024 to late January 2025, with collections performed every three days to cover multiple growth stages of cultivated strawberries, ranging from unripe to overripe. During this period, a total of 7,885 images were acquired using a Huawei Pocket 2 (resolution: 4096×3072 pixels) and a Redmi K20 Pro (resolution: 4000×3000 pixels). To enhance dataset diversity and model generalization, the collected images included variations such as sunny/cloudy conditions, foliage occlusion, sparse or dense strawberry clusters (group distribution), overhead and ground perspectives, overexposure, and backlighting, as illustrated in Figure \ref{FIG:2}. Initially, 5,168 images were gathered. Additionally, DJI Mini SE was used to capture 150 images at a distance of 50 cm from the strawberries.


\begin{figure*}[H]
   \centering
   % 三列布局，每列占 0.31 文本宽度，去掉两端空白
   \begin{tabular}{@{}m{0.31\textwidth}m{0.31\textwidth}m{0.31\textwidth}@{}}
       % 第一行
       \begin{minipage}[c]{\linewidth}
           \centering
           \includegraphics[width=\linewidth, keepaspectratio]{tupian/2.png}
           \captionof{subfigure}{(a) Sunny Condition}
           \label{subfig:2a}
       \end{minipage} &
       \begin{minipage}[c]{\linewidth}
           \centering
           \includegraphics[width=\linewidth, keepaspectratio]{tupian/3.png}
           \captionof{subfigure}{(b) Cloudy Condition}
           \label{subfig:2b}
       \end{minipage} &
       \begin{minipage}[c]{\linewidth}
           \centering
           \includegraphics[width=\linewidth, keepaspectratio]{tupian/4.png}
           \captionof{subfigure}{(c) Foliage Occlusion}
           \label{subfig:2c}
       \end{minipage} \\[0.5em]

       % 第二行
       \begin{minipage}[c]{\linewidth}
           \centering
           \includegraphics[width=\linewidth, keepaspectratio]{tupian/5.png}
           \captionof{subfigure}{(d) Sparse Spatial Distribution}
           \label{subfig:2d}
       \end{minipage} &
       \begin{minipage}[c]{\linewidth}
           \centering
           \includegraphics[width=\linewidth, keepaspectratio]{tupian/6.png}
           \captionof{subfigure}{(e) Dense Spatial Distribution}
           \label{subfig:2e}
       \end{minipage} &
       \begin{minipage}[c]{\linewidth}
           \centering
           \includegraphics[width=\linewidth, keepaspectratio]{tupian/7.png}
           \captionof{subfigure}{(f) Overhead Perspective View}
           \label{subfig:2f}
       \end{minipage} \\[0.5em]

       % 第三行
       \begin{minipage}[c]{\linewidth}
           \centering
           \includegraphics[width=\linewidth, keepaspectratio]{tupian/8.png}
           \captionof{subfigure}{(g) Ground Perspective View}
           \label{subfig:2g}
       \end{minipage} &
       \begin{minipage}[c]{\linewidth}
           \centering
           \includegraphics[width=\linewidth, keepaspectratio]{tupian/9.png}
           \captionof{subfigure}{(h) Overexposure}
           \label{subfig:2h}
       \end{minipage} &
       \begin{minipage}[c]{\linewidth}
           \centering
           \includegraphics[width=\linewidth, keepaspectratio]{tupian/10.png}
           \captionof{subfigure}{(i) Backlighting}
           \label{subfig:2i}
       \end{minipage}
   \end{tabular}

   % 整个 figure 的标题和标签，只写一次！
   \caption{Strawberry ripeness annotation structure}
   \label{FIG:2}
\end{figure*}

\subsection{Data Processing}
All images were stored in JPEG format and annotated in YOLO format using LabelImg. Based on the grading criteria for strawberry sensory evaluation proposed by Ren Xiao et al. \cite{Ren2023}, this study classified strawberry maturity into four distinct stages, as illustrated in Table \ref{tbl1}. Sample images of strawberries at different maturity stages are illustrated in Figure \ref{FIG:3}.
The dataset was divided into training, validation, and test sets in an 8:1:1 ratio, consisting of 4,346, 536, and 536 images, respectively.

\begin{table*}[H]
   \centering
   \caption{Annotation Types}\label{tbl1}
   \begin{tabular*}{0.9\textwidth}{@{\extracolsep{\fill}} llll @{}}
       \toprule
       Type & Label & Peel Color & Proportion of Red on the Fruit \\
       \midrule
       Unripe & Unripe & Almost entirely green or white & 0\%$\sim$30\% \\
       Semi-ripe & Semi-ripe& A mix of red and white & 30\%$\sim$60\% \\
       Ripe & Ripe & Almost entirely red & 60\%$\sim$90\% \\
       Overripe & Overripe & Entirely red & 100\% \\
       \bottomrule
   \end{tabular*}
\end{table*}

\begin{figure*}[H]
   \centering
   \begin{tabular}{@{}cccc@{}} % 三列，无额外间距
       % 第一行
       \begin{minipage}[b]{0.23\textwidth}
           \centering
           \includegraphics[width=\linewidth]{tupian/fig311.png}
           \captionof{(a) Unripe}
           \label{subfig:10a}
       \end{minipage} &
       \begin{minipage}[b]{0.23\textwidth}
           \centering
           \includegraphics[width=\linewidth]{tupian/fig312.png}
           \captionof{(b) Semi-ripe}
           \label{subfig:10b}
       \end{minipage} &
       \begin{minipage}[b]{0.23\textwidth}
           \centering
           \includegraphics[width=\linewidth]{tupian/fig313.png}
           \captionof{(c) Ripe}
           \label{subfig:10c}
       \end{minipage} &
       \begin{minipage}[b]{0.23\textwidth}
           \centering
           \includegraphics[width=\linewidth]{tupian/fig314.png}
           \captionof{(c) Overripe}
           \label{subfig:10c}
       \end{minipage}


   \end{tabular}
   \caption{Different maturity stages of strawberries}
   \label{FIG:3}
\end{figure*}

\section{Research Methodology}
\subsection{YOLOv11 Detection Model}
The YOLO series, as a single-stage object detection algorithm, has been widely adopted for real-time object detection tasks due to its high accuracy and computational efficiency\cite{Wang2023}. In YOLOv11, the backbone network incorporates C3k2 blocks, superseding the C2f blocks used in previous versions. The architectural design of C3k2 blocks significantly enhances computational efficiency, enabling YOLOv11 to perform accelerated feature extraction during image processing while achieving substantial performance gains in complex operational environments.
This study selects the YOLOv11s as the baseline model for strawberry maturity detection. comprises three core components: the Backbone, Neck, and Head, as illustrated in Figure \ref{FIG:4}.

\subsection{Improved YOLOv11 Network}

\begin{figure*}[H]
   \centering
   \includegraphics[width=.9\textwidth]{tupian/fig415.png}
   \caption{Structure of the YOLOv11 Algorithm
   }
   \label{FIG:4}
\end{figure*}

To address the strawberry maturity grading task, this study proposed a high-precision B2G-YOLOv11-S detection framework. The model incorporates the HGNetv2\cite{Zhang2024} network and introduced a customized HGNetv2-C network. By integrating the last three layers of DWHG-Net and YOLOv11's Backbone network through CBLinear and CBFuse modules, we constructed a dual-stream B2-Net architecture. To further enhance multi-scale feature fusion, inspired by Damo-YOLO, we proposed a C3kGFPN structure that replaces the CSPStage module with C3k2 blocks, effectively reducing computational complexity.Furthermore, the C3kGFPN architecture introduced the SDI module, which enhances the model's discriminative capacity and recognition of diverse targets through feature separation and deep interaction mechanisms. This improved model achieved a balance between lightweight design and detection accuracy improvement, demonstrating suitability for complex agricultural image recognition scenarios.The structure design of B2G-YOLOv11-S is illustrated in Figure \ref{FIG:5}.

\begin{figure*}[H-]
   \centering
   \includegraphics[width=.9\textwidth]{tupian/fig516.png}
   \caption{Structure Diagram of B2G-YOLOv11-S
   }
   \label{FIG:5}
\end{figure*}

\subsubsection{Improved YOLOv11 Network}
In strawberry maturity classification tasks, the deep convolutional modules of YOLOv11 extracted global features through progressive downsampling. However, this approach exhibited inherent limitations in capturing subtle features such as color gradients and epidermal texture variations of strawberries, particularly under challenging conditions including uneven illumination, severe occlusion, or significant background interference. Consequently, it led to frequent misclassification of leaves or soil as strawberry fruits.To address this issue, this study introduced the HGNetv2 network and proposed an HGNetv2-C architecture. The design replaced the original YOLO backbone network with HGStem and HGBlock modules while retaining the C2PSA module and incorporating depthwise separable convolution. By decoupling spatial and channel feature extraction through depthwise and pointwise convolutions, the architecture significantly reduced computational complexity while maintaining feature representation capability \cite{Ping2025}. The structural design of the improved HGNetv2-C is illustrated in Figure \ref{FIG:6}.

\begin{figure*}[H]
   \centering
   \includegraphics[width=.9\textwidth]{tupian/fig617.png}
   \caption{Improved Structure of HGNetv2-C
   }
   \label{FIG:6}
\end{figure*}

The HGStem layer consists of multiple convolutional layers and one max-pooling layer, which is designed to extract initial low-level features from raw input data. This layer performs downsampling through convolutional operations to reduce spatial dimensions while expanding the channel count to an intermediate dimensionality. The input features are then processed through two parallel processing branches: one branch performs downsampling via a max-pooling layer, while the other sequentially passes through two convolutional layers with padding to maintain spatial dimensions. The outputs of both branches are concatenated along the channel dimension to form a richer feature representation. The concatenated features are further processed by a convolutional layer to produce the final output of the Stem layer. Through multi-branch feature extraction and concatenation, this design not only enhances the expressive capacity of initial features but also achieves computational efficiency through strategic downsampling, thereby establishing the foundation for subsequent hierarchical feature extraction.
The HGBlock module comprises multi-branch convolutional layers, feature concatenation layers, progressive convolutional layers, and channel-wise feature recalibration modules, enabling hierarchical feature extraction of data. Input features are independently processed by multiple parallel convolutional branches, each focusing on features of different scales (e.g., local textures, edges, or global structures) to capture both low-level and high-level abstract features. The output feature maps from each branch are concatenated and fused, then subjected to channel-wise dynamic recalibration via a "squeeze-and-excitation" module. This module compresses spatial information through global pooling and generates channel weights, adaptively enhancing critical feature channels (e.g., color channels for strawberries) while suppressing redundant ones.

This design allows HGBlock to achieve multi-level feature extraction and efficient feature utilization while maintaining a lightweight architecture. By incorporating depthwise separable convolution (DWConv), it significantly reduces computational overhead, striking a balance between performance and efficiency. The structure of the lightweight HGBlock module is illustrated in Figure \ref{FIG:7}.

\begin{figure*}[H]
   \centering
   \includegraphics[width=.9\textwidth]{tupian/fig718.png}
   \caption{Structure Diagram of the Lightweight HGBlock Module
   }
   \label{FIG:7}
\end{figure*}

\subsubsection{B2-Net Network}
Although the HGNetv2-C network architecture enhances local feature extraction capabilities, a single-backbone network still has inherent limitations. The lightweight design results in inadequate global context modeling capability and suboptimal feature fusion efficiency when integrated with YOLO modules, which consequently results in a weak capability of capturing correlations between strawberry fruits and their surrounding environments, including leaves and stems. This reduces the detection capability for partially occluded or densely arranged fruits.

To mitigate this issue, we proposed the B2-Net dual-stream network. The HGNetv2-C branch is responsible for extracting local features of strawberries, such as color and texture, while the YOLO branch captures global contextual information, including arrangement and occlusion, thereby achieving complementary feature extraction. A hierarchical fusion strategy\cite{Sun2024} was employed: the first two layers were predominantly processed by the HGNetv2-C branch to prevent premature introduction of computational overhead from the deep modules of YOLO and to maintain sensitivity to details; the last three layers incorporated the CBLinear module to augment linear feature representation capability and employed the CBFuse module to achieve the fusion of dual-stream features, significantly improving the detection capability for strawberry fruits in complex environments. The structural diagram of the B2-Net dual-stream network is depicted in Figure \ref{FIG:8}.

\begin{figure*}[H]
   \centering
   \includegraphics[width=.9\textwidth]{tupian/fig819.png}
   \caption{Structure Diagram of the LHG-YOLO-CrossNet Dual-Stream Network
   }
   \label{FIG:8}
\end{figure*}

The Silence module acts as a placeholder within the model, serving as a no - operation layer to prevent any modification to the feature flow.

The CBLinear module partitions the features of stage 2 into three branches with channel dimensions [64, 128, 256], stage 3 into four branches with channel dimensions [64, 128, 256, 512], and stage 4 into five branches with channel dimensions [64, 128, 256, 512, 1024] using 1×1 convolutions. This facilitates the standardization of the input format for dynamic fusion with deep-level features.
The CBFuse module adopts the resolution of the last feature as the reference resolution. It adjusts features with varying resolutions (derived from CBLinear outputs or downsampling layers) to a unified resolution through nearest-neighbor interpolation, and subsequently performs element-wise summation fusion to enhance multi-scale contextual information.
\subsubsection{C3kGFPN}
Based on the B2-Net network, the original neck of YOLOv11 demonstrates certain limitations in strawberry maturity detection tasks. During the process of multi-scale feature fusion \cite{He2025}, the original neck fails to fully exploit feature information from different levels, resulting in a certain degree of information loss during feature propagation. This poses a challenge for the model to accurately discriminate strawberries with similar maturity levels. To mitigate this issue, we integrated the neck design from Damo-Yolo.
The neck of DAMO-YOLO employed a reparameterized generalized feature pyramid network (RepGFPN) architecture \cite{Zhao2024}, as illustrated in Figure \ref{FIG:9}. It attains a balance between computational efficiency and feature representation capability by leveraging multi - scale feature fusion and structural reparameterization techniques. The CSPStage module, serving as the core component of RepGFPN, improves the efficiency of multi-scale feature fusion by integrating feature reuse and gradient optimization mechanisms through a cross-stage partial (CSP) connection structure.

\begin{figure*}[H]
   \centering
   \includegraphics[width=.9\textwidth]{tupian/fig920.png}
   \caption{Structure Diagram of the C3kGFPN Network
   }
   \label{FIG:9}
\end{figure*}

In comparison with the C3k2 module employed in YOLOv11, the CSPStage module within the original neck structure of DAMO-YOLO exhibits higher computational complexity, thereby constraining its deployability in resource-constrained scenarios. In an effort to streamline the model, we substituted the CSPStage module with the C3k2 module. The C3k2 module incorporates multi-scale convolutional branches and channel separation techniques, effectively extracting multi-scale features whilst concurrently reducing computational complexity. The resulting improved architecture is designated as C3kGFPN, and its structure is illustrated in Figure \ref{FIG:10}.
\begin{figure*}[H]
   \centering
   \includegraphics[width=.9\textwidth]{tupian/fig1021.png}
   \caption{Architecture Diagram of the C3kGFPN Network
   }
   \label{FIG:10}
\end{figure*}

\subsubsection{C3kGFPN-S Network}
Within the C3kGFPN neck architecture, an SDI (Split-Depthwise Interaction) module is introduced. Its feature separation and depthwise interaction mechanism enables comprehensive mining of key information from input features, thereby enabling the model to acquire more representative and discriminative features, thus significantly enhancing the model's capability to capture features from complex scenes and diverse targets. The resulting improved architecture is designated as C3kGFPN-S, and its structure is illustrated in Figure \ref{FIG:11}.
\begin{figure*}[H]
   \centering
   \includegraphics[width=.9\textwidth]{tupian/fig1122.png}
   \caption{Architecture Diagram of the C3kGFPN-S Network
   }
   \label{FIG:11}
\end{figure*}

The SDI module comprises a feature separation layer, a depthwise interaction submodule, and a feature fusion layer. The feature separation layer partitions the input features into several groups evenly via channel grouping. The depthwise interaction submodule incorporates multiple depthwise separable convolution layers, each of which is composed of a depthwise convolution and a pointwise convolution. The depthwise convolution applies convolution kernels of varying sizes to each subgroup to perform local feature extraction \cite{Hu2024}. The pointwise convolution employs 1×1 convolutions to fuse features across different subgroups, thereby facilitating inter-channel information interaction. Activation functions are inserted between the depthwise separable convolution layers to augment the module's nonlinear representation capability. Upon completion of the depthwise interaction, the feature fusion layer recombines all subgroups into a comprehensive feature map. The structure is illustrated in Figure \ref{FIG:12}.

\begin{figure*}[H]
   \centering
   \includegraphics[width=.9\textwidth]{tupian/fig1223.png}
   \caption{Architecture Diagram of the SDI Network
   }
   \label{FIG:12}
\end{figure*}

\section{Experimental Results and Analysis}
\subsection{Experimental Environment and Parameter Settings}

To guarantee fairness in the comparison of experimental results, all experiments were conducted on a single computer with an identical hardware configuration. The configuration of the experimental environment is presented in Table \ref{tbl2}. The training parameters were set as follows: an initial learning rate of 0.01, an SGD optimizer with a momentum of 0.937, and a weight decay coefficient of 0.0005, a batch size of 12 samples,Image input resolution uniformly set to 640×640 pixels , 4 worker threads for multiprocessing, and a total of 200 training epochs.

\begin{table*}[H]
   \centering
   \caption{Annotation Types}\label{tbl2}
   \begin{tabular*}{0.9\textwidth}{@{\extracolsep{\fill}} llll @{}}
       \toprule
       Type & Label & Peel Color & Proportion of Red on the Fruit \\
       \midrule
       Unripe & Unripe & Almost entirely green or white & 0\%$\sim$30\% \\
       Semi-ripe & Semi-ripe& A mix of red and white & 30\%$\sim$60\% \\
       Ripe & Ripe & Almost entirely red & 60\%$\sim$90\% \\
       Overripe & Overripe & Entirely red & 100\% \\
       \bottomrule
   \end{tabular*}
\end{table*}

\subsection{Evaluation Metrics}
To assess the performance of the strawberry maturity detection model, this study adopts mean Average Precision (mAP) as the primary evaluation metric, supplemented by Recall (R) and Precision (P) to comprehensively reflect the model's detection capability \cite{Dong2025}. Additionally, the algorithm evaluation includes two key performance indicators: Giga Floating-Point Operations Per Second (GFLOPs) and Frames Per Second (FPS).

\subsection{Experimental Environment and Parameter Settings}
To thoroughly evaluate the performance of the enhanced YOLOv11 algorithm in high - altitude strawberry maturity detection tasks, this study employs a test set comprising 5,318 complex scene images for quantitative assessment. As illustrated in Table \ref{tbl3}, the quantitative assessment results show notable enhancements in comprehensive performance metrics: The model achieved a mean Average Precision (mAP) of 82.9\% over the range of IoU thresholds from 50\% to 95\%. When the IoU threshold was set at 50\%, the mAP further improved to 95.6\%. In terms of key object detection metrics, Precision reached 89.6\% and Recall achieved 92.2\%, fully demonstrating the improved algorithm's excellent detection performance and practical application value in complex agricultural scenarios.

\begin{table*}[H]
   \centering
   \caption{Quantitative Evaluation Results}\label{tbl3}
   \begin{tabular*}{0.9\textwidth}{@{\extracolsep{\fill}} c c c c c c c @{}}
       \toprule
       Maturity Level Classification & P & R & mAP$_{50}$ & mAP$_{50-95}$ & FPS & GFlops \\
       \midrule
       all & 0.896 & 0.922 & 0.956 & 0.829 & \multirow{4}{*}{52.36} & \multirow{4}{*}{13.8} \\
       Unripe & 0.879 & 0.907 & 0.938 & 0.778 & & \\
       Semi-ripe & 0.91 & 0.918 & 0.954 & 0.829 & & \\
       Overripe & 0.936 & 0.923 & 0.972 & 0.867 & & \\
       \bottomrule
   \end{tabular*}
\end{table*}

\subsection{Ablation Experiments}
Ablation experiments were carried out to assess the impact of the proposed model on the original YOLOv11 architecture. By employing the control-variable method and maintaining consistent training configurations, each module was sequentially removed and subsequently comparatively analyzed to quantitatively evaluate the effectiveness of individual architectural innovations. The ablation study centered on four key aspects: 1) A performance comparison between HGNetv2-C (Backbone 1), which boasts efficient feature extraction capability, and traditional backbones; 2) Verification of the efficacy of the dual-stream information fusion mechanism in B2-Net (Backbone 2); 3) An analysis of the cross-scale feature interaction mechanism within C3kGFPN (Neck 1); 4) The optimization outcomes of the spatial-context and channel-feature enhancement strategies employed in C3kGFPN-S (Neck 2). All quantitative results were derived under standardized evaluation metrics and identical experimental settings. The detailed performance metrics are provided in Table \ref{tbl4}. The analysis of modular contributions unveils both the synergistic mechanisms and the independent effectiveness of each component within the proposed detection framework.

\begin{table*}[H]
   \centering
   \caption{Ablation Experiments}\label{tbl4}
   \begin{tabular*}{1\textwidth}{@{\extracolsep{\fill}} c c c c c c c c c c @{}}
       \toprule
       \multicolumn{2}{c}{Backbone} & \multicolumn{2}{c}{Neck} & P & R & mAP50 & mAP50-95 & FPS & GFlops \\
       \cmidrule(lr){1-2} \cmidrule(lr){3-4}
       1 & 2 & 1 & 2 & & & & & & \\
       \midrule
       & & & & 0.824 & 0.839 & 0.893 & 0.772 & 88.54 & 6.3 \\
       $\surd$ & & & & 0.833 & 0.881 & 0.911 & 0.787 & 67.3 & 6.9 \\
       $\surd$ & $\surd$ & & & 0.866 & 0.871 & 0.923 & 0.799 & 58.64 & 9.4 \\
       & & $\surd$ & & 0.84 & 0.871 & 0.921 & 0.797 & 79.96 & 6.7 \\
       & & $\surd$ & $\surd$ & 0.871 & 0.897 & 0.917 & 0.794 & 78.66 & 10.6 \\
       $\surd$ & & $\surd$ & & 0.842 & 0.896 & 0.922 & 0.795 & 77.32 & 5.8 \\
       $\surd$ & & $\surd$ & $\surd$ & 0.873 & 0.876 & 0.921 & 0.794 & 79.54 & 9.8 \\
       $\surd$ & $\surd$ & $\surd$ & & 0.884 & 0.913 & 0.94 & 0.816 & 55.68 & 10.0 \\
       $\surd$ & $\surd$ & $\surd$ & $\surd$ & 0.896 & 0.922 & 0.956 & 0.829 & 52.36 & 13.8 \\
       \bottomrule
   \end{tabular*}
\end{table*}

The results of the ablation experiments indicate that the complete model incorporating a dual - stream backbone network and an efficient feature pyramid achieves optimal performance across all evaluation metrics. The model achieves a substantial improvement of 95.The results of the ablation experiments indicate that the complete model incorporating a dual - stream backbone network and an efficient feature pyramid achieves optimal performance across all evaluation metrics. The model achieves a substantial impThe results of the ablation experiments indicate that the complete model incorporating a dual - stream backbone network and an efficient feature pyramid achieves optimal performance across all evaluation metrics. The model achieves a substantial improvement of 95.6\% in mAP50, representing a 6.3\% increase compared to the baseline model, while maintaining high detection accuracy. Despite the fact that this design elevates computational complexity, its advantages in feature extraction, multi - scale fusion, and contextual information utilization lead to a significant enhancement of the model's detection performance in complex scenarios. The experiments comprehensively validate the efficacy of the synergistic collaboration between the dual - stream backbone network and the efficient feature pyramid, providing important references for designing high-performance object detection models.

\subsection{Performance Comparison}
\begin{figure*}[H]
   \centering
   \begin{tabular}{@{}c@{}}
       % 第一行（Simple 类别，行标题 + 6张子图）
       \begin{tabular}{@{}c@{\hspace{0.5em}}ccccc@{}}
           \multirow{2}{*}{
               \begin{minipage}[b]{0.1\textwidth}
                   \centering
                   \captionof{Simple}
           \end{minipage}} &

           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/24.png}
           \end{minipage} &

           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/25.png}
           \end{minipage} &

           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/26.png}
           \end{minipage} &

           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/27.png}
           \end{minipage} &

           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/28.png}
           \end{minipage} \\[0.5em]

       \end{tabular} \\[1em]

       % 第二行（Occlusion 类别，行标题 + 6张子图）
       \begin{tabular}{@{}c@{\hspace{0.5em}}ccccc@{}}
           \multirow{2}{*}{\begin{minipage}[b]{0.1\textwidth}
                   \centering
                   \captionof{Occlusion}
           \end{minipage}} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/29.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/30.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/31.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/32.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/33.png}
           \end{minipage}\\[0.5em]

       \end{tabular} \\[1em]


       % 第三行（Complex Environment 类别，行标题 + 6张子图）
       \begin{tabular}{@{}c@{\hspace{0.5em}}ccccc@{}}
           \multirow{2}{*}{\begin{minipage}[b]{0.1\textwidth}
                   \centering
                   \captionof{Complex Environment}
           \end{minipage}} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/34.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/35.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/36.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/37.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/38.png}
           \end{minipage} \\[0.5em]

       \end{tabular} \\[1em]

       % 第四行（Complex Environment 类别，行标题 + 6张子图）
       \begin{tabular}{@{}c@{\hspace{0.5em}}ccccc@{}}
           \multirow{2}{*}{\begin{minipage}[b]{0.1\textwidth}
                   \centering
                   \captionof{Multiple Fruits}
           \end{minipage}} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/39.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/40.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/41.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/42.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/43.png}
           \end{minipage} \\[0.5em]

       \end{tabular} \\[1em]

       % 第五行（Multiple Fruits 类别，行标题 + 6张子图）
       \begin{tabular}{@{}c@{\hspace{0.5em}}ccccc@{}}
           \multirow{2}{*}{\begin{minipage}[b]{0.1\textwidth}
                   \centering
                   \captionof{Long-Range}
           \end{minipage}} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/44.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/45.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/46.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/47.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/48.png}
           \end{minipage}\\[0.5em]

       \end{tabular} \\[1em]

       % 第六行（Long-Range 类别，行标题 + 6张子图）
       \begin{tabular}{@{}c@{\hspace{0.5em}}ccccc@{}}
           \multirow{2}{*}{\begin{minipage}[b]{0.1\textwidth}
                   \centering
                   \captionof{UAV}
           \end{minipage}} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/49.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/50.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/51.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/52.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/53.png}
           \end{minipage}\\[0.5em]
           & \multicolumn{1}{c}{\small Original Image} & \multicolumn{1}{c}{\small Yolov5} & \multicolumn{1}{c}{\small Yolov8} & \multicolumn{1}{c}{\small Yolov11} & \multicolumn{1}{c}{\small B2G-YOLOv11-S} & \\
       \end{tabular} \\[1em]
   \end{tabular}

   \caption{ Comparison of Detection Performance Across Different Models}
   \label{FIG:13}
\end{figure*}
In the evaluation of object detection performance in complex agricultural scenarios, the improved YOLOv11 architecture demonstrates significant advantages, as illustrated in Figure \ref{FIG:13}. For densely occluded scenes, when the target fruit overlap rate exceeds 65\%, the model can still accurately identify strawberries that are doubly occluded in the lower-left region of the image, whereas mainstream detectors suffer from missed detections under such extreme conditions.

In complex environmental interference tests, when strawberries grown at elevated positions exhibit texture features resembling rot, YOLOv5 and YOLOv8 exhibit significant semantic confusion\cite{Ji2025}. Specifically, traditional models misclassify decayed areas, which have similar spectral features, as overripe fruits. Although the baseline YOLOv11 model avoids such cross-category misclassification, it reveals another flaw in environmental adaptation: it incorrectly classifies green leaf tissues within the complex background of cultivation frames as unripe fruits. In contrast, the B2G-YOLOv11-S model successfully resolves both types of typical misclassifications.

In scenarios with dense arrangements of multiple fruits, when the fruit spacing is less than 15\% of the fruit diameter, only the improved algorithm successfully identifies the position of the Semi-ripe strawberry located in the upper-left region, which is partially occluded. For long-range small-object detection, when the pixel area of the fruit accounts for less than 0.32\% of the total image area, the proposed model remains capable of accurately discriminating the maturity of the fruit, achieving a high mAP.

\begin{figure*}[H]
   \centering
   \begin{tabular}{@{}c@{}}
       % 第一行（Simple 类别，行标题 + 6张子图）
       \begin{tabular}{@{}c@{\hspace{0.5em}}ccccc@{}}
           \multirow{2}{*}{
               \begin{minipage}[b]{0.1\textwidth}
                   \centering
                   \captionof{Simple}
           \end{minipage}} &

           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/54.png}
           \end{minipage} &

           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/55.png}
           \end{minipage} &

           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/56.png}
           \end{minipage} &

           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/57.png}
           \end{minipage} &

           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/58.png}
           \end{minipage} \\[0.5em]

       \end{tabular} \\[1em]

       % 第二行（Occlusion 类别，行标题 + 6张子图）
       \begin{tabular}{@{}c@{\hspace{0.5em}}ccccc@{}}
           \multirow{2}{*}{\begin{minipage}[b]{0.1\textwidth}
                   \centering
                   \captionof{Occlusion}
           \end{minipage}} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/59.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/60.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/61.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/62.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/63.png}
           \end{minipage}\\[0.5em]

       \end{tabular} \\[1em]


       % 第三行（Complex Environment 类别，行标题 + 6张子图）
       \begin{tabular}{@{}c@{\hspace{0.5em}}ccccc@{}}
           \multirow{2}{*}{\begin{minipage}[b]{0.1\textwidth}
                   \centering
                   \captionof{Complex Environment}
           \end{minipage}} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/64.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/65.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/66.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/67.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/68.png}
           \end{minipage} \\[0.5em]

       \end{tabular} \\[1em]

       % 第四行（Complex Environment 类别，行标题 + 6张子图）
       \begin{tabular}{@{}c@{\hspace{0.5em}}ccccc@{}}
           \multirow{2}{*}{\begin{minipage}[b]{0.1\textwidth}
                   \centering
                   \captionof{Multiple Fruits}
           \end{minipage}} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/69.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/70.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/71.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/72.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/73.png}
           \end{minipage} \\[0.5em]

       \end{tabular} \\[1em]

       % 第五行（Multiple Fruits 类别，行标题 + 6张子图）
       \begin{tabular}{@{}c@{\hspace{0.5em}}ccccc@{}}
           \multirow{2}{*}{\begin{minipage}[b]{0.1\textwidth}
                   \centering
                   \captionof{Long-Range}
           \end{minipage}} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/74.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/75.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/76.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/77.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/78.png}
           \end{minipage}\\[0.5em]

       \end{tabular} \\[1em]

       % 第六行（Long-Range 类别，行标题 + 6张子图）
       \begin{tabular}{@{}c@{\hspace{0.5em}}ccccc@{}}
           \multirow{2}{*}{\begin{minipage}[b]{0.1\textwidth}
                   \centering
                   \captionof{UAV}
           \end{minipage}} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/79.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/80.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/81.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/82.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/83.png}
           \end{minipage}\\[0.5em]
           & \multicolumn{1}{c}{\small Original Image} & \multicolumn{1}{c}{\small Yolov5} & \multicolumn{1}{c}{\small Yolov8} & \multicolumn{1}{c}{\small Yolov11} & \multicolumn{1}{c}{\small B2G-YOLOv11-S} & \\
       \end{tabular} \\[1em]
   \end{tabular}

   \caption{Comparison of Heatmaps Across Different Models}
   \label{FIG:14}
\end{figure*}
\begin{figure*}[H]
   \centering
   \begin{tabular}{@{}c@{}}
       % 第一行（Simple 类别，行标题 + 6张子图）
       \begin{tabular}{@{}c@{\hspace{0.5em}}ccccc@{}}
           \multirow{2}{*}{
               \begin{minipage}[b]{0.1\textwidth}
                   \centering
                   \captionof{0.5× Illumination}
           \end{minipage}} &

           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/84.png}
           \end{minipage} &

           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/85.png}
           \end{minipage} &

           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/86.png}
           \end{minipage} &

           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/87.png}
           \end{minipage} &

           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/88.png}
           \end{minipage} \\[0.5em]

       \end{tabular} \\[1em]

       % 第二行（Occlusion 类别，行标题 + 6张子图）
       \begin{tabular}{@{}c@{\hspace{0.5em}}ccccc@{}}
           \multirow{2}{*}{\begin{minipage}[b]{0.1\textwidth}
                   \centering
                   \captionof{0.75× Illumination}
           \end{minipage}} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/89.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/90.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/91.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/92.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/93.png}
           \end{minipage}\\[0.5em]

       \end{tabular} \\[1em]


       % 第三行（Complex Environment 类别，行标题 + 6张子图）
       \begin{tabular}{@{}c@{\hspace{0.5em}}ccccc@{}}
           \multirow{2}{*}{\begin{minipage}[b]{0.1\textwidth}
                   \centering
                   \captionof{1.0× Illumination}
           \end{minipage}} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/94.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/95.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/96.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/97.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/98.png}
           \end{minipage} \\[0.5em]

       \end{tabular} \\[1em]

       % 第四行（Complex Environment 类别，行标题 + 6张子图）
       \begin{tabular}{@{}c@{\hspace{0.5em}}ccccc@{}}
           \multirow{2}{*}{\begin{minipage}[b]{0.1\textwidth}
                   \centering
                   \captionof{1.25× Illumination}
           \end{minipage}} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/99.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/100.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/101.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/102.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/103.png}
           \end{minipage} \\[0.5em]

       \end{tabular} \\[1em]

       % 第五行（Multiple Fruits 类别，行标题 + 6张子图）
       \begin{tabular}{@{}c@{\hspace{0.5em}}ccccc@{}}
           \multirow{2}{*}{\begin{minipage}[b]{0.1\textwidth}
                   \centering
                   \captionof{1.5× Illumination}
           \end{minipage}} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/104.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/105.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/106.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/107.png}
           \end{minipage} &
           \begin{minipage}[b]{0.15\textwidth}
               \centering
               \includegraphics[width=\linewidth]{tupian/108.png}
           \end{minipage}\\[0.5em]
           & \multicolumn{1}{c}{\small Original Image} & \multicolumn{1}{c}{\small Yolov5} & \multicolumn{1}{c}{\small Yolov8} & \multicolumn{1}{c}{\small Yolov11} & \multicolumn{1}{c}{\small B2G-YOLOv11-S} & \\
       \end{tabular} \\[1em]
   \end{tabular}

   \caption{Detection Results Under Simulated Lighting Conditions}
   \label{FIG:15}
\end{figure*}

\begin{figure*}[H]
   \centering
   \includegraphics[width=.9\textwidth]{tupian/109.png}
   \caption{Detection Results of the ACE Model
   }
   \label{FIG:16}
\end{figure*}
\begin{table*}[H]
   \centering
   \caption{Performance Metrics Comparison Across Different Models}\label{tbl5}
   \begin{tabular*}{0.9\textwidth}{@{\extracolsep{\fill}} l c c c c c @{}}
       \toprule
       \textbf{Model} & \textbf{P} & \textbf{R} & \textbf{Map50} & \textbf{Map50-95} & \textbf{F1-score} \\
       \midrule
       YOLOv5 & 0.819 & 0.844 & 0.899 & 0.77 & 0.831 \\
       YOLOv8 & 0.787 & 0.884 & 0.905 & 0.78 & 0.832 \\
       YOLOv11 & 0.824 & 0.839 & 0.893 & 0.772 & 0.831 \\
       B2G-YOLOv11-S & 0.896 & 0.922 & 0.956 & 0.829 & 0.909 \\
       \bottomrule
   \end{tabular*}
\end{table*}

Particularly in UAV aerial verification scenarios, the improved model accurately analyzes subtle phenotypic changes during fruit expansion despite challenges such as inter - object occlusion and uneven lighting in elevated cultivation.It achieves comprehensive detection of strawberry targets throughout their entire growth cycle, thereby avoiding the typical error made by traditional models, which misclassify this growth stage as unripe fruit.

By comparing the heatmaps of the processing layers before the P3 detection head across multiple models, as illustrated in Figure \ref{FIG:14}, the improved model demonstrates significant advantages in complex environments. When there are a large number of interfering objects in the background, such as branches, leaves, and soil with colors similar to those of strawberries, the heatmap exhibits high activation in the key feature areas of strawberries. Bright and concentrated activation points are observed on the heatmap in key feature regions like the area around the strawberry calyx where color changes occur and the region showing the color uniformity of the fruit surface. This indicates that the model can penetrate through complex backgrounds, accurately lock onto the key features of strawberries, and provide a reliable basis for maturity grading.

\subsection{Causal Analysis Comparison}

In this study, by introducing the ACE metric (Average Causal Effect)\cite{Rubin2014}, the significant advantages of the B2G-YOLOv11-S framework in strawberry maturity grading were rigorously verified through causal reasoning. The ACE metric, defined as the expected difference in predictive outcomes under counterfactual interventions （1）, is mathematically formulated as:
\begin{equation}
\centering
\mathrm{ACE=E[Y(1)-Y(0)]}\end{equation}
where Y(1) denotes the maturity judgment outcome when phenotypic color features are observed under perturbed illumination intensities (0.5×, 0.75×, 1.25×, 1.5× initial intensity), and Y(0) represents the baseline outcome under standard lighting conditions. This formulation enables the quantification of direct causal contributions from color representation to maturity prediction, thereby exceeding the predictive correlation limits of traditional metrics like F1-score.

The validation set underwent illumination intensity perturbations at 0.5×, 0.75×, 1.25×, and 1.5× the initial intensity ,as illustrated in Figure \ref{FIG:15}, with experimental results validated using the trained model,as illustrated in Figure \ref{FIG:16}. When compared with baseline models YOLOv5, YOLOv8, and YOLOv11, the proposed B2G-YOLOv11-S demonstrated remarkably stable ACE fluctuation rates. Specifically, under illumination perturbation, the mean absolute percentage change of ACE was constrained within ±0.5\%Δ, whereas YOLOv8 and YOLOv11 exhibited fluctuations exceeding ±1.5\%Δ and ±1.3\%Δ respectively. This result confirms that the B2G-YOLOv11-S model effectively strengthens the causal relationship between phenotypic features and maturity judgment by maintaining performance robustness across illumination variations.

\subsection{Different Models Comparison}
In this study, the detection performance of YOLOv5, YOLOv8, YOLOv11, and the B2G-YOLOv11-S models was compared for the strawberry maturity detection task. The experimental results demonstrate that the B2G-YOLOv11-S model achieves the best performance across core metrics, including precision, recall, mAP50, mAP50-95, and F1-score, significantly outperforming the other comparative models in Table \ref{tbl5}. By leveraging an innovative dual-stream guided architecture, this model facilitates bidirectional information fusion among feature pyramids, thereby effectively enhancing the detection accuracy of strawberry maturity. The B2G-YOLOv11-S model demonstrates notable advantages in terms of both precision and efficiency for strawberry maturity detection tasks.

\section{Conclusion}

This study presents an innovative approach to address the challenges associated with the imbalance between accuracy and speed, limited environmental adaptability, and the lack of causal interpretability in strawberry maturity detection.Firstly, leveraging an improved YOLOv11 model, wes integrate the backbone network of YOLOv11 with the HGNetv2 network into a B2-Net dual-stream network architecture via the CBLinear and CBFuse modules. Drawing inspiration from the RepGFPN concept, we propose the C3kGFPN architecture and incorporate the SDI module, which significantly improves the accuracy of strawberry maturity detection.Secondly, the study introduces the Average Causal Effect (ACE) as a novel evaluation metric. We construct a causal reasoning chain, "color representation → light intensity → maturity judgment," offering a fresh perspective for assessing model performance and enhancing both the environmental adaptability and decision reliability of the model.Finally, by employing multi-source image fusion technology that combines image data from drones and smartphones, we provide rich and precise image information. This aids the model in maintaining high detection accuracy and robustness under complex environmental conditions.

\appendix

%% Loading bibliography style file
%\bibliographystyle{model1-num-names}
\bibliographystyle{cas-model2-names}

\printcredits

\section{Declaration of Generative AI and AI-assisted technologies in the writing process}
During the preparation of this work the authors used ChatGPT in order to improve language and readability with caution. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

\section{Declaration of competing interest}
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

\section{Acknowledgements}
This study received support from the Smart Agricultural Machinery and Rural Governance Project funded by the Key Research and Development Program of Shanxi Province (No. 202202140601021), as well as the Science and Technology Innovation Enhancement Project under the Scientific Research Initiatives of Shanxi Agricultural University (No. CXGC2025057). Additionally, it was bolstered by the Postgraduate Research Innovation Project within the Provincial Postgraduate Education Innovation Plan at Shanxi Agricultural University (No. 2024KY322). The authors extend their heartfelt appreciation for the support provided by these projects.

\section{Data availability}
Data will be made available on request.

% Loading bibliography database
\bibliography{cas-refs}

%\vskip3pt

\end{document}