[BSPC 2025]A novel framework for enhancing postpartum hemorrhage prediction: Balanced data preproces-优快云博客

③After this, encoding non-textual feature when feature dimensions are different: binning for skewed or non-linearly correlated continuous variables and one hot for categorical features

④Meanwhile, they define the key factor symbol（这哪来的啊，就是上面的纯紫色表，我咋知道重不重要）

skewed adj.斜的；偏的；偏向(或偏重)…的；歪曲的；歪的；不准确的；有偏颇的 v.歪斜；偏离；歪曲；曲解；影响…的准确性；使不公允

2.4.2. Data balancing

①Two steps oversampling:

where LCFs is lowly correlated features and PFs is promising features

（1）Feature classification

①The $i$ -th sample in the dataset $D=\{s_{1},s_{2},\ldots,s_{N}\}$ has $p$ feature number and label $y$ :

$s_i=(x_{i1},x_{i2},\ldots,x_{ip},y_i)$

②Measure the relationship between feature and label, by Pearson correlation:

$r_j=\frac{\sum_{i=1}^N(x_{ij}-\bar{x}_j)(y_i-\bar{y})}{\sqrt{\sum_{i=1}^N\left(x_{ij}-\bar{x}_j\right)^2}\sqrt{\sum_{i=1}^N\left(y_i-\bar{y}\right)^2}}$

（能直接看到标签的吗？？是统计？）

③Apply t test on correlation, which ensures less influence from random factor:

$t=\frac{r_j\sqrt{N-2}}{\sqrt{1-\left(r_j\right)^2}}$

（2）Part-synthetic oversampling strategy

①LCF value for new generated sample:

$s_i^{\prime{(LCF)}}=s_i^{(LCF)}+\lambda_i\cdot\left(s_{in}^{(LCF)}-s_i^{(LCF)}\right)$

where $\lambda_{i}\in(0,1)$ and $s_{in}\in N_{K}\left(s_{i}\right)$ denotes the nearest $K$ neighbors of $s_i$

②New PF value generated by Gaussian perturbation:

$s_i^{^{\prime}(PF)}=s_i^{(PF)}+\varepsilon_i$

where $\varepsilon_{i}\sim\mathcal{N}\left(0,\sigma^{2}\right)$

③Algorithm of PSOS:

2.4.3. Model integrating in the prediction framework

①Collaborative label prediction of three models:

$\left.Label=\left\{ \begin{array} {ll}1 & \quad\mathrm{if}P_{psos}^1>0.5 \\ \\ 1 & \quad\mathrm{if}\min\left(P_{\mathrm{Ros}}^1,P_{Rus}^1\right)>\max\left(P_{psos}^0,\tau\right) \\ \\ 0 & \quad\mathrm{otherwise}, \end{array}\right.\right.$

（emmm）where 1 denotes PPH and 0 denotes NL

②Algorithm of PPH prediction:

2.5. Experiments and discussions

2.5.1. Dataset and experimental settings

①Dataset: gynecology and obstetrics of Sichuan Provincial Maternity and Child Health Care Hospital

②Definition of PPH: blood loss of ≥ 500 ml after vaginal delivery of a baby, or ≥ 1000 ml after caesarean section within 2 hours

③Sample: 24,110 in total, and only 663 (2.75%) are PPH

④Feature in dataset:

⑤Demographics:

（1）Experimental configuration

①Dimension of feature: 223

②Hyper-parameter: $\alpha=0.1,\beta=0.3,\sigma=0.15,\rho=25,K=5,\gamma=45,\tau=0.75$

③Neural network (NN): linear layers with 200, 64 and 32 hidden dimension and ReLU

④Optimizer: Adam with learning rate of 0.0001 and weight decay with 0.0001

（2）Evaluation metrics

①Metrics:

$\begin{aligned} & Sensitivity=\frac{TP}{TP+FN}, \\ & F1={\frac{2TP}{2TP+FP+FN}}, \\ & MCC={\frac{TP\times TN-FP\times FN}{\sqrt{(TP+FP)\times(TP+FN)\times(TN+FP)\times(TN+FN)}}}, \\ & Gmean=\sqrt{\frac{TP}{TP+FN}\times\frac{TN}{TN+FP}}, \\ & YI=\frac{TP}{TP+FN}+\frac{TN}{TN+FP}-1, \end{aligned}$