[BSPC 2025]A novel framework for enhancing postpartum hemorrhage prediction: Balanced data preproces

论文网址:A novel framework for enhancing postpartum hemorrhage prediction: Balanced data preprocessing meets neural network modeling - ScienceDirect

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related works

2.4. Methodology

2.4.1. Data preprocessing

2.4.2. Data balancing

2.4.3. Model integrating in the prediction framework

2.5. Experiments and discussions

2.5.1. Dataset and experimental settings

2.5.2. Experimental results and analysis

2.5.3. Discussion

2.6. Conclusion

1. 心得

(1)相关工作显得有点早了

(2)好鲜艳的绘图风格

(3)是不是有点太...performance based?了?emm,双栏九页显得有点短了,感觉实验纯在说性能,有没有更实际一点的作用呢?比如这个比检测疾病来得更加紧急,是不是需要更快速更轻量?出血都是同一原因吗?能不能诊断不同原因的出血?(外行不懂)

(4)需要的特征是不是太多了?如果已知哪些重要(好像文章里提到了高危因素),那加其他那么多进来是emm,也不是不行,就是奇奇怪怪的

2. 论文逐段精读

2.1. Abstract

        ①Limitation: class imbalance and insufficient utilization of neural networks on postpartum hemorrhage (PPH) prediction

postpartum  adj.产后的    hemorrhage  n.出血;(尤指大量的)失血  v.出血

2.2. Introduction

        ①Demand: timely treatment

        ②Challenges: low quality of data, scarse data, unstructured text records and class imbalance

disseminated  v.散布,传播(信息、知识等)  adj.浸染的;散播性的

intravascular  adj.血管内的    coagulation  n.凝固;凝结(物);凝聚;絮(胶)凝;聚集

2.3. Related works

        ①Discriminative features might be hard to collect

        ②Univariate analysis ignores the non-linear relationship between feature and target

        ③Other methods might ignores the class imbalance problem

        ④Lists several undersampling and oversampling methods

placenta  n.胎盘    uterine  adj.子宫的

2.4. Methodology

        ①The overall framework:

2.4.1. Data preprocessing

        ①Encode missing values to feature vector:

        ②How to deal with missing data:

where the new category might be "undefined" class

        ③After this, encoding non-textual feature when feature dimensions are different: binning for skewed or non-linearly correlated continuous variables and one hot for categorical features

        ④Meanwhile, they define the key factor symbol(这哪来的啊,就是上面的纯紫色表,我咋知道重不重要)

skewed  adj.斜的;偏的;偏向(或偏重)…的;歪曲的;歪的;不准确的;有偏颇的  v.歪斜;偏离;歪曲;曲解;影响…的准确性;使不公允

2.4.2. Data balancing

        ①Two steps oversampling:

where LCFs is lowly correlated features and PFs is promising features

(1)Feature classification

        ①The i-th sample in the dataset D=\{s_{1},s_{2},\ldots,s_{N}\} has p feature number and label y:

s_i=(x_{i1},x_{i2},\ldots,x_{ip},y_i)

        ②Measure the relationship between feature and label, by Pearson correlation:

r_j=\frac{\sum_{i=1}^N(x_{ij}-\bar{x}_j)(y_i-\bar{y})}{\sqrt{\sum_{i=1}^N\left(x_{ij}-\bar{x}_j\right)^2}\sqrt{\sum_{i=1}^N\left(y_i-\bar{y}\right)^2}}

(能直接看到标签的吗??是统计?)

        ③Apply t test on correlation, which ensures less influence from random factor:

t=\frac{r_j\sqrt{N-2}}{\sqrt{1-\left(r_j\right)^2}}

(2)Part-synthetic oversampling strategy

        ①LCF value for new generated sample:

s_i^{\prime{(LCF)}}=s_i^{(LCF)}+\lambda_i\cdot\left(s_{in}^{(LCF)}-s_i^{(LCF)}\right)

where \lambda_{i}\in(0,1) and s_{in}\in N_{K}\left(s_{i}\right) denotes the nearest K neighbors of s_i

        ②New PF value generated by Gaussian perturbation:

s_i^{^{\prime}(PF)}=s_i^{(PF)}+\varepsilon_i

where \varepsilon_{i}\sim\mathcal{N}\left(0,\sigma^{2}\right)

        ③Algorithm of PSOS:

2.4.3. Model integrating in the prediction framework

        ①Collaborative label prediction of three models:

\left.Label=\left\{ \begin{array} {ll}1 & \quad\mathrm{if}P_{psos}^1>0.5 \\ \\ 1 & \quad\mathrm{if}\min\left(P_{\mathrm{Ros}}^1,P_{Rus}^1\right)>\max\left(P_{psos}^0,\tau\right) \\ \\ 0 & \quad\mathrm{otherwise}, \end{array}\right.\right.

(emmm)where 1 denotes PPH and 0 denotes NL

        ②Algorithm of PPH prediction:

2.5. Experiments and discussions

2.5.1. Dataset and experimental settings

        ①Dataset: gynecology and obstetrics of Sichuan Provincial Maternity and Child Health Care Hospital

        ②Definition of PPH: blood loss of ≥ 500 ml after vaginal delivery of a baby, or ≥ 1000 ml after caesarean section within 2 hours

        ③Sample: 24,110 in total, and only 663 (2.75%) are PPH

        ④Feature in dataset:

        ⑤Demographics:

(1)Experimental configuration

        ①Dimension of feature: 223

        ②Hyper-parameter: \alpha=0.1,\beta=0.3,\sigma=0.15,\rho=25,K=5,\gamma=45,\tau=0.75

        ③Neural network (NN): linear layers with 200, 64 and 32 hidden dimension and ReLU

        ④Optimizer: Adam with learning rate of 0.0001 and weight decay with 0.0001

(2)Evaluation metrics

        ①Metrics:

\begin{aligned} & Sensitivity=\frac{TP}{TP+FN}, \\ & F1={\frac{2TP}{2TP+FP+FN}}, \\ & MCC={\frac{TP\times TN-FP\times FN}{\sqrt{(TP+FP)\times(TP+FN)\times(TN+FP)\times(TN+FN)}}}, \\ & Gmean=\sqrt{\frac{TP}{TP+FN}\times\frac{TN}{TN+FP}}, \\ & YI=\frac{TP}{TP+FN}+\frac{TN}{TN+FP}-1, \end{aligned}

2.5.2. Experimental results and analysis

(1)Results of prediction framework

        ①Performance:

(2)Results of data preprocessing

        ①How text feature improves performance:

(3)Results of data balancing strategies

        ①Performance of different sampling strategies:

(4)Results of ablation experiments

        ①Performance of module ablation:

where O is complete model, PB and N denote data preprocessing, data balancing, and a single NN model

2.5.3. Discussion

        ①Hyper-parameter can be further optimized

2.6. Conclusion

        ~

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值