Statistics in Clinical Research_use of two-segment piecewise linear models in esti-优快云博客

本文链接：https://blog.youkuaiyun.com/2401_86101889/article/details/140909843

Normality assessment

[7]

Shapiro-Wilk test v.s. Kolmogorov-Smirnov test (two most widely used methods to test the normality of the data)

Shapiro-Wilk test is more appropriate method for small sample sizes (<50 samples) while Kolmogorov-Smirnov test is used for n>=50.

If the p-value is greater than the chosen alpha level (commonly 0.05), you fail to reject the null hypothesis, suggesting that the data does not significantly deviate from normality.

from scipy.stats import shapiro

data0=list(df_test_0['mkkk']).copy()
data1=list(df_test_1['mkkk']).copy()

stat0, p0 = shapiro(data0)
stat1, p1 = shapiro(data1)
print(p0,p1)

Comparison of numerical variable

（一文看懂统计学、T检验、F检验、卡方检验 - 知乎）

t：正态、同方差、独立

MW与KS：前者对位置如中位数位置敏感；后者对位置及形状均敏感；

MW与KW：后者可用于两组以上。前者适用于小样本？？？

Regression

c test预先共线性检测，pca分析共线性并剔除；VIF 方法；或用岭回归等模型。

Comparison of rates RXC

n>=40 & T>=5，chi.

n>=40 & T!>=5，correction.

n!>=40 or T<1，Fisher.

n refers to total counts

hypothesis testing - Chi-squared test with scipy: what's the difference between chi2_contingency and chisquare? - Cross Validated

Repeated measures ANOVA

Concepts

To detect any overall differences between related means, in related groups, not independant groups. [6]

Compare one-way ANOVA and repeated measures ANOVA: data in different groups from independant subjects v.s. from same subjects.

Pre-test

norm-distribution: repeated measures ANOVA; else: Scheirer–Ray–Hare test.

P for interaction (wait to be completed ......240906)

[5][4]

CONCEPTS

Interaction / subgroup analysis: Excluding the controlled factor, is outcome also effected by a baseline / demographic factor? Including additive and multiplicative interaction effects.

If no p for interaction, whether prespecified or post hoc?

METHODS

SYNOPTICALLY

Two ways to assess a statistical interaction: 1) stratification. treatment effects are assessed across subgroups defined by a baseline / demographic factor; 2) interaction modelling. treatment and the baseline / demographic factor are included together with an interaction term into a statistical model (treatment + baseline factor + treatment*baseline factor).

SPECIFICALLY

For one focus, across statistical models, interaction results may be not consistent, so:

considering two tested scales: 1) additive scale__RD risk difference, linear models--linear regression model; 2) multiplicative scale__RR risk ratio, exponential models--logistic and Cox regression models.

TIPS

1) concidering to categorize the continuous factor.

Cases / Articles methods analysis

A case analysis form an article 1

[1] [2] [3]

Kolmogorov-Smirnov test:

compares your data with a known distribution and lets you know if they have the same distribution. it does not assume any particular underlying distribution. But it is commonly used as test for normality.

Levene tests:

used to check that variances are equal for all samples when your data comes from a non normal distribution. checking the assumption of equal variances before running a test like One-Way ANOVA is needed. If faily certained that data comes from a normal or nearly normal distribution, use Bartlett's Tset instead.

Kaplan-Meier analysis:

common used survival analysis. compared to Cox proportional hazard model, the Kaplan-Meier method is intuitive and nonparametric and therefore requires few assumptions. However, besides a treatment variable (control, treatment1, treatment2, ...), it cannot easily incorporate additional variables and predictions into the model.

Log-rank test:

used in survival analysis to compare the distrivution of time to event in two or more independent samples.

Breslow test:

Breslow-Day test is used to assess the homogeneity of odds ratios in a meta-analysis. it is a non-parametric test that compares the observed odds ratios of different studies with the expected odds ratio, which is the average of all the observed odds ratios. The Breslow-Day test assumes that all the studies included in the meta-analysis have the same true odds ratio.

Cox proportional hazards regression:

for investigation the association between the survival time of patients and one or more predictor variables. the above mentioned methods-Kaplan-Meier curves and logrank tests are examples of univariate analysis. they describe the survival according to one factor under investigation, but ignore the impact of any others. Additionally, Kaplan-Meier curves and logrank tests are useful only when the predictor variable is categorical. they do not work easily for quantitative predictors such as gene expression, weight, or age. Cox proportional hazards regression analysis is an alternative method, which works for both quantitative predictor variables and for categorical variables. Furthermore, the Cox regression model extends survival analysis methods to assess simultaneously the effect of several risk factors on survival tiem.

Firth-penalized Cox proportional hazards regression:

Proportional hazards regression models often suffer from monotone likelihoods, in which the likelihood converges to a finite value but at least one paramteter diverges. Firth's penalized likelihood is also used to correct monotone likelihoods and to obtain parameter estimates that converge.

Categorical variables are presented as numbers and relative frequencies (percentages); continuous variables are presented either as mean+-SD or median with IQR according to their distributions, which were checked by using the Kolmogorov-Smirnov and Levene tests. Data were analyzed ...... xxx showed discordant...... excluded from the xxx analysis...... Kaplan-Meier analysis was used to calculate the cumulative incidence of primary and secondary clinical outcomes, and the log-rank test or the Breslow test was used to compare between-group differences. In addition, Cox proportional hazards regression was used to calculate hazard ratios (HRs) and 95% CIs to compare between-group differences. Firth-penalized Cox proportional hazards regression was used for the separation problem.