一文介绍 statisitc behand ab test anova test handson code

该文通过Python代码展示了如何进行A/B测试的数据预处理,包括检查用户组别与页面分配的一致性、处理重复记录,以及计算所需最小样本量。接着,文章进行了假设检验,如z-test和t-test,结果显示控制组和实验组之间转换率的差异无统计学意义。此外,还运用ANOVA测试分析了时间段对转换率的影响,结论是时间段对转换无显著影响。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

一文介绍 statisitc behind a/b test, anova test handson code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.stats.api as sms
import scipy.stats as stats
import warnings
warnings.filterwarnings('ignore')
from math import ceil
sns.set_theme(style='dark')
sns.set(rc={'figure.figsize':(20,15)})
data_ab = pd.read_csv('data/ab_test.csv')
data_ab.head()
idtimecon_treatpageconverted
085110411:48.6controlold_page0
180422801:45.2controlold_page0
266159055:06.2treatmentnew_page0
385354128:03.1treatmentnew_page0
486497552:26.2controlold_page1

EDA

check whether user in treatment group only receive new pages , check whether duplicated records exists

data_ab.columns = ["user_id", "timestamp", "group", "landing_page", "converted"]
# num of rows and unique user
print('row number:{0}'.format(data_ab.shape[0]))
print('unique user:{0}'.format(data_ab['user_id'].nunique()))
row number:294478
unique user:290584
data_ab.head()
user_idtimestampgrouplanding_pageconverted
085110411:48.6controlold_page0
180422801:45.2controlold_page0
266159055:06.2treatmentnew_page0
385354128:03.1treatmentnew_page0
486497552:26.2controlold_page1
# check whether mismatch exists
data_ab.groupby(by=['group'],as_index=False).agg({'landing_page':pd.Series.nunique})
grouplanding_page
0control2
1treatment2
pd.crosstab(data_ab['group'],data_ab['landing_page'],margins='True')
landing_pagenew_pageold_pageAll
group
control1928145274147202
treatment1453111965147276
All147239147239294478
# check for mismatch 
n_treat = data_ab[data_ab['group']=='treatment'].shape[0]
n_newpage = data_ab[data_ab['landing_page']=='new_page'].shape[0]
diff = n_treat - n_newpage
pd.DataFrame({'n treatment':[n_treat],
             'n newPage':[n_newpage],
             'diff':diff})
n treatmentn newPagediff
014727614723937
page_diff = pd.crosstab(index=data_ab['group'],columns=data_ab['landing_page'])
page_diff.plot.bar()

在这里插入图片描述

# clean up the dataset, so that group=treatment -> page= new
df = data_ab[(data_ab['group']=='treatment')&(data_ab['landing_page']=='new_page')|(data_ab['group']=='control')&(data_ab['landing_page']=='old_page')]
df.shape[0]
290585
df[df.duplicated(subset=['user_id'])==True]
user_idtimestampgrouplanding_pageconverted
289377319255:59.6treatmentnew_page0
df = df.drop_duplicates('user_id',keep='first')

AB TEST

  • post campaign review
  • calculate minimum sample size before campaign
df['user_id'] = df['user_id'].astype(str)
control_cvt = df[df['group']=='control']['converted'].mean()
print('control_cvt:%.4f'%control_cvt)
control_cvt:0.1204
treat_cvt= df[df['group']=='treatment']['converted'].mean()
print('control_cvt:%.4f'%treat_cvt)
control_cvt:0.1188
n_treat = df[df['group']=='treatment']['user_id'].count()
n_control = df[df['group']=='control']['user_id'].count()
n_treat/n_control
               
1.0002478075911725
n_control
145274

test on minimum sample size

e f f e c t s i z e = δ s t d effect_size = \frac {\delta}{std} effectsize=stdδ
做完一个hypothesis test后,如果P<5%,还需要计算效应量,如果P>5%,需要计算功效
效应量:样本间差异或相关程度的量化指标,P值判定是否有统计学意义,效应量判断差异性有多大,反应实际上的意义,有时即使有显著统计学意义,效应量却很小。ES<0.2,则小效应,0.2-0.5中效应,>0.5大效应

effect_size = sms.proportion_effectsize(control_cvt,treat_cvt)
print('effect_size:%.4f'%effect_size)
# proportion_effectsize effect size的公式是cohen h
effect_size:0.0049
effect_size2 = power_analysis.solve_power(effect_size = None,power=0.8,alpha=0.05,nobs1 =n_treat)
# 给定sample size,power,显著性水平计算effect size

effect_size和effect_size2计算结果含义不同,effect_size是指给定baseline和improvement后计算出的效应量,此时样本量没有固定,为infinite。effect_size2是指给定样本量后最小能够计算出的效应量。

required_n = sms.NormalIndPower().solve_power(effect_size,power=0.8,alpha=0.05,ratio=n_treat/n_control)
required_n = ceil(required_n)
print('required_n:%d'%required_n)
required_n:663492
import statsmodels.stats.power as smp
power_analysis = smp.TTestIndPower()
n_required = power_analysis.solve_power(effect_size = effect_size,power=0.8,alpha=0.05,ratio=n_treat/n_control)
print('n_required:%.4f'%n_required)
n_required:663492.3208

因此,control_cvt为0.1188,treat_cvt为0.1204时,至少需要样本量663492,但是实验样本量只有145274,检测不出

test on whether experiment achieve the goal: difference is significant

通过ztest和ttest坐下hypothesis test

convert_old = df[(df["converted"] == 1) & (df["landing_page"] == "old_page")]['user_id'].nunique()
convert_new = df[(df["converted"] == 1) & (df["landing_page"] == "new_page")]['user_id'].nunique()
n_old = df[df["landing_page"] == "old_page"]['user_id'].nunique()
n_new = df[df["landing_page"] == "new_page"]['user_id'].nunique()
n_old
145274
convert_old
17489

方法1:通过ztest做hypothesis test

z_score,p_value = sm.stats.proportions_ztest(np.array([convert_new,convert_old]),np.array([n_new,n_old]), alternative = 'larger')
p_value
0.9050583127590245

方法2:通过ttest做hypothesis test

当sample size>30时,用ztest也可以

from scipy import stats as st
new_gr = df.loc[df['landing_page']=="new_page",'converted'].to_numpy()
old_gr = df.loc[df['landing_page']=="old_page",'converted'].to_numpy()
st.ttest_ind(new_gr,old_gr)
Ttest_indResult(statistic=-1.3109235634981506, pvalue=0.18988462498742617)

方法3:手算 ttest

var_old = control_cvt*(1-control_cvt)
var_new = treat_cvt*(1-treat_cvt)
p_delta = -control_cvt +  treat_cvt
print(p_delta)
-0.0015782389853555567
pooled_se = np.sqrt(var_new/n_new + var_old/n_old)
t_sts = p_delta/pooled_se
# degree of freedom
dof = (var_new/n_new + var_old/n_old)**2/((var_new/n_new)**2/(n_new-1) + (var_old/n_old)**2/(n_old-1))
dof
290571.7142957336
pvalue = 2*st.t.cdf(-abs(t_sts),dof)
print(pvalue)
0.18988341360095998

通过假设检验,也可以看出,原假设即control组合treat组没区别不能被拒绝

计算置信区间

(low_treat,up_treat) = st.norm.interval(0.95,loc=treat_cvt,scale=np.sqrt(var_new/n_new))
(low_control,up_control) = st.norm.interval(0.95,loc=control_cvt,scale=np.sqrt(var_old/n_old))
(lower_con, lower_treat), (upper_con, upper_treat) = sm.stats.proportion_confint(np.array([convert_new,convert_old]),np.array([n_new,n_old]),alpha=0.05)

scip和statmodels库两个方法计算置信区间,control组和treat组区间有重合,也说明原假设不能被拒绝

ANOVA TEST

测试下时间段是否对转换有影响,将时间段分为0-20,20-40,40-60

df.head()
user_idtimestampgrouplanding_pageconvertedhourtime_range
085110411:48.6controlold_page0110-20
180422801:45.2controlold_page010-20
266159055:06.2treatmentnew_page05540-60
385354128:03.1treatmentnew_page02820-40
486497552:26.2controlold_page15240-60
df['hour'] = [int(i.split(':')[0]) for i in df['timestamp']]
df['time_range'] = pd.cut(df['hour'],bins=[0,20,40,60],include_lowest=True,labels=['0-20','20-40','40-60'])

方法1

fvalue,pvl = stats.f_oneway(df[df['time_range']=='0-20']['converted'],df[df['time_range']=='20-40']['converted'],df[df['time_range']=='40-60']['converted'])
print('fvalue:{0} pvalue:{1}'.format(fvalue,pvl))
fvalue:0.6913612944665806 pvalue:0.5008945648250425
from statsmodels.formula.api import ols
df_melt = pd.melt(df,id_vars=['time_range'],value_vars=['converted'])
df_melt.head()
time_rangevariablevalue
00-20converted0
10-20converted0
240-60converted0
320-40converted0
440-60converted1

方法2

df.head()
user_idtimestampgrouplanding_pageconvertedhourtime_range
085110411:48.6controlold_page0110-20
180422801:45.2controlold_page010-20
266159055:06.2treatmentnew_page05540-60
385354128:03.1treatmentnew_page02820-40
486497552:26.2controlold_page15240-60
model = ols('converted~C(time_range)',data=df[['time_range','converted']]).fit()
anova_table = sm.stats.anova_lm(model,typ=2)
anova_table
sum_sqdfFPR(>F)
C(time_range)0.1455932.00.6913610.500895
Residual30596.496834290581.0NaNNaN

Pvalue>0.05,因此判断按照目前的时间段划分,时间段对转换无影响

demo two - way anova test

model_twoway = ols('converted~C(time_range) + C(landing_page) + C(time_range):C(landing_page)',data=df[['time_range','converted','landing_page']]).fit()
anova_table_twoway = sm.stats.anova_lm(model_twoway,typ=2)
anova_table_twoway
sum_sqdfFPR(>F)
C(time_range)0.1453092.00.6900170.501568
C(landing_page)0.1806661.01.7158250.190232
C(time_range):C(landing_page)0.1844782.00.8760150.416440
Residual30596.131690290578.0NaNNaN

结果表明,时间划分,landingpage及时间和landingpage的组合都对转换无影响

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值