p value and confidence level

原创已于 2025-01-26 13:19:31 修改 · 1k 阅读

18 ·

CC 4.0 BY-SA版权

文章标签：

#数据分析

于 2025-01-26 11:54:27 首次发布

面试求职同时被 3 个专栏收录

10 篇文章

订阅专栏

统计

4 篇文章

订阅专栏

数据分析

3 篇文章

订阅专栏

P-value

The p-value is a measure used in statistical hypothesis testing to help you determine the significance of your results. Here’s a step-by-step breakdown of what it represents:

Null Hypothesis (H0): This is the default assumption that there is no effect or no difference. For example, if you’re testing whether a new drug is effective, the null hypothesis might state that the new drug has no effect compared to a placebo.
Alternative Hypothesis (H1): This is the hypothesis that there is an effect or a difference. Continuing with the drug example, the alternative hypothesis would state that the new drug is effective.
Calculation of the P-value: When you perform a statistical test, you calculate a p-value which quantifies the evidence against the null hypothesis. The p-value represents the probability of obtaining test results at least as extreme as the observed data, assuming that the null hypothesis is true.
假设原假设是对的，观测数据求出的统计量在原假设的分布下的概率，p value是我们能得到比观测值算出的统计量还极端的概率；如果要拒绝原假设，那么alpha至少要比p大，alpha也是一类错误，即原假设是对的，但是拒绝了原假设（本来没效果，说有效果）
Interpretation:
- Low P-value (typically ≤ 0.05): This suggests that the observed data is unlikely under the null hypothesis, leading you to reject the null hypothesis in favor of the alternative hypothesis.
- High P-value (typically > 0.05): This indicates that the observed data is consistent with the null hypothesis, so there is not enough evidence to reject it.
  如果检验出来ab两组的差异有5%，我们不能承认5%是显著的，因为原假设是没差异，我们只是拒绝了没有差异，并不是承认有5%

Confidence Level

The confidence level is associated with confidence intervals and reflects how confident you are that a parameter lies within a specified range. Here’s how it works:
根据样本数据算出来的一个区间，总体的统计量以一个置信度(confidence level)落在这个区间内

Confidence Interval: This is a range of values, derived from the sample data, that is likely to contain the true population parameter (e.g., mean, proportion) with a certain level of confidence.
Confidence Level: This is the probability that the confidence interval contains the true population parameter. Common confidence levels are 90%, 95%, and 99%.
- 95% Confidence Level: If you were to take 100 different samples and compute a confidence interval from each sample, approximately 95 of those intervals would contain the true population parameter.
Interpretation: A higher confidence level means that you can be more certain that the interval contains the parameter, but it also results in a wider interval. Conversely, a lower confidence level means a narrower interval but less certainty.

Relationship Between P-value and Confidence Level

Both concepts are related to statistical inference but serve different purposes:

The p-value helps you decide whether to reject the null hypothesis.
The confidence level helps you estimate a range within which the true parameter lies.