p值 统计学意义_什么是统计意义? P值定义以及如何计算

P值是统计分析中的重要概念,用于推断统计显著性。它表示假设原假设为真时,观察到的数据结果出现的概率。低P值意味着在原假设下结果出现的几率较小,可能拒绝原假设。文章通过假设检验、卡方示例和常见误解的讨论,阐述了P值的计算和正确解释的重要性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

p值 统计学意义

P values are one of the most widely used concepts in statistical analysis. They are used by researchers, analysts and statisticians to draw insights from data and make informed decisions.

P值是统计分析中使用最广泛的概念之一。 研究人员,分析人员和统计学家使用它们来从数据中汲取见解并做出明智的决策。

Along with statistical significance, they are also one of the most widely misused and misunderstood concepts in statistical analysis.

除了统计意义外,它们还是统计分析中最广泛使用和误解的概念之一。

This article will explain:

本文将说明:

  • how a P value is used for inferring statistical significance

    P值如何用于推断统计显着性
  • how P values are calculated

    如何计算P值
  • and how to avoid some common misconceptions

    以及如何避免一些常见的误解

回顾:假设检验 (Recap: Hypothesis testing)

Hypothesis testing is a standard approach to drawing insights from data. It is used in virtually every quantitative discipline, and has a rich history going back over one hundred years.

假设检验是一种从数据中获取见解的标准方法。 几乎所有定量学科都使用它,并且已有一百多年的悠久历史。

The usual approach to hypothesis testing is to define a question in terms of the variables you are interested in. Then, you can form two opposing hypotheses to answer it.

假设检验的通常方法是根据您感兴趣的变量定义一个问题。然后,您可以形成两个相反的假设来回答该问题。

  • The null hypothesis claims there is no statistically significant relationship between the variables

    零假设声称变量之间没有统计上的显着关系

  • The alternative hypothesis claims there is a statistically significant relationship between the variables

    替代假设声称变量之间存在统计上的显着关系

For example, say you are testing whether caffeine affects programming productivity. There are two variables you are interested in - the dose of the caffeine, and the productivity of group of software developers.

例如,假设您正在测试咖啡因是否会影响编程效率。 您感兴趣的变量有两个-咖啡因的剂量和软件开发人员的生产率。

The null hypothesis would be:

假设为:

  • "Caffeine intake has no significant effect on programming productivity".

    “摄入咖啡因对编程效率没有显着影响 ”。

The alternative hypothesis would be:

另一种假设是:

  • "Caffeine intake does have a significant effect on productivity".

    “摄入咖啡因确实会对生产力产生重大影响 ”。

The word 'significant' has a very specific meaning here. It refers to a relationship between variables existing due to something more than chance alone.

“重要”一词在此具有非常具体的含义。 它指的是由于某些因素而不仅仅是偶然因素而存在的变量之间的关系。

Instead, the relationship exists (at least in part) due to 'real' differences or effects between the variables.

相反,由于变量之间的“实际”差异或影响,这种关系存在(至少部分存在)。

The next step is to collect some data to test the hypotheses. This could be collected from an experiment or survey, or from a set of data you have access to.

下一步是收集一些数据以检验假设。 这可以从实验或调查中收集,也可以从您有权访问的一组数据中收集。

The final step is to calculate a test statistic from the data. This is a single number that represents some characteristic of your data. Examples include the t-test, Chi-squared test, and the Kruskal-Wallis test - among many others.

最后一步是根据数据计算测试统计量。 这是一个代表数据某些特征的数字。 示例包括t检验,卡方检验和Kruskal-Wallis检验-等等。

Exactly which one to calculate will depend on the question you are asking, the structure of your data, and the distribution of your data.

究竟要计算哪一个取决于您要问的问题,数据的结构以及数据的分布。

Here's a handy cheatsheet for your reference.

这是一个方便的备忘单,供您参考。

In the caffeine example, a suitable test might be a two-sample t-test.

在咖啡因示例中,合适的测试可能是两样本t检验

You will end up with a single test statistic from your data. All that is left to do is interpret this result to determine whether it supports or rejects the null hypothesis.

您最终将获得数据中的单个测试统计信息。 剩下要做的就是解释这个结果,以确定它是否支持或拒绝原假设。

This is where P values come into play.

这就是P值起作用的地方。

该统计数据不太可能出现? (How unlikely is this statistic?)

Recall that you have calculated a test statistic, which represents some characteristic of your data. You want to understand whether it supports or rejects the null hypothesis.

回想一下,您已经计算了一个测试统计量,该统计量代表了数据的某些特征。 您想了解它是否支持或拒绝原假设。

The approach taken is to assume the null hypothesis is true. That is, assume there are no significant relationships between the variables you are interested in.

所采用的方法是假设零假设为真。 也就是说,假设您感兴趣的变量之间没有显着的关系。

Then, look at the data you have collected. How likely would your test statistic be if the null hypothesis really is true?

然后,查看您收集的数据。 如果原假设正确,那么您的检验统计量有多大?

Let's refer back to the caffeine intake example from before.

让我们回顾一下以前的咖啡因摄入量示例。

  • Say that productivity levels were split about evenly between developers, regardless of whether they drank caffeine or not (graph A). This result would be likely to occur by chance if the null hypothesis were true.

    假设无论开发人员是否喝咖啡因,生产力水平在开发人员之间平均分配(图A)。 如果原假设为真,则可能会偶然发生此结果。

  • However, suppose that almost all of the highest productivity was seen in developers who drank caffeine (graph B). This is a more 'extreme' result, and would be unlikely to occur just by chance if the null hypothesis were true.

    但是,假设在喝咖啡因的开发人员中几乎可以看到最高的生产率(图B)。 这是一个更“极端”的结果,并且如果原假设为真,则不可能偶然发生

But how 'extreme' does a result need to be before it is considered too unlikely to support the null hypothesis?

但是,在结果被认为不太可能无法支持原假设的情况下,结果必须是多么“极端”?

This is what a P value lets you estimate. It provides a numerical answer to the question: "if the null hypothesis is true, what is the probability of a result this extreme or more extreme?"

这就是P值可让您估算的值。 它提供了以下问题的数字答案:“如果原假设是正确的,那么这个极端或更大极端的结果的概率是多少?”

P values are probabilities, so they are always between 0 and 1.

P值是概率,因此它们始终在0到1之间。

  • A high P value indicates the observed results are likely to occur by chance under the null hypothesis.

    高P值表示在原假设下观察到的结果很可能偶然发生

  • A low P value indicates that the results are less likely to occur by chance under the null hypothesis.

    P值低表示在原假设下偶然发生结果的可能性较小

Usually, a threshold is chosen to determine statistical significance. This threshold is often denoted α.

通常,选择阈值以确定统计显着性。 该阈值通常表示为α。

If the P value is below the threshold, your results are 'statistically significant'. This means you can reject the null hypothesis (and accept the alternative hypothesis).

如果P值低于阈值 ,则您的结果“具有统计意义 ”。 这意味着您可以拒绝原假设(并接受替代假设)。

There is no one-size-fits-all threshold suitable for all applications. Usually, an arbitrary threshold will be used that is appropriate for the context.

没有适合所有应用的“一刀切”门槛。 通常,将使用适合于上下文的任意阈值。

For example, in fields such as ecology and evolution, it is difficult to control experimental conditions because many factors can affect the outcome. It can also be difficult to collect very large sample sizes. In these fields, a threshold of 0.05 will often be used.

例如,在生态和进化等领域,由于许多因素都会影响实验结果,因此很难控制实验条件。 收集非常大的样本量也可能很困难。 在这些字段中,通常将使用0.05的阈值。

In other contexts such as physics and engineering, a threshold of 0.01 or even lower will be more appropriate.

在诸如物理和工程学的其他情况下,阈值0.01甚至更低将更为合适。

卡方示例 (Chi-squared example)

In this example, there are two (fictional) variables: region, and political party membership. It uses the Chi-squared test to see if there's a relationship between region and political party membership.

在此示例中,有两个(虚构的)变量:地区和政党成员。 它使用卡方检验来查看地区和政党成员之间是否存在关系。

You can change the number of members for each party.

您可以更改每一方的成员数量。

  • Null hypothesis: "there is no significant relationship between region and political party membership"

    零假设:“地区与政党成员之间没有显着关系

  • Alternative hypothesis: "there is a significant relationship between region and political party membership"

    替代假设:“地区与政党成员之间存在重要关系

Hit the "rerun" button to try different scenarios.

点击“重新运行”按钮尝试不同的情况。

常见的误解以及如何避免它们 (Common misconceptions and how to avoid them)

There are several mistakes that even experienced practitioners often make about the use of P values and hypothesis testing. This section will aim to clear those up.

即使是经验丰富的从业人员,在使用P值和假设检验时也会经常犯一些错误。 本节旨在清除这些内容。

The null hypothesis is uninteresting - if the data is good and analysis is done right, then it is a valid conclusion in its own right.

零假设无趣 -如果数据良好且分析正确,那么它本身就是有效的结论。

A question worth answering should have an interesting answer - whatever the outcome.

worth无论结果如何,值得回答的问题都应该有一个有趣的答案。

P value is the probability of the null hypothesis being true - a P value represents "the probability of the results, given the null hypothesis being true". This is not the same as "the probability of the null hypothesis being true, given the results".

❌P 值是原假设为真的概率-P值表示“假设原假设为真,结果的概率”。 这与“在给出给定结果的情况下,原假设为真的概率”不同。

P(Data | Hypothesis) ≠ P(Hypothesis | Data)

P(数据|假设)≠P(假设|数据)

This means a low P value tells you: "if the null hypothesis is true, these results are unlikely". It does not tell you: "if these results are true, the null hypothesis is unlikely".

这意味着低P值会告诉您:“如果零假设成立,那么这些结果就不太可能”。 它不会告诉你:“如果这些结果是真实的,零假设是不可能的”。

You can use the same significance threshold for multiple comparisons - remember the definition of the P value. It is the probability of observing a certain test statistic by chance alone.

您可以对多个比较使用相同的显着性阈值 -请记住P值的定义。 这是仅靠偶然观察某项统计数据的概率。

If you use a threshold of α = 0.05 (or 1-in-20) and you carry out, say, 20 stats tests... you might expect by chance alone to find a low P value.

如果您使用阈值α= 0.05(或20分之一),并且进行了20次统计测试,那么您可能会偶然发现低P值。

You should use a lower threshold if you are carrying out multiple comparisons. There are correction methods that will let you calculate how much lower the threshold should be.

if如果要进行多个比较,则应使用较低的阈值。 有一些校正方法可让您计算阈值应降低多少。

The significance threshold means anything at all - it is entirely arbitrary. 0.05 is just a convention. The difference between p = 0.049 and p = 0.051 is the pretty much the same as between p = 0.039 and p = 0.041.

significance 重要性阈值意味着任何东西 -完全是任意的。 0.05只是一个约定。 p = 0.049和p = 0.051之间的差异与p = 0.039和p = 0.041之间的差异几乎相同。

This is one of the biggest weaknesses of hypothesis testing this way. It forces you to draw a line in the sand, even though no line can easily be drawn.

这是这种假设检验的最大弱点之一。 即使没有任何线条可以轻易绘制,它也会迫使您在沙子上画一条线。

Therefore, always consider significance thresholds for what they are - totally arbitrary.

因此,始终考虑重要性阈值是什么-完全是任意的。

Statistical significance means chance plays no part - far from it. Often, there are many causes for a given outcome. Some will be random, others less so.

❌具有统计意义,意味着机会不起作用-远非如此。 通常,给定结果的原因很多。 有些会是随机的,有些则不会如此。

Finding one non-random cause doesn't mean it explains all the differences between your variables. It is important not to mistake statistical significance with "effect size".

找到一个非随机原因并不意味着它可以解释变量之间的所有差异。 重要的是不要将统计显着性与“效应大小”相混淆。

P values are the only way to determine statistical significance - there are other approaches which are sometimes better.

❌P 值是确定统计显着性的唯一方法 -有些方法有时更好。

As well as classical hypothesis testing, consider other approaches - such as using Bayes factors, or False Positive Risk instead.

classical与经典假设检验一样,请考虑其他方法-例如使用贝叶斯因子误报风险

翻译自: https://www.freecodecamp.org/news/what-is-statistical-significance-p-value-defined-and-how-to-calculate-it/

p值 统计学意义

### 关于连接操作中的 P 无法精确计算的原因 在统计学中,当涉及到复杂的数据结构或模型之间的交互时,尤其是通过某种形式的“连接”来构建更复杂的推断框架时,P 可能难以被精确计算。这主要源于以下几个方面: #### 1. 数据依赖性和非独立性 许多经典的假设检验方法(如 t 检验、卡方检验等)都基于一个核心前提——观察之间相互独立。然而,在涉及连接操作的情况下,比如将多个子模型的结果组合起来形成最终结论的过程中,不同部分可能会引入隐含的相关性或依赖关系[^2]。这种相关性破坏了传统统计测试所需的独立同分布 (i.i.d.) 条件,从而使得传统的 P 计算不再适用。 例如,在某些集成学习法(如随机森林或者 boosting 方法)中,各个基分类器虽然单独训练但彼此间存在一定的关联性;又或者是时间序列分析里前后时刻的状态转移也会造成类似的效应。这些情况都会使误差项变得更为复杂而不易于解析表达出来用于后续的概率评估过程之中[^3]。 #### 2. 复杂分布形态下的近似处理 即使能够定义清楚所有变量间的相互作用模式及其对应的联合概率密度函数(pdf),实际运过程中仍然面临巨大挑战。对于高维空间内的多元正态或其他类型的特殊分布来说,直接积分求解累积分布函数(CDF)往往非常困难甚至不可能完成手工演工作量过大。因此很多时候不得不采用数模拟技术或者其他简化手段来进行估计而不是得到确切答案[^1]。 具体而言,当我们尝试把若干个简单事件串联起来构成一个新的复合命题并希望据此得出相应显著水平α下拒绝原假设H₀与否的标准界限时,如果每一个组成部分本身已经具备较为复杂的内部结构,则整体系统的特性很可能偏离常规认知范围之外。此时再单纯依靠查表法获取临界点位置显然不够准确可靠。 #### 3. 计算资源限制与效率考量 除了理论上存在的障碍外,实践层面也有诸多因素制约着完全精准地确定某个特定场景下的真实p-value可能性。现代科学研究经常面对海量规模的数据集合以及高度动态变化的目标对象环境设定条件不断调整更新迭代优化方案等等都需要消耗大量CPU/GPU周期才能实现预期目标达成预定效果所以有时候为了追求速度牺牲一点精度也是完全可以接受的选择之一. 综上所述,由于上述种种原因共同作用导致我们在很多情况下只能获得关于给定统计数据S相对于零假设成立情形下出现至少如此极端结果的发生几率的一个大致区间估而非绝对意义上的唯一真表示. ```python import scipy.stats as stats # Example of calculating a two-tailed p-value from a t-statistic with given degrees of freedom. def calculate_p_value(t_stat, df): """ Calculate the two-tailed p-value based on the provided t-statistic and degrees of freedom. Parameters: t_stat : float The calculated t-statistic value. df : int Degrees of freedom associated with the sample data used to compute `t_stat`. Returns: float: Two-tailed p-value corresponding to the input parameters. """ prob_one_tail = stats.t.sf(abs(t_stat), df=df) return 2 * prob_one_tail example_t_stat = 2.086 degrees_of_freedom = 9 calculated_p_val = calculate_p_value(example_t_stat, degrees_of_freedom) print(f"P-value for t={example_t_stat} at {degrees_of_freedom} DF is approximately {calculated_p_val:.4f}.") ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值