p值统计学意义_什么是统计意义？ P值定义以及如何计算

最新推荐文章于 2025-05-04 23:28:24 发布

翻译最新推荐文章于 2025-05-04 23:28:24 发布

· 1w 阅读

13 ·

版权

原文链接：https://www.freecodecamp.org/news/what-is-statistical-significance-p-value-defined-and-how-to-calculate-it/

文章标签：

#python #机器学习 #java #人工智能 #大数据

P值是统计分析中的重要概念，用于推断统计显著性。它表示假设原假设为真时，观察到的数据结果出现的概率。低P值意味着在原假设下结果出现的几率较小，可能拒绝原假设。文章通过假设检验、卡方示例和常见误解的讨论，阐述了P值的计算和正确解释的重要性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

p值统计学意义

P values are one of the most widely used concepts in statistical analysis. They are used by researchers, analysts and statisticians to draw insights from data and make informed decisions.

P值是统计分析中使用最广泛的概念之一。研究人员，分析人员和统计学家使用它们来从数据中汲取见解并做出明智的决策。

Along with statistical significance, they are also one of the most widely misused and misunderstood concepts in statistical analysis.

除了统计意义外，它们还是统计分析中最广泛使用和误解的概念之一。

This article will explain:

本文将说明：

how a P value is used for inferring statistical significance
P值如何用于推断统计显着性
how P values are calculated
如何计算P值
and how to avoid some common misconceptions
以及如何避免一些常见的误解

回顾：假设检验 (Recap: Hypothesis testing)

Hypothesis testing is a standard approach to drawing insights from data. It is used in virtually every quantitative discipline, and has a rich history going back over one hundred years.

假设检验是一种从数据中获取见解的标准方法。几乎所有定量学科都使用它，并且已有一百多年的悠久历史。

The usual approach to hypothesis testing is to define a question in terms of the variables you are interested in. Then, you can form two opposing hypotheses to answer it.

假设检验的通常方法是根据您感兴趣的变量定义一个问题。然后，您可以形成两个相反的假设来回答该问题。

The null hypothesis claims there is no statistically significant relationship between the variables
零假设声称变量之间没有统计上的显着关系
The alternative hypothesis claims there is a statistically significant relationship between the variables
替代假设声称变量之间存在统计上的显着关系

For example, say you are testing whether caffeine affects programming productivity. There are two variables you are interested in - the dose of the caffeine, and the productivity of group of software developers.

例如，假设您正在测试咖啡因是否会影响编程效率。您感兴趣的变量有两个-咖啡因的剂量和软件开发人员的生产率。

The null hypothesis would be:

原假设为：

"Caffeine intake has no significant effect on programming productivity".
“摄入咖啡因对编程效率没有显着影响 ”。

The alternative hypothesis would be:

另一种假设是：

"Caffeine intake does have a significant effect on productivity".
“摄入咖啡因确实会对生产力产生重大影响 ”。

The word 'significant' has a very specific meaning here. It refers to a relationship between variables existing due to something more than chance alone.

“重要”一词在此具有非常具体的含义。它指的是由于某些因素而不仅仅是偶然因素而存在的变量之间的关系。

Instead, the relationship exists (at least in part) due to 'real' differences or effects between the variables.

相反，由于变量之间的“实际”差异或影响，这种关系存在(至少部分存在)。

The next step is to collect some data to test the hypotheses. This could be collected from an experiment or survey, or from a set of data you have access to.

下一步是收集一些数据以检验假设。这可以从实验或调查中收集，也可以从您有权访问的一组数据中收集。

The final step is to calculate a test statistic from the data. This is a single number that represents some characteristic of your data. Examples include the t-test, Chi-squared test, and the Kruskal-Wallis test - among many others.

最后一步是根据数据计算测试统计量。这是一个代表数据某些特征的数字。示例包括t检验，卡方检验和Kruskal-Wallis检验-等等。

Exactly which one to calculate will depend on the question you are asking, the structure of your data, and the distribution of your data.

究竟要计算哪一个取决于您要问的问题，数据的结构以及数据的分布。

Here's a handy cheatsheet for your reference.

这是一个方便的备忘单，供您参考。

In the caffeine example, a suitable test might be a two-sample t-test.

在咖啡因示例中，合适的测试可能是两样本t检验。

You will end up with a single test statistic from your data. All that is left to do is interpret this result to determine whether it supports or rejects the null hypothesis.

您最终将获得数据中的单个测试统计信息。剩下要做的就是解释这个结果，以确定它是否支持或拒绝原假设。

This is where P values come into play.

这就是P值起作用的地方。

该统计数据不太可能出现？ (How unlikely is this statistic?)

Recall that you have calculated a test statistic, which represents some characteristic of your data. You want to understand whether it supports or rejects the null hypothesis.

回想一下，您已经计算了一个测试统计量，该统计量代表了数据的某些特征。您想了解它是否支持或拒绝原假设。

The approach taken is to assume the null hypothesis is true. That is, assume there are no significant relationships between the variables you are interested in.

所采用的方法是假设零假设为真。也就是说，假设您感兴趣的变量之间没有显着的关系。

Then, look at the data you have collected. How likely would your test statistic be if the null hypothesis really is true?

然后，查看您收集的数据。如果原假设正确，那么您的检验统计量有多大？

Let's refer back to the caffeine intake example from before.

让我们回顾一下以前的咖啡因摄入量示例。

Say that productivity levels were split about evenly between developers, regardless of whether they drank caffeine or not (graph A). This result would be likely to occur by chance if the null hypothesis were true.
假设无论开发人员是否喝咖啡因，生产力水平在开发人员之间平均分配(图A)。如果原假设为真，则可能会偶然发生此结果。
However, suppose that almost all of the highest productivity was seen in developers who drank caffeine (graph B). This is a more 'extreme' result, and would be unlikely to occur just by chance if the null hypothesis were true.
但是，假设在喝咖啡因的开发人员中几乎可以看到最高的生产率(图B)。这是一个更“极端”的结果，并且如果原假设为真，则不可能偶然发生 。

But how 'extreme' does a result need to be before it is considered too unlikely to support the null hypothesis?

但是，在结果被认为不太可能无法支持原假设的情况下，结果必须是多么“极端”？

This is what a P value lets you estimate. It provides a numerical answer to the question: "if the null hypothesis is true, what is the probability of a result this extreme or more extreme?"

这就是P值可让您估算的值。它提供了以下问题的数字答案：“如果原假设是正确的，那么这个极端或更大极端的结果的概率是多少？”

P values are probabilities, so they are always between 0 and 1.

P值是概率，因此它们始终在0到1之间。

A high P value indicates the observed results are likely to occur by chance under the null hypothesis.
高P值表示在原假设下观察到的结果很可能偶然发生 。
A low P value indicates that the results are less likely to occur by chance under the null hypothesis.
P值低表示在原假设下偶然发生结果的可能性较小 。

Usually, a threshold is chosen to determine statistical significance. This threshold is often denoted α.

通常，选择阈值以确定统计显着性。该阈值通常表示为α。

If the P value is below the threshold, your results are 'statistically significant'. This means you can reject the null hypothesis (and accept the alternative hypothesis).

如果P值低于阈值 ，则您的结果“具有统计意义 ”。这意味着您可以拒绝原假设(并接受替代假设)。

There is no one-size-fits-all threshold suitable for all applications. Usually, an arbitrary threshold will be used that is appropriate for the context.

没有适合所有应用的“一刀切”门槛。通常，将使用适合于上下文的任意阈值。

For example, in fields such as ecology and evolution, it is difficult to control experimental conditions because many factors can affect the outcome. It can also be difficult to collect very large sample sizes. In these fields, a threshold of 0.05 will often be used.

例如，在生态和进化等领域，由于许多因素都会影响实验结果，因此很难控制实验条件。收集非常大的样本量也可能很困难。在这些字段中，通常将使用0.05的阈值。

In other contexts such as physics and engineering, a threshold of 0.01 or even lower will be more appropriate.

在诸如物理和工程学的其他情况下，阈值0.01甚至更低将更为合适。

卡方示例 (Chi-squared example)

In this example, there are two (fictional) variables: region, and political party membership. It uses the Chi-squared test to see if there's a relationship between region and political party membership.

在此示例中，有两个(虚构的)变量：地区和政党成员。它使用卡方检验来查看地区和政党成员之间是否存在关系。

You can change the number of members for each party.

您可以更改每一方的成员数量。

Null hypothesis: "there is no significant relationship between region and political party membership"
零假设：“地区与政党成员之间没有显着关系 ”
Alternative hypothesis: "there is a significant relationship between region and political party membership"
替代假设：“地区与政党成员之间存在重要关系 ”

Hit the "rerun" button to try different scenarios.

点击“重新运行”按钮尝试不同的情况。

常见的误解以及如何避免它们 (Common misconceptions and how to avoid them)

There are several mistakes that even experienced practitioners often make about the use of P values and hypothesis testing. This section will aim to clear those up.

即使是经验丰富的从业人员，在使用P值和假设检验时也会经常犯一些错误。本节旨在清除这些内容。

❌The null hypothesis is uninteresting - if the data is good and analysis is done right, then it is a valid conclusion in its own right.

❌ 零假设无趣 -如果数据良好且分析正确，那么它本身就是有效的结论。

✅A question worth answering should have an interesting answer - whatever the outcome.

worth无论结果如何，值得回答的问题都应该有一个有趣的答案。

❌P value is the probability of the null hypothesis being true - a P value represents "the probability of the results, given the null hypothesis being true". This is not the same as "the probability of the null hypothesis being true, given the results".

❌P 值是原假设为真的概率-P值表示“假设原假设为真，结果的概率”。这与“在给出给定结果的情况下，原假设为真的概率”不同。

P(Data | Hypothesis) ≠ P(Hypothesis | Data)

P(数据|假设)≠P(假设|数据)

✅This means a low P value tells you: "if the null hypothesis is true, these results are unlikely". It does not tell you: "if these results are true, the null hypothesis is unlikely".

✅这意味着低P值会告诉您：“如果零假设成立，那么这些结果就不太可能”。它不会告诉你：“如果这些结果是真实的，零假设是不可能的”。

❌You can use the same significance threshold for multiple comparisons - remember the definition of the P value. It is the probability of observing a certain test statistic by chance alone.

❌ 您可以对多个比较使用相同的显着性阈值 -请记住P值的定义。这是仅靠偶然观察某项统计数据的概率。

If you use a threshold of α = 0.05 (or 1-in-20) and you carry out, say, 20 stats tests... you might expect by chance alone to find a low P value.

如果您使用阈值α= 0.05(或20分之一)，并且进行了20次统计测试，那么您可能会偶然发现低P值。

✅You should use a lower threshold if you are carrying out multiple comparisons. There are correction methods that will let you calculate how much lower the threshold should be.

if如果要进行多个比较，则应使用较低的阈值。有一些校正方法可让您计算阈值应降低多少。

❌The significance threshold means anything at all - it is entirely arbitrary. 0.05 is just a convention. The difference between p = 0.049 and p = 0.051 is the pretty much the same as between p = 0.039 and p = 0.041.

significance 重要性阈值意味着任何东西 -完全是任意的。 0.05只是一个约定。 p = 0.049和p = 0.051之间的差异与p = 0.039和p = 0.041之间的差异几乎相同。

This is one of the biggest weaknesses of hypothesis testing this way. It forces you to draw a line in the sand, even though no line can easily be drawn.

这是这种假设检验的最大弱点之一。即使没有任何线条可以轻易绘制，它也会迫使您在沙子上画一条线。

✅Therefore, always consider significance thresholds for what they are - totally arbitrary.

✅因此，始终考虑重要性阈值是什么-完全是任意的。

❌Statistical significance means chance plays no part - far from it. Often, there are many causes for a given outcome. Some will be random, others less so.

❌具有统计意义，意味着机会不起作用-远非如此。通常，给定结果的原因很多。有些会是随机的，有些则不会如此。

✅Finding one non-random cause doesn't mean it explains all the differences between your variables. It is important not to mistake statistical significance with "effect size".

✅找到一个非随机原因并不意味着它可以解释变量之间的所有差异。重要的是不要将统计显着性与“效应大小”相混淆。

❌P values are the only way to determine statistical significance - there are other approaches which are sometimes better.

❌P 值是确定统计显着性的唯一方法 -有些方法有时更好。

✅As well as classical hypothesis testing, consider other approaches - such as using Bayes factors, or False Positive Risk instead.

classical与经典假设检验一样，请考虑其他方法-例如使用贝叶斯因子或误报风险。

翻译自: https://www.freecodecamp.org/news/what-is-statistical-significance-p-value-defined-and-how-to-calculate-it/

p值统计学意义