1.Overview and Descriptive Statistics

本文介绍了统计学的基本概念,包括总体、样本、变量等,并详细解释了描述性统计和推断统计的区别,同时还涵盖了数据收集、数据展示方法如茎叶图、点图、直方图等内容。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1.Populations, Samples, and Processes

An investigation will typically focus on a well-defined collection of objects constituting a population of interest.

When desired information is available for all objects in the population, we have what is called a census.

A subset of the population -- a sample -- is selected in some prescribed manner.

 

A variable is any characteristic whose value may change from one object to another in the population.

  • Univariate data set consists of observations on a single variable.
  • Bivariate data is when observations are made on each of two variables.
  • Multivariate data arises when observations are made on more than two variables.

Branches of Statistics

  • An investigator who has collected data may wish simply to summarize and describe important features of the data. This entails using methods from descriptive statisitcs.
  • Techniques for generlizing from a sample to a population are gathered within the branch of our discipline called inferential statisitcs.

Enumerative Versus Analytic Studies

  • Enumerative studies, interest is focused on a finite, indetifiable, unchanging collection of individuals or objects that make up a population.
  • Analytic studies are often carried out with the objective of improving a future product by taking action on a process of some sort.

Collecting Data

 

2. Pictorial and Tabular Methods in Descriptive Statistics

Notation

The number of observations in a single sample will often be denoted by n.

Given a data set consisting of n obversations on some variable x, the individual observations will be denoted by x1,x2,x3,...,xn.

Stem-and-Leaf Displays

Steps for Constructing a Stem-and-Leaf Display

  1. Select one or more leading digits for the stem values. The trailing digits become the leaves.
  2. List possible stem values in a vertival column.
  3. Record the leaf for every observation beside the corresponding stem value.
  4. Indicate the units for stems and leaves somplace in the display.

Dotplots

A dotplots is an attractive summary of numerical data when the data set is reasonably samll or there ar relatively few distinct data values. Each observation is represented by a dot above the corresponding location on a horizontal measurement scale.

Histograms

  • A variable is discrete if its set of possible values either is finite or else can be list in an infinite sequence.
  • A variable is continuous if its possible values consist of an entire interval on the number line.

Consider data consisting of observations on a discrete variable x.

  • The frequency of any particular x value is the number of times that value occurs in the data set.
  • The relative frequency of a value is the fraction or proportion of time the value occurs.
  • A frequency distribution is a tabulation of the frequencies and/or relative frequency.

Histogram Shapes

Histograms come in a variety of shapes.

  • Unimodal histogram is one that rises to a single peak and then delines.
  • Bimodal histogram has two differernt peaks.
  • Multimodal histogram has more than two peaks.

 

  • A histogram is symmetric if the left half is the mirror image of the right half.
  • A unimodal is positively skewed if the right or upper tail is stretched our compared with the left or lower tail and negatively skewed if the stretching is to the left.

Qualitative Data

Multivariate Data

 

3. Measures of Location

The Mean

For a given set of number x1,x2,x3,...,xn, the most familiar and useful measure of the center is the mean, or arithmetic average of the set.

The Median

The word median is synonymous with "middle", and the sample median is indeed the middle value when the observations are ordered from smallest to largest.

Other Measures of Location: Quartiles, Percentiles, and Trimmed Means

A trimmed mean is a conpromise between mean and median. A 10% trimmed mean, for example, would be computed by eliminating the smallest 10% and the largest 10% of sample and then averaging what is left over.

Categorical Data and Sample Proportions

 

4. Measures of Variability

Measures of Variability for Sample Data

The simplest measure of variability in a sample is the range, which is the difference between the largest and smallest sample values.

The sample variance, denoted by s2;

The sample standard deviation, denoted by s.

Motivation for s2

We will use σ2 to denote the population variance and σ to denote the population standard deviation.

It is customary to refer to s2 as being based on n-1 degrees of freedom(df).

This terminology results from the fact that although s2 is based on the n quantities, these sum to 0, so specifying the values of any n-1 of the quantities determines the remaining value. For example, if n=4 and x1-x=8,x2-x=-6,x4-x=-4, then automatically we have x3-x=2, so only 3 of the 4 values of xi-x are freely determined(3df).

A Computing Formula for s2 

Boxplots

After the n observations in a data set are ordered from smallest to largest, the lower fourth and upper fourth are given by:

lower fourth:

  • median of the smallest n/2 observations, n even
  • median of the smallest (n+1)/2 observations, n odd

upper fourth:

  • median of the largest n/2 observations, n even
  • median of the largest (n+1)/2 observations, n odd

That is, the lower(upper) fourth is hte median of the smallest(largest) half of the data, where the median is included in both halves if n is odd. A measure of spread that is resistant to ourliersis th fourth spread ƒs, given by:

ƒ = upper fourth - lower fourth

Boxplots that Show Outliers

Any observation father than 1.5ƒs from the closest fourth is an outlier. An outlier is extreme if it is more than 3ƒs from the nearest fourth, and it is mild otherwise.

Comparative Boxplots

A comparative or side-by-side boxplot is a very effective way of revealing similarities and differences between two or more data sets consisting of observations on the same variable.

转载于:https://www.cnblogs.com/cyoutetsu/p/6801925.html

• Utilize SPSS 26.0 statistical software to analyze the quantitative data collected from all study participants. Begin by performing comprehensive descriptive statistics to effectively summarize the central tendency and variability across the key datasets. Specifically, calculate the mean, standard deviation, maximum value, and minimum value for the physical health knowledge scores, physical fitness test results, and satisfaction scores within both the experimental group and the control group, both prior to and following the intervention. This initial step provides a crucial overview of the overall data distribution, aids in identifying any potential outliers or unusual patterns, and establishes a foundational understanding of the dataset characteristics for subsequent analyses. • Subsequently, conduct inferential statistical procedures to rigorously test the study hypotheses and explore potential relationships between variables. Initiate this phase by employing an independent sample t-test. Apply this test to compare the baseline differences in physical health knowledge scores, physical fitness test results, and satisfaction scores between the experimental group and the control group before the intervention commences, using a predetermined significance level of α=0.05. This critical comparison ensures that the two groups are statistically comparable at the outset, confirming the absence of significant pre-existing differences prior to the administration of the intervention. • Proceed next with paired sample t-tests to meticulously examine within-group changes over the intervention period. Conduct these tests separately for the experimental group and the control group, comparing the differences in physical health knowledge scores, physical fitness test results, and satisfaction scores recorded before the intervention with those recorded after the intervention, again applying the α=0.05 significance threshold. This analysis directly assesses the magnitude and statistical significance of changes occurring over time within each group individually, providing insight into the natural progression or any inherent group-specific effects. • Then, implement analysis of covariance (ANCOVA) to account for initial variations between participants and enhance the precision of the between-group comparison after the intervention. For this analysis, incorporate the pre-test (baseline) results as covariates. Analyze the adjusted differences in post-test results for physical health knowledge scores, physical fitness test results, and satisfaction scores between the experimental group and the control group, statistically controlling for these baseline scores. This sophisticated approach effectively eliminates the confounding influence of pre-existing differences among participants, thereby yielding a more accurate and unbiased evaluation of the true intervention effect, with statistical significance assessed at α=0.05. • Finally, execute bivariate correlation analyses to investigate potential linear associations between the measured variables. Analyze the pairwise correlations between physical health knowledge scores, physical fitness test results, and satisfaction scores using Pearson's correlation coefficient (r). This analysis explores the strength and direction of potential relationships and dependencies among these key outcome measures, with the significance of each correlation coefficient rigorously tested at the α=0.05 level. Throughout all inferential analyses (t-tests, ANCOVA, correlation), it is imperative to include thorough checks for underlying statistical assumptions, such as normality of distribution and homogeneity of variances (homoscedasticity), to ensure the validity and robustness of the reported findings.根据以上画一个流程图
08-03
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值