Numerical Measures in R

本文详细分析了美国怀俄明州黄石国家公园的老忠实喷泉等待时间与喷发持续时间的数据,通过计算平均值、中位数、四分位数等统计量,揭示了喷发的规律,并进一步探讨了等待时间和喷发时间之间的相关性,通过方差、标准差、协方差和相关系数来量化这种关系。

Data: Faithful -- Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA.

> head(faithful)
  eruptions waiting
1     3.600      79
2     1.800      54
3     3.333      74
4     2.283      62
5     4.533      85
6     2.883      55
> summary(faithful)
   eruptions        waiting    
 Min.   :1.600   Min.   :43.0  
 1st Qu.:2.163   1st Qu.:58.0  
 Median :4.000   Median :76.0  
 Mean   :3.488   Mean   :70.9  
 3rd Qu.:4.454   3rd Qu.:82.0  
 Max.   :5.100   Max.   :96.0  

Mean, median, quantile

> mean(faithful$eruptions)
[1] 3.487783
> median(faithful$eruptions)
[1] 4
> quantile(faithful$eruptions)
     0%     25%     50%     75%    100% 
1.60000 2.16275 4.00000 4.45425 5.10000 
 

Sample variance is defined as 

> var(faithful$eruptions)
[1] 1.302728

The standard deviation of an observation variable is the square root of its variance.

> sd(faithful$eruptions)
[1] 1.141371

The covariance of two variances x and y in a data sample how the two are linear related.  A positive covariance would indicates a positive linear relationship between the variables, and a negative covariance would indicate the opposite.

The sample covariance is defined in terms of the sample means as:

> cov(faithful$eruptions, faithful$waiting)
[1] 13.97781

The correlation coefficient of two variables in a data sample is their covariance divided by the product of their individual standard deviations. It is a normalized measurement of how the two are linearly related.

The sample correlation coefficient is defined by the following formula, where sxand sy are the sample standard deviations, and sxy is the sample covariance.


cor(faithful$eruptions, faithful$waiting)
[1] 0.9008112

The  k th   central moment  (or moment about the mean ) of a data sample is:


In particular, the second central moment of a population is its variance.

> library("moments", lib.loc="~/R/win-library/3.2")
> moment(faithful$eruptions, order = 3, central = TRUE)
[1] -0.6149059

The skewness of a data population is defined by the following formula, where μ2 and μ3 are the second and third central moments.


Intuitively, the skewness is a measure of symmetry. As a rule, negative skewness indicates that the mean of the data values is less than the median, and the data distribution is left-skewed. Positive skewness would indicates that the mean of the data values is larger than the median, and the data distribution is right-skewed.

> skewness(faithful$eruptions)
[1] -0.415841


The  kurtosis  of a univariate population is defined by the following formula, where  μ 2  and  μ 4 are the second and fourth  central moments .


Intuitively, the kurtosis is a measure of the peakedness of the data distribution. Negative kurtosis would indicates a flat data distribution, which is said to be platykurtic. Positive kurtosis would indicates a peaked distribution, which is said to be leptokurtic. Incidentally, the normal distribution has zero kurtosis, and is said to be mesokurtic.

> kurtosis(faithful$eruptions)
[1] 1.4994

添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值