【统计-Distributions】

本文介绍了统计学的基本概念,包括个体、变量的分类(分类变量与定量变量)、数据分布、统计数据的图形展示方式(如条形图、饼图、茎叶图等),以及如何使用数值指标(如均值、中位数、标准差)来描述数据集的特点。

简单就是最好的.


keypoint:

1.

Individuals are the objects described in a set of data. Individuals are

sometimes people. When the objects that we want to study are not

people, we often call them cases.


A variable is any characteristic of an individual. A variable can take different

values for different individuals.


2.

CATEGORICAL AND QUANTITATIVE VARIABLES

A categorical variable places an individual into one of two or more
groups or categories.

A quantitative variable takes numerical values for which arithmetic operations
such as adding and averaging make sense.


The distribution of a variable tells us what values it takes and how often
it takes these values.  


3. rate

一个有趣的例子:

Accidents for passenger cars and motorcycles. The government’s
Fatal Accident Reporting System says that 27,102 passenger cars were involved
in fatal accidents in 2002. Only 3339 motorcycles had fatal accidents
that year.2 Does this mean that motorcycles are safer than cars? Not at all—
there are many more cars than motorcycles, so we expect cars to have a
higher count of fatal accidents.


A better measure of the dangers of driving is a rate, the number of fatal
accidents divided by the number of vehicles on the road. In 2002, passenger
cars had about 21 fatal accidents for each 100,000 vehicles registered. There
were about 67 fatal accidents for each 100,000 motorcycles registered. The
rate for motorcycles is more than three times the rate for cars. Motorcycles
are, as we might guess, much more dangerous than cars.



4. 图形

1. bar graph

2.pie graph 

Pie charts require that you include all the categories that make up a whole.

3.stemplot

Stemplots do not work well for large data sets, where each stem must hold a large number of leaves.


--------------------------------------------------------------------------------------------------------------------------------------------------------

1. mean x; To find the mean x of a set of observations, add their values and divide
by the number of observations. If the n observations are x1, x2, . . . , xn,
their mean is

x = (x1 + x2 +· · ·+xn)/n

特征:the mean is sensitive to the influence of a few extreme observations


2.resistant measure:

Its value does not respond strongly to changes in a few observations, no matter
how large those changes may be.


71

3.mean and median

mean is average, median is typical


outliers 极端值


4. Minimum Q1 M Q3 Maximum boxplot.

5.THE 1.5 × IQR RULE FOR OUTLIERS

Call an observation a suspected outlier if it falls more than 1.5 × IQR
above the third quartile or below the first quartile.


6.s

s, like the mean x, is not resistant. A few outliers can make s very large.

The use of squared deviations renders s even more sensitive than x to a few extreme observations


7. 如何选择

CHOOSING A SUMMARY
The five-number summary is usually better than the mean and standard
deviation for describing a skewed distribution or a distribution with
strong outliers. Use x and s only for reasonably symmetric distributions
that are free of outliers.


8.Linear transformations do not change the shape of a distribution.


9.分布的线性转换


------------------------------------------------------------------------------

1.Density curves 密度曲线

One way to think of a density curve is as a smooth approximation to the irregular
bars of a histogram.

A density curve is a curve that
• is always on or above the horizontal axis and
• has area exactly 1 underneath it.


从密度曲线上得到的信息。

mode 众数: A mode of a distribution described by a density curve is a peak point of
the curve, the location where the curve is highest

median 中间数: the median is the point with half the total area on each side.


mean 平均值: 质点

曾经纠结过的一段话:

A density curve is an idealized description of a distribution of data. For
example, the symmetric density curve in Figure 1.25 is exactly symmetric,
but the histogram of vocabulary scores is only approximately symmetric. We
therefore need to distinguish between the mean and standard deviation of the
density curve and the numbers x and s computed from the actual observations.
mean μ The usual notation for the mean of an idealized distribution is μ (the Greek
standard deviation σ letter mu). We write the standard deviation of a density curve as σ (the Greek
letter sigma).


2.Normal distributions.

   The curve with the larger standard deviation is more spread out.

    如何目测sigma:  The points at which this change of curvature takes place are located at distance
σ on either side of the mean μ.(拐点)


THE 68–95–99.7 RULE
In the Normal distribution with mean μ and standard deviation σ:
• Approximately 68% of the observations fall within σ of the mean μ.
• Approximately 95% of the observations fall within 2σ of μ.
• Approximately 99.7% of the observations fall within 3σ of μ.



  In fact, all Normal distributions are the same if we measure in units of size
σ about the mean μ as center.

 Observations larger than the mean are positive when standardized, and observations smaller than the mean
are negative.







这个是完整源码 python实现 Django 【python毕业设计】基于Python的天气预报(天气预测分析)(Django+sklearn机器学习+selenium爬虫)可视化系统.zip 源码+论文+sql脚本 完整版 数据库是mysql 本研究旨在开发一个基于Python的天气预报可视化系统,该系统结合了Django框架、sklearn机器学习库和Selenium爬虫技术,实现对天气数据的收集、分析和可视化。首先,我们使用Selenium爬虫技术从多个天气数据网站实时抓取气象数据,包括温度、湿度、气压、风速等多项指标。这些数据经过清洗和预处理后本研究旨在开发一个基于Python的天气预报可视化系统,该系统结合了Django框架、sklearn机器学习库和Selenium爬虫技术,实现对天气数据的收集、分析和可视化。首先,我们使用Selenium爬虫技术从多个天气数据网站实时抓取气象数据,包括温度、湿度、气压、风速等多项指标。这些数据经过清洗和预处理后,将其存储在后端数据库中,以供后续分析。 其次,采用s,将其存储在后端数据库中,以供后续分析。 其次,采用sklearn机器学习库构建预测模型,通过时间序列分析和回归方法,对未来天气情况进行预测。我们利用以往的数据训练模型,以提高预测的准确性。通过交叉验证和超参数优化等技术手段,我们优化了模型性能,确保其在实际应用中的有效性和可靠性。 最后,基于Django框架开发前端展示系统,实现天气预报的可视化。用户可以通过友好的界面查询实时天气信息和未来几天内的天气预测。系统还提供多种图表类型,包括折线图和柱状图,帮助用户直观理解天气变化趋势。 本研究的成果为天气预报领域提供了一种新的技术解决方案,不仅增强了数据获取和处理的效率,还提升了用户体验。未来,该系统能够扩展至其他气象相关的应用场景,为大众提供更加准确和及时的气象服务。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值