继续Step 5: Statistics And Linear Algebra/Probability And Statistics In Python: Intermediate
Introduction to probability
Calculating Probabilities
>>数据集bike sharing Dataset,地址here
地板除//,5//4=1
计算阶乘:math.factorial(N)
p = .6
q = .4
import math
def calc_prob(total,days):
per_pro=(p**(days))*(q**(total-days))
num=math.factorial(total)/math.factorial(days)/math.factorial(total-days)
return per_pro*num
prob_8=calc_prob(10,8)
Probability distributions
import math
# Each item in this list represents one k, starting from 0 and going up to and including 30.
outcome_counts = list(range(31))
#手写二项分布的代码
def calc_prob(N,k,p,q):
prob=(p**k)*(q**(N-k))
count=math.factorial(N)/math.factorial(k)/math.factorial(N-k)
return prob*count
outcome_probs=[]
for i in outcome_counts:
outcome_probs.append(calc_prob(30,i,.39,.61))
#利用scipy库进行二项分布求解
import scipy
from scipy import linspace
from scipy.stats import binom
# Create a range of numbers from 0 to 30, with 31 elements (each number has one entry).
outcome_counts = linspace(0,30,31)
outcome_probs=binom.pmf(outcome_counts,30,0.39)
plt.bar(outcome_counts,outcome_probs)
plt.show()
#二项分布均值Np,方差Npq
#进行试验的测试足够多时,二项分布近似正态分布
#累计概率密度,binom.cdf()
# The sum of all the probabilities to the left of k, including k.
left = binom.cdf(k,N,p)
# The sum of all probabilities to the right of k.
right = 1 - left
Significance Testing : p-value,置信区间的概念
Chi-squared tests
产生0.0~1.0之间随机数numpy.random.random(a,b),返回a*b维的ndarray
#手动产生卡方分布
chi_squared_values = []
for i in range(1000):
numbers=numpy.random.random(32561,)
for i in range(len(numbers)):
if numbers[i]<0.5:
numbers[i]=0
else:
numbers[i]=1
mal=32561-numpy.sum(numbers)
femal=numpy.sum(numbers)
male_diff=(mal-16280.5)**2/16280.5
female_diff=(femal-16280.5)**2/16280.5
chi_squared_values.append(male_diff+female_diff)
plt.hist(chi_squared_values)
plt.show()
#利用scipy产生卡方值
from scipy.stats import chisquare
observed = np.array([5, 10, 15])
expected = np.array([7, 11, 12])
chisquare_value, pvalue = chisquare(observed, expected)
Multi category chi-squared tests
pandas.crosstab 计算DataFrame表中的各项频次关系
import pandas
table = pandas.crosstab(income["sex"], [income["race"]])
print(table)
scipy.stats.chi2_contingency 函数返回一些卡方分布参数
from scipy.stats import chi2_contingency
table=pandas.crosstab(income['sex'],[income['race']])
chisq_value, pvalue, df, expected= chi2_contingency(table)
pvalue_gender_race=pvalue
Guided Project: Winning Jeopardy
list.remove() 可以直接修改list,移除第一个匹配项,但是没有返回值
Solving Systems of Equations with Matrices/vectors
#矩阵行变换
import numpy as np
matrix = np.asarray([
[2, 1, 25],
[3, 2, 40]
], dtype=np.float32)
matrix[0]*=2
matrix[0]-=matrix[1]
matrix[1]-=(matrix[0]*3)
matrix[1]/=2
#行与行进行交换
matrix[[0,2]] = matrix[[2,0]]
#对多个向量作图
import numpy as np
import matplotlib.pyplot as plt
# We're going to plot two vectors
# The first will start at origin 0,0, then go over 1 and up 2
# The second will start at origin 1,2, then go over 3 and up 2
# The third will start at origin 0,0, then go over 4 and up 4
X = [0,1,0]
Y = [0,2,0]
U = [1,3,4]
V = [2,2,4]
plt.quiver(X, Y, U, V, angles='xy', scale_units='xy', scale=1)
plt.xlim([0,6])
plt.ylim([0,6])
plt.show()
#矩阵相乘numpy.dot(A,B)