Dataquest学习总结[7]

继续Step 5: Statistics And Linear Algebra/Probability And Statistics In Python: Intermediate

 Introduction to probability

Calculating Probabilities

>>数据集bike sharing Dataset,地址here  

地板除//,5//4=1

计算阶乘:math.factorial(N)

p = .6
q = .4
import math
def calc_prob(total,days):
    per_pro=(p**(days))*(q**(total-days))
    num=math.factorial(total)/math.factorial(days)/math.factorial(total-days)
    return per_pro*num
prob_8=calc_prob(10,8)
Probability distributions

import math
# Each item in this list represents one k, starting from 0 and going up to and including 30.
outcome_counts = list(range(31))
#手写二项分布的代码
def calc_prob(N,k,p,q):
    prob=(p**k)*(q**(N-k))
    count=math.factorial(N)/math.factorial(k)/math.factorial(N-k)
    return prob*count
outcome_probs=[]
for i in outcome_counts:
    outcome_probs.append(calc_prob(30,i,.39,.61))

#利用scipy库进行二项分布求解
import scipy
from scipy import linspace
from scipy.stats import binom
# Create a range of numbers from 0 to 30, with 31 elements (each number has one entry).
outcome_counts = linspace(0,30,31)
outcome_probs=binom.pmf(outcome_counts,30,0.39)
plt.bar(outcome_counts,outcome_probs)
plt.show()

#二项分布均值Np,方差Npq
#进行试验的测试足够多时,二项分布近似正态分布
#累计概率密度,binom.cdf()
# The sum of all the probabilities to the left of k, including k.
left = binom.cdf(k,N,p)
# The sum of all probabilities to the right of k.
right = 1 - left
Significance Testing : p-value,置信区间的概念

Chi-squared tests

产生0.0~1.0之间随机数numpy.random.random(a,b),返回a*b维的ndarray

#手动产生卡方分布
chi_squared_values = []
for i in range(1000):
    numbers=numpy.random.random(32561,)
    for i in range(len(numbers)):
        if numbers[i]<0.5:
            numbers[i]=0
        else:
            numbers[i]=1
    mal=32561-numpy.sum(numbers)
    femal=numpy.sum(numbers)
    male_diff=(mal-16280.5)**2/16280.5
    female_diff=(femal-16280.5)**2/16280.5
    chi_squared_values.append(male_diff+female_diff)
plt.hist(chi_squared_values)
plt.show()

#利用scipy产生卡方值
from scipy.stats import chisquare
observed = np.array([5, 10, 15])
expected = np.array([7, 11, 12])
chisquare_value, pvalue = chisquare(observed, expected)
Multi category chi-squared tests
pandas.crosstab 计算DataFrame表中的各项频次关系

import pandas
table = pandas.crosstab(income["sex"], [income["race"]])
print(table)
scipy.stats.chi2_contingency   函数返回一些卡方分布参数

from scipy.stats import chi2_contingency
table=pandas.crosstab(income['sex'],[income['race']])
chisq_value, pvalue, df, expected= chi2_contingency(table)
pvalue_gender_race=pvalue
Guided Project: Winning Jeopardy

代码 here    数据集here

list.remove()  可以直接修改list,移除第一个匹配项,但是没有返回值

Solving Systems of Equations with Matrices/vectors

#矩阵行变换
import numpy as np
matrix = np.asarray([
    [2, 1, 25],
    [3, 2, 40]  
], dtype=np.float32)
matrix[0]*=2
matrix[0]-=matrix[1]
matrix[1]-=(matrix[0]*3)
matrix[1]/=2
#行与行进行交换
matrix[[0,2]] = matrix[[2,0]]

#对多个向量作图
import numpy as np
import matplotlib.pyplot as plt
# We're going to plot two vectors
# The first will start at origin 0,0, then go over 1 and up 2
# The second will start at origin 1,2, then go over 3 and up 2
# The third will start at origin 0,0, then go over 4 and up 4
X = [0,1,0]
Y = [0,2,0]
U = [1,3,4]
V = [2,2,4]
plt.quiver(X, Y, U, V, angles='xy', scale_units='xy', scale=1)
plt.xlim([0,6])
plt.ylim([0,6])
plt.show()
#矩阵相乘numpy.dot(A,B)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值