吴恩达cs229|编程作业第八周(Python)

练习八:异常检测和推荐系统

目录:

1.包含的文件

2.异常检测

3.推荐系统

1.包含的文件

文件名 含义
ex8.py 异常检测实验
ex8_cofi.py 推荐系统实验
ex8data1.mat 异常检测数据集1
ex8data2.mat 异常检测数据集2
ex8_movies.mat 电影评分数据集
ex8_movieParams.mat 参数优化
multivariateGaussian.py 多元高斯分布
visualizeFit.py 数据可视化
checkCostFunction.py 协同过滤的梯度检查
computeNumericalGradient.py 近似梯度计算
loadMovieList.py 加载电影列表
movie_ids.txt 电影名字列表
normalizeRatings.py 协同过滤均值规范化
estimateGaussian.py 高斯分布参数估计
selectThreshold.py 异常检测的阈值设置
cofiCostFunc.py 实现协同过滤的代价函数

注:红色部分需要自己填写。

2.异常检测

  • 导入需要的包以及初始化:
import matplotlib.pyplot as plt
import numpy as np
import scipy.io as scio

import estimateGaussian as eg
import multivariateGaussian as mvg
import visualizeFit as vf
import selectThreshold as st

plt.ion()
# np.set_printoptions(formatter={'float': '{: 0.6f}'.format})

2.1数据可视化

# ===================== Part 1: Load Example Dataset =====================
# We start this exercise by using a small dataset that is easy to visualize.
#
# Our example case consists of two network server statistics across
# several machines: the latency and throughput of each machine.
# This exercise will help us find possibly faulty (or very fast) machines
#

print('Visualizing example dataset for outlier detection.')

#  The following command loads the dataset. You should now have the
#  variables X, Xval, yval in your environment.
data = scio.loadmat('ex8data1.mat')
X = data['X']
Xval = data['Xval']
yval = data['yval'].flatten()

# Visualize the example dataset
plt.figure()
plt.scatter(X[:, 0], X[:, 1], c='b', marker='x', s=15, linewidth=1)
plt.axis([0, 30, 0, 30])
plt.xlabel('Latency (ms)')
plt.ylabel('Throughput (mb/s')

input('Program paused. Press ENTER to continue')
  • 可视化结果

2.2估计概率分布

  • 要执行异常检测,首先需要根据数据的分布匹配模型。高斯分布为:

  • 要估计平均值,可以使用:

  • 对于方差:

  • 编写参数估计程序estimateGaussian.py
import numpy as np


def estimate_gaussian(X):
    # Useful variables
    m, n = X.shape

    # You should return these values correctly
    mu = np.zeros(n)
    sigma2 = np.zeros(n)

    # ===================== Your Code Here =====================
    # Instructions: Compute the mean of the data and the variances
    #               In particular, mu[i] should contain the mean of
    #               the data for the i-th feature and sigma2[i]
    #               should contain variance of the i-th feature
    #
    mu = (1/m)*X.sum(axis = 0).reshape(1, -1)
    
    sigma2 = ((1/m)*(X - mu)*(X - mu)).sum(axis = 0)

    # ==========================================================

    return mu, sigma2
  • 估计训练集的概率分布
# ===================== Part 2: Estimate the dataset statistics =====================
# For this exercise, we assume a Gaussian distribution for the dataset.
#
# We first estimate the parameters of our assumed Gaussian distribution,
# then compute the probabilities for each of the points and then visualize
# both the overall distribution and where each of the points falls in
# terms of that distribution
#
print('Visualizing Gaussian fit.')

# Estimate mu and sigma2
mu, sigma2 = eg.estimate_gaussian(X)

# Returns the density of the multivariate normal at each data point(row) of X
p = mvg.multivariate_gaussian(X, mu, sigma2)

# Visualize the fit
vf.visualize_fit(X, mu, sigma2)
plt.xlabel('Latency (ms)')
plt.ylabel('Throughput (mb/s')

input('Program paused. Press ENTER to continue')
  • 查看计算概率分布的程序multivariateGaussian.py
import numpy as np


def multivariate_gaussian(X, mu, sigma2):
    #特征的个数
    k = mu.size

    #如果是基于单元高斯分布的模型  将其sigma2转换为对角矩阵 作为协方差矩阵 代入多元高斯分布公式
    #此时单元模型和多元模型是等价的
    #如果是基于多元高斯分布的模型 直接将计算的协方差矩阵sigma2代入多元高斯分布公式
    if sigma2.ndim == 1 or (sigma2.ndim == 2 and (sigma2.shape[1] == 1 or sigma2.shape[0] == 1)):
        sigma2 = np.diag(sigma2)

    x = X - mu
    p = (2 * np.pi) ** (-k / 2) * np.linalg.det(sigma2) ** (-0.5) * np.exp(-0.5*np.sum(np.dot(x, np.linalg.pinv(sigma2)) * x, axis=1))

    return p
  • 查看数据可视化程序 visualizeFit.py
import matplotlib.pyplot as plt
import numpy as np
import multivariateGaussian as mvg


def visualize_fit(X, mu
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值