O2O优惠券预测

Allenlzcoder

于 2025-06-23 00:56:32 发布

阅读量208

点赞数 2

CC 4.0 BY-SA版权

分类专栏：阿里云天池大赛文章标签：机器学习

本文链接：https://blog.youkuaiyun.com/Allenlzcoder/article/details/148835584

阿里云天池大赛专栏收录该内容

1 篇文章

订阅专栏

阿里云天池大赛-赛题解析
常见库导入缩写习惯：

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from datetime import date
import datetime as dt
from scipy import stats

1.数据探索

代码连接：https://tianchi.aliyun.com/notebook/129415

1.1.画箱形图示例

fig = plt.figure(figsize=(4, 6))  # 指定绘图对象宽度和高度
sns.boxplot(dftrain[(dftrain.label>=0)&(dftrain.distance>=0)]['distance'],orient="v", width=0.5)
plt.show()

1.2.直方图和QQ图

plt.figure(figsize=(10,5))
# ax=plt.subplot(1,2,1)
sns.distplot(
    dftrain[(dftrain.label>=0)&(dftrain.distance>=0)]['distance'],
    fit=stats.norm) # 拟合正态分布曲线
plt.show()

plt.figure(figsize=(10,5))
# ax=plt.subplot(1,2,2)
res = stats.probplot(dftrain[(dftrain.label>=0)&(dftrain.distance>=0)]['distance'], plot=plt)
plt.show()

1.3.概率图

stats.probplot 是 SciPy 库中用于生成概率图（Probability Plot）的核心函数，主要用于检验数据是否符合特定理论分布（如正态分布）。以下是其核心含义和使用方法的详细说明：

plt.figure(figsize=(10,5))
res = stats.probplot(dftrain[(dftrain.label>=0)&(dftrain.discount_rate>=0)]['discount_rate'], plot=plt)
plt.show()

1.4.对比分布

ax = sns.kdeplot(dftrain[(dftrain.label>=0)&(dftrain.discount_rate>=0)]['discount_rate'], color="Red", shade=True)
ax = sns.kdeplot(dftest[(dftest.discount_rate>=0)]['discount_rate'], color="Blue", shade=True)
ax.set_xlabel('discount_rate')
ax.set_ylabel("Frequency")
ax = ax.legend(["train","test"])
plt.show()

seaborn.kdeplot() 是 Seaborn 库中用于绘制核密度估计图（Kernel Density Estimate Plot）的核心函数。它通过平滑的曲线展示单变量或双变量数据的概率密度分布，是数据探索和统计建模中常用的可视化工具。以下是其核心含义、参数及使用场景的详细说明：

1.5.可视化线性关系

plt.figure(figsize=(8,4))
sns.regplot(x='distance', y='label', data=dftrain[(dftrain.label>=0)&(dftrain.distance>=0)][['distance','label']], ax=ax, 
            scatter_kws={'marker':'.','s':3,'alpha':0.3},
            line_kws={'color':'k'});
plt.xlabel('distance')
plt.ylabel('label')