零代码也能玩转机器学习？手把手带你打造零代码智能平台

正牌强哥

已于 2025-01-26 12:55:21 修改

阅读量494

点赞数 4

分类专栏： AI 文章标签：人工智能

于 2025-01-26 12:54:52 首次发布

本文链接：https://blog.youkuaiyun.com/qq_44493378/article/details/145367595

版权

AI 专栏收录该内容

4 篇文章

订阅专栏

语言：python

工具：pandas，pycaret，streamlit；

一、背景

在日常工作中，业务部门在某些业务场景下，会提出对业务数据进行分类或回归预测的需求。在AI领域，这类数据处理任务的技术已相当成熟。但是，业务人员缺乏AI知识和代码编写能力，这成为了他们利用AI技术进行数据处理的阻碍。为打破这一壁垒，让业务人员能够自主高效地进行数据分析与处理，我打造了一个零代码机器学习平台。借助该平台，业务人员只需轻松上传数据，平台即可自动完成模型训练，并妥善保存训练好的模型。后续，业务人员还能运用这些模型，对新数据展开预测工作，一站式完成数据处理全流程。

‍

二、实现思路

在数据处理与分析领域，我们采用 PyCaret 库来完成一系列关键任务。通过 PyCaret，对数据进行高效处理，自动开展模型训练，并深入分析各个模型的性能表现，最终精准筛选出最优模型。同时，自动保存该最优模型的文件，当有新数据到来时，我们可以直接利用这个保存好的最优模型进行预测。为了让整个过程更加直观、便捷，即使是没有编程基础的人员也能轻松上手，我们引入了 Streamlit 工具。它实现了数据处理与模型应用过程的可视化，用户只需通过简单的零代码互动操作，就能轻松完成数据上传、模型训练、结果查看等一系列流程。

‍

三、实现步骤

安装pandas，pycaret，streamlit库：

pip install pandas pycaret streamlit

零代码智能平台代码：新建文件：AI_quick.py

import streamlit as st
import pandas as pd
import os
import io
import matplotlib.pyplot as plt

# 分类任务函数
def classification_task(data, target_variable, train_size):
    from pycaret.classification import setup, compare_models, save_model, pull, plot_model, predict_model
    setup(data=data, target=target_variable, session_id=123, normalize=True, train_size=train_size)
    best_model = compare_models()
    st.write("最佳模型：", best_model)
    save_model(best_model, 'best_classification_model')

    # 获取模型对比结果
    model_comparison = pull()
    st.write("多模型对比结果：")
    st.dataframe(model_comparison)

    return best_model

# 回归任务函数
def regression_task(data, target_variable, train_size):
    from pycaret.regression import setup, compare_models, save_model, pull, predict_model
    setup(data=data, target=target_variable, train_size=train_size)
    best_model = compare_models()
    st.write("最佳模型：", best_model)
    save_model(best_model, 'best_regression_model')
    # 获取模型对比结果
    model_comparison = pull()
    st.write("多模型对比结果：")
    st.dataframe(model_comparison)
    return best_model

# 预测函数
def prediction(model_path, prediction_file):
    if os.path.exists(f'{model_path}.pkl'):
        if 'classification' in model_path:
            from pycaret.classification import load_model, predict_model
        else:
            from pycaret.regression import load_model, predict_model
        loaded_model = load_model(model_path)
        st.write("模型已成功载入。")

        # 读取待预测数据
        if prediction_file.name.endswith('.csv'):
            prediction_data = pd.read_csv(prediction_file, encoding='utf-8-sig')
        elif prediction_file.name.endswith('.xlsx'):
            prediction_data = pd.read_excel(prediction_file, engine='openpyxl')

        predictions = predict_model(loaded_model, data=prediction_data)
        st.write("预测结果：")
        st.write(predictions)
    else:
        st.write("未找到相应的模型文件，请先训练模型。")

# 定义主函数
def main():
    st.title("PyCaret 模型平台")

    # 上传数据
    uploaded_file = st.file_uploader("上传数据集 (CSV 或 Excel格式)", type=["csv", "xlsx"])

    if uploaded_file is not None:
        # 判断文件类型并读取数据
        if uploaded_file.name.endswith('.csv'):
            data = pd.read_csv(uploaded_file, encoding='utf-8-sig')
        elif uploaded_file.name.endswith('.xlsx'):
            data = pd.read_excel(uploaded_file, engine='openpyxl')
        data = pd.DataFrame(data)

        st.markdown("数据基本内容：")
        st.write(data.head(10))

        # 选择任务类型
        task_type = st.selectbox("选择任务类型", ["分类", "回归"])

        # 选择目标变量
        target_variable = st.selectbox("选择目标变量", data.columns)

        # 输入训练集比例
        train_size = st.number_input("输入训练集比例（0 - 1之间）", min_value=0.0, max_value=1.0, value=0.7, step=0.01)

        # 训练模型
        if st.button("训练模型"):
            if task_type == "分类":
                best_model = classification_task(data, target_variable, train_size)
            else:
                best_model = regression_task(data, target_variable, train_size)

        # 载入已有模型进行预测
        if st.checkbox("载入最佳模型进行预测"):
            if task_type == "分类":
                model_path = 'best_classification_model'
            else:
                model_path = 'best_regression_model'

            # 上传待预测数据
            prediction_file = st.file_uploader("上传待预测数据", type=["csv", "xlsx"])
            if prediction_file is not None:
                prediction(model_path, prediction_file)

if __name__ == "__main__":
    main()