算法-基于transformers/中文bert的分类fine-turing

哥德巴赫的猜想

已于 2022-02-24 00:36:45 修改

阅读量1.3k

点赞数

文章标签：算法 bert 分类

于 2022-02-24 00:16:46 首次发布

本文链接：https://blog.youkuaiyun.com/godlovebinlee/article/details/123102224

版权

本文介绍了使用BERT-base-chinese模型进行中文信息安全分类竞赛的训练环境配置和数据处理。作者在Windows 10环境下，配置了CUDA 11.3和PyTorch 1.10.2，选择了Python 3.8，并详细讲述了数据读取和预处理的步骤。文章旨在通过实际操作加深对NLP任务的理解。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

算法-基于bert-base-chinese的中文信息安全分类竞赛实现

前言

学习了nlp一段时间，了解nlp的任务和技术体系，需要写代码练习下常见的nlp任务。找到最为基础分类任务。比赛名称是：面向数据安全治理的数据内容智能发现与分级分类

一、win10 训练环境配置

cuda / cudnn / pytorch 的版本选择

1.cuda 和 cudnn的选择：
pip3 install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio===0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
2.python版本选择：
在python3.6和python3.8中测试了transformers库下bert-base-chinese的模型，transformers 4.16.2 无法在python3.6上运行
3.本地cuda环境版本：
(py38_pt_common) D:\dev\envs\py38_pt_common>nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__2_19:25:35_Pacific_Daylight_Time_2021
Cuda compilation tools, release 11.4, V11.4.48
Build cuda_11.4.r11.4/compiler.30033411_0
4.关于cuda和cudnn版本可以搜索得到

配置命令记录

conda create --prefix=D:\dev\envs\py38_pt_common python=3.8
conda activate D:\dev\envs\py38_pt_common
pip3 install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio===0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
pip install scikit-learn
pip install transformers
pip install sentencepiece

二、数据认识

1.引入库

代码如下（示例）：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
import  ssl
ssl._create_default_https_context = ssl._create_unverified_context

2.读入数据

代码如下（示例）：

data = pd.read_csv(
    'https://labfile.oss.aliyuncs.com/courses/1283/adult.data.csv')
print(data.head())

该处使用的url网络请求的数据。

总结

提示：这里对文章进行总结：
例如：以上就是今天要讲的内容，本文仅仅简单介绍了pandas的使用，而pandas提供了大量能使我们快速便捷地处理数据的函数和方法。