浙江大学PAT_甲级_1070. Mooncake (25)

本文深入探讨了信息技术领域的关键组件和技术应用,包括开发工具、嵌入式硬件、音视频基础、AI音视频处理等,通过实例分析,帮助读者理解各组件的功能和应用场景,促进技术领域的学习和实践。

题目链接:点击打开链接

Mooncake is a Chinese bakery product traditionally eaten during the Mid-Autumn Festival. Many types of fillings and crusts can be found in traditional mooncakes according to the region's culture. Now given the inventory amounts and the prices of all kinds of the mooncakes, together with the maximum total demand of the market, you are supposed to tell the maximum profit that can be made.

Note: partial inventory storage can be taken. The sample shows the following situation: given three kinds of mooncakes with inventory amounts being 180, 150, and 100 thousand tons, and the prices being 7.5, 7.2, and 4.5 billion yuans. If the market demand can be at most 200 thousand tons, the best we can do is to sell 150 thousand tons of the second kind of mooncake, and 50 thousand tons of the third kind. Hence the total profit is 7.2 + 4.5/2 = 9.45 (billion yuans).

Input Specification:

Each input file contains one test case. For each case, the first line contains 2 positive integers N (<=1000), the number of different kinds of mooncakes, and D (<=500 thousand tons), the maximum total demand of the market. Then the second line gives the positive inventory amounts (in thousand tons), and the third line gives the positive prices (in billion yuans) of N kinds of mooncakes. All the numbers in a line are separated by a space.

Output Specification:

For each test case, print the maximum profit (in billion yuans) in one line, accurate up to 2 decimal places.

Sample Input:
3 200
180 150 100
7.5 7.2 4.5
Sample Output:
9.45
这题在乙级考试里出现过, 点击打开链接.

我的C++代码:

#include<iostream>
#include<algorithm>
using namespace std;
struct mooncake
{
	double reserve;//库存量
	double all_price;//总价
	double unit_price;//单价
};
bool compare(mooncake a, mooncake b)
{
	return a.unit_price > b.unit_price;
}
int main()
{
	int i=0,n, requirement;
	double income = 0.0;
	mooncake cake[1001];
	cin >> n >> requirement;
	for (i = 0; i < n; i++)
	{
		cin >> cake[i].reserve;
	}
	for (i = 0; i < n; i++)
	{
		cin >> cake[i].all_price;
		cake[i].unit_price = cake[i].all_price / cake[i].reserve;
	}
	sort(cake, cake + n, compare);
	i = 0;
	while (requirement >0&&i<n)
	{
		if (requirement >= cake[i].reserve)
		{
			income = income + cake[i].reserve *cake[i].unit_price;
			requirement = requirement - cake[i].reserve;
			i++;
		}
		else
		{
			income = income + requirement*cake[i].unit_price;
			requirement = 0;
		}
	}
	printf("%.2f", income);
	//system("pause");
	return 0;
}


#%% # 导入必要的库 import pandas as pd import numpy as np import torch from transformers import BertTokenizer, BertForSequenceClassification from sklearn.model_selection import train_test_split import jieba.analyse import matplotlib.pyplot as plt import seaborn as sns from datetime import datetime from sklearn.metrics import classification_report import snownlp from wordcloud import WordCloud import re from transformers import BertModel, BertTokenizer # 加载预处理后的数据 # clean_df = pd.read_csv('cleaned_mooncake_comments.csv') # 假设数据已经预处理完成 # 检查数据结构 print(clean_df.head()) print(clean_df.info()) # 使用BERT进行情感分析 # 加载预训练的BERT模型和分词器 # model = BertModel.from_pretrained('bert-base-uncased') # tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') tokenizer = BertTokenizer.from_pretrained('bert-base-chinese') model = BertForSequenceClassification.from_pretrained('bert-base-chinese', num_labels=3) # 假设分为积极、中性、消极三类 # 准备训练数据和测试数据 train_texts, test_texts, train_labels, test_labels = train_test_split( clean_df['内容'], clean_df['评分'], test_size=0.2, random_state=42 ) # 将文本转换为BERT输入格式 def encode_texts(texts): return tokenizer( texts.tolist(), padding=True, truncation=True, max_length=128, return_tensors='pt' ) train_encodings = encode_texts(train_texts) test_encodings = encode_texts(test_texts) # 定义训练和评估函数 def train_model(model, train_encodings, train_labels, epochs=3, batch_size=16): optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5) model.train() for epoch in range(epochs): for i in range(0, len(train_encodings['input_ids']), batch_size): batch = {k: v[i:i+batch_size] for k, v in train_encodings.items()} outputs = model(**batch, labels=torch.tensor(train_labels[i:i+batch_size])) loss = outputs.loss loss.backward() optimizer.step() optimizer.zero_grad() if i % 100 == 0: print(f'Epoch {epoch+1}, Bat
04-11
<think>好的,我现在要帮助用户解决如何使用BERT模型进行中文情感分析的问题。用户希望得到一个完整的流程,包括数据预处理、模型加载、训练和评估,并且需要代码示例,涉及pandas、jieba、torch和transformers库。 首先,我需要确定用户的需求。他们可能有一定的编程基础,但对BERT模型的具体实现流程不太熟悉。需要详细但不过于复杂的步骤说明,代码示例要清晰,关键部分要有注释。 接下来,数据预处理部分。中文情感分析通常需要文本数据,比如评论数据,用户可能使用CSV格式。需要用pandas读取数据,处理缺失值。然后分词,但注意到用户提到了jieba,不过BERT通常使用其自身的tokenizer,所以可能需要解释为什么不需要jieba分词,或者如何结合使用。不过根据引用[2],分词风格可能与语料库相关,但BERT有自己的分词器,所以可能需要说明直接使用BERT的tokenizer,而不是jieba。但用户问题中提到了jieba分词,这里可能需要澄清。或者用户可能想先用jieba进行分词再输入BERT?但通常BERT的tokenizer会处理原始文本,不需要预先分词。所以可能需要指出这一点,避免用户混淆。 然后是模型加载部分。使用transformers库的BertForSequenceClassification,需要从预训练模型加载,比如'bert-base-chinese'。需要配置分类标签数,比如二分类的话是2。 训练流程部分,需要用PyTorch准备DataLoader,设置优化器、损失函数。可能需要自定义Dataset类,处理tokenization和padding。需要提醒设置设备(GPU或CPU),训练循环中的步骤,如forward、backward等。 评估部分,需要计算准确率、F1分数等指标,在验证集或测试集上进行。 还要注意代码示例的正确性,比如导入必要的库,处理数据的分割,设置正确的参数如MAX_LEN、BATCH_SIZE等。另外,需要提到保存和加载模型的方法。 可能遇到的问题:数据不平衡、过拟合,这时候需要提到数据增强、早停法、调整学习率等。或者如何处理长文本的截断和分块。 另外,用户提供的引用中,引用[2]提到了分词风格和set_cws_style函数,但可能不适用于BERT,因为BERT有自己的tokenizer,所以可能需要说明这一点,或者用户可能混淆了不同的模型的分词方式。需要指出在BERT中不需要使用jieba,而是使用BertTokenizer。 综上,整理步骤: 1. 数据准备与预处理:加载数据,处理缺失值,划分数据集。 2. 使用BertTokenizer进行文本编码,创建DataLoader。 3. 加载预训练BERT模型,修改分类头。 4. 设置训练参数,训练模型,包含训练循环和验证步骤。 5. 模型评估与预测。 在代码中,需要展示如何用pandas读取数据,用transformers的tokenizer处理文本,构建Dataset和DataLoader,定义模型,训练循环,以及评估指标的计算。 可能还需要注意的部分:padding和attention masks的处理,动态padding是否使用DataCollator,但为了简化,可能在Dataset中处理。或者使用tokenizer的padding参数。 现在需要确保代码示例正确,比如导入正确的模块,正确使用tokenizer(如truncation=True, padding=True,max_length),以及正确设置模型参数。 最后,相关问题部分,考虑用户可能进一步的问题,比如处理多分类、调整超参数、部署模型等。</think>以下是如何使用BERT模型进行中文情感分析的完整流程说明及代码示例,结合pandas、PyTorch和transformers库实现: --- ### 一、数据预处理 #### 1.1 数据加载与清洗 ```python import pandas as pd from sklearn.model_selection import train_test_split # 读取CSV数据(假设列名为'text'和'label') df = pd.read_csv('sentiment_data.csv') df = df.dropna() # 删除缺失值 # 划分训练集和验证集 train_df, val_df = train_test_split(df, test_size=0.2, random_state=42) ``` #### 1.2 文本编码 ```python from transformers import BertTokenizer tokenizer = BertTokenizer.from_pretrained('bert-base-chinese') MAX_LEN = 128 # 根据文本长度调整 def encode_text(text): return tokenizer.encode_plus( text, add_special_tokens=True, max_length=MAX_LEN, padding='max_length', truncation=True, return_attention_mask=True ) # 示例编码 sample = "这个电影太精彩了!" encoded = encode_text(sample) print(encoded['input_ids'], encoded['attention_mask']) ``` --- ### 二、模型加载与训练 #### 2.1 创建PyTorch Dataset ```python import torch from torch.utils.data import Dataset, DataLoader class SentimentDataset(Dataset): def __init__(self, texts, labels): self.texts = [encode_text(text) for text in texts] self.labels = labels def __len__(self): return len(self.labels) def __getitem__(self, idx): return { 'input_ids': torch.tensor(self.texts[idx]['input_ids']), 'attention_mask': torch.tensor(self.texts[idx]['attention_mask']), 'label': torch.tensor(self.labels[idx], dtype=torch.long) } # 创建DataLoader BATCH_SIZE = 32 train_dataset = SentimentDataset(train_df['text'].tolist(), train_df['label'].tolist()) train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True) ``` #### 2.2 加载BERT模型 ```python from transformers import BertForSequenceClassification model = BertForSequenceClassification.from_pretrained( 'bert-base-chinese', num_labels=2 # 二分类任务 ) device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model.to(device) ``` #### 2.3 训练流程 ```python from transformers import AdamW optimizer = AdamW(model.parameters(), lr=2e-5) EPOCHS = 3 for epoch in range(EPOCHS): model.train() total_loss = 0 for batch in train_loader: optimizer.zero_grad() inputs = {k: v.to(device) for k, v in batch.items() if k != 'label'} outputs = model(**inputs, labels=batch['label'].to(device)) loss = outputs.loss loss.backward() optimizer.step() total_loss += loss.item() print(f"Epoch {epoch+1}, Loss: {total_loss/len(train_loader):.4f}") ``` --- ### 三、模型评估与预测 #### 3.1 验证集评估 ```python from sklearn.metrics import accuracy_score val_dataset = SentimentDataset(val_df['text'].tolist(), val_df['label'].tolist()) val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE) model.eval() predictions, true_labels = [], [] with torch.no_grad(): for batch in val_loader: inputs = {k: v.to(device) for k, v in batch.items() if k != 'label'} outputs = model(**inputs) logits = outputs.logits preds = torch.argmax(logits, dim=1).cpu().numpy() predictions.extend(preds) true_labels.extend(batch['label'].numpy()) print(f"Validation Accuracy: {accuracy_score(true_labels, predictions):.4f}") ``` #### 3.2 预测新数据 ```python def predict_sentiment(text): encoded = encode_text(text) input_ids = torch.tensor([encoded['input_ids']]).to(device) attention_mask = torch.tensor([encoded['attention_mask']]).to(device) with torch.no_grad(): output = model(input_ids, attention_mask=attention_mask) return torch.argmax(output.logits).item() # 示例预测 text = "服务态度非常差" print("积极" if predict_sentiment(text) == 1 else "消极") ``` --- ### 四、关键说明 1. **分词处理**:BERT使用其自带的`BertTokenizer`,无需额外使用jieba分词[^2] 2. **硬件加速**:通过`device`参数自动选择GPU训练 3. **超参数调整**:可根据任务调整`MAX_LEN`、`BATCH_SIZE`和`EPOCHS` --- 相关问题
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值