Torch - Ubuntu安装torch-hdf5,loadcaffe,matio和nccl

本文介绍了如何在Ubuntu上安装Torch的扩展库,包括torch-hdf5、loadcaffe、matio,并针对matio安装中可能出现的错误提供了详细的解决步骤。此外,还提到了nccl在多GPU训练中的加速作用及解决libnccl.so找不到的问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

torch-hdf5

sudo apt-get install libhdf5-serial-dev hdf5-tools 
git clone https://github.com/deepmind/torch-hdf5 
cd torch-hdf5 
sudo luarocks make hdf5-0-0.rockspec LIBHDF5_LIBDIR=”/usr/lib/x86_64-Linux-gnu/

loadcaffe

git clone https://github.com/szagoruyko/loadcaffe.git 
cd loadcaffe 
sudo apt-get install libprotobuf-dev protobuf-compiler 
luarocks install loadcaffe 

matio

luarocks install matio
matio 出现的详细错误及解决

错误:

/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/trepl/init.lua:389: /root/torch/install/share/lua/5.1/trepl/init.lua:389: /root/torch/install/share/lua/5.1/trepl/init.lua:389: /root/torch/install/share/lua/5.1/matio/ffi.lua:24: Could not find libmatio. Please make sure that you installd MatIO and you have the shared libraries (libmatio.so or libmatio.dylib) in your library path

解决方案:

sudo apt-get install libmatio2
luarocks install matio

nccl

采用 multi-GPUs 训练时速度更高:

git clone https://github.com/NVIDIA/nccl.git
cd nccl
make 
make install
luarocks install nccl

如果出现 libnccl.so not found,在 ~/.bashrc 中设置 LD_LIBRARY_PATH.

下面代码是在训练数据子集name_keyword_train_data.csv的代码,全集的文件名是name_keyword_train_data_large.csv,# BERT预训练模型 import pandas as pd import torch from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report, roc_auc_score from transformers import BertTokenizer, BertForSequenceClassification, AdamW from torch.utils.data import DataLoader, TensorDataset print(torch.__version__) print(torch.cuda.is_available()) # 加载数据 data = pd.read_csv(r"E:\pyRepo\pythonProject\Test4\name_keyword_train_data.csv") texts = data['text'].tolist() labels = data['label'].tolist() # 分割数据集 train_texts, val_texts, train_labels, val_labels = train_test_split(texts, labels, test_size=0.2, random_state=42) # 加载tokenizer tokenizer = BertTokenizer.from_pretrained(r"E:\pyRepo\bert-base-chinese") # 对文本进行编码 def encode_texts(texts): return tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors='pt') train_encodings = encode_texts(train_texts) val_encodings = encode_texts(val_texts) # 创建数据集 train_dataset = TensorDataset( train_encodings['input_ids'], train_encodings['attention_mask'], torch.tensor(train_labels) ) val_dataset = TensorDataset( val_encodings['input_ids'], val_encodings['attention_mask'], torch.tensor(val_labels) ) # 创建数据加载器 batch_size = 16 train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True) val_loader = DataLoader(val_dataset, batch_size=batch_size) # 加载预训练模型 model = BertForSequenceClassification.from_pretrained(r"E:\pyRepo\bert-base-chinese", num_labels=2) device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model.to(device) # 定义优化器 optimizer = AdamW(model.parameters(), lr=3.62e-05, weight_decay=9.50e-04) # 训练模型 num_epochs = 10 for epoch in range(num_epochs): model.train() total_train_loss = 0.0 num_train_batches = 0 # 训练阶段 for batch in train_loader: optimizer.zero_grad() input_ids = batch[0].to(device) attention_mask = batch[1].to(device) labels = batch[2].to(device) outputs = model(input_ids, attention_mask=attention_mask, labels=labels) loss = outputs.loss loss.backward() optimizer.step() total_train_loss += loss.item() num_train_batches += 1 avg_train_loss = total_train_loss / num_train_batches # 验证阶段 model.eval() total_val_loss = 0.0 num_val_batches = 0 val_preds = [] val_true = [] val_probs = [] # 存储预测概率 with torch.no_grad(): for batch in val_loader: input_ids = batch[0].to(device) attention_mask = batch[1].to(device) labels = batch[2].to(device) # 前向传播 outputs = model(input_ids, attention_mask=attention_mask) # 获取预测概率 logits = outputs.logits probs = torch.softmax(logits, dim=1)[:, 1] # 取正类的概率 # 存储结果 val_probs.extend(probs.cpu().tolist()) val_true.extend(labels.cpu().tolist()) val_preds.extend(torch.argmax(logits, dim=1).cpu().tolist()) # 计算验证loss(需要单独计算) loss_fct = torch.nn.CrossEntropyLoss() loss = loss_fct(logits, labels) total_val_loss += loss.item() num_val_batches += 1 avg_val_loss = total_val_loss / num_val_batches # 计算指标 print(f"Epoch {epoch + 1}/{num_epochs}") print(f"Train Loss: {avg_train_loss:.4f} | Val Loss: {avg_val_loss:.4f}") print(classification_report(val_true, val_preds)) # 计算AUC-ROC try: roc_auc = roc_auc_score(val_true, val_probs) print(f"Validation AUC-ROC: {roc_auc:.4f}\n") except ValueError as e: print(f"Error calculating AUC-ROC: {e}\n") for param in model.parameters(): param.data = param.data.contiguous() # 保存模型 model.save_pretrained('./sensitive_word_model') tokenizer.save_pretrained('./sensitive_word_model')如何将改代码应用你上面说的方案?
最新发布
05-29
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值