背景
- 最近申请到一张8卡的gpu 机器用来验证音频转文本,发现whisper的处理速度很慢,于是有了下面的几种验证方法
配置信息
- 8张A30

第一次尝试
- 使用whisper对一个95M 的mp4文件进行解码
import time
import whisper
def transcribe_audio(audio_path, model_name="medium"):
start_time = time.time()
device = torch.device("cuda")
print(f"使用设备:{device}")
model_start_time = time.time()
model = whisper.load_model(model_name)
model_end_time = time.time()
print(f"加载模型耗时:{model_end_time - model_start_time:.2f} 秒")
transcribe_start_time = time.time()
result = model.transcribe(audio_path)
transcribe_end_time = time.time()
print(f"转写音频耗时:{transcribe_end_time - transcribe_start_time:.2f} 秒")
print("识别文本:", result["text"])
end_time = time.time()
total_time = end_time - start_time
print(f"总耗时:{total_time:.2f} 秒")
if __name__ == "__main__":
transcribe_audio("a.mp4", model_name="medium")
输出如下:
使用设备:cuda
加载模型耗时:9.73 秒
转写音频耗时:96.64 秒
总耗时:106.37 秒
第二次尝试
import time
import torch
import whisper
def transcribe_audio(audio_path, model_name="medium"):
start_time = time.time()
device = torch.device("cuda")
print(f"使用设备:{device}")
model_start_time = time.time()
model = whisper.load_model(model_name)
model = torch.nn.DataParallel(model, device_ids=[0, 1, 2, 3])
m = model.module
model_end_time = time.time()
print(f"加载模型耗时:{model_end_time - model_start_time:.2f} 秒")
transcribe_start_time = time.time()
result = m.transcribe(audio_path)
transcribe_end_time = time.time()
print(f"转写音频耗时:{transcribe_end_time - transcribe_start_time:.2f} 秒")
print("识别文本:", result["text"])
end_time = time.time()
total_time = end_time - start_time
print(f"总耗时:{total_time:.2f} 秒")
if __name__ == "__main__":
transcribe_audio("a.mp4", model_name="medium")
输出结果没变化,可是是使用姿势有问题
使用设备:cuda
加载模型耗时:9.68 秒
转写音频耗时:96.43 秒
总耗时:106.10 秒
第三次尝试
import time
from faster_whisper import WhisperModel
def transcribe_audio(audio_path, model_name="medium"):
start_time = time.time()
model_start_time = time.time()
model = WhisperModel(model_name, device="cuda")
model_end_time = time.time()
print(f"加载模型耗时:{model_end_time - model_start_time:.2f} 秒")
transcribe_start_time = time.time()
segments, info = model.transcribe(audio_path, beam_size=16)
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
transcribe_end_time = time.time()
print(f"转写音频耗时:{transcribe_end_time - transcribe_start_time:.2f} 秒")
end_time = time.time()
total_time = end_time - start_time
print(f"总耗时:{total_time:.2f} 秒")
if __name__ == "__main__":
transcribe_audio("a.mp4", model_name="medium")
输出结果
加载模型耗时:2.76 秒
转写音频耗时:12.81 秒
总耗时:15.57 秒
- 疑问? 第二种方法看 是只有1-2个卡工作,是否是batch size 未指定导致?