Mac M1 Pro 安装 Whisper 并启用 MPS-编程知识

Mac M1 Pro 安装 Whisper 并启用 MPS

news/2025/3/23 0:26:27/文章来源:https://www.cnblogs.com/seliote/p/18787377

背景

需要长语音转文字，十几个音频差不多七小时，网上基本都是付费的而且价格比较贵，看了一圈没合适的，自己 M1 Pro 上搭个 Whisper 并启用 MPS 试一下看看快不快。

安装

整体参考 OpenAI GitHub Whisper。不过需要注意 PyTorch v1.12 才开始支持 MPS，但是官方文档说是在 v1.10.1 上进行训练和测试的，所以需要高两个小版本才行，看文档说正常情况下也是支持的。

安装 ffmpeg: brew install ffmpeg
Conda 创建 Python 环境：conda 查找一下 Pyhton 版本 conda search python，创建一个 3.10 的环境 conda create -n whisper python=3.10.14，进入新创建的环境 conda activate whisper，确认是否为 arm64 环境 python -c "import platform; print(platform.uname()[4])"
安装 PyTorch v1.13.1：pip install torch1.13.1 torchvision0.14.1 torchaudio0.13.1，测试是否支持 MPS python -c "import torch;print(torch.backends.mps.is_built())"
安装 Whisper 及其依赖：pip install -U openai-whisper、pip install setuptools-rust
Numpy 版本与依赖不一致，需降级：pip install numpy1.26.4
测试

需要较高准确性及中文支持，所以使用 large-v3 的模型：whisper ./audio/-1.m4a --language Chinese --model large
然而实际并没有用到 GPU，默认还是在用 CPU 跑，非常慢，不到两分钟的音频跑了六分多钟
加上 --device mps 参数直接报错了，后来发现了这个震惊！Whisper 并不像宣传的支持 MPS！，M1 Pro 直接被抛弃，M2 还逆向优化
安装 v2

换这个项目吧 ggerganov GitHub Whisper.Cpp，这么多 star 也说明是够牛逼的
clone 项目：git clone https://github.com/ggerganov/whisper.cpp.git code
下载模型：cd code && bash ./models/download-ggml-model.sh large-v3
编译：make large-v3
测试 v2

Whisper.cpp 只支持 16-bit WAV 文件，所以需要预先使用 ffmpge 进行转码：ffmpeg -i ./audio/1.m4a -ar 16000 -ac 1 -c:a pcm_s16le 1.wav
跑一下，要指定语言：./main -m models/ggml-large-v3.bin -l Chinese -f 1.wav
同样一段音频，翻译下来 46 秒，可以接受
成品批量处理
import os
import shutil
import subprocess
from pathlib import Path

source_audio_dir = './source_audio'
output_txt_dir = './output_txt/'

tmp_audio_dir = './tmp/'

whisper_main = '/Users/seliote/Projects/whisper/output/main'
whisper_model = '/Users/seliote/Projects/whisper/output/models/large-v3.bin'

def clean_cache():
shutil.rmtree(tmp_audio_dir, ignore_errors=True)
shutil.rmtree(output_txt_dir, ignore_errors=True)
Path(tmp_audio_dir).mkdir(parents=True, exist_ok=True)
Path(output_txt_dir).mkdir(parents=True, exist_ok=True)

def find_source_audio_file(source_dir):
for root, _, files in os.walk(source_dir):
for filename in files:
if filename.startswith('.'):
continue
yield os.path.join(root, filename)

def cvt_2_wav(source_file, wav_file):
execute_shell('ffmpeg', '-i', source_file, '-ar', '16000', '-ac', '1', '-c:a', 'pcm_s16le', wav_file)

def wav_2_txt(wav, txt):
txt_content = execute_shell(whisper_main, '-m', whisper_model, '-l', 'Chinese', '-f', wav)
with open(txt, 'w') as f:
f.write(str(txt_content))

def execute_shell(*args):
return subprocess.check_output(list(args), encoding='utf-8')

if name == 'main':
clean_cache()
for source_audio_file in find_source_audio_file(source_audio_dir):
wav_file = Path(os.path.join(tmp_audio_dir, os.path.basename(source_audio_file)))
wav_file = wav_file.with_suffix('.wav')
cvt_2_wav(str(source_audio_file), str(wav_file))
txt_file = Path(os.path.join(output_txt_dir, os.path.basename(wav_file)))
txt_file = txt_file.with_suffix('.txt')
wav_2_txt(str(wav_file), str(txt_file))
Bug

一共七小时左右的多文件长音频，一共跑了三小时二十分钟，有个 bug，中间一段翻译不出来之后，后续就会一直卡在那句话上，但这个是 Whisper 的问题，和 Whisper.cpp 无关。

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.hqwc.cn/news/903706.html

如若内容造成侵权/违法违规/事实不符，请联系编程知识网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！