DND Transcribe
The goal of this project is to create a tool to transcribe audio recordings of DND games and transcribe them.
Our initial approach is rather naive, using wav2vec 2.0 pre-trained models to perform automated speach recognition
Installation Instructions
Optional: Install CUDA
If you would like to make use of an Nvidia GPU through CUDA make sure to install CUDA first
Install with pip
virtualenv .env
.env/bin/pip install .
Usage Examples
The following examples are using a WAV file with 16kHz sample rate
Run against the large Facebook wav2vec2 model using just the CPU
This is the most versatile and is the most reliable in terms of transcribing human speech in multiple languages. However it is large and relatively slow.
note: this is the default model when --model
is not provided
./.env/bin/dnd_transcribe \
--audio-file example.wav \
--model "facebook/wav2vec2-large-960h-lv60-self" \
--no-gpu
Run against the base wav2vec2 model
This is the base of the Facebook wav2vec2 model, which is smaller and less precise, however is small enough to run on a GPU with limited memory.
./.env/bin/dnd_transcribe \
--audio-file example.wav \
--model "facebook/wav2vec2-base-960h"
Run against the small OpenAI whisper model
See more about the whisper models: https://huggingface.co/openai/whisper-large-v3
This is the model provided by OpenAI which is smaller and only supports English, but being smaller is able to run on a GPU with limited memory.
./.env/bin/dnd_transcribe \
--audio-file example.wav \
--model "openai/whisper-small.en"