DND Transcribe

The goal of this project is to create a tool to transcribe audio recordings of DND games and transcribe them.

Our initial approach is rather naive, using wav2vec 2.0 pre-trained models to perform automated speach recognition

Installation Instructions

Optional: Install CUDA

If you would like to make use of an Nvidia GPU through CUDA make sure to install CUDA first

Install with `pip`

virtualenv .env
.env/bin/pip install .

Usage Examples

The following examples are using a WAV file with 16kHz sample rate

Run against the large Facebook wav2vec2 model using just the CPU

This is the most versatile and is the most reliable in terms of transcribing human speech in multiple languages. However it is large and relatively slow.

note: this is the default model when --model is not provided

./.env/bin/dnd_transcribe \
    --audio-file example.wav \
    --model "facebook/wav2vec2-large-960h-lv60-self" \
    --no-gpu

Run against the base wav2vec2 model

This is the base of the Facebook wav2vec2 model, which is smaller and less precise, however is small enough to run on a GPU with limited memory.

./.env/bin/dnd_transcribe \
    --audio-file example.wav \
    --model "facebook/wav2vec2-base-960h"

Run against the small OpenAI whisper model

See more about the whisper models: https://huggingface.co/openai/whisper-large-v3

This is the model provided by OpenAI which is smaller and only supports English, but being smaller is able to run on a GPU with limited memory.

./.env/bin/dnd_transcribe \
    --audio-file example.wav \
    --model "openai/whisper-small.en"

README.md