dnd_transcribe/README.md

64 lines
1.6 KiB
Markdown

# DND Transcribe
The goal of this project is to create a tool to transcribe audio recordings of
DND games and transcribe them.
Our initial approach is rather naive, using wav2vec 2.0 pre-trained models to
perform automated speach recognition
## Installation Instructions
### Optional: Install CUDA
If you would like to make use of an Nvidia GPU through CUDA make sure to install
CUDA first
### Install with `pip`
```
virtualenv .env
.env/bin/pip install .
```
## Usage Examples
The following examples are using a WAV file with 16kHz sample rate
### Run against the large Facebook wav2vec2 model using just the CPU
This is the most versatile and is the most reliable in terms of transcribing
human speech in multiple languages. However it is large and relatively slow.
note: this is the default model when `--model` is not provided
```
./.env/bin/dnd_transcribe \
--audio-file example.wav \
--model "facebook/wav2vec2-large-960h-lv60-self" \
--no-gpu
```
### Run against the base wav2vec2 model
This is the base of the Facebook wav2vec2 model, which is smaller and less
precise, however is small enough to run on a GPU with limited memory.
```
./.env/bin/dnd_transcribe \
--audio-file example.wav \
--model "facebook/wav2vec2-base-960h"
```
### Run against the small OpenAI whisper model
See more about the whisper models: https://huggingface.co/openai/whisper-large-v3
This is the model provided by OpenAI which is smaller and only supports English,
but being smaller is able to run on a GPU with limited memory.
```
./.env/bin/dnd_transcribe \
--audio-file example.wav \
--model "openai/whisper-small.en"
```