64 lines
1.6 KiB
Markdown
64 lines
1.6 KiB
Markdown
# DND Transcribe
|
|
|
|
The goal of this project is to create a tool to transcribe audio recordings of
|
|
DND games and transcribe them.
|
|
|
|
Our initial approach is rather naive, using wav2vec 2.0 pre-trained models to
|
|
perform automated speach recognition
|
|
|
|
## Installation Instructions
|
|
|
|
### Optional: Install CUDA
|
|
|
|
If you would like to make use of an Nvidia GPU through CUDA make sure to install
|
|
CUDA first
|
|
|
|
### Install with `pip`
|
|
|
|
```
|
|
virtualenv .env
|
|
.env/bin/pip install .
|
|
```
|
|
|
|
## Usage Examples
|
|
|
|
The following examples are using a WAV file with 16kHz sample rate
|
|
|
|
### Run against the large Facebook wav2vec2 model using just the CPU
|
|
|
|
This is the most versatile and is the most reliable in terms of transcribing
|
|
human speech in multiple languages. However it is large and relatively slow.
|
|
|
|
note: this is the default model when `--model` is not provided
|
|
|
|
```
|
|
./.env/bin/dnd_transcribe \
|
|
--audio-file example.wav \
|
|
--model "facebook/wav2vec2-large-960h-lv60-self" \
|
|
--no-gpu
|
|
```
|
|
|
|
### Run against the base wav2vec2 model
|
|
|
|
This is the base of the Facebook wav2vec2 model, which is smaller and less
|
|
precise, however is small enough to run on a GPU with limited memory.
|
|
|
|
```
|
|
./.env/bin/dnd_transcribe \
|
|
--audio-file example.wav \
|
|
--model "facebook/wav2vec2-base-960h"
|
|
```
|
|
|
|
### Run against the small OpenAI whisper model
|
|
|
|
See more about the whisper models: https://huggingface.co/openai/whisper-large-v3
|
|
|
|
This is the model provided by OpenAI which is smaller and only supports English,
|
|
but being smaller is able to run on a GPU with limited memory.
|
|
|
|
```
|
|
./.env/bin/dnd_transcribe \
|
|
--audio-file example.wav \
|
|
--model "openai/whisper-small.en"
|
|
```
|