This page is home to ongoing speech recognition research for the Talon project. I have trained these models in a process spanning >10 months, using thousands of hours of audio data and months of high end V100 GPU time very generously donated by packet.com

If you find these models useful, please donate on Patreon or send a thank you
to @aegis on the Talon Slack or @lunixbochs on Twitter

You can also contribute your own voice to these models at speech.talonvoice.com

The code used to pre-process the public datasets and set up a machine for training can be found at github.com/talonvoice/speech and the librispeech recipe in the facebookresearch/wav2letter repository

recommended wav2letter models (tokens.txt)
60MB - Epoch 65 - words (TER 5.88) - common voice 1k (15.67) - librispeech (4.03) - librispeech other (11.55) - speech commands (3.17) - private (10.50) - private2 (6.45) - tatoeba (0.67) - tedlium (9.65) - tts (2.37)
77MB - Epoch 289 - words (TER 8.47) - common voice 1k (21.73) - librispeech (5.52) - librispeech other (15.13) - speech commands (2.93) - private (13.96) - tatoeba (0.78) - tedlium (12.36) - tts (3.44)
400MB - Epoch 245 - common voice original (TER 14.91) - librispeech (3.49) - librispeech other (11.60) - speech commands (2.12) - private (10.53) - tatoeba (0.48) - tedlium (9.32)
1.6GB - Epoch 40 - librispeech (TER 2.64)
1.6GB - Epoch 127 - SpecAugment - words (TER 5.89) - common voice 1k (12.63) - librispeech (3.00) - librispeech other (9.28) - speech commands (2.16) - private (9.41) - tatoeba (0.45) - tedlium (8.42) - tts (2.12)
kenlm language models
1.1GB - 4-gram English Wikipedia (April 2019)
248MB - 3-gram English Wikipedia (April 2019)
104MB - 2-gram English Wikipedia (April 2019)
all wav2letter models (tokens.txt)
1.6GB - Epoch 127 - SpecAugment - words (TER 5.89) - common voice 1k (12.63) - librispeech (3.00) - librispeech other (9.28) - speech commands (2.16) - private (9.41) - tatoeba (0.45) - tedlium (8.42) - tts (2.12)
1.6GB - Epoch 40 - librispeech (TER 2.64)
1.6GB - Epoch 91 - librispeech (TER 2.80) + common voice (11.74) + speech commands (3.76)
1.6GB - Epoch 2 - librispeech (TER 20)
830MB - Epoch 15 - librispeech (TER 4.98)
400MB - Epoch 45 - librispeech (TER 4.50)
400MB - Epoch 245 - common voice original (TER 14.91) - librispeech (3.49) - librispeech other (11.40) - speech commands (2.12) - private (10.53) - tatoeba (0.48) - tedlium (9.32)
77MB - Epoch 289 - words (TER 8.47) - common voice 1k (21.73) - librispeech (5.52) - librispeech other (15.13) - speech commands (2.93) - private (13.96) - tatoeba (0.78) - tedlium (12.36) - tts (3.44)
77MB - Epoch 337 - librispeech (TER 4.62) - librispeech other (13.93)
77MB - Epoch 210 - words (TER 8.93) - common voice 1k (22.33) - librispeech (5.64) - librispeech other (15.38) - speech commands (2.78) - private (14.29) - tatoeba (0.79) - tedlium (12.66)
77MB - Epoch 147 - words (TER 11.01) - common voice 1k (28.08) - librispeech (5.53) - librispeech other (16.02) - speech commands (3.01) - private (14.06) - tatoeba (0.73) - tedlium (12.62)
77MB - Epoch 21 - common voice 1k (TER 26.48) - librispeech (7.69) - librispeech other (19.12) - speech commands (4.29) - private (17.79) - tatoeba (1.22) - tedlium (16.12)
77MB - Epoch 169 - single words (TER 5.73)
60MB - Epoch 65 - words (TER 5.88) - common voice 1k (15.67) - librispeech (4.03) - librispeech other (11.55) - speech commands (3.17) - private (10.50) - private2 (6.45) - tatoeba (0.67) - tedlium (9.65) - tts (2.37)
21MB - Epoch 90 - words (TER 19.73) common voice 1k (34.60) - librispeech (13.23) - librispeech other (27.25) - speech commands (8.68) - private (25.79) - tatoeba (2.75) - tedlium (23.03) - tts (9.78)