|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263 |
- # Transcription DevTools
-
- Includes:
- * __JiWER__ CLI NodeJS wrapper
- * Benchmark tool to test multiple transcription engines
- * TypeScript classes to evaluate word-error-rate of files generated by the transcription
-
- ## Build
-
- ```sh
- npm run build
- ```
-
- ## Benchmark
-
- A benchmark of available __transcribers__ might be run with:
- ```sh
- npm run benchmark
- ```
- ```
- ┌────────────────────────┬───────────────────────┬───────────────────────┬──────────┬────────┬───────────────────────┐
- │ (index) │ WER │ CER │ duration │ model │ engine │
- ├────────────────────────┼───────────────────────┼───────────────────────┼──────────┼────────┼───────────────────────┤
- │ 5yZGBYqojXe7nuhq1TuHvz │ '28.39506172839506%' │ '9.62457337883959%' │ '41s' │ 'tiny' │ 'openai-whisper' │
- │ x6qREJ2AkTU4e5YmvfivQN │ '29.75206611570248%' │ '10.46195652173913%' │ '15s' │ 'tiny' │ 'whisper-ctranslate2' │
- └────────────────────────┴───────────────────────┴───────────────────────┴──────────┴────────┴───────────────────────┘
- ```
-
- The benchmark may be run with multiple model builtin sizes:
-
- ```sh
- MODELS=tiny,small,large npm run benchmark
- ```
-
- ## Jiwer
-
- > *JiWER is a python tool for computing the word-error-rate of ASR systems.*
- > https://jitsi.github.io/jiwer/cli/
-
- __JiWER__ serves as a reference implementation to calculate errors rates between 2 text files:
- - WER (Word Error Rate)
- - CER (Character Error Rate)
-
-
- ### Usage
-
- ```typescript
- const jiwerCLI = new JiwerClI('./reference.txt', './hypothesis.txt')
-
- // WER as a percentage, ex: 0.03 -> 3%
- console.log(await jiwerCLI.wer())
-
- // CER as a percentage: 0.01 -> 1%
- console.log(await jiwerCLI.cer())
-
- // Detailed comparison report
- console.log(await jiwerCLI.alignment())
- ```
-
- ## Resources
-
- - https://jitsi.github.io/jiwer/
- - https://github.com/rapidfuzz/RapidFuzz
|