ニジカ投稿局 https://tv.nizika.tv
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 2.6 KiB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
  1. # Transcription DevTools
  2. Includes:
  3. * __JiWER__ CLI NodeJS wrapper
  4. * Benchmark tool to test multiple transcription engines
  5. * TypeScript classes to evaluate word-error-rate of files generated by the transcription
  6. ## Build
  7. ```sh
  8. npm run build
  9. ```
  10. ## Benchmark
  11. A benchmark of available __transcribers__ might be run with:
  12. ```sh
  13. npm run benchmark
  14. ```
  15. ```
  16. ┌────────────────────────┬───────────────────────┬───────────────────────┬──────────┬────────┬───────────────────────┐
  17. │ (index) │ WER │ CER │ duration │ model │ engine │
  18. ├────────────────────────┼───────────────────────┼───────────────────────┼──────────┼────────┼───────────────────────┤
  19. │ 5yZGBYqojXe7nuhq1TuHvz │ '28.39506172839506%' │ '9.62457337883959%' │ '41s' │ 'tiny' │ 'openai-whisper' │
  20. │ x6qREJ2AkTU4e5YmvfivQN │ '29.75206611570248%' │ '10.46195652173913%' │ '15s' │ 'tiny' │ 'whisper-ctranslate2' │
  21. └────────────────────────┴───────────────────────┴───────────────────────┴──────────┴────────┴───────────────────────┘
  22. ```
  23. The benchmark may be run with multiple model builtin sizes:
  24. ```sh
  25. MODELS=tiny,small,large npm run benchmark
  26. ```
  27. ## Jiwer
  28. > *JiWER is a python tool for computing the word-error-rate of ASR systems.*
  29. > https://jitsi.github.io/jiwer/cli/
  30. __JiWER__ serves as a reference implementation to calculate errors rates between 2 text files:
  31. - WER (Word Error Rate)
  32. - CER (Character Error Rate)
  33. ### Usage
  34. ```typescript
  35. const jiwerCLI = new JiwerClI('./reference.txt', './hypothesis.txt')
  36. // WER as a percentage, ex: 0.03 -> 3%
  37. console.log(await jiwerCLI.wer())
  38. // CER as a percentage: 0.01 -> 1%
  39. console.log(await jiwerCLI.cer())
  40. // Detailed comparison report
  41. console.log(await jiwerCLI.alignment())
  42. ```
  43. ## Resources
  44. - https://jitsi.github.io/jiwer/
  45. - https://github.com/rapidfuzz/RapidFuzz