ニジカ投稿局 https://tv.nizika.tv
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 2.5 KiB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
  1. # Transcription
  2. Video **transcription** consists in transcribing the audio content of a video to a text.
  3. > This process might be called __Automatic Speech Recognition__ or __Speech to Text__ in more general context.
  4. Provide a common API to many transcription backend, currently:
  5. - `openai-whisper` CLI
  6. - `faster-whisper` (*via* `whisper-ctranslate2` CLI)
  7. > Potential candidates could be: whisper-cpp, vosk, ...
  8. ## Requirements
  9. - Python 3
  10. - PIP
  11. And at least one of the following transcription backend:
  12. - Python:
  13. - `openai-whisper`
  14. - `whisper-ctranslate2>=0.4.3`
  15. ## Usage
  16. Create a transcriber manually:
  17. ```typescript
  18. import { OpenaiTranscriber } from '@peertube/peertube-transcription'
  19. (async () => {
  20. // Optional if you want to use a local installation of transcribe engines
  21. const binDirectory = 'local/pip/path/bin'
  22. // Create a transcriber powered by OpenAI Whisper CLI
  23. const transcriber = new OpenaiTranscriber({
  24. name: 'openai-whisper',
  25. command: 'whisper',
  26. languageDetection: true,
  27. binDirectory
  28. });
  29. // If not installed globally, install the transcriber engine (use pip under the hood)
  30. await transcriber.install('local/pip/path')
  31. // Transcribe
  32. const transcriptFile = await transcriber.transcribe({
  33. mediaFilePath: './myVideo.mp4',
  34. model: 'tiny',
  35. format: 'txt'
  36. });
  37. console.log(transcriptFile.path);
  38. console.log(await transcriptFile.read());
  39. })();
  40. ```
  41. Using a local model file:
  42. ```typescript
  43. import { WhisperBuiltinModel } from '@peertube/peertube-transcription/dist'
  44. const transcriptFile = await transcriber.transcribe({
  45. mediaFilePath: './myVideo.mp4',
  46. model: await WhisperBuiltinModel.fromPath('./models/large.pt'),
  47. format: 'txt'
  48. });
  49. ```
  50. You may use the builtin Factory if you're happy with the default configuration:
  51. ```Typescript
  52. import { transcriberFactory } from '@peertube/peertube-transcription'
  53. transcriberFactory.createFromEngineName({
  54. engineName: transcriberName,
  55. logger: compatibleWinstonLogger,
  56. transcriptDirectory: '/tmp/transcription'
  57. })
  58. ```
  59. > For further usage [../tests/src/transcription/whisper/transcriber/openai-transcriber.spec.ts](../tests/src/transcription/whisper/transcriber/openai-transcriber.spec.ts)
  60. ## Lexicon
  61. - ONNX: Open Neural Network eXchange. A specification, the ONNX Runtime run these models.
  62. - GPTs: Generative Pre-Trained Transformers
  63. - LLM: Large Language Models
  64. - NLP: Natural Language Processing
  65. - MLP: Multilayer Perceptron
  66. - ASR: Automatic Speech Recognition
  67. - WER: Word Error Rate
  68. - CER: Character Error Rate