|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990 |
- # Transcription
-
- Video **transcription** consists in transcribing the audio content of a video to a text.
-
- > This process might be called __Automatic Speech Recognition__ or __Speech to Text__ in more general context.
-
- Provide a common API to many transcription backend, currently:
- - `openai-whisper` CLI
- - `faster-whisper` (*via* `whisper-ctranslate2` CLI)
-
- > Potential candidates could be: whisper-cpp, vosk, ...
-
- ## Requirements
- - Python 3
- - PIP
-
- And at least one of the following transcription backend:
- - Python:
- - `openai-whisper`
- - `whisper-ctranslate2>=0.4.3`
-
- ## Usage
-
- Create a transcriber manually:
-
- ```typescript
- import { OpenaiTranscriber } from '@peertube/peertube-transcription'
-
- (async () => {
- // Optional if you want to use a local installation of transcribe engines
- const binDirectory = 'local/pip/path/bin'
-
- // Create a transcriber powered by OpenAI Whisper CLI
- const transcriber = new OpenaiTranscriber({
- name: 'openai-whisper',
- command: 'whisper',
- languageDetection: true,
- binDirectory
- });
-
- // If not installed globally, install the transcriber engine (use pip under the hood)
- await transcriber.install('local/pip/path')
-
- // Transcribe
- const transcriptFile = await transcriber.transcribe({
- mediaFilePath: './myVideo.mp4',
- model: 'tiny',
- format: 'txt'
- });
-
- console.log(transcriptFile.path);
- console.log(await transcriptFile.read());
- })();
- ```
-
- Using a local model file:
-
- ```typescript
- import { WhisperBuiltinModel } from '@peertube/peertube-transcription/dist'
-
- const transcriptFile = await transcriber.transcribe({
- mediaFilePath: './myVideo.mp4',
- model: await WhisperBuiltinModel.fromPath('./models/large.pt'),
- format: 'txt'
- });
- ```
-
- You may use the builtin Factory if you're happy with the default configuration:
-
- ```Typescript
- import { transcriberFactory } from '@peertube/peertube-transcription'
-
- transcriberFactory.createFromEngineName({
- engineName: transcriberName,
- logger: compatibleWinstonLogger,
- transcriptDirectory: '/tmp/transcription'
- })
- ```
- > For further usage [../tests/src/transcription/whisper/transcriber/openai-transcriber.spec.ts](../tests/src/transcription/whisper/transcriber/openai-transcriber.spec.ts)
-
-
- ## Lexicon
- - ONNX: Open Neural Network eXchange. A specification, the ONNX Runtime run these models.
- - GPTs: Generative Pre-Trained Transformers
- - LLM: Large Language Models
- - NLP: Natural Language Processing
- - MLP: Multilayer Perceptron
- - ASR: Automatic Speech Recognition
- - WER: Word Error Rate
- - CER: Character Error Rate
|