Video transcription consists in transcribing the audio content of a video to a text.
This process might be called Automatic Speech Recognition or Speech to Text in more general context.
Provide a common API to many transcription backend, currently:
openai-whisper
CLIfaster-whisper
(via whisper-ctranslate2
CLI)Potential candidates could be: whisper-cpp, vosk, ...
And at least one of the following transcription backend:
openai-whisper
whisper-ctranslate2>=0.4.3
Create a transcriber manually:
import { OpenaiTranscriber } from '@peertube/peertube-transcription'
(async () => {
// Optional if you want to use a local installation of transcribe engines
const binDirectory = 'local/pip/path/bin'
// Create a transcriber powered by OpenAI Whisper CLI
const transcriber = new OpenaiTranscriber({
name: 'openai-whisper',
command: 'whisper',
languageDetection: true,
binDirectory
});
// If not installed globally, install the transcriber engine (use pip under the hood)
await transcriber.install('local/pip/path')
// Transcribe
const transcriptFile = await transcriber.transcribe({
mediaFilePath: './myVideo.mp4',
model: 'tiny',
format: 'txt'
});
console.log(transcriptFile.path);
console.log(await transcriptFile.read());
})();
Using a local model file:
import { WhisperBuiltinModel } from '@peertube/peertube-transcription/dist'
const transcriptFile = await transcriber.transcribe({
mediaFilePath: './myVideo.mp4',
model: await WhisperBuiltinModel.fromPath('./models/large.pt'),
format: 'txt'
});
You may use the builtin Factory if you’re happy with the default configuration:
import { transcriberFactory } from '@peertube/peertube-transcription'
transcriberFactory.createFromEngineName({
engineName: transcriberName,
logger: compatibleWinstonLogger,
transcriptDirectory: '/tmp/transcription'
})
For further usage ../tests/src/transcription/whisper/transcriber/openai-transcriber.spec.ts