ニジカ投稿局 https://tv.nizika.tv

History

みてるぞ 6632905f32 はじまりの大地		11 months ago
..
src	はじまりの大地	11 months ago
README.md	はじまりの大地	11 months ago
package.json	はじまりの大地	11 months ago
tsconfig.json	はじまりの大地	11 months ago
tsconfig.types.json	はじまりの大地	11 months ago

README.md

Transcription

Video transcription consists in transcribing the audio content of a video to a text.

This process might be called Automatic Speech Recognition or Speech to Text in more general context.

Provide a common API to many transcription backend, currently:

openai-whisper CLI
faster-whisper (via whisper-ctranslate2 CLI)

Potential candidates could be: whisper-cpp, vosk, ...

Requirements

Python 3
PIP

And at least one of the following transcription backend:

Python:
- openai-whisper
- whisper-ctranslate2>=0.4.3

Usage

Create a transcriber manually:

import { OpenaiTranscriber } from '@peertube/peertube-transcription'

(async () => {
  // Optional if you want to use a local installation of transcribe engines
  const binDirectory = 'local/pip/path/bin'

  // Create a transcriber powered by OpenAI Whisper CLI
  const transcriber = new OpenaiTranscriber({
    name: 'openai-whisper',
    command: 'whisper',
    languageDetection: true,
    binDirectory
  });

  // If not installed globally, install the transcriber engine (use pip under the hood)
  await transcriber.install('local/pip/path')

  // Transcribe
  const transcriptFile = await transcriber.transcribe({
    mediaFilePath: './myVideo.mp4',
    model: 'tiny',
    format: 'txt'
  });

  console.log(transcriptFile.path);
  console.log(await transcriptFile.read());
})();

Using a local model file:

import { WhisperBuiltinModel } from '@peertube/peertube-transcription/dist'

const transcriptFile = await transcriber.transcribe({
  mediaFilePath: './myVideo.mp4',
  model: await WhisperBuiltinModel.fromPath('./models/large.pt'),
  format: 'txt'
});

You may use the builtin Factory if you’re happy with the default configuration:

import { transcriberFactory } from '@peertube/peertube-transcription'

transcriberFactory.createFromEngineName({
  engineName: transcriberName,
  logger: compatibleWinstonLogger,
  transcriptDirectory: '/tmp/transcription'
})

For further usage ../tests/src/transcription/whisper/transcriber/openai-transcriber.spec.ts

Lexicon

ONNX: Open Neural Network eXchange. A specification, the ONNX Runtime run these models.
GPTs: Generative Pre-Trained Transformers
LLM: Large Language Models
NLP: Natural Language Processing
MLP: Multilayer Perceptron
ASR: Automatic Speech Recognition
WER: Word Error Rate
CER: Character Error Rate