Audio operations#

This page lists all components related to audio processing.

Note

For more details about all sub-packages, refer to medkit.audio.

Pre-processing operations#

This section provides some information about how to use preprocessing modules for audio.

Note

For more details about public APIs, refer to medkit.audio.preprocessing.

Downmixer#

For more details, refer to medkit.audio.preprocessing.downmixer.

Power normalizer#

For more details, refer to medkit.audio.preprocessing.power_normalizer.

Resampler#

Important

Resampler needs additional dependencies that can be installed with pip install medkit-lib[resampler]

For more details, refer to medkit.audio.preprocessing.resampler.

Segmentation operations#

This section lists audio segmentation operations. They are part of the medkit.audio.segmentation module.

WebRTC voice detector#

For more details, refer to medkit.audio.segmentation.webrtc_voice_detector.

Pyannote speaker detector#

Important

PASpeakerDetector is an experimental feature. It depends on a version of pyannote-audio that is not released yet on PyPI.

To install it, you may use the JSALT2023 tag :

pip install https://github.com/pyannote/pyannote-audio/archive/refs/tags/JSALT2023.tar.gz

For more details, refer to medkit.audio.segmentation.pa_speaker_detector.

Audio Transcription#

This section lists operations and other components to use to perform audio transcription. They are part of the medkit.audio.transcription module.

DocTranscriber is the operation handling the transformation of AudioDocument instances into TranscribedTextDocument instances (subclass of TextDocument).

The actual conversion from text to audio is delegated to operation complying with the TranscriptionOperation protocol. HFTranscriber and SBTranscriber are implementations of TranscriptionOperation, allowing to use HuggingFace transformer models and speechbrain models respectively.

DocTranscriber#

For more details, refer to medkit.audio.transcription.doc_transcriber.

TranscribedTextDocument#

For more details, refer to medkit.audio.transcription.transcribed_text_document.

HFTranscriber#

Important

HFTranscriber needs additional dependencies that can be installed with pip install medkit-lib[hf-transcriber]

For more details, refer to medkit.audio.transcription.hf_transcriber.

SBTranscriber#

Important

SBTranscriber needs additional dependencies that can be installed with pip install medkit-lib[sb-transcriber]

For more details, refer to medkit.audio.transcription.sb_transcriber.

Metrics#

The module medkit.audio.metrics provides components to evaluate audio annotations.

Audio operations

Contents

Audio operations#

Pre-processing operations#

Downmixer#

Power normalizer#

Resampler#

Segmentation operations#

WebRTC voice detector#

Pyannote speaker detector#

Audio Transcription#

DocTranscriber#

TranscribedTextDocument#

HFTranscriber#

SBTranscriber#

Metrics#