Audio operations
Contents
Audio operations#
This page lists all components related to audio processing.
Note
For more details about all sub-packages, refer to
medkit.audio
.
Pre-processing operations#
This section provides some information about how to use preprocessing modules for audio.
Note
For more details about public APIs, refer to medkit.audio.preprocessing
.
Downmixer#
For more details, refer to medkit.audio.preprocessing.downmixer
.
Power normalizer#
For more details, refer to medkit.audio.preprocessing.power_normalizer
.
Resampler#
Important
Resampler
needs additional dependencies
that can be installed with pip install medkit-lib[resampler]
For more details, refer to medkit.audio.preprocessing.resampler
.
Segmentation operations#
This section lists audio segmentation operations. They are part of the
medkit.audio.segmentation
module.
WebRTC voice detector#
For more details, refer to
medkit.audio.segmentation.webrtc_voice_detector
.
Pyannote speaker detector#
Important
PASpeakerDetector
is an experimental feature.
It depends on a version of pyannote-audio that is not released yet on PyPI.
To install it, you may use the JSALT2023
tag :
pip install https://github.com/pyannote/pyannote-audio/archive/refs/tags/JSALT2023.tar.gz
For more details, refer to medkit.audio.segmentation.pa_speaker_detector
.
Audio Transcription#
This section lists operations and other components to use to perform audio
transcription.
They are part of the medkit.audio.transcription
module.
DocTranscriber
is the operation handling the
transformation of AudioDocument
instances into
TranscribedTextDocument
instances (subclass of
TextDocument
).
The actual conversion from text to audio is delegated to operation complying
with the TranscriptionOperation
protocol.
HFTranscriber
and
SBTranscriber
are implementations
of TranscriptionOperation
, allowing to use
HuggingFace transformer models and speechbrain models respectively.
DocTranscriber#
For more details, refer to medkit.audio.transcription.doc_transcriber
.
TranscribedTextDocument#
For more details, refer to medkit.audio.transcription.transcribed_text_document
.
HFTranscriber#
Important
HFTranscriber
needs additional
dependencies that can be installed with
pip install medkit-lib[hf-transcriber]
For more details, refer to
medkit.audio.transcription.hf_transcriber
.
SBTranscriber#
Important
SBTranscriber
needs additional
dependencies that can be installed with
pip install medkit-lib[sb-transcriber]
For more details, refer to
medkit.audio.transcription.sb_transcriber
.
Metrics#
The module medkit.audio.metrics
provides components to evaluate audio annotations.