medkit.audio.segmentation.webrtc_voice_detector
medkit.audio.segmentation.webrtc_voice_detector#
This module needs extra-dependencies not installed as core dependencies of medkit. To install them, use pip install medkit-lib[webrtc-voice-detector].
Classes:
|
Voice Activity Detection operation relying on the webrtcvad package. |
- class WebRTCVoiceDetector(output_label, aggressiveness=2, frame_duration=30, nb_frames_in_window=10, switch_ratio=0.9, uid=None)[source]#
Voice Activity Detection operation relying on the webrtcvad package.
Per-frame VAD results of webrtcvad are aggregated with a switch algorithm considering the percentage of speech/non-speech frames in a wider sliding window.
Input segments must be mono at 8kHZ, 16kHz, 32kHz or 48Khz.
- Parameters
output_label (
str
) – Label of output speech segments.aggressiveness (
Literal
[0
,1
,2
,3
]) – Aggressiveness param passed to webrtcvad (the higher, the more likely to detect speech).frame_duration (
Literal
[10
,20
,30
]) – Duration in milliseconds of frames passed to webrtcvad.nb_frames_in_window (
int
) – Number of frames in the sliding window used when aggregating per-frame VAD results.switch_ratio (
float
) – Percentage of speech/non-speech frames required to switch the window speech state when aggregating per-frame VAD results.uid (str) – Identifier of the detector.
Methods:
run
(segments)Return all speech segments detected for all input segments.
set_prov_tracer
(prov_tracer)Enable provenance tracing.
Attributes:
Contains all the operation init parameters.
- property description: medkit.core.operation_desc.OperationDescription#
Contains all the operation init parameters.
- Return type
- set_prov_tracer(prov_tracer)#
Enable provenance tracing.
- Parameters
prov_tracer (
ProvTracer
) – The provenance tracer used to trace the provenance.