medkit.audio.metrics.transcription#

This module needs extra-dependencies not installed as core dependencies of medkit. To install them, use pip install medkit-lib[metrics-transcription].

Classes:

`TranscriptionEvaluator`([speech_label, ...])	Word Error Rate (WER) and Character Error Rate (CER) computation based on speechbrain.
`TranscriptionEvaluatorResult`(wer, ...)	Results returned by `TranscriptionEvaluator`

class TranscriptionEvaluator(speech_label='speech', transcription_label='transcription', case_sensitive=False, remove_punctuation=True, replace_unicode=False)[source]#

Word Error Rate (WER) and Character Error Rate (CER) computation based on speechbrain.

The WER is the ratio of predictions errors at the word level, taking into accounts:

words present in the reference transcription but missing from the prediction;
extra predicted words not present in the reference;
reference words mistakenly replaced by other words in the prediction.

The CER is identical to the WER but computed at the character level rather than at the word level.

This component expects as input reference documents containing speech segments with reference transcription attributes, as well as corresponding speech segments with predicted transcription attributes.

Parameters

speech_label (str) – Label of the speech segments on the reference documents
transcription_label (str) – Label of the transcription attributes on the reference and predicted speech segments
case_sensitive (bool) – Whether to take case into consideration when comparing reference and prediction
remove_punctuation (bool) – If True, punctuation in reference and predictions is removed before comparing (based on string.punctuation)
replace_unicode (bool) – If True, special unicode characters in reference and predictions are replaced by their closest ASCII characters (when possible) before comparing

Methods:

compute(reference, predicted)

Compute and return the WER and CER for predicted transcription attributes, against reference annotated documents.

compute(reference, predicted)[source]#

Compute and return the WER and CER for predicted transcription attributes, against reference annotated documents.

Parameters

reference (Sequence[AudioDocument]) – Reference documents containing speech segments with speech_label as label, each of them containing a transcription attribute with transcription_label as label.
predicted (Sequence[Sequence[Segment]]) – Predicted segments containing each a transcription attribute with transcription_label as label. This is a list of list that must be of the same length and ordering as reference.

Return type

TranscriptionEvaluatorResult

Returns

TranscriptionEvaluatorResult – Computed metrics

class TranscriptionEvaluatorResult(wer, word_insertions, word_deletions, word_substitutions, word_support, cer, char_insertions, char_deletions, char_substitutions, char_support)[source]#

Results returned by TranscriptionEvaluator

Variables

wer (float) – Word Error Rate, combination of word insertions, deletions and substitutions
word_insertions (float) – Ratio of extra words in prediction (over word_support)
word_deletions (float) – Ratio of missing words in prediction (over word_support)
word_substitutions (float) – Ratio of replaced words in prediction (over word_support)
word_support (int) – Total number of words
cer (float) – Character Error Rate, same as wer but at character level
char_insertions (float) – Identical to word_insertions but at character level
char_deletions (float) – Identical to word_deletions but at character level
char_substitutions (float) – Identical to word_substitutions but at character level
char_support (int) – Total number of characters (not including whitespaces, post punctuation removal and unicode replacement)