medkit.audio.metrics.transcription#

This module needs extra-dependencies not installed as core dependencies of medkit. To install them, use pip install medkit-lib[metrics-transcription].

Classes:

TranscriptionEvaluator([speech_label, ...])

Word Error Rate (WER) and Character Error Rate (CER) computation based on speechbrain.

TranscriptionEvaluatorResult(wer, ...)

Results returned by TranscriptionEvaluator

class TranscriptionEvaluator(speech_label='speech', transcription_label='transcription', case_sensitive=False, remove_punctuation=True, replace_unicode=False)[source]#

Word Error Rate (WER) and Character Error Rate (CER) computation based on speechbrain.

The WER is the ratio of predictions errors at the word level, taking into accounts:

  • words present in the reference transcription but missing from the prediction;

  • extra predicted words not present in the reference;

  • reference words mistakenly replaced by other words in the prediction.

The CER is identical to the WER but computed at the character level rather than at the word level.

This component expects as input reference documents containing speech segments with reference transcription attributes, as well as corresponding speech segments with predicted transcription attributes.

Parameters
  • speech_label (str) – Label of the speech segments on the reference documents

  • transcription_label (str) – Label of the transcription attributes on the reference and predicted speech segments

  • case_sensitive (bool) – Whether to take case into consideration when comparing reference and prediction

  • remove_punctuation (bool) – If True, punctuation in reference and predictions is removed before comparing (based on string.punctuation)

  • replace_unicode (bool) – If True, special unicode characters in reference and predictions are replaced by their closest ASCII characters (when possible) before comparing

Methods:

compute(reference, predicted)

Compute and return the WER and CER for predicted transcription attributes, against reference annotated documents.

compute(reference, predicted)[source]#

Compute and return the WER and CER for predicted transcription attributes, against reference annotated documents.

Parameters
  • reference (Sequence[AudioDocument]) – Reference documents containing speech segments with speech_label as label, each of them containing a transcription attribute with transcription_label as label.

  • predicted (Sequence[Sequence[Segment]]) – Predicted segments containing each a transcription attribute with transcription_label as label. This is a list of list that must be of the same length and ordering as reference.

Return type

TranscriptionEvaluatorResult

Returns

TranscriptionEvaluatorResult – Computed metrics

class TranscriptionEvaluatorResult(wer, word_insertions, word_deletions, word_substitutions, word_support, cer, char_insertions, char_deletions, char_substitutions, char_support)[source]#

Results returned by TranscriptionEvaluator

Variables
  • wer (float) – Word Error Rate, combination of word insertions, deletions and substitutions

  • word_insertions (float) – Ratio of extra words in prediction (over word_support)

  • word_deletions (float) – Ratio of missing words in prediction (over word_support)

  • word_substitutions (float) – Ratio of replaced words in prediction (over word_support)

  • word_support (int) – Total number of words

  • cer (float) – Character Error Rate, same as wer but at character level

  • char_insertions (float) – Identical to word_insertions but at character level

  • char_deletions (float) – Identical to word_deletions but at character level

  • char_substitutions (float) – Identical to word_substitutions but at character level

  • char_support (int) – Total number of characters (not including whitespaces, post punctuation removal and unicode replacement)