medkit.audio.transcription.doc_transcriber#

Classes:

DocTranscriber(input_label, output_label, ...)

Speech-to-text transcriber generating text documents from audio documents.

TranscriptionOperation(*args, **kwargs)

Protocol for operations in charge of the actual speech-to-text transcription to use with DocTranscriber

class DocTranscriber(input_label, output_label, transcription_operation, attrs_to_copy=None, uid=None)[source]#

Speech-to-text transcriber generating text documents from audio documents.

For each text document, all audio segments with a specific label are converted into text segments and regrouped in a corresponding new text document. The text of each segment is concatenated to form the full raw text of the new document.

Generated text documents are instances of TranscribedTextDocument (subclass of TextDocument) with additional info such as the identifier of the original audio document and a mapping between audio spans and text spans.

Methods :func: create_text_segment() and :func: augment_full_text_for_next_segment() can be overridden to customize how the text segments are created and how they are concatenated to form the full text.

The actual transcription task is delegated to a TranscriptionOperation that must be provided, for instance :class`~medkit.audio.transcription.hf_transcriber.HFTranscriber` or :class`~medkit.audio.transcription.sb_transcriber.SBTranscriber`.

Parameters
  • input_label (str) – Label of audio segments that should be transcribed.

  • output_label (str) – Label of generated text segments.

  • transcription_operation (TranscriptionOperation) – Transcription operation in charge of actually transcribing each audio segment.

  • attrs_to_copy (Optional[List[str]]) – Labels of attributes that should be copied from the original audio segments to the transcribed text segments.

  • uid (str) – Identifier of the transcriber.

Methods:

augment_full_text_for_next_segment(...)

Append intermediate joining text to full text before the next segment is concatenated to it.

run(audio_docs)

Return a transcribed text document for each document in audio_docs

set_prov_tracer(prov_tracer)

Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

run(audio_docs)[source]#

Return a transcribed text document for each document in audio_docs

Parameters

audio_docs (List[AudioDocument]) – Audio documents to transcribe

Return type

List[TranscribedTextDocument]

Returns

List[TranscribedTextDocument] – Transcribed text documents (once per document in audio_docs)

augment_full_text_for_next_segment(full_text, segment_text, audio_segment)[source]#

Append intermediate joining text to full text before the next segment is concatenated to it. Override for custom behavior.

Return type

str

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

class TranscriptionOperation(*args, **kwargs)[source]#

Protocol for operations in charge of the actual speech-to-text transcription to use with DocTranscriber

Attributes:

output_label

Label to use for generated transcription attributes

Methods:

run(segments)

Add a transcription attribute to each segment with a text value containing the transcribed text.

output_label: str#

Label to use for generated transcription attributes

run(segments)[source]#

Add a transcription attribute to each segment with a text value containing the transcribed text.

Parameters

segments (List[Segment]) – List of segments to transcribe