medkit.io.srt#

Classes:

SRTInputConverter([turn_segment_label, ...])

Convert .srt files containing transcription information into turn segments with transcription attributes.

SRTOutputConverter([segment_turn_label, ...])

Build .srt files containing transcription information from Segment objects.

class SRTInputConverter(turn_segment_label='turn', transcription_attr_label='transcribed_text', converter_id=None)[source]#

Convert .srt files containing transcription information into turn segments with transcription attributes.

For each turn in a .srt file, a Segment will be created, with an associated Attribute holding the transcribed text as value. The segments can be retrieved directly or as part of an AudioDocument instance.

If a ProvTracer is set, provenance information will be added for each segment and each attribute (referencing the input converter as the operation).

Parameters
  • turn_segment_label (str) – Label to use for segments representing turns in the .srt file.

  • transcription_attr_label (str) – Label to use for segments attributes containing the transcribed text.

  • converter_id (Optional[str]) – Identifier of the converter.

Attributes:

description

Contains all the input converter init parameters.

Methods:

load(srt_dir[, audio_dir, audio_ext])

Load all .srt files in a directory into a list of AudioDocument objects.

load_doc(srt_file, audio_file)

Load a single .srt file into an AudioDocument containing turn segments with transcription attributes.

load_segments(srt_file, audio_file)

Load a .srt file and return a list of Segment objects corresponding to turns, with transcription attributes.

set_prov_tracer(prov_tracer)

Enable provenance tracing.

property description: medkit.core.operation_desc.OperationDescription#

Contains all the input converter init parameters.

Return type

OperationDescription

set_prov_tracer(prov_tracer)[source]#

Enable provenance tracing.

Parameters

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

load(srt_dir, audio_dir=None, audio_ext='.wav')[source]#

Load all .srt files in a directory into a list of AudioDocument objects.

For each .srt file, they must be a corresponding audio file with the same basename, either in the same directory or in an separated audio directory.

Parameters
  • srt_dir (Union[str, Path]) – Directory containing the .srt files.

  • audio_dir (Union[str, Path, None]) – Directory containing the audio files corresponding to the .srt files, if they are not in srt_dir.

  • audio_ext (str) – File extension to use for audio files.

Return type

List[AudioDocument]

Returns

List[AudioDocument] – List of generated documents.

load_doc(srt_file, audio_file)[source]#

Load a single .srt file into an AudioDocument containing turn segments with transcription attributes.

Parameters
  • srt_file (Union[str, Path]) – Path to the .srt file.

  • audio_file (Union[str, Path]) – Path to the corresponding audio file.

Return type

AudioDocument

Returns

AudioDocument – Generated document.

load_segments(srt_file, audio_file)[source]#

Load a .srt file and return a list of Segment objects corresponding to turns, with transcription attributes.

Parameters
  • srt_file (Union[str, Path]) – Path to the .srt file.

  • audio_file (Union[str, Path]) – Path to the corresponding audio file.

Return type

List[Segment]

Returns

List[Segment] – Turn segments as found in the .srt file, with transcription attributes attached.

class SRTOutputConverter(segment_turn_label='turn', transcription_attr_label='transcribed_text')[source]#

Build .srt files containing transcription information from Segment objects.

There must be a segment for each turn, with an associated Attribute holding the transcribed text as value. The segments can be passed directly or as part of AudioDocument instances.

Parameters
  • segment_turn_label (str) – Label of segments representing turns in the audio documents.

  • transcription_attr_label (str) – Label of segments attributes containing the transcribed text.

Methods:

save(docs, srt_dir[, doc_names])

Save AudioDocument instances as .srt files in a directory.

save_doc(doc, srt_file)

Save a single AudioDocument as a .srt file.

save_segments(segments, srt_file)

Save Segment objects representing turns into a .srt file.

save(docs, srt_dir, doc_names=None)[source]#

Save AudioDocument instances as .srt files in a directory.

Parameters
  • docs (List[AudioDocument]) – List of audio documents to save.

  • str_dir – Directory into which the generated .str files will be stored.

  • doc_names (Optional[List[str]]) – Optional list of names to use as basenames for the generated .srt files.

save_doc(doc, srt_file)[source]#

Save a single AudioDocument as a .srt file.

Parameters
  • doc (AudioDocument) – Audio document to save.

  • srt_file (Union[str, Path]) – Path of the generated .srt file.

save_segments(segments, srt_file)[source]#

Save Segment objects representing turns into a .srt file.

Parameters
  • segments (List[Segment]) – Turn segments to save.

  • srt_file (Union[str, Path]) – Path of the generated .srt file.