medkit.text.segmentation.section_tokenizer#

Classes:

SectionModificationRule(section_name, ...)

SectionTokenizer([section_dict, ...])

Section segmentation annotator based on keyword rules

class SectionModificationRule(section_name, new_section_name, other_sections, order)[source]#
class SectionTokenizer(section_dict=None, output_label='section', section_rules=(), strip_chars='.;,?! \n\r\t', uid=None)[source]#

Section segmentation annotator based on keyword rules

Initialize the Section Tokenizer

Parameters
  • section_dict (Optional[Dict[str, List[str]]]) – Dictionary containing the section name as key and the list of mappings as value. If None, the content of default_section_definition.yml will be used.

  • output_label (str) – Segment label to use for annotation output.

  • section_rules (Iterable[SectionModificationRule]) – List of rules for modifying a section name according its order to the other sections. If section_dict is None, the content of default_section_definition.yml will be used.

  • strip_chars (str) – The list of characters to strip at the beginning of the returned segment.

  • uid (str, Optional) – Identifier of the tokenizer

Methods:

load_section_definition(filepath[, encoding])

Load the sections definition stored in a yml file

run(segments)

Return sections detected in segments.

save_section_definition(section_dict, ...[, ...])

Save section yaml definition file

set_prov_tracer(prov_tracer)

Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

run(segments)[source]#

Return sections detected in segments. Each section is a segment with an attached attribute (label: <same as self.output_label>, value: <the name of the section>).

Parameters

segments (List[Segment]) – List of segments into which to look for sections

Return type

List[Segment]

Returns

List[Segments] – Sections segments found in segments

static load_section_definition(filepath, encoding=None)[source]#

Load the sections definition stored in a yml file

Parameters
  • filepath (Path) – Path to a yml file containing the sections(name + mappings) and rules

  • encoding (Optional[str]) – Encoding of the file to open

Return type

Tuple[Dict[str, List[str]], Tuple[SectionModificationRule, …]]

Returns

Tuple[Dict[str, List[str]], Tuple[SectionModificationRule, …]] – Tuple containing: - the dictionary where key is the section name and value is the list of all equivalent strings. - the list of section modification rules. These rules allow to rename some sections according their order

static save_section_definition(section_dict, section_rules, filepath, encoding=None)[source]#

Save section yaml definition file

Parameters
  • section_dict (Dict[str, List[str]]) – Dictionary containing the section name as key and the list of mappings as value (cf. content of default_section_dict.yml as example)

  • section_rules (Iterable[SectionModificationRule]) – List of rules for modifying a section name according its order to the other sections.

  • filepath (Path) – Path to the file to save

  • encoding (Optional[str]) – File encoding. Default: None

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.