medkit.text.segmentation.syntagma_tokenizer#

Classes:

SyntagmaTokenizer([separators, ...])

Syntagma segmentation annotator based on provided separators

class SyntagmaTokenizer(separators=None, output_label='syntagma', strip_chars='.;,?! \n\r\t', attrs_to_copy=None, uid=None)[source]#

Syntagma segmentation annotator based on provided separators

Instantiate the syntagma tokenizer

Parameters
  • separators (Tuple[str, ...]) – The tuple of regular expressions corresponding to separators. If None provided, the rules in “default_syntagma_definitiion.yml” will be used.

  • output_label (str, Optional) – The output label of the created annotations.

  • strip_chars (str) – The list of characters to strip at the beginning of the returned segment.

  • attrs_to_copy (Optional[List[str]]) – Labels of the attributes that should be copied from the input segment to the derived segment. For example, useful for propagating section name.

  • uid (str, Optional) – Identifier of the tokenizer

Methods:

load_syntagma_definition(filepath[, encoding])

Load the syntagma definition stored in yml file

run(segments)

Return syntagmes detected in segments.

save_syntagma_definition(syntagma_seps, filepath)

Save syntagma yaml definition file

set_prov_tracer(prov_tracer)

Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

run(segments)[source]#

Return syntagmes detected in segments.

Parameters

segments (List[Segment]) – List of segments into which to look for sentences

Return type

List[Segment]

Returns

List[Segments] – Syntagmas segments found in segments

static load_syntagma_definition(filepath, encoding=None)[source]#

Load the syntagma definition stored in yml file

Parameters
  • filepath (Path) – Path to a yml file containing the syntagma separators

  • encoding (Optional[str]) – Encoding of the file to open

Return type

Tuple[str, …]

Returns

Tuple[str, …] – Tuple containing the separators

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type

OperationDescription

static save_syntagma_definition(syntagma_seps, filepath, encoding=None)[source]#

Save syntagma yaml definition file

Parameters
  • syntagma_seps (Tuple[str, …]) – The tuple of regular expressions corresponding to separators

  • filepath (Path) – The path of the file to save

  • encoding (Optional[str]) – The encoding of the file. Default: None

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.