medkit.text.preprocessing.eds_cleaner#

Classes:

EDSCleaner([output_label, keep_endlines, ...])

EDS pre-processing annotation module

class EDSCleaner(output_label='clean_text', keep_endlines=False, handle_parentheses_eds=True, handle_points_eds=True, uid=None)[source]#

EDS pre-processing annotation module

This module is a non-destructive module allowing to remove and clean selected points and newlines characters. It respects the span modification by creating a new text-bound annotation containing the span modification information from input text.

Instantiate the endlines handler.

Parameters
  • output_label (str) – The output label of the created annotations.

  • keep_endlines (bool) – If True, modify multiple endlines using .n as a replacement. If False (default), modify multiple endlines using whitespaces (.s) as a replacement.

  • handle_parentheses_eds (bool) – If True (default), modify the text near to parentheses or keywords according to predefined rules for french documents If False, the text near to parentheses or keywords is not modified

  • handle_points_eds (bool) – Modify points near to predefined keywords for french documents If True (default), modify the points near to keywords If False, the points near to keywords is not modified

  • uid (str) – Identifier of the pre-processing module

Methods:

run(segments)

Run the module on a list of segments provided as input and returns a new list of segments.

set_prov_tracer(prov_tracer)

Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

run(segments)[source]#

Run the module on a list of segments provided as input and returns a new list of segments.

Parameters

segments (List[Segment]) – List of segments to normalize

Return type

List[Segment]

Returns

List[~medkit.core.text.Segment] – List of cleaned segments.

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.