medkit.text.preprocessing.eds_cleaner
medkit.text.preprocessing.eds_cleaner#
Classes:
|
EDS pre-processing annotation module |
- class EDSCleaner(output_label='clean_text', keep_endlines=False, handle_parentheses_eds=True, handle_points_eds=True, uid=None)[source]#
EDS pre-processing annotation module
This module is a non-destructive module allowing to remove and clean selected points and newlines characters. It respects the span modification by creating a new text-bound annotation containing the span modification information from input text.
Instantiate the endlines handler.
- Parameters
output_label (
str
) – The output label of the created annotations.keep_endlines (
bool
) – If True, modify multiple endlines using .n as a replacement. If False (default), modify multiple endlines using whitespaces (.s) as a replacement.handle_parentheses_eds (
bool
) – If True (default), modify the text near to parentheses or keywords according to predefined rules for french documents If False, the text near to parentheses or keywords is not modifiedhandle_points_eds (
bool
) – Modify points near to predefined keywords for french documents If True (default), modify the points near to keywords If False, the points near to keywords is not modifieduid (str) – Identifier of the pre-processing module
Methods:
run
(segments)Run the module on a list of segments provided as input and returns a new list of segments.
set_prov_tracer
(prov_tracer)Enable provenance tracing.
Attributes:
Contains all the operation init parameters.
- run(segments)[source]#
Run the module on a list of segments provided as input and returns a new list of segments.
- property description: medkit.core.operation_desc.OperationDescription#
Contains all the operation init parameters.
- Return type
- set_prov_tracer(prov_tracer)#
Enable provenance tracing.
- Parameters
prov_tracer (
ProvTracer
) – The provenance tracer used to trace the provenance.