medkit.text.context.negation_detector#

Classes:

NegationDetector(output_label[, rules, uid])

Annotator creating negation Attributes with boolean values indicating if an hypothesis has been found.

NegationDetectorRule(regexp[, ...])

Regexp-based rule to use with NegationDetector

NegationMetadata(_typename[, _fields])

Metadata dict added to negation attributes with True value.

class NegationDetector(output_label, rules=None, uid=None)[source]#

Annotator creating negation Attributes with boolean values indicating if an hypothesis has been found.

Because negation attributes will be attached to whole annotations, each input annotation should be “local”-enough rather than a big chunk of text (ie a sentence or a syntagma).

For detecting negation, the module uses rules that may be sensitive to unicode or not. When the rule is not sensitive to unicode, we try to convert unicode chars to the closest ascii chars. However, some characters need to be pre-processed before (e.g., -> number). So, if the text lengths are different, we fall back on initial unicode text for detection even if rule is not unicode-sensitive. In this case, a warning is logged for recommending to pre-process data.

Instantiate the negation detector

Parameters
  • output_label (str) – The label of the created attributes

  • rules (Optional[List[NegationDetectorRule]]) – The set of rules to use when detecting negation. If none provided, the rules in “negation_detector_default_rules.yml” will be used

  • uid (str) – Identifier of the detector

Methods:

check_rules_sanity(rules)

Check consistency of a set of rules

load_rules(path_to_rules[, encoding])

Load all rules stored in a yml file

run(segments)

Add a negation attribute to each segment with a boolean value indicating if an hypothesis has been found.

save_rules(rules, path_to_rules[, encoding])

Store rules in a yml file

set_prov_tracer(prov_tracer)

Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

run(segments)[source]#

Add a negation attribute to each segment with a boolean value indicating if an hypothesis has been found.

Negation attributes with a True value have a metadata dict with fields described in NegationRuleMetadata.

Parameters

segments (List[Segment]) – List of segments to detect as being negated or not

static load_rules(path_to_rules, encoding=None)[source]#

Load all rules stored in a yml file

Parameters
  • path_to_rules (Path) – Path to a yml file containing a list of mappings with the same structure as NegationDetectorRule

  • encoding (Optional[str]) – Encoding of the file to open

Return type

List[NegationDetectorRule]

Returns

List[NegationDetectorRule] – List of all the rules in path_to_rules, can be used to init a NegationDetector

static check_rules_sanity(rules)[source]#

Check consistency of a set of rules

static save_rules(rules, path_to_rules, encoding=None)[source]#

Store rules in a yml file

Parameters
  • rules (List[NegationDetectorRule]) – The rules to save

  • path_to_rules (Path) – Path to a .yml file that will contain the rules

  • encoding (Optional[str]) – Encoding of the .yml file

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

class NegationDetectorRule(regexp, exclusion_regexps=<factory>, id=None, case_sensitive=False, unicode_sensitive=False)[source]#

Regexp-based rule to use with NegationDetector

Input text may be converted before detecting rule.

Parameters
  • regexp (str) – The regexp pattern used to match a negation

  • exclusion_regexps (List[str]) – Optional exclusion patterns

  • id (Optional[str]) – Unique identifier of the rule to store in the metadata of the entities

  • case_sensitive (bool) – Whether to consider case when running regexp and `exclusion_regexs

  • unicode_sensitive (bool) – If True, rule matches are searched on unicode text. So, regexp and `exclusion_regexs shall not contain non-ASCII chars because they would never be matched. If False, rule matches are searched on closest ASCII text when possible. (cf. NegationDetector)

class NegationMetadata(_typename, _fields=None, /, **kwargs)[source]#

Metadata dict added to negation attributes with True value.

Parameters

rule_id (Union[str, int]) – Identifier of the rule used to detect a negation. If the rule has no uid, then the index of the rule in the list of rules is used instead.