medkit.text.context#

APIs#

For accessing these APIs, you may use import like this:

from medkit.text.context import <api_to_import>

Classes:

`FamilyDetector`(output_label[, rules, uid])	Annotator creating family attributes with boolean values indicating if a family reference has been detected.
`FamilyDetectorRule`(regexp[, ...])	Regexp-based rule to use with FamilyDetector
`FamilyMetadata`(_typename[, _fields])	Metadata dict added to family attributes with True value.
`HypothesisDetector`([output_label, rules, ...])	Annotator creating hypothesis Attributes with boolean values indicating if an hypothesis has been found.
`HypothesisDetectorRule`(regexp[, ...])	Regexp-based rule to use with HypothesisDetector
`HypothesisRuleMetadata`(_typename[, _fields])	Metadata dict added to hypothesis attributes with True value detected by a rule
`HypothesisVerbMetadata`(_typename[, _fields])	Metadata dict added to hypothesis attributes with True value detected by a rule.
`NegationDetector`(output_label[, rules, uid])	Annotator creating negation Attributes with boolean values indicating if an hypothesis has been found.
`NegationDetectorRule`(regexp[, ...])	Regexp-based rule to use with NegationDetector
`NegationMetadata`(_typename[, _fields])	Metadata dict added to negation attributes with True value.

class FamilyDetector(output_label, rules=None, uid=None)[source]#

Annotator creating family attributes with boolean values indicating if a family reference has been detected.

Because family attributes will be attached to whole annotations, each input annotation should be “local”-enough rather than a big chunk of text (ie a sentence or a syntagma).

For detecting family references, the module uses rules that may be sensitive to unicode or not. When the rule is not sensitive to unicode, we try to convert unicode chars to the closest ascii chars. However, some characters need to be pre-processed before (e.g., n° -> number). So, if the text lengths are different, we fall back on initial unicode text for detection even if rule is not unicode-sensitive. In this case, a warning is logged for recommending to pre-process data.

Note that for better results, family detection should be run at the sentence level (ie on sentence segments) rather than at the syntagma level [1].

[1] N. Garcelon, A. Neuraz, V. Benoit, R. Salomon, A. Burgun, “Improving a full-text search engine: the importance of negation detection and family history context to identify cases in a biomedical data warehouse”, Journal of the American Medical Informatics Association, Volume 24, Issue 3, May 2017

Parameters

output_label (str) – The label of the created attributes
rules (Optional[List[FamilyDetectorRule]]) – The set of rules to use when detecting family references. If none provided, the rules in “family_detector_default_rules.yml” will be used
uid (str) – Identifier of the detector

Methods:

`check_rules_sanity`(rules)	Check consistency of a set of rules
`load_rules`(path_to_rules[, encoding])	Load all rules stored in a yml file
`run`(segments)	Add a family attribute to each segment with a boolean value indicating if a family reference has been detected.
`save_rules`(rules, path_to_rules[, encoding])	Store rules in a yml file
`set_prov_tracer`(prov_tracer)	Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

run(segments)[source]#

Add a family attribute to each segment with a boolean value indicating if a family reference has been detected.

Family attributes with a True value have a metadata dict with fields described in FamilyMetadata.

Parameters: segments (List[Segment]) – List of segments to detect as being family references or not

static load_rules(path_to_rules, encoding=None)[source]#

Load all rules stored in a yml file

Parameters

path_to_rules (Path) – Path to a yml file containing a list of mappings with the same structure as FamilyDetectorRule
encoding (Optional[str]) – Encoding of the file to open

Return type

List[FamilyDetectorRule]

Returns

List[FamilyDetectorRule] – List of all the rules in path_to_rules, can be used to init a FamilyDetector

static check_rules_sanity(rules)[source]#

Check consistency of a set of rules

static save_rules(rules, path_to_rules, encoding=None)[source]#

Store rules in a yml file

Parameters

rules (List[FamilyDetectorRule]) – The rules to save
path_to_rules (Path) – Path to a .yml file that will contain the rules
encoding (Optional[str]) – Encoding of the .yml file

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type: OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters: prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

class FamilyDetectorRule(regexp, exclusion_regexps=<factory>, id=None, case_sensitive=False, unicode_sensitive=False)[source]#

Regexp-based rule to use with FamilyDetector

Input text may be converted before detecting rule.

Parameters

regexp (str) – The regexp pattern used to match a family reference
exclusion_regexps (List[str]) – Optional exclusion patterns
id (Optional[str]) – Unique identifier of the rule to store in the metadata of the entities
case_sensitive (bool) – Whether to consider case when running regexp and `exclusion_regexs
unicode_sensitive (bool) – If True, rule matches are searched on unicode text. So, regexp and exclusion_regexps shall not contain non-ASCII chars because they would never be matched. If False, rule matches are searched on closest ASCII text when possible. (cf. FamilyDetector)

class FamilyMetadata(_typename, _fields=None, /, **kwargs)[source]#

Metadata dict added to family attributes with True value.

Parameters: rule_id (Union[str, int]) – Identifier of the rule used to detect a family reference. If the rule has no id, then the index of the rule in the list of rules is used instead.

clear() → None. Remove all items from D.#

copy() → a shallow copy of D#

fromkeys(value=None, /)#: Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)#: Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D's items#

keys() → a set-like object providing a view on D's keys#

pop(k[, d]) → v, remove specified key and return the corresponding value.#: If key is not found, d is returned if given, otherwise KeyError is raised

popitem()#

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)#

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) → None. Update D from dict/iterable E and F.#: If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D's values#

class HypothesisDetector(output_label='hypothesis', rules=None, verbs=None, modes_and_tenses=None, max_length=150, uid=None)[source]#

Annotator creating hypothesis Attributes with boolean values indicating if an hypothesis has been found.

Hypothesis will be considered present either because of the presence of a certain text pattern in a segment, or because of the usage of a certain verb at a specific mode and tense (for instance conditional).

Because hypothesis attributes will be attached to whole segments, each input segment should be “local”-enough (ie a sentence or a syntagma) rather than a big chunk of text.

Instantiate the hypothesis detector

Parameters

output_label (str) – The label of the created attributes
rules (Optional[List[HypothesisDetectorRule]]) – The set of rules to use when detecting hypothesis. If none provided, the rules in “hypothesis_detector_default_rules.yml” will be used
verbs (Optional[Dict[str, Dict[str, Dict[str, List[str]]]]]) – Conjugated verbs forms, to be used in association with modes_and_tenses. Conjugated forms of a verb at a specific mode and tense must be provided in nested dicts with the 1st key being the verb’s root, the 2d key the mode and the 3d key the tense. For instance verb[“aller”][“indicatif][“présent”] would hold the list [“vais”, “vas”, “va”, “allons”, aller”, “vont”] When verbs is provided, modes_and_tenses must also be provided. If none provided, the rules in “hypothesis_detector_default_verbs.yml” will be used.
modes_and_tenses (Optional[List[Tuple[str, str]]]) – List of tuples of all modes and tenses associated with hypothesis. Will be used to select conjugated forms in verbs that denote hypothesis.
max_length (int) – Maximum number of characters in a hypothesis segment. Segments longer than this will never be considered as hypothesis
uid (str) – Identifier of the detector

Methods:

`check_rules_sanity`(rules)	Check consistency of a set of rules
`get_example`()	Instantiate an HypothesisDetector with example rules and verbs, designed for usage with EDS documents
`load_rules`(path_to_rules[, encoding])	Load all rules stored in a yml file
`load_verbs`(path_to_verbs[, encoding])	Load all conjugated verb forms stored in a yml file.
`run`(segments)	Add an hypothesis attribute to each segment with a boolean value indicating if an hypothesis has been detected.
`save_rules`(rules, path_to_rules[, encoding])	Store rules in a yml file
`set_prov_tracer`(prov_tracer)	Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

run(segments)[source]#

Add an hypothesis attribute to each segment with a boolean value indicating if an hypothesis has been detected.

Hypothesis attributes with a True value have a metadata dict with fields described in either HypothesisRuleMetadata or HypothesisVerbMetadata.

Parameters: segments (List[Segment]) – List of segments to detect as being hypothesis or not

static load_verbs(path_to_verbs, encoding=None)[source]#

Load all conjugated verb forms stored in a yml file. Conjugated verb forms at a specific mode and tense must be stored in nested mappings with the 1st key being the verb root, the 2d key the mode and the 3d key the tense.

Parameters

path_to_verbs (Path) – Path to a yml file containing a list of verbs form, arranged by mode and tense.
encoding (Optional[str]) – Encoding on the file to open

Return type

Dict[str, Dict[str, Dict[str, List[str]]]]

Returns

List[Dict[str, Dict[str, List[str]]]] – List of verb forms in path_to_verbs, can be used to init an HypothesisDetector

static load_rules(path_to_rules, encoding=None)[source]#

Load all rules stored in a yml file

Parameters

path_to_rules (Path) – Path to a yml file containing a list of mappings with the same structure as HypothesisDetectorRule
encoding (Optional[str]) – Encoding of the file to open

Return type

List[HypothesisDetectorRule]

Returns

List[HypothesisDetectorRule] – List of all the rules in path_to_rules, can be used to init an HypothesisDetector

classmethod get_example()[source]#

Instantiate an HypothesisDetector with example rules and verbs, designed for usage with EDS documents

Return type: HypothesisDetector

static check_rules_sanity(rules)[source]#

Check consistency of a set of rules

static save_rules(rules, path_to_rules, encoding=None)[source]#

Store rules in a yml file

Parameters

rules (List[HypothesisDetectorRule]) – The rules to save
path_to_rules (Path) – Path to a .yml file that will contain the rules
encoding (Optional[str]) – Encoding of the .yml file

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type: OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters: prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

class HypothesisDetectorRule(regexp, exclusion_regexps=<factory>, id=None, case_sensitive=False, unicode_sensitive=False)[source]#

Regexp-based rule to use with HypothesisDetector

Variables

regexp (str) – The regexp pattern used to match a hypothesis
exclusion_regexps (List[str]) – Optional exclusion patterns
id (Optional[str]) – Unique identifier of the rule to store in the metadata of the entities
case_sensitive (bool) – Whether to ignore case when running regexp and `exclusion_regexps
unicode_sensitive (bool) – Whether to replace all non-ASCII chars by the closest ASCII chars on input text before running regexp and `exclusion_regexps. If True, then regexp and `exclusion_regexps shouldn’t contain non-ASCII chars because they would never be matched.

class HypothesisRuleMetadata(_typename, _fields=None, /, **kwargs)[source]#

Metadata dict added to hypothesis attributes with True value detected by a rule

Parameters

type (Literal['rule']) – Metadata type, here “rule” (use to differentiate between rule/verb metadata dict)
rule_id (str) – Identifier of the rule used to detect an hypothesis. If the rule has no uid, then the index of the rule in the list of rules is used instead

clear() → None. Remove all items from D.#

copy() → a shallow copy of D#

fromkeys(value=None, /)#: Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)#: Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D's items#

keys() → a set-like object providing a view on D's keys#

pop(k[, d]) → v, remove specified key and return the corresponding value.#: If key is not found, d is returned if given, otherwise KeyError is raised

popitem()#

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)#

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) → None. Update D from dict/iterable E and F.#: If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D's values#

class HypothesisVerbMetadata(_typename, _fields=None, /, **kwargs)[source]#

Metadata dict added to hypothesis attributes with True value detected by a rule.

Parameters

type (Literal['verb']) – Metadata type, here “verb” (use to differentiate between rule/verb metadata dict).
matched_verb (str) – Root of the verb used to detect an hypothesis.

clear() → None. Remove all items from D.#

copy() → a shallow copy of D#

fromkeys(value=None, /)#: Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)#: Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D's items#

keys() → a set-like object providing a view on D's keys#

pop(k[, d]) → v, remove specified key and return the corresponding value.#: If key is not found, d is returned if given, otherwise KeyError is raised

popitem()#

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)#

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) → None. Update D from dict/iterable E and F.#: If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D's values#

class NegationDetector(output_label, rules=None, uid=None)[source]#

Annotator creating negation Attributes with boolean values indicating if an hypothesis has been found.

Because negation attributes will be attached to whole annotations, each input annotation should be “local”-enough rather than a big chunk of text (ie a sentence or a syntagma).

For detecting negation, the module uses rules that may be sensitive to unicode or not. When the rule is not sensitive to unicode, we try to convert unicode chars to the closest ascii chars. However, some characters need to be pre-processed before (e.g., n° -> number). So, if the text lengths are different, we fall back on initial unicode text for detection even if rule is not unicode-sensitive. In this case, a warning is logged for recommending to pre-process data.

Instantiate the negation detector

Parameters

output_label (str) – The label of the created attributes
rules (Optional[List[NegationDetectorRule]]) – The set of rules to use when detecting negation. If none provided, the rules in “negation_detector_default_rules.yml” will be used
uid (str) – Identifier of the detector

Methods:

`check_rules_sanity`(rules)	Check consistency of a set of rules
`load_rules`(path_to_rules[, encoding])	Load all rules stored in a yml file
`run`(segments)	Add a negation attribute to each segment with a boolean value indicating if an hypothesis has been found.
`save_rules`(rules, path_to_rules[, encoding])	Store rules in a yml file
`set_prov_tracer`(prov_tracer)	Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

run(segments)[source]#

Add a negation attribute to each segment with a boolean value indicating if an hypothesis has been found.

Negation attributes with a True value have a metadata dict with fields described in NegationRuleMetadata.

Parameters: segments (List[Segment]) – List of segments to detect as being negated or not

static load_rules(path_to_rules, encoding=None)[source]#

Load all rules stored in a yml file

Parameters

path_to_rules (Path) – Path to a yml file containing a list of mappings with the same structure as NegationDetectorRule
encoding (Optional[str]) – Encoding of the file to open

Return type

List[NegationDetectorRule]

Returns

List[NegationDetectorRule] – List of all the rules in path_to_rules, can be used to init a NegationDetector

static check_rules_sanity(rules)[source]#

Check consistency of a set of rules

static save_rules(rules, path_to_rules, encoding=None)[source]#

Store rules in a yml file

Parameters

rules (List[NegationDetectorRule]) – The rules to save
path_to_rules (Path) – Path to a .yml file that will contain the rules
encoding (Optional[str]) – Encoding of the .yml file

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type: OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters: prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

class NegationDetectorRule(regexp, exclusion_regexps=<factory>, id=None, case_sensitive=False, unicode_sensitive=False)[source]#

Regexp-based rule to use with NegationDetector

Input text may be converted before detecting rule.

Parameters

regexp (str) – The regexp pattern used to match a negation
exclusion_regexps (List[str]) – Optional exclusion patterns
id (Optional[str]) – Unique identifier of the rule to store in the metadata of the entities
case_sensitive (bool) – Whether to consider case when running regexp and `exclusion_regexs
unicode_sensitive (bool) – If True, rule matches are searched on unicode text. So, regexp and `exclusion_regexs shall not contain non-ASCII chars because they would never be matched. If False, rule matches are searched on closest ASCII text when possible. (cf. NegationDetector)

class NegationMetadata(_typename, _fields=None, /, **kwargs)[source]#

Metadata dict added to negation attributes with True value.

Parameters: rule_id (Union[str, int]) – Identifier of the rule used to detect a negation. If the rule has no uid, then the index of the rule in the list of rules is used instead.

clear() → None. Remove all items from D.#

copy() → a shallow copy of D#

fromkeys(value=None, /)#: Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)#: Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D's items#

keys() → a set-like object providing a view on D's keys#

pop(k[, d]) → v, remove specified key and return the corresponding value.#: If key is not found, d is returned if given, otherwise KeyError is raised

popitem()#

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)#

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) → None. Update D from dict/iterable E and F.#: If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D's values#

Subpackages / Submodules#

`medkit.text.context.family_detector`
`medkit.text.context.hypothesis_detector`
`medkit.text.context.negation_detector`

medkit.text.context

Contents

medkit.text.context#

APIs#

Subpackages / Submodules#