medkit.text.ner#

APIs#

For accessing these APIs, you may use import like this:

from medkit.text.ner import <api_to_import>

Classes:

ADICAPNormAttribute(code[, sampling_mode, ...])

Attribute describing tissue sample using the ADICAP (Association pour le Développement de l'Informatique en Cytologie et Anatomo-Pathologie) coding.

DateAttribute(label[, year, month, day, ...])

Attribute representing an absolute date or time associated to a segment or entity.

DucklingMatcher(output_label, version[, ...])

Entity annotator using Duckling (https://github.com/facebook/duckling).

DurationAttribute(label[, years, months, ...])

Attribute representing a time quantity associated to a segment or entity.

IAMSystemMatcher(matcher[, label_provider, ...])

Entity annotator and linker based on iamsystem library

MedkitKeyword(label, kb_id, kb_name, ent_label)

A recommended iamsystem's IEntity implementation.

RegexpMatcher([rules, attrs_to_copy, name, uid])

Entity annotator relying on regexp-based rules

RegexpMatcherNormalization(kb_name, kb_id[, ...])

Descriptor of normalization attributes to attach to entities created from a RegexpMatcherRule

RegexpMatcherRule(regexp, label[, term, id, ...])

Regexp-based rule to use with RegexpMatcher

RegexpMetadata(_typename[, _fields])

Metadata dict added to entities matched by RegexpMatcher

RelativeDateAttribute(label, direction[, ...])

Attribute representing a relative date or time associated to a segment or entity, ie a date/time offset from an (unknown) reference date/time, with a direction.

RelativeDateDirection(value)

Direction of a RelativeDateAttribute

SimstringMatcher(rules[, threshold, ...])

Entity matcher relying on string similarity

SimstringMatcherNormalization(kb_name, kb_id)

Descriptor of normalization attributes to attach to entities created from a SimstringMatcherRule

SimstringMatcherRule(term, label[, ...])

Rule to use with SimstringMatcher

UMLSMatcher(umls_dir, cache_dir, language[, ...])

Entity annotator identifying UMLS concepts using the simstring fuzzy matching algorithm (http://chokkan.org/software/simstring/).

class ADICAPNormAttribute(code, sampling_mode=None, technic=None, organ=None, pathology=None, pathology_type=None, behaviour_type=None, metadata=None, uid=None)[source]#

Attribute describing tissue sample using the ADICAP (Association pour le Développement de l’Informatique en Cytologie et Anatomo-Pathologie) coding.

Cf https://smt.esante.gouv.fr/wp-json/ans/terminologies/document?terminologyId=terminologie-adicap&fileName=cgts_sem_adicap_fiche-detaillee.pdf for a complete description of the coding.

This class is replicating EDS-NLP’s Adicap class, making it a medkit Attribute.

The code field fully describes the tissue sample. Additional information is derived from code in human readable fields (sampling_code, technic, organ, pathology, pathology_type, behaviour_type)

Variables
  • uid – Identifier of the attribute

  • label – The attribute label, always set to EntityNormAttribute.LABEL

  • value – ADICAP code prefix with “adicap:” (ex: “adicap:BHGS0040”)

  • code – ADICAP code as a string (ex: “BHGS0040”)

  • kb_id – Same as code

  • sampling_mode (Optional[str]) – Sampling mode (ex: “BIOPSIE CHIRURGICALE”)

  • technic (Optional[str]) – Sampling technic (ex: “HISTOLOGIE ET CYTOLOGIE PAR INCLUSION”)

  • organ (Optional[str]) – Organ and regions (ex: “SEIN (ÉGALEMENT UTILISÉ CHEZ L’HOMME)”)

  • pathology (Optional[str]) – General pathology (ex: “PATHOLOGIE GÉNÉRALE NON TUMORALE”)

  • pathology_type (Optional[str]) – Pathology type (ex: “ETAT SUBNORMAL - LESION MINEURE”)

  • behaviour_type (Optional[str]) – Behaviour type (ex: “CARACTERES GENERAUX”)

  • metadata – Metadata of the attribute

Methods:

copy()

Create a new attribute that is a copy of the current instance, but with a new identifier

from_dict(adicap_dict)

Creates an Attribute from a dict

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

to_brat()

Return a value compatible with the brat format

to_spacy()

Return a value compatible with spaCy

copy()#

Create a new attribute that is a copy of the current instance, but with a new identifier

This is used when we want to duplicate an existing attribute onto a different annotation.

Return type

Attribute

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters

data_dict (Dict[str, Any]) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type

Optional[Type[Self]]

Returns

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

to_brat()#

Return a value compatible with the brat format

Return type

str

to_spacy()#

Return a value compatible with spaCy

Return type

str

classmethod from_dict(adicap_dict)[source]#

Creates an Attribute from a dict

Parameters

attribute_dict (dict) – A dictionary from a serialized Attribute as generated by to_dict()

Return type

Self

class DucklingMatcher(output_label, version, url='http://localhost:8000', locale='fr_FR', dims=None, attrs_to_copy=None, uid=None)[source]#

Entity annotator using Duckling (https://github.com/facebook/duckling).

This annotator can parse several types of information in multiple languages:

amount of money, credit card numbers, distance, duration, email, numeral, ordinal, phone number, quantity, temperature, time, url, volume.

This annotator currently requires a Duckling Server running. The easiest method is to run a docker container :

>>> docker run --rm -d -p <PORT>:8000 --name duckling rasa/duckling:<TAG>

This command will start a Duckling server listening on port <PORT>. The version of the server is identified by <TAG>

Instantiate the Duckling matcher

Parameters
  • version (str) – Version of the Duckling server.

  • output_label (str) – Label to use for attributes created by this annotator.

  • url (str) – URL of the server. Defaults to “http://localhost:8000

  • locale (str) – Language flag of the text to parse following ISO-639-1 standard, e.g. “fr_FR”

  • dims (Optional[List[str]]) – List of dimensions to extract. If None, all available dimensions will be extracted.

  • attrs_to_copy (Optional[List[str]]) – Labels of the attributes that should be copied from the source segment to the created entity. Useful for propagating context attributes (negation, antecendent, etc)

Methods:

run(segments)

Return entities for each match in segments

set_prov_tracer(prov_tracer)

Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

run(segments)[source]#

Return entities for each match in segments

Parameters

segments (List[Segment]) – List of segments into which to look for matches

Return type

List[Entity]

Returns

entities (List[Entity]) – Entities found in segments

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

class RegexpMatcher(rules=None, attrs_to_copy=None, name=None, uid=None)[source]#

Entity annotator relying on regexp-based rules

For detecting entities, the module uses rules that may be sensitive to unicode or not. When the rule is not sensitive to unicode, we try to convert unicode chars to the closest ascii chars. However, some characters need to be pre-processed before (e.g., -> number). So, if the text lengths are different, we fall back on initial unicode text for detection even if rule is not unicode-sensitive. In this case, a warning is logged for recommending to pre-process data.

Instantiate the regexp matcher

Parameters
  • rules (Optional[List[RegexpMatcherRule]]) – The set of rules to use when matching entities. If none provided, the rules in “regexp_matcher_default_rules.yml” will be used

  • attrs_to_copy (Optional[List[str]]) – Labels of the attributes that should be copied from the source segment to the created entity. Useful for propagating context attributes (negation, antecedent, etc)

  • name (Optional[str]) – Name describing the matcher (defaults to the class name)

  • uid (str) – Identifier of the matcher

Methods:

check_rules_sanity(rules)

Check consistency of a set of rules

load_rules(path_to_rules[, encoding])

Load all rules stored in a yml file

run(segments)

Return entities (with optional normalization attributes) matched in segments

save_rules(rules, path_to_rules[, encoding])

Store rules in a yml file

set_prov_tracer(prov_tracer)

Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

run(segments)[source]#

Return entities (with optional normalization attributes) matched in segments

Parameters

segments (List[Segment]) – List of segments into which to look for matches

Return type

List[Entity]

Returns

entities (List[Entity]:) – Entities found in segments (with optional normalization attributes). Entities have a metadata dict with fields described in RegexpMetadata

static load_rules(path_to_rules, encoding=None)[source]#

Load all rules stored in a yml file

Parameters
  • path_to_rules (Path) – Path to a yml file containing a list of mappings with the same structure as RegexpMatcherRule

  • encoding (Optional[str]) – Encoding of the file to open

Return type

List[RegexpMatcherRule]

Returns

List[RegexpMatcherRule] – List of all the rules in path_to_rules, can be used to init a RegexpMatcher

static check_rules_sanity(rules)[source]#

Check consistency of a set of rules

static save_rules(rules, path_to_rules, encoding=None)[source]#

Store rules in a yml file

Parameters
  • rules (List[RegexpMatcherRule]) – The rules to save

  • path_to_rules (Path) – Path to a .yml file that will contain the rules

  • encoding (Optional[str]) – Encoding of the .yml file

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

class RegexpMatcherRule(regexp, label, term=None, id=None, version=None, index_extract=0, case_sensitive=True, unicode_sensitive=True, exclusion_regexp=None, normalizations=<factory>)[source]#

Regexp-based rule to use with RegexpMatcher

Variables
  • regexp (str) – The regexp pattern used to match entities

  • label (str) – The label to attribute to entities created based on this rule

  • term (Optional[str]) – The optional normalized version of the entity text

  • id (Optional[str]) – Unique identifier of the rule to store in the metadata of the entities

  • version (Optional[str]) – Version string to store in the metadata of the entities

  • index_extract (int) – If the regexp has groups, the index of the group to use to extract the entity

  • case_sensitive (bool) – Whether to ignore case when running regexp and `exclusion_regexp

  • unicode_sensitive (bool) – If True, regexp rule matches are searched on unicode text. So, regexp and `exclusion_regexps shall not contain non-ASCII chars because they would never be matched. If False, regexp rule matches are searched on closest ASCII text when possible. (cf. RegexpMatcher)

  • exclusion_regexp (Optional[str]) – An optional exclusion pattern. Note that this exclusion pattern will be executed on the whole input annotation, so when relying on exclusion_regexp make sure the input annotations passed to RegexpMatcher are “local”-enough (sentences or syntagmas) rather than the whole text or paragraphs

  • normalizations (List[medkit.text.ner.regexp_matcher.RegexpMatcherNormalization]) – Optional list of normalization attributes that should be attached to the entities created

class RegexpMatcherNormalization(kb_name, kb_id, kb_version=None)[source]#

Descriptor of normalization attributes to attach to entities created from a RegexpMatcherRule

Variables
  • kb_name (str) – The name of the knowledge base we are referencing. Ex: “umls”

  • kb_version (Optional[str]) – The name of the knowledge base we are referencing. Ex: “202AB”

  • kb_id (Any) – The id of the entity in the knowledge base, for instance a CUI

class RegexpMetadata(_typename, _fields=None, /, **kwargs)[source]#

Metadata dict added to entities matched by RegexpMatcher

Parameters
  • rule_id (Union[str, int]) – Identifier of the rule used to match an entity. If the rule has no id, then the index of the rule in the list of rules is used instead.

  • version (Optional[str]) – Optional version of the rule used to match an entity

clear() None.  Remove all items from D.#
copy() a shallow copy of D#
fromkeys(value=None, /)#

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)#

Return the value for key if key is in the dictionary, else default.

items() a set-like object providing a view on D's items#
keys() a set-like object providing a view on D's keys#
pop(k[, d]) v, remove specified key and return the corresponding value.#

If key is not found, d is returned if given, otherwise KeyError is raised

popitem()#

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)#

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) None.  Update D from dict/iterable E and F.#

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() an object providing a view on D's values#
class SimstringMatcher(rules, threshold=0.9, min_length=3, max_length=50, similarity='jaccard', spacy_tokenization_language=None, blacklist=None, same_beginning=False, attrs_to_copy=None, name=None, uid=None)[source]#

Entity matcher relying on string similarity

Uses the simstring fuzzy matching algorithm (http://chokkan.org/software/simstring/).

Note that setting spacy_tokenization_language to True might reduce the number of false positives. This requires the spacy optional dependency, which can be installed with pip install medkit-lib[spacy].

Parameters
  • rules (List[SimstringMatcherRule]) – Rules to use for matching entities.

  • min_length (int) – Minimum number of chars in matched entities.

  • max_length (int) – Maximum number of chars in matched entities.

  • threshold (float) – Minimum similarity (between 0.0 and 1.0) between a rule term and the text of an entity matched on that rule.

  • similarity (Literal['cosine', 'dice', 'jaccard', 'overlap']) – Similarity metric to use.

  • spacy_tokenization_language (Optional[str]) – 2-letter code (ex: “fr”, “en”, etc.) designating the language of the spacy model to use for tokenization. If provided, spacy will be used to tokenize input segments and filter out some tokens based on their part-of-speech tags, such as determinants, conjunctions and prepositions. If None, a simple regexp based tokenization will be used, which is faster but might give more false positives.

  • blacklist (Optional[List[str]]) – Optional list of exact terms to ignore.

  • same_beginning (bool) – Ignore all matches that start with a different character than the term of the rule. This can be convenient to get rid of false positives on words that are very similar but have opposite meanings because of a preposition, for instance “activation” and “inactivation”.

  • attrs_to_copy (Optional[List[str]]) – Labels of the attributes that should be copied from the source segment to the created entity. Useful for propagating context attributes (negation, antecedent, etc.).

  • name (Optional[str]) – Name describing the matcher (defaults to the class name).

  • uid (str) – Identifier of the matcher.

Methods:

load_rules(path_to_rules[, encoding])

Load all rules stored in a yml file

run(segments)

Return entities (with optional normalization attributes) matched in segments

save_rules(rules, path_to_rules[, encoding])

Store rules in a yml file

set_prov_tracer(prov_tracer)

Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

static load_rules(path_to_rules, encoding=None)[source]#

Load all rules stored in a yml file

Parameters
  • path_to_rules (Path) – The path to a yml file containing a list of mappings with the same structure as SimstringMatcherRule

  • encoding (Optional[str]) – The encoding of the file to open

Return type

List[SimstringMatcherRule]

Returns

List[SimstringMatcherRule] – List of all the rules in path_to_rules, can be used to init a SimstringMatcher

static save_rules(rules, path_to_rules, encoding=None)[source]#

Store rules in a yml file

Parameters
  • rules (List[SimstringMatcherRule]) – The rules to save

  • path_to_rules (Path) – The path to a yml file that will contain the rules

  • encoding (Optional[str]) – The encoding of the yml file

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type

OperationDescription

run(segments)#

Return entities (with optional normalization attributes) matched in segments

Parameters

segments (List[Segment]) – List of segments into which to look for matches

Return type

List[Entity]

Returns

entities (List[Entity]:) – Entities found in segments (with optional normalization attributes)

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

class SimstringMatcherRule(term, label, case_sensitive=False, unicode_sensitive=False, normalizations=<factory>)[source]#

Rule to use with SimstringMatcher

Variables
  • term (str) – Term to match using similarity-based fuzzy matching

  • label (str) – Label to use for the entities created when a match is found

  • case_sensitive (bool) – Whether to take case into account when looking for matches.

  • unicode_sensitive (bool) – Whether to use ASCII-only versions of the rule term and input texts when looking for matches (non-ASCII chars replaced by closest ASCII chars).

  • normalizations (List[medkit.text.ner._base_simstring_matcher.BaseSimstringMatcherNormalization]) – Optional list of normalization attributes that should be attached to the entities created

Methods:

from_dict(data)

Creates a SimStringMatcherRule from a dict.

static from_dict(data)[source]#

Creates a SimStringMatcherRule from a dict.

Return type

SimstringMatcherRule

class SimstringMatcherNormalization(kb_name, kb_id, kb_version=None, term=None)[source]#

Descriptor of normalization attributes to attach to entities created from a SimstringMatcherRule

Variables
  • kb_name (str) – The name of the knowledge base we are referencing. Ex: “umls”

  • kb_version (Optional[str]) – The name of the knowledge base we are referencing. Ex: “202AB”

  • kb_id (Union[int, str]) – The id of the entity in the knowledge base, for instance a CUI

  • term (Optional[str]) – Optional normalized version of the entity text in the knowledge base

Methods:

from_dict(data)

Creates a SimstringMatcherNormalization object from a dict

to_attribute(score)

Create a normalization attribute based on the normalization descriptor

static from_dict(data)[source]#

Creates a SimstringMatcherNormalization object from a dict

Return type

SimstringMatcherNormalization

to_attribute(score)#

Create a normalization attribute based on the normalization descriptor

Parameters

score (float) – Score of similarity between the normalized term and the entity text

Return type

EntityNormAttribute

Returns

EntityNormAttribute – Normalization attribute to add to entity

class UMLSMatcher(umls_dir, cache_dir, language, threshold=0.9, min_length=3, max_length=50, similarity='jaccard', lowercase=True, normalize_unicode=False, spacy_tokenization=False, semgroups=('ANAT', 'CHEM', 'DEVI', 'DISO', 'PHYS', 'PROC'), blacklist=None, same_beginning=False, output_labels_by_semgroup=None, attrs_to_copy=None, name=None, uid=None)[source]#

Entity annotator identifying UMLS concepts using the simstring fuzzy matching algorithm (http://chokkan.org/software/simstring/).

This operation is heavily inspired by the QuickUMLS library (https://github.com/Georgetown-IR-Lab/QuickUMLS).

By default, only terms belonging to the ANAT (anatomy), CHEM (Chemicals & Drugs), DEVI (Devices), DISO (Disorders), PHYS (Physiology) and PROC (Procedures) semgroups will be considered. This behavior can be changed with the semgroups parameter.

Note that setting spacy_tokenization_language to True might reduce the number of false positives. This requires the spacy optional dependency, which can be installed with pip install medkit-lib[spacy].

Parameters
  • umls_dir (Union[str, Path]) – Path to the UMLS directory containing the MRCONSO.RRF and MRSTY.RRF files.

  • cache_dir (Union[str, Path]) – Path to the directory into which the umls database will be cached. If it doesn’t exist yet, the database will be automatically generated (it can take a long time) and stored there, ready to be reused on further instantiations. If it already exists, a check will be done to make sure the params used when the database was generated are consistent with the params of the current instance. If you want to rebuild the database with new params using the same cache dir, you will have to manually delete it first.

  • language (str) – Language to consider as found in the MRCONSO.RRF file. Example: “FRE”. Will trigger a regeneration of the database if changed.

  • min_length (int) – Minimum number of chars in matched entities.

  • max_length (int) – Maximum number of chars in matched entities.

  • threshold (float) – Minimum similarity threshold (between 0.0 and 1.0) between a UMLS term and the text of a matched entity.

  • similarity (Literal['cosine', 'dice', 'jaccard', 'overlap']) – Similarity metric to use.

  • same_beginning (bool) – Ignore all matches that start with a different character than the term of the rule. This can be convenient to get rid of false positives on words that are very similar but have opposite meanings because of a preposition, for instance “activation” and “inactivation”.

  • lowercase (bool) – Whether to use lowercased versions of UMLS terms and input entities (except for acronyms for which the uppercase term is always used). Will trigger a regeneration of the database if changed.

  • normalize_unicode (bool) – Whether to use ASCII-only versions of UMLS terms and input entities (non-ASCII chars replaced by closest ASCII chars). Will trigger a regeneration of the database if changed.

  • spacy_tokenization (bool) – If True, spacy will be used to tokenize input segments and filter out some tokens based on their part-of-speech tags, such as determinants, conjunctions and prepositions. If None, a simple regexp based tokenization will be used, which is faster but might give more false positives.

  • semgroups (Optional[Sequence[str]]) – Ids of UMLS semantic groups that matched concepts should belong to. cf https://lhncbc.nlm.nih.gov/semanticnetwork/download/sg_archive/SemGroups-v04.txt The default value is [“ANAT”,”CHEM”, “DEVI”, “DISO”, “PHYS”,”PROC”]. If set to None, all concepts can be matched. Will trigger a regeneration of the database if changed.

  • blacklist (Optional[List[str]]) – Optional list of exact terms to ignore.

  • output_labels_by_semgroup (Union[str, Dict[str, str], None]) – By default, ~`medkit.text.ner.umls.SEMGROUP_LABELS` will be used as entity labels. Use this parameter to override them. Example: {“DISO”: “problem”, “PROC”: “test}. If output_labels_by_semgroup is a string, all entities will use this string as label instead. Will trigger a regeneration of the database if changed.

  • attrs_to_copy (Optional[List[str]]) – Labels of the attributes that should be copied from the source segment to the created entity. Useful for propagating context attributes (negation, antecedent, etc)

  • name (Optional[str]) – Name describing the matcher (defaults to the class name).

  • uid (str) – Identifier of the matcher.

Attributes:

description

Contains all the operation init parameters.

Methods:

run(segments)

Return entities (with optional normalization attributes) matched in segments

set_prov_tracer(prov_tracer)

Enable provenance tracing.

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type

OperationDescription

run(segments)#

Return entities (with optional normalization attributes) matched in segments

Parameters

segments (List[Segment]) – List of segments into which to look for matches

Return type

List[Entity]

Returns

entities (List[Entity]:) – Entities found in segments (with optional normalization attributes)

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

class IAMSystemMatcher(matcher, label_provider=None, attrs_to_copy=None, name=None, uid=None)[source]#

Entity annotator and linker based on iamsystem library

Instantiate the operation supporting the iamsystem matcher

Parameters
  • matcher (Matcher) – IAM system Matcher

  • label_provider (Optional[Callable[[Sequence[IKeyword]], Optional[str]]]) – Callable providing the output label to set for detected entity. As iamsystem matcher may return several keywords for an annotation, we have to know how to provide only one entity label whatever the number of matched keywords. In medkit, normalization attributes are used for representing detected keywords.

  • attrs_to_copy (Optional[List[str]]) – Labels of the attributes that should be copied from the input segment to the created entity. Useful for propagating context attributes (negation, antecedent, etc).

  • name (Optional[str]) – Name describing the matcher (defaults to the class name)

  • uid (str) – Identifier of the operation

Attributes:

description

Contains all the operation init parameters.

Methods:

set_prov_tracer(prov_tracer)

Enable provenance tracing.

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

class MedkitKeyword(label, kb_id, kb_name, ent_label)[source]#

A recommended iamsystem’s IEntity implementation.

This class is implemented to allow user to define one of both values of kb_id or kb_name with its iamsystem keyword. The entity label may be also provided if the user wants to define a category for the searched keyword (e.g., “drug” label for “Vicodin” keyword)

Also implements SupportEntLabel, SupportKBName protocols

class DateAttribute(label, year=None, month=None, day=None, hour=None, minute=None, second=None, metadata=None, uid=None)[source]#

Attribute representing an absolute date or time associated to a segment or entity.

The date or time can be incomplete: each date/time component is optional but at least one must be provided.

Variables
  • uid (str) – Identifier of the attribute

  • label (str) – Label of the attribute

  • value (Optional[Any]) – String representation of the date with YYYY-MM-DD format for the date part and HH:MM:SS for the time part, if present. Missing components are replaced with question marks.

  • year (Optional[int]) – Year component of the date

  • month (Optional[int]) – Month component of the date

  • day (Optional[int]) – Day component of the date

  • hour (Optional[int]) – Hour component of the time

  • minute (Optional[int]) – Minute component of the time

  • second (Optional[int]) – Second component of the time

  • metadata (Dict[str, Any]) – Metadata of the attribute

Methods:

copy()

Create a new attribute that is a copy of the current instance, but with a new identifier

from_dict(date_dict)

Creates an Attribute from a dict

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

to_brat()

Return a value compatible with the brat format

to_spacy()

Return a value compatible with spaCy

to_brat()[source]#

Return a value compatible with the brat format

Return type

str

to_spacy()[source]#

Return a value compatible with spaCy

Return type

str

classmethod from_dict(date_dict)[source]#

Creates an Attribute from a dict

Parameters

attribute_dict (dict) – A dictionary from a serialized Attribute as generated by to_dict()

Return type

Self

copy()#

Create a new attribute that is a copy of the current instance, but with a new identifier

This is used when we want to duplicate an existing attribute onto a different annotation.

Return type

Attribute

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters

data_dict (Dict[str, Any]) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type

Optional[Type[Self]]

Returns

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

class DurationAttribute(label, years=0, months=0, weeks=0, days=0, hours=0, minutes=0, seconds=0, metadata=None, uid=None)[source]#

Attribute representing a time quantity associated to a segment or entity.

Each date/time component is optional but at least one must be provided.

Variables
  • uid (str) – Identifier of the attribute

  • label (str) – Label of the attribute

  • value (Optional[Any]) – String representation of the duration (ex: “1 year 10 months 2 days”)

  • direction – Direction the relative date. Ex: “2 years ago” correspond to the PAST direction and “in 2 weeks” to the FUTURE direction.

  • years (int) – Year component of the date quantity

  • months (int) – Month component of the date quantity

  • weeks (int) – Week component of the date quantity

  • days (int) – Day component of the date quantity

  • hours (int) – Hour component of the time quantity

  • minutes (int) – Minute component of the time quantity

  • seconds (int) – Second component of the time quantity

  • metadata (Dict[str, Any]) – Metadata of the attribute

Methods:

copy()

Create a new attribute that is a copy of the current instance, but with a new identifier

from_dict(duration_dict)

Creates an Attribute from a dict

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

to_brat()

Return a value compatible with the brat format

to_spacy()

Return a value compatible with spaCy

to_brat()[source]#

Return a value compatible with the brat format

Return type

str

to_spacy()[source]#

Return a value compatible with spaCy

Return type

str

classmethod from_dict(duration_dict)[source]#

Creates an Attribute from a dict

Parameters

attribute_dict (dict) – A dictionary from a serialized Attribute as generated by to_dict()

Return type

Self

copy()#

Create a new attribute that is a copy of the current instance, but with a new identifier

This is used when we want to duplicate an existing attribute onto a different annotation.

Return type

Attribute

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters

data_dict (Dict[str, Any]) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type

Optional[Type[Self]]

Returns

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

class RelativeDateAttribute(label, direction, years=0, months=0, weeks=0, days=0, hours=0, minutes=0, seconds=0, metadata=None, uid=None)[source]#

Attribute representing a relative date or time associated to a segment or entity, ie a date/time offset from an (unknown) reference date/time, with a direction.

At least one date/time component must be non-zero.

Variables
  • uid (str) – Identifier of the attribute

  • label (str) – Label of the attribute

  • value (Optional[Any]) – String representation of the relative date (ex: “+ 1 year 10 months 2 days”)

  • direction (medkit.text.ner.date_attribute.RelativeDateDirection) – Direction the relative date. Ex: “2 years ago” corresponds to the PAST direction and “in 2 weeks” to the FUTURE direction.

  • years (int) – Year component of the date offset

  • months (int) – Month component of the date offset

  • weeks (int) – Week component of the date offset

  • days (int) – Day component of the date offset

  • hours (int) – Hour component of the time offset

  • minutes (int) – Minute component of the time offset

  • seconds (int) – Second component of the time offset

  • metadata (Dict[str, Any]) – Metadata of the attribute

Methods:

copy()

Create a new attribute that is a copy of the current instance, but with a new identifier

from_dict(date_dict)

Creates an Attribute from a dict

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

to_brat()

Return a value compatible with the brat format

to_spacy()

Return a value compatible with spaCy

copy()#

Create a new attribute that is a copy of the current instance, but with a new identifier

This is used when we want to duplicate an existing attribute onto a different annotation.

Return type

Attribute

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters

data_dict (Dict[str, Any]) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type

Optional[Type[Self]]

Returns

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

to_brat()[source]#

Return a value compatible with the brat format

Return type

str

to_spacy()[source]#

Return a value compatible with spaCy

Return type

str

classmethod from_dict(date_dict)[source]#

Creates an Attribute from a dict

Parameters

attribute_dict (dict) – A dictionary from a serialized Attribute as generated by to_dict()

Return type

Self

class RelativeDateDirection(value)[source]#

Direction of a RelativeDateAttribute

Subpackages / Submodules#

medkit.text.ner.adicap_norm_attribute

medkit.text.ner.date_attribute

medkit.text.ner.duckling_matcher

medkit.text.ner.edsnlp_date_matcher

This module needs extra-dependencies not installed as core dependencies of medkit.

medkit.text.ner.edsnlp_tnm_matcher

This module needs extra-dependencies not installed as core dependencies of medkit.

medkit.text.ner.hf_entity_matcher

This module needs extra-dependencies not installed as core dependencies of medkit.

medkit.text.ner.hf_entity_matcher_trainable

This module needs extra-dependencies not installed as core dependencies of medkit.

medkit.text.ner.hf_tokenization_utils

medkit.text.ner.iamsystem_matcher

medkit.text.ner.quick_umls_matcher

This module needs extra-dependencies not installed as core dependencies of medkit.

medkit.text.ner.regexp_matcher

medkit.text.ner.simstring_matcher

medkit.text.ner.tnm_attribute

This package needs extra-dependencies not installed as core dependencies of medkit.

medkit.text.ner.umls_coder_normalizer

This module needs extra-dependencies not installed as core dependencies of medkit.

medkit.text.ner.umls_matcher

medkit.text.ner.umls_utils