medkit.text.ner.quick_umls_matcher#

This module needs extra-dependencies not installed as core dependencies of medkit. To install them, use pip install medkit-lib[quick-umls-matcher].

Classes:

QuickUMLSMatcher(version, language[, ...])

Entity annotator relying on QuickUMLS.

class QuickUMLSMatcher(version, language, lowercase=False, normalize_unicode=False, overlapping='length', threshold=0.9, window=5, similarity='jaccard', accepted_semtypes=quickumls.constants.ACCEPTED_SEMTYPES, attrs_to_copy=None, output_label=None, name=None, uid=None)[source]#

Entity annotator relying on QuickUMLS.

This annotator requires a QuickUMLS installation performed with python -m quickumls.install with flags corresponding to the params language, version, lowercase and normalize_unicode passed at init. QuickUMLS installations must be registered with the add_install class method.

For instance, if we want to use QuickUMLSMatcher with a french lowercase QuickUMLS install based on UMLS version 2021AB, we must first create this installation with:

>>> python -m quickumls.install --language FRE --lowercase /path/to/umls/2021AB/data /path/to/quick/umls/install

then register this install with:

>>> QuickUMLSMatcher.add_install(
>>>        "/path/to/quick/umls/install",
>>>        version="2021AB",
>>>        language="FRE",
>>>        lowercase=True,
>>> )

and finally instantiate the matcher with:

>>> matcher = QuickUMLSMatcher(
>>>     version="2021AB",
>>>     language="FRE",
>>>     lowercase=True,
>>> )

This mechanism makes it possible to store in the OperationDescription how the used QuickUMLS was created, and to reinstantiate the same matcher on a different environment if a similar install is available.

Instantiate the QuickUMLS matcher

Parameters
  • version (str) – UMLS version of the QuickUMLS install to use, for instance “2021AB” Will be used to decide with QuickUMLS to use

  • language (str) – Language flag of the QuickUMLS install to use, for instance “ENG”. Will be used to decide with QuickUMLS to use

  • lowercase (bool) – Whether to use a QuickUMLS install with lowercased concepts Will be used to decide with QuickUMLS to use

  • normalize_unicode (bool) – Whether to use a QuickUMLS install with non-ASCII chars concepts converted to the closest ASCII chars. Will be used to decide with QuickUMLS to use

  • overlapping (Literal['length', 'score']) – Criteria for sorting multiple potential matches (cf QuickUMLS doc)

  • threshold (float) – Minimum similarity (cf QuickUMLS doc)

  • window (int) – Max number of tokens per match (cf QuickUMLS doc)

  • similarity (Literal['dice', 'jaccard', 'cosine', 'overlap']) – Similarity measure to use (cf QuickUMLS doc)

  • accepted_semtypes (List[str]) – UMLS semantic types that matched concepts should belong to (cf QuickUMLS doc).

  • attrs_to_copy (Optional[List[str]]) – Labels of the attributes that should be copied from the source segment to the created entity. Useful for propagating context attributes (negation, antecendent, etc)

  • output_label (Union[str, Dict[str, str], None]) – By default, ~`medkit.text.ner.umls.SEMGROUP_LABELS` will be used as entity labels. Use this parameter to override them. Example: {“DISO”: “problem”, “PROC”: “test}. If output_labels_by_semgroup is a string, all entities will use this string as label instead.

  • name (Optional[str]) – Name describing the matcher (defaults to the class name)

  • uid (str) – Identifier of the matcher

Methods:

add_install(path, version, language[, ...])

Register path and settings of a QuickUMLS installation performed with python -m quickumls.install

clear_installs()

Remove all QuickUMLS installation registered with add_install

run(segments)

Return entities (with UMLS normalization attributes) for each match in segments

set_prov_tracer(prov_tracer)

Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

classmethod add_install(path, version, language, lowercase=False, normalize_unicode=False)[source]#

Register path and settings of a QuickUMLS installation performed with python -m quickumls.install

Parameters
  • path (Union[str, Path]) – The path to the destination folder passed to the install command

  • version (str) – The version of the UMLS database, for instance “2021AB”

  • language (str) – The language flag passed to the install command, for instance “ENG”

  • lowercase (bool) – Whether the –lowercase flag was passed to the install command (concepts are lowercased to increase recall)

  • normalize_unicode (bool) – Whether the –normalize-unicode flag was passed to the install command (non-ASCII chars in concepts are converted to the closest ASCII chars)

classmethod clear_installs()[source]#

Remove all QuickUMLS installation registered with add_install

run(segments)[source]#

Return entities (with UMLS normalization attributes) for each match in segments

Parameters

segments (List[Segment]) – List of segments into which to look for matches

Return type

List[Entity]

Returns

entities (List[Entity]) – Entities found in segments, with UMLSNormAttribute attributes.

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.