medkit.text.ner.quick_umls_matcher
medkit.text.ner.quick_umls_matcher#
This module needs extra-dependencies not installed as core dependencies of medkit. To install them, use pip install medkit-lib[quick-umls-matcher].
Classes:
|
Entity annotator relying on QuickUMLS. |
- class QuickUMLSMatcher(version, language, lowercase=False, normalize_unicode=False, overlapping='length', threshold=0.9, window=5, similarity='jaccard', accepted_semtypes=quickumls.constants.ACCEPTED_SEMTYPES, attrs_to_copy=None, output_label=None, name=None, uid=None)[source]#
Entity annotator relying on QuickUMLS.
This annotator requires a QuickUMLS installation performed with python -m quickumls.install with flags corresponding to the params language, version, lowercase and normalize_unicode passed at init. QuickUMLS installations must be registered with the add_install class method.
For instance, if we want to use QuickUMLSMatcher with a french lowercase QuickUMLS install based on UMLS version 2021AB, we must first create this installation with:
>>> python -m quickumls.install --language FRE --lowercase /path/to/umls/2021AB/data /path/to/quick/umls/install
then register this install with:
>>> QuickUMLSMatcher.add_install( >>> "/path/to/quick/umls/install", >>> version="2021AB", >>> language="FRE", >>> lowercase=True, >>> )
and finally instantiate the matcher with:
>>> matcher = QuickUMLSMatcher( >>> version="2021AB", >>> language="FRE", >>> lowercase=True, >>> )
This mechanism makes it possible to store in the OperationDescription how the used QuickUMLS was created, and to reinstantiate the same matcher on a different environment if a similar install is available.
Instantiate the QuickUMLS matcher
- Parameters
version (
str
) – UMLS version of the QuickUMLS install to use, for instance “2021AB” Will be used to decide with QuickUMLS to uselanguage (
str
) – Language flag of the QuickUMLS install to use, for instance “ENG”. Will be used to decide with QuickUMLS to uselowercase (
bool
) – Whether to use a QuickUMLS install with lowercased concepts Will be used to decide with QuickUMLS to usenormalize_unicode (
bool
) – Whether to use a QuickUMLS install with non-ASCII chars concepts converted to the closest ASCII chars. Will be used to decide with QuickUMLS to useoverlapping (
Literal
['length'
,'score'
]) – Criteria for sorting multiple potential matches (cf QuickUMLS doc)threshold (
float
) – Minimum similarity (cf QuickUMLS doc)window (
int
) – Max number of tokens per match (cf QuickUMLS doc)similarity (
Literal
['dice'
,'jaccard'
,'cosine'
,'overlap'
]) – Similarity measure to use (cf QuickUMLS doc)accepted_semtypes (
List
[str
]) – UMLS semantic types that matched concepts should belong to (cf QuickUMLS doc).attrs_to_copy (
Optional
[List
[str
]]) – Labels of the attributes that should be copied from the source segment to the created entity. Useful for propagating context attributes (negation, antecendent, etc)output_label (
Union
[str
,Dict
[str
,str
],None
]) – By default, ~`medkit.text.ner.umls.SEMGROUP_LABELS` will be used as entity labels. Use this parameter to override them. Example: {“DISO”: “problem”, “PROC”: “test}. If output_labels_by_semgroup is a string, all entities will use this string as label instead.name (
Optional
[str
]) – Name describing the matcher (defaults to the class name)uid (str) – Identifier of the matcher
Methods:
add_install
(path, version, language[, ...])Register path and settings of a QuickUMLS installation performed with python -m quickumls.install
Remove all QuickUMLS installation registered with add_install
run
(segments)Return entities (with UMLS normalization attributes) for each match in segments
set_prov_tracer
(prov_tracer)Enable provenance tracing.
Attributes:
Contains all the operation init parameters.
- classmethod add_install(path, version, language, lowercase=False, normalize_unicode=False)[source]#
Register path and settings of a QuickUMLS installation performed with python -m quickumls.install
- Parameters
path (
Union
[str
,Path
]) – The path to the destination folder passed to the install commandversion (
str
) – The version of the UMLS database, for instance “2021AB”language (
str
) – The language flag passed to the install command, for instance “ENG”lowercase (
bool
) – Whether the –lowercase flag was passed to the install command (concepts are lowercased to increase recall)normalize_unicode (
bool
) – Whether the –normalize-unicode flag was passed to the install command (non-ASCII chars in concepts are converted to the closest ASCII chars)
- run(segments)[source]#
Return entities (with UMLS normalization attributes) for each match in segments
- property description: medkit.core.operation_desc.OperationDescription#
Contains all the operation init parameters.
- Return type
- set_prov_tracer(prov_tracer)#
Enable provenance tracing.
- Parameters
prov_tracer (
ProvTracer
) – The provenance tracer used to trace the provenance.