medkit.text.ner.hf_entity_matcher#

This module needs extra-dependencies not installed as core dependencies of medkit. To install them, use pip install medkit-lib[hf-entity-matcher].

Classes:

HFEntityMatcher(model[, ...])

Entity matcher based on HuggingFace transformers model

class HFEntityMatcher(model, aggregation_strategy='max', attrs_to_copy=None, device=- 1, batch_size=1, hf_auth_token=None, cache_dir=None, name=None, uid=None)[source]#

Entity matcher based on HuggingFace transformers model

Any token classification model from the HuggingFace hub can be used (for instance “samrawal/bert-base-uncased_clinical-ner”).

Parameters
  • model (Union[str, Path]) – Name (on the HuggingFace models hub) or path of the NER model. Must be a model compatible with the TokenClassification transformers class.

  • aggregation_strategy (Literal['none', 'simple', 'first', 'average', 'max']) – Strategy to fuse tokens based on the model prediction, passed to TokenClassificationPipeline. Defaults to “max”, cf https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TokenClassificationPipeline.aggregation_strategy for details

  • attrs_to_copy (Optional[List[str]]) – Labels of the attributes that should be copied from the input segment to the created entity. Useful for propagating context attributes (negation, antecendent, etc).

  • device (int) – Device to use for the transformer model. Follows the HuggingFace convention (-1 for “cpu” and device number for gpu, for instance 0 for “cuda:0”).

  • batch_size (int) – Number of segments in batches processed by the transformer model.

  • hf_auth_token (Optional[str]) – HuggingFace Authentication token (to access private models on the hub)

  • cache_dir (Union[str, Path, None]) – Directory where to store downloaded models. If not set, the default HuggingFace cache dir is used.

  • name (Optional[str]) – Name describing the matcher (defaults to the class name).

  • uid (str) – Identifier of the matcher.

Methods:

make_trainable(model_name_or_path, labels, ...)

Return the trainable component of the operation.

run(segments)

Return entities for each match in segments.

set_prov_tracer(prov_tracer)

Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

run(segments)[source]#

Return entities for each match in segments.

Parameters

segments (List[Segment]) – List of segments into which to look for matches.

Return type

List[Entity]

Returns

List[Entity] – Entities found in segments.

static make_trainable(model_name_or_path, labels, tagging_scheme, tag_subtokens=False, tokenizer_max_length=None, hf_auth_token=None, device=- 1)[source]#

Return the trainable component of the operation. This component can be trained using Trainer, and then used in a new HFEntityMatcher operation.

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.