medkit.text.ner.hf_entity_matcher
medkit.text.ner.hf_entity_matcher#
This module needs extra-dependencies not installed as core dependencies of medkit. To install them, use pip install medkit-lib[hf-entity-matcher].
Classes:
|
Entity matcher based on HuggingFace transformers model |
- class HFEntityMatcher(model, aggregation_strategy='max', attrs_to_copy=None, device=- 1, batch_size=1, hf_auth_token=None, cache_dir=None, name=None, uid=None)[source]#
Entity matcher based on HuggingFace transformers model
Any token classification model from the HuggingFace hub can be used (for instance “samrawal/bert-base-uncased_clinical-ner”).
- Parameters
model (
Union
[str
,Path
]) – Name (on the HuggingFace models hub) or path of the NER model. Must be a model compatible with the TokenClassification transformers class.aggregation_strategy (
Literal
['none'
,'simple'
,'first'
,'average'
,'max'
]) – Strategy to fuse tokens based on the model prediction, passed to TokenClassificationPipeline. Defaults to “max”, cf https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TokenClassificationPipeline.aggregation_strategy for detailsattrs_to_copy (
Optional
[List
[str
]]) – Labels of the attributes that should be copied from the input segment to the created entity. Useful for propagating context attributes (negation, antecendent, etc).device (
int
) – Device to use for the transformer model. Follows the HuggingFace convention (-1 for “cpu” and device number for gpu, for instance 0 for “cuda:0”).batch_size (
int
) – Number of segments in batches processed by the transformer model.hf_auth_token (
Optional
[str
]) – HuggingFace Authentication token (to access private models on the hub)cache_dir (
Union
[str
,Path
,None
]) – Directory where to store downloaded models. If not set, the default HuggingFace cache dir is used.name (
Optional
[str
]) – Name describing the matcher (defaults to the class name).uid (str) – Identifier of the matcher.
Methods:
make_trainable
(model_name_or_path, labels, ...)Return the trainable component of the operation.
run
(segments)Return entities for each match in segments.
set_prov_tracer
(prov_tracer)Enable provenance tracing.
Attributes:
Contains all the operation init parameters.
- static make_trainable(model_name_or_path, labels, tagging_scheme, tag_subtokens=False, tokenizer_max_length=None, hf_auth_token=None, device=- 1)[source]#
Return the trainable component of the operation. This component can be trained using
Trainer
, and then used in a new HFEntityMatcher operation.
- property description: medkit.core.operation_desc.OperationDescription#
Contains all the operation init parameters.
- Return type
- set_prov_tracer(prov_tracer)#
Enable provenance tracing.
- Parameters
prov_tracer (
ProvTracer
) – The provenance tracer used to trace the provenance.