medkit.text.spacy#

APIs#

For accessing these APIs, you may use import like this:

from medkit.text.spacy import <api_to_import>

This package needs extra-dependencies not installed as core dependencies of medkit. To install them, use pip install medkit-lib[spacy].

Classes:

SpacyDocPipeline(nlp[, medkit_labels_anns, ...])

DocPipeline to obtain annotations created using spacy

SpacyPipeline(nlp[, spacy_entities, ...])

Segment annotator relying on a Spacy pipeline

class SpacyDocPipeline(nlp, medkit_labels_anns=None, medkit_attrs=None, spacy_entities=None, spacy_span_groups=None, spacy_attrs=None, medkit_attribute_factories=None, name=None, uid=None)[source]#

DocPipeline to obtain annotations created using spacy

Initialize the pipeline

Parameters
  • nlp (Language) – Language object with the loaded pipeline from Spacy

  • medkit_labels_anns (Optional[List[str]]) – Labels of medkit annotations to include in the spacy document. If None (default) all the annotations will be included.

  • medkit_attrs (Optional[List[str]]) – Labels of medkit attributes to add in the annotations that will be included. If None (default) all the attributes will be added as custom attributes in each annotation included.

  • spacy_entities (Optional[List[str]]) – Labels of new spacy entities (doc.ents) to convert into medkit entities. If None (default) all the new spacy entities will be converted and added into its origin medkit document.

  • spacy_span_groups (Optional[List[str]]) – Name of new spacy span groups (doc.spans) to convert into medkit segments. If None (default) new spacy span groups will be converted and added into its origin medkit document.

  • spacy_attrs (Optional[List[str]]) – Name of span extensions to convert into medkit attributes. If None (default) all non-None extensions will be added for each annotation with a medkit ID.

  • medkit_attribute_factories (Optional[Dict[str, Callable[[Span, str], Attribute]]]) – Mapping of factories in charge of converting spacy attributes to medkit attributes. Factories will receive a spacy span and an an attribute label when called. The key in the mapping is the attribute label.

  • name (Optional[str]) – Name describing the pipeline (defaults to the class name).

  • uid (str) – Identifier of the pipeline

Methods:

run(medkit_docs)

Run a spacy pipeline on a list of medkit documents.

set_prov_tracer(prov_tracer)

Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

run(medkit_docs)[source]#

Run a spacy pipeline on a list of medkit documents. Each medkit document is converted to spacy document (Doc object), with the selected annotations and attributes. Then, the spacy pipeline is executed and finally, the new annotations and attributes are converted into medkit annotations.

Parameters

medkit_docs (List[TextDocument]) – List of TextDocuments on which to run the pipeline

Return type

None

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

class SpacyPipeline(nlp, spacy_entities=None, spacy_span_groups=None, spacy_attrs=None, medkit_attribute_factories=None, name=None, uid=None)[source]#

Segment annotator relying on a Spacy pipeline

Initialize the segment annotator

Parameters
  • nlp (Language) – Language object with the loaded pipeline from Spacy

  • spacy_entities (Optional[List[str]]) – Labels of new spacy entities (doc.ents) to convert into medkit entities. If None (default) all the new spacy entities will be converted

  • spacy_span_groups (Optional[List[str]]) – Name of new spacy span groups (doc.spans) to convert into medkit segments. If None (default) new spacy span groups will be converted

  • spacy_attrs (Optional[List[str]]) – Name of span extensions to convert into medkit attributes. If None (default) all non-None extensions will be added for each annotation with a medkit ID.

  • medkit_attribute_factories (Optional[Dict[str, Callable[[Span, str], Attribute]]]) – Mapping of factories in charge of converting spacy attributes to medkit attributes. Factories will receive a spacy span and an an attribute label when called. The key in the mapping is the attribute label.

  • name (Optional[str]) – Name describing the pipeline (defaults to the class name).

  • uid (str) – Identifier of the pipeline

Methods:

run(segments)

Run a spacy pipeline on a list of segments provided as input and returns a new list of segments.

set_prov_tracer(prov_tracer)

Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

run(segments)[source]#

Run a spacy pipeline on a list of segments provided as input and returns a new list of segments. Each segment is converted to spacy document (Doc object). Then, the spacy pipeline is executed and finally, the new annotations and attributes are converted into medkit annotations.

Parameters

segments (List[Segment]) – List of segments on which to run the spacy pipeline

Return type

List[Segment]

Returns

List[Segments] – List of new annotations

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

Subpackages / Submodules#

medkit.text.spacy.displacy_utils

medkit.text.spacy.doc_pipeline

medkit.text.spacy.edsnlp

This package needs extra-dependencies not installed as core dependencies of medkit.

medkit.text.spacy.pipeline

medkit.text.spacy.spacy_utils