medkit.io.spacy#

This module needs extra-dependencies not installed as core dependencies of medkit. To install them, use pip install medkit-lib[spacy].

Classes:

SpacyInputConverter([entities, span_groups, ...])

Class in charge of converting spacy documents into a collection of TextDocuments.

SpacyOutputConverter(nlp[, apply_nlp_spacy, ...])

Class in charge of converting a list of TextDocuments into a list of spacy documents

class SpacyInputConverter(entities=None, span_groups=None, attrs=None, uid=None)[source]#

Class in charge of converting spacy documents into a collection of TextDocuments.

Initialize the spacy input converter

Parameters
  • entities (Optional[List[str]]) – Labels of spacy entities (doc.ents) to convert into medkit entities. If None (default) all spacy entities will be converted and added into its origin medkit document.

  • span_groups (Optional[List[str]]) – Name of groups of spacy spans (doc.spans) to convert into medkit segments. If None (default) all groups of spacy spans will be converted and added into the medkit document.

  • attrs (Optional[List[str]]) – Name of span extensions to convert into medkit attributes. If None (default) all non-None extensions will be added for each annotation

  • uid (Optional[str]) – Identifier of the converter

Methods:

load(spacy_docs)

Create a list of TextDocuments from a list of spacy Doc objects.

load(spacy_docs)[source]#

Create a list of TextDocuments from a list of spacy Doc objects. Depending on the configuration of the converted, the selected annotations and attributes are included in the documents.

Parameters

spacy_docs (List[Doc]) – A list of spacy documents to convert

Return type

List[TextDocument]

Returns

List[TextDocument] – A list of TextDocuments

class SpacyOutputConverter(nlp, apply_nlp_spacy=False, labels_anns=None, attrs=None, uid=None)[source]#

Class in charge of converting a list of TextDocuments into a list of spacy documents

Initialize the spacy output converter

Parameters
  • nlp (Language) – Language object with the loaded pipeline from Spacy

  • apply_nlp_spacy (bool) – If True, each component of nlp pipeline is applied to the new spacy document. Some features, such as ‘POS TAG’, are added by a component of the pipeline, this parameter should be True, in order to add such attributes. If False, the nlp pipeline is not applied in the spacy document, so the document contains only the annotations and attributes transferred by medkit.

  • labels_anns (Optional[List[str]]) – Labels of medkit annotations to include in the spacy document. If None (default) all the annotations will be included.

  • attrs (Optional[List[str]]) – Labels of medkit attributes to add in the annotations that will be included. If None (default) all the attributes will be added as custom attributes in each annotation included.

  • uid (Optional[str]) – Identifier of the pipeline

Methods:

convert(medkit_docs)

Convert a list of TextDocuments into a list of spacy Doc objects.

convert(medkit_docs)[source]#

Convert a list of TextDocuments into a list of spacy Doc objects. Depending on the configuration of the converted, the selected annotations and attributes are included in the documents.

Parameters

medkit_docs (List[TextDocument]) – A list of TextDocuments to convert

Return type

List[Doc]

Returns

List[Doc] – A list of spacy Doc objects