medkit.text.spacy.spacy_utils
medkit.text.spacy.spacy_utils#
Functions:
|
Create a Spacy Doc from a TextDocument. |
|
Create a Spacy Doc from a Segment. |
|
Given a spacy document, convert selected entities or spans into Segments. |
- extract_anns_and_attrs_from_spacy_doc(spacy_doc, medkit_source_ann=None, entities=None, span_groups=None, attrs=None, attribute_factories=None, rebuild_medkit_anns_and_attrs=False)[source]#
Given a spacy document, convert selected entities or spans into Segments. Extract attributes for each annotation in the document.
- Parameters
spacy_doc (
Doc
) – A Spacy Doc with spans to be convertedmedkit_source_ann (
Optional
[Segment
]) – Segment used to rebuild spans referencing the original textentities (
Optional
[List
[str
]]) – Labels of entities to be extracted If None (default) all new entities will be extracted as annotationsspan_groups (
Optional
[List
[str
]]) – Name of span groups to be extracted If None (default) all new spans will be extracted as annotationsattrs (
Optional
[List
[str
]]) – Name of custom attributes to extract from the annotations that will be included. If None (default) all the custom attributes will be extractedattribute_factories (
Optional
[Dict
[str
,Callable
[[Span
,str
],Attribute
]]]) – Mapping of factories in charge of converting spacy attributes to medkit attributes. Factories will receive a spacy span and an attribute label when called. The key in the mapping is the attribute label.rebuild_medkit_anns_and_attrs (
bool
) – If True the annotations and attributes with medkit ids will become new annotations/attributes with new ids. If False (default) the annotations and attributes with medkit ids are not rebuilt, only new annotations and attributes are returned
- Return type
- Returns
annotations (List[~medkit.core.text.Segment]) – Segments extracted from the spacy Doc object
attributes_by_ann (Dict[str, List[Attribute]]]) – Attributes extracted for each annotation, the key is a medkit uid
- Raises
ValueError – Raises when the given medkit source and the spacy doc do not have the same medkit uid
- build_spacy_doc_from_medkit_doc(nlp, medkit_doc, labels_anns=None, attrs=None, include_medkit_info=True)[source]#
Create a Spacy Doc from a TextDocument.
- Parameters
nlp (
Language
) – Language object with the loaded pipeline from Spacymedkit_doc (
TextDocument
) – TextDocument to convertlabels_anns (
Optional
[List
[str
]]) – Labels of annotations to include in the spacy document. If None (default) all the annotations will be included.attrs (
Optional
[List
[str
]]) – Labels of attributes to add in the annotations that will be included. If None (default) all the attributes will be added as custom attributes in each annotation included.include_medkit_info (
bool
) – If True, medkitID is included as an extension in the Doc object to identify the medkit source annotation. If False, no information about IDs is included
- Return type
Doc
- Returns
Doc – A Spacy Doc with the selected annotations included.
- build_spacy_doc_from_medkit_segment(nlp, segment, annotations=[], attrs=None, include_medkit_info=True)[source]#
Create a Spacy Doc from a Segment.
- Parameters
nlp (
Language
) – Language object with the loaded pipeline from Spacysegment (
Segment
) – Segment to convert, this annotation contains the text to create the spacy docannotations (
List
[Segment
]) – List of annotations in segment to includeattrs (
Optional
[List
[str
]]) – Labels of attributes to add in the annotations that will be included. If None (default) all the attributes will be added as custom attributes in each annotation included.include_medkit_info (
bool
) – If True, medkitID is included as an extension in the Doc object to identify the medkit source annotation. If False, no information about IDs is included.
- Return type
Doc
- Returns
Doc – A Spacy Doc with the selected annotations included.