medkit.io.brat
medkit.io.brat#
Classes:
|
Class in charge of converting brat annotations |
|
Class in charge of converting a list of TextDocuments into a brat collection file. |
- class BratInputConverter(detect_cuis_in_notes=True, notes_label='brat_note', uid=None)[source]#
Class in charge of converting brat annotations
- Parameters
notes_label (
str
) – Label to use for attributes created from annotator notes.detect_cuis_in_notes (
bool
) – If True, strings looking like CUIs in annotator notes of entities will be converted to UMLS normalization attributes rather than creating anAttribute
with the whole note text as value.uid (
Optional
[str
]) – Identifier of the converter.
Methods:
load
(dir_path[, ann_ext, text_ext])Create a list of TextDocuments from a folder containing text files and associated brat annotations files.
load_annotations
(ann_file)Load a .ann file and return a list of
Annotation
objects.load_doc
(ann_path, text_path)Create a TextDocument from a .ann file and its associated .txt file
- load(dir_path, ann_ext='.ann', text_ext='.txt')[source]#
Create a list of TextDocuments from a folder containing text files and associated brat annotations files.
- Parameters
dir_path (
Union
[str
,Path
]) – The path to the directory containing the text files and the annotation files (.ann)ann_ext (
str
) – The extension of the brat annotation file (e.g. .ann)text_ext (
str
) – The extension of the text file (e.g. .txt)
- Return type
List
[TextDocument
]- Returns
List[TextDocument] – The list of TextDocuments
- load_doc(ann_path, text_path)[source]#
Create a TextDocument from a .ann file and its associated .txt file
- Parameters
text_path (
Union
[str
,Path
]) – The path to the text document file.ann_path (
Union
[str
,Path
]) – The path to the brat annotation file.
- Return type
- Returns
TextDocument – The document containing the text and the annotations
- load_annotations(ann_file)[source]#
Load a .ann file and return a list of
Annotation
objects.- Parameters
ann_file (
Union
[str
,Path
]) – Path to the .ann file.- Return type
List
[TextAnnotation
]
- class BratOutputConverter(anns_labels=None, attrs=None, notes_label='brat_note', ignore_segments=True, convert_cuis_to_notes=True, create_config=True, top_values_by_attr=50, uid=None)[source]#
Class in charge of converting a list of TextDocuments into a brat collection file.
Hint
BRAT checks the coherence between span and text for each annotation. This converter adjusts the text and spans to get the right visualization and ensure compatibility.
Initialize the Brat output converter
- Parameters
anns_labels (
Optional
[List
[str
]]) – Labels of medkit annotations to convert into Brat annotations. If None (default) all the annotations will be convertedattrs (
Optional
[List
[str
]]) – Labels of medkit attributes to add in the annotations that will be included. If None (default) all medkit attributes found in the segments or relations will be converted to Brat attributesnotes_label (
str
) – Label of attributes that will be converted to annotator notes.ignore_segments (
bool
) – If True medkit segments will be ignored. Only entities, attributes and relations will be converted to Brat annotations. If False the medkit segments will be converted to Brat annotations as well.convert_cuis_to_notes (
bool
) – If True, UMLS normalization attributes will be converted to annotator notes rather than attributes. For entities with multiple UMLS attributes, CUIs will be separated by spaces (ex: “C0011849 C0004096”).create_config (
bool
) – Whether to create a configuration file for the generated collection. This file defines the types of annotations generated, it is necessary for the correct visualization on Brat.top_values_by_attr (
int
) – Defines the number of most common values by attribute to show in the configuration. This is useful when an attribute has a large number of values, only the ‘top’ ones will be in the config. By default, the top 50 of values by attr will be in the config.uid (
Optional
[str
]) – Identifier of the converter
Methods:
save
(docs, dir_path[, doc_names])Convert and save a collection or list of TextDocuments into a Brat collection.
- save(docs, dir_path, doc_names=None)[source]#
Convert and save a collection or list of TextDocuments into a Brat collection. For each collection or list of documents, a folder is created with ‘.txt’ and ‘.ann’ files; an ‘annotation.conf’ is saved if required.
- Parameters
docs (
List
[TextDocument
]) – List of medkit doc objects to convertdir_path (
Union
[str
,Path
]) – String or path object to save the generated filesdoc_names (
Optional
[List
[str
]]) – Optional list with the names for the generated files. If ‘None’, ‘uid’ will be used as the name. Where ‘uid.txt’ has the raw text of the document and ‘uid.ann’ the Brat annotation file.