Core components
Contents
Core components#
This page contains all core concepts of medkit.
Note
For more details about public APIs, refer to medkit.core
.
Documents, Annotations & Attributes#
Medkit documents classes are used to:
access to raw data,
store relevant annotations extracted from the raw data.
The Document
and Annotation
protocols
are defined inside medkit.core
. They define common properties and
methods across all modalities. These protocols are then implemented for each
modality (text, audio, image, etc), with additional logic specific to the
modality.
To facilitate the implementation of the Document
protocol,
an AnnotationContainer
class is provided. It behaves like a list of
annotations, with additional filtering methods and support for non-memory
storage.
medkit.core
also defines the Attribute
class, that
can directly be used to attach attributes to annotations of any modality.
Similarly to AnnotationContainer
, the role of this container is to
provide additional methods for facilitating access to the list of attributes
belonging to an annotation.
Currently, medkit.core.text
implements a
TextDocument
class and a corresponding set of
TextAnnotation
subclasses, and similarly
medkit.core.audio
provides an AudioDocument
class
and a corresponding Segment
.
Both modality also subclass AnnotationContainer
to add some
modality-specific logic or filtering.
To get more details about each modality, you can refer to their documentation:
Document#
Document
protocol class provides the minimal data structure
for a medkit document.
For example, each document (whatever the modality) is linked to an annotation
container for the same modality.
AnnotationContainer
class provides a set of methods (e.g., add/get)
to be implemented for each modality.
The goal is to provide user with a minimum set of common interfaces for accessing to the document annotations whatever the modality.
Given a document named doc
from any modality
User can browse the document annotations
for ann in doc.anns: ...
User can add a new annotation to the document
ann = <my annotation> doc.anns.add(ann)
User can get the document annotations filtered by label
anns = doc.anns.get(label="disorder")
Note
For more details about their implementation, refer to
medkit.core.document.Document
and
medkit.core.annotation_container.AnnotationContainer
.
Annotation & Attribute#
Annotation
protocol class provides the minimal
data structure for a medkit annotation.
For example, each annotation is linked to an attribute container.
AttributeContainer
class provides a set of common interfaces for
accessing to the annotation Attribute
whatever the modality.
Given an annotation ann
from any modality:
User may browse the annotation attributes
for attr in ann.attrs: ...
User may add a new attribute to an annotation
attr = <my attribute> ann.attrs.add(attr)
User may get the annotation attributes filtered by label
attrs = ann.attrs.get(label="NORMALIZATION")
Collection#
Collection
class allows to manipulate a set of Document
.
Warning
This work is still under development. It may be changed in the future.
Operations#
The Operation
abstract class groups all necessary methods for
being compatible with medkit processing pipeline and provenance.
We have defined different subclasses depending on the nature of the operation,
including text-specific and audio-specific operations in medkit.core.text
and medkit.core.audio
.
To get more details about each modality, you can refer to their documentation:
For all operations inheriting from Operation
abstract class,
these 4 lines shall be added in __init__
method:
def __init__(self, ..., uid=None):
...
# Pass all arguments to super (remove self)
init_args = locals()
init_args.pop("self")
super().__init__(**init_args)
Each operation is described with OperationDescription
.
Converters#
Two abstract classes have been defined for managing document conversion between medkit format and another one.
Note
For more details about the public APIs, refer to medkit.core.conversion
.
Pipeline#
Pipeline
allows to chain several operations.
To better understand how to declare and use medkit pipelines, you may refer to the pipeline tutorial.
Note
For more details about the public APIs, refer to medkit.core.pipeline
.
The DocPipeline
class is a wrapper allowing
to run an annotation pipeline on a list of documents by automatically attach
output annotations to these documents.
Store#
A store is an object responsible for keeping the annotations of a document
(through an AnnotationContainer
) or the attributes of an
annotation (through an AttributeContainer
).
The Store
protocol defines the method that a store
must implement. For now, we only provide a single implement of this protocol
based on a dictionary, but in the future we will probably provide other
implementations relying on databases.
Users can also implement their own store based on their needs.
Warning
This work is still under development. It may be changed in the future.
Note
For more details about the public APIs, refer to medkit.core.store
.
Global store#
To store all data items in the same location, a global store is used for your application. If you have not set your own store, the global store will automatically use the simple internal dict store.
If you implement your own store, we suggest to call
medkit.core.store.GlobalStore.init_store()
before initializing any other
medkit component.
GlobalStore
provides initialization, access and removal methods
for the global store.
Provenance#
Warning
This work is still under development. It may be changed in the future.
Provenance is a medkit concept allowing to track all operations and their role in new knowledge extraction.
With this mechanism, we will be able to provide the provenance information about a generated data. To log this information, a separate provenance store is used.
For better understanding this concept, you may follow the provenance tutorial and/or refer to “how to make your own module” to know what you have to do to enable provenance.
Note
For more details about the public APIs, refer to medkit.core.prov_tracer
.