medkit.core.doc_pipeline#

Classes:

DocPipeline(pipeline[, labels_by_input_key, uid])

Wrapper around the Pipeline class that runs a pipeline on a list (or collection) of documents, retrieving input annotations from each document and attaching output annotations back to documents.

class DocPipeline(pipeline, labels_by_input_key=None, uid=None)[source]#

Wrapper around the Pipeline class that runs a pipeline on a list (or collection) of documents, retrieving input annotations from each document and attaching output annotations back to documents.

Initialize the pipeline

Parameters
  • pipeline (Pipeline) – Pipeline to execute on documents. Annotations given to pipeline (corresponding to its input_keys) will be retrieved from documents, according to labels_by_input. Annotations returned by pipeline (corresponding to its output_keys) will be added to documents.

  • labels_by_input_key (Optional[Dict[str, List[str]]]) –

    Optional labels of existing annotations that should be retrieved from documents and passed to the pipeline as input. One list of labels per input key.

    When labels_by_input_key is not provided, it is assumed that the pipeline just expects the document raw segments as input.

    For the use case where the documents contain pre-existing sentence segments labelled as “SENTENCE”, that we want to pass the “sentences” input key of the pipeline:

    >>> doc_pipeline = DocPipeline(
    >>>     pipeline,
    >>>     labels_by_input={"sentences": ["SENTENCE"]},
    >>> )
    

    Because the values of labels_by_input_key are lists (one per input), it is possible to use annotation with different labels for the same input key.

Methods:

run(docs)

Run the pipeline on a list of documents, adding the output annotations to each document

set_prov_tracer(prov_tracer)

Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

set_prov_tracer(prov_tracer)[source]#

Enable provenance tracing.

Parameters

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

run(docs)[source]#

Run the pipeline on a list of documents, adding the output annotations to each document

Parameters

docs (List[Document[~AnnotationType]]) – The documents on which to run the pipeline. Labels to input keys association will be used to retrieve existing annotations from each document, and all output annotations will also be added to each corresponding document.

Return type

None

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type

OperationDescription