medkit.core.prov_tracer#

Classes:

Prov(data_item, op_desc, source_data_items, ...)

Provenance information for a specific data item.

ProvTracer([store, _graph])

Provenance tracing component.

class ProvTracer(store=None, _graph=None)[source]#

Provenance tracing component.

ProvTracer is intended to gather provenance information about how all data generated by medkit. For each data item (for instance an annotation or an attribute), ProvTracer can tell the operation that created it, the data items that were used to create it, and reciprocally, the data items that were derived from it (cf. Prov).

Provenance-compatible operations should inform the provenance tracer of each data item that through the add_prov() method.

Users wanting to gather provenance information should instantiate one unique ProvTracer object and provide it to all operations involved in their data processing flow. Once all operations have been executed, they may then retrieve provenance info for specific data items through get_prov(), or for all items with get_provs().

Composite operations relying on inner operations (such as pipelines) shouldn’t call add_prov() method. Instead, they should instantiate their own internal ProvTracer and provide it to the operations they rely on, then use add_prov_from_sub_tracer() to integrate information from this internal sub-provenance tracer into the main provenance tracer that was provided to them.

This will build sub-provenance information, that can be retrieved later through get_sub_prov_tracer() or get_sub_prov_tracers(). The inner operations of a composite operation can themselves be composite operations, leading to a tree-like structure of nested provenance tracers.

Parameters

store (Optional[ProvStore]) – Store that will contain all traced data items.

Methods:

add_prov(data_item, op_desc, source_data_items)

Append provenance information about a specific data item.

add_prov_from_sub_tracer(data_items, ...)

Append provenance information about data items created by a composite operation relying on inner operations (such as a pipeline) having its own internal sub-provenance tracer.

get_prov(data_item_id)

Return provenance information about a specific data item.

get_provs()

Return all provenance information about all data items known to the tracer.

get_sub_prov_tracer(operation_id)

Return a sub-provenance tracer containing sub-provenance information from a specific composite operation.

get_sub_prov_tracers()

Return all sub-provenance tracers of the provenance tracer.

has_prov(data_item_id)

Check if the provenance tracer has provenance information about a specific data item.

has_sub_prov_tracer(operation_id)

Check if the provenance tracer has a sub-provenance tracer for a specific composite operation (such as a pipeline).

add_prov(data_item, op_desc, source_data_items)[source]#

Append provenance information about a specific data item.

Parameters
add_prov_from_sub_tracer(data_items, op_desc, sub_tracer)[source]#

Append provenance information about data items created by a composite operation relying on inner operations (such as a pipeline) having its own internal sub-provenance tracer.

Parameters
  • data_items (List[IdentifiableDataItem]) – Data items created by the composite operation. Should not include internal intermediate data items, only the output of the operation.

  • op_desc (OperationDescription) – Description of the composite operation that created the data items.

  • sub_tracer (ProvTracer) – Internal sub-provenance tracer of the composite operation.

has_prov(data_item_id)[source]#

Check if the provenance tracer has provenance information about a specific data item.

Note

This will return False if we have provenance info about a data item but only in a sub-provenance tracer.

Parameters

data_item_id (str) – Id of the data item.

Return type

bool

Returns

boolTrue if there is provenance info that can be retrieved with get_prov().

get_prov(data_item_id)[source]#

Return provenance information about a specific data item.

Parameters

data_item_id (str) – Id of the data item.

Return type

Prov

Returns

Prov – Provenance info about the data item.

get_provs()[source]#

Return all provenance information about all data items known to the tracer.

Note

Nested provenance info from sub-provenance tracers will not be returned.

Return type

List[Prov]

Returns

List[Prov] – Provenance info about all known data items.

has_sub_prov_tracer(operation_id)[source]#

Check if the provenance tracer has a sub-provenance tracer for a specific composite operation (such as a pipeline).

Note

This will return False if there is a sub-provenance tracer for the operation but that is not a direct child (i.e. that is deeper in the hierarchy).

Parameters

operation_id (str) – Id of the composite operation.

Return type

bool

Returns

boolTrue if there is a sub-provenance tracer for the operation.

get_sub_prov_tracer(operation_id)[source]#

Return a sub-provenance tracer containing sub-provenance information from a specific composite operation.

Parameters

operation_id (str) – Id of the composite operation.

Return type

ProvTracer

Returns

ProvTracer – The sub-provenance tracer containing sub-provenance information from the operation.

get_sub_prov_tracers()[source]#

Return all sub-provenance tracers of the provenance tracer.

Note

This will not return sub-provenance tracers that are not direct children of this tracer (i.e. that are deeper in the hierarchy).

Return type

List[ProvTracer]

Returns

List[ProvTracer] – All sub-provenance tracers of this provenance tracer.

class Prov(data_item, op_desc, source_data_items, derived_data_items)[source]#

Provenance information for a specific data item.

Parameters