medkit.text.metrics.irr_utils#

Metrics to assess inter-annotator agreement

Functions:

krippendorff_alpha(all_annotators_data)

Compute Krippendorff's alpha: a coefficient of agreement among many annotators.

krippendorff_alpha(all_annotators_data)[source]#

Compute Krippendorff’s alpha: a coefficient of agreement among many annotators.

This coefficient is a generalization of several reliability indices. The general form is:

\[\alpha = 1 - \frac{D_o}{D_e}\]

where \(D_o\) is the observed disagreement among labels assigned to units or annotations and \(D_e\) is the disagreement between annotators attributable to chance. The arguments of the disagreement measures are values in coincidence matrices.

This function implements the general computational form proposed in 1, but only supports binaire or nominal labels.

Parameters: all_annotators_data (array_like, (m_annotators,n_samples)) – Reliability_data, list or array of labels given to n_samples by m_annotators. Missing labels are represented with None
Return type: float
Returns: alpha (float) – The alpha coefficient, a number between 0 and 1. A value of 0 indicates the absence of reliability, and a value of 1 indicates perfect reliability.
Raises: AssertionError – Raise if any list of labels within all_annotators_data differs in size or if there is only one label to be compared.

References

1: K. Krippendorff, “Computing Krippendorff’s alpha-reliability,” ScholarlyCommons, 25-Jan-2011, pp. 8-10. [Online]. Available: https://repository.upenn.edu/asc_papers/43/

Examples

Three annotators labelled six items. Some labels are missing.

>>> annotator_A = ['yes','yes','no','no','yes',None]
>>> annotator_B = [None,'yes','no','yes','yes','no']
>>> annotator_C = ['yes','no','no','yes','yes',None]
>>> krippendorff_alpha([annotator_A,annotator_B,annotator_C])
0.42222222222222217