medkit.text.metrics.irr_utils
medkit.text.metrics.irr_utils#
Metrics to assess inter-annotator agreement
Functions:
|
Compute Krippendorff's alpha: a coefficient of agreement among many annotators. |
- krippendorff_alpha(all_annotators_data)[source]#
Compute Krippendorff’s alpha: a coefficient of agreement among many annotators.
This coefficient is a generalization of several reliability indices. The general form is:
\[\alpha = 1 - \frac{D_o}{D_e}\]where \(D_o\) is the observed disagreement among labels assigned to units or annotations and \(D_e\) is the disagreement between annotators attributable to chance. The arguments of the disagreement measures are values in coincidence matrices.
This function implements the general computational form proposed in 1, but only supports binaire or nominal labels.
- Parameters
all_annotators_data (array_like, (m_annotators,n_samples)) – Reliability_data, list or array of labels given to n_samples by m_annotators. Missing labels are represented with None
- Return type
float
- Returns
alpha (float) – The alpha coefficient, a number between 0 and 1. A value of 0 indicates the absence of reliability, and a value of 1 indicates perfect reliability.
- Raises
AssertionError – Raise if any list of labels within all_annotators_data differs in size or if there is only one label to be compared.
References
- 1
K. Krippendorff, “Computing Krippendorff’s alpha-reliability,” ScholarlyCommons, 25-Jan-2011, pp. 8-10. [Online]. Available: https://repository.upenn.edu/asc_papers/43/
Examples
Three annotators labelled six items. Some labels are missing.
>>> annotator_A = ['yes','yes','no','no','yes',None] >>> annotator_B = [None,'yes','no','yes','yes','no'] >>> annotator_C = ['yes','no','no','yes','yes',None] >>> krippendorff_alpha([annotator_A,annotator_B,annotator_C]) 0.42222222222222217