medkit.core.text.span_utils#

Functions:

clean_up_gaps_in_normalized_spans(spans, text)

Remove small gaps in normalized spans.

concatenate(texts, all_spans)

Concatenate text and span objects

extract(text, spans, ranges)

Extract parts of a text as well as its associated spans

insert(text, spans, positions, insertion_texts)

Insert strings in text, and update accordingly its associated spans

move(text, spans, range, destination)

Move part of a text to another position, also moving its associated spans

normalize_spans(spans)

Return a transformed of spans in which all instances of ModifiedSpan are replaced by the spans they refer to, spans are sorted and contiguous spans are merged.

remove(text, spans, ranges)

Remove parts of a text, while also removing accordingly its associated spans

replace(text, spans, ranges, replacement_texts)

Replace parts of a text, and update accordingly its associated spans

replace(text, spans, ranges, replacement_texts)[source]#

Replace parts of a text, and update accordingly its associated spans

Parameters
  • text (str) – The text in which some parts will be replaced

  • spans (List[AnySpan]) – The spans associated with text

  • ranges (List[Tuple[int, int]]) – The ranges of the parts that will be replaced (end excluded), sorted by ascending order

  • replacements_texts – The strings to use as replacements (must be the same length as ranges)

Return type

Tuple[str, List[AnySpan]]

Returns

  • text – The updated text

  • spans – The spans associated with the updated text

Example

>>> text = "Hello, my name is John Doe."
>>> spans = [Span(0, len(text))]
>>> ranges = [(0, 5), (18, 22)]
>>> replacements = ["Hi", "Jane"]
>>> text, spans = replace(text, spans, ranges, replacements)
>>> print(text)
Hi, my name is Jane Doe.
remove(text, spans, ranges)[source]#

Remove parts of a text, while also removing accordingly its associated spans

Parameters
  • text (str) – The text in which some parts will be removed

  • spans (List[AnySpan]) – The spans associated with text

  • ranges (List[Tuple[int, int]]) – The ranges of the parts that will be removed (end excluded), sorted by ascending order

Return type

Tuple[str, List[AnySpan]]

Returns

  • text – The updated text

  • spans – The spans associated with the updated text

extract(text, spans, ranges)[source]#

Extract parts of a text as well as its associated spans

Parameters
  • text (str) – The text to extract parts from

  • spans (List[AnySpan]) – The spans associated with text

  • ranges (List[Tuple[int, int]]) – The ranges of the parts to extract (end excluded), sorted by ascending order

Return type

Tuple[str, List[AnySpan]]

Returns

  • text – The extracted text

  • spans – The spans associated with the extracted text

insert(text, spans, positions, insertion_texts)[source]#

Insert strings in text, and update accordingly its associated spans

Parameters
  • text (str) – The text in which some strings will be inserted

  • spans (List[AnySpan]) – The spans associated with text

  • positions (List[int]) – The positions where the strings will be inserted, sorted by ascending order

  • insertion_texts (List[str]) – The strings to insert (must be the same length as positions)

Return type

Tuple[str, List[AnySpan]]

Returns

  • text – The updated text

  • spans – The spans associated with the updated text

Example

>>> text = "Hello, my name is John Doe."
>>> spans = [Span(0, len(text))]
>>> positions = [5]
>>> inserts = [" everybody"]
>>> text, spans = insert(text, spans, positions, inserts)
>>> print(text)
Hello everybody, my name is John Doe."
move(text, spans, range, destination)[source]#

Move part of a text to another position, also moving its associated spans

Parameters
  • text (str) – The text in which a part should be moved

  • range (Tuple[int, int]) – The range of the part to move (end excluded)

  • destination (int) – The position where to insert the displaced range

Return type

Tuple[str, List[AnySpan]]

Returns

  • text – The updated text

  • spans – The spans associated with the updated text

Example

>>> text = "Hello, my name is John Doe."
>>> spans = [Span(0, len(text))]
>>> range = (17, 22)
>>> dest = len(text) - 1
>>> text, spans = move(text, spans, range, dest)
>>> print(text)
Hi, my name is Doe John.
normalize_spans(spans)[source]#

Return a transformed of spans in which all instances of ModifiedSpan are replaced by the spans they refer to, spans are sorted and contiguous spans are merged.

Parameters

spans (List[AnySpan]) – The spans associated with a text, including additional spans if insertions or replacement were performed

Return type

List[Span]

Returns

normalized_spans – Spans in spans normalized as described

Examples

>>> spans = [Span(0, 10), Span(20, 30), ModifiedSpan(8, replaced_spans=[Span(30, 36)])]
>>> spans = normalize_spans(spans)
>>> print(spans)
>>> [Span(0, 10), Span(20, 36)]
concatenate(texts, all_spans)[source]#

Concatenate text and span objects

Return type

Tuple[str, List[AnySpan]]

clean_up_gaps_in_normalized_spans(spans, text, max_gap_length=3)[source]#

Remove small gaps in normalized spans.

This is useful for converting non-contiguous entity spans with small gaps containing only whitespace or a few meaningless characters (due to clean-up preprocessing or translation) into one unique bigger span. Gaps having less than max_gap_length will be removed by merging the spans before and after the gap.

Parameters
  • spans (List[Span]) – The normalized spans in which to remove gaps

  • text (str) – The text associated with spans

  • max_gap_length (int) – Max number of characters in gaps, after stripping leading and trailing whitespace.

Examples

>>> text = "heart failure"
>>> spans = [Span(0, 5), Span(6, 13)]
>>> spans = clean_up_gaps_in_normalized_spans(spans, text)
>>> print(spans)
>>> spans = [Span(0, 13)]