medkit.core.text.span_utils
medkit.core.text.span_utils#
Functions:
|
Remove small gaps in normalized spans. |
|
Concatenate text and span objects |
|
Extract parts of a text as well as its associated spans |
|
Insert strings in text, and update accordingly its associated spans |
|
Move part of a text to another position, also moving its associated spans |
|
Return a transformed of spans in which all instances of ModifiedSpan are replaced by the spans they refer to, spans are sorted and contiguous spans are merged. |
|
Remove parts of a text, while also removing accordingly its associated spans |
|
Replace parts of a text, and update accordingly its associated spans |
- replace(text, spans, ranges, replacement_texts)[source]#
Replace parts of a text, and update accordingly its associated spans
- Parameters
text (
str
) – The text in which some parts will be replacedspans (
List
[AnySpan
]) – The spans associated with textranges (
List
[Tuple
[int
,int
]]) – The ranges of the parts that will be replaced (end excluded), sorted by ascending orderreplacements_texts – The strings to use as replacements (must be the same length as ranges)
- Return type
Tuple
[str
,List
[AnySpan
]]- Returns
text – The updated text
spans – The spans associated with the updated text
Example
>>> text = "Hello, my name is John Doe." >>> spans = [Span(0, len(text))] >>> ranges = [(0, 5), (18, 22)] >>> replacements = ["Hi", "Jane"] >>> text, spans = replace(text, spans, ranges, replacements) >>> print(text) Hi, my name is Jane Doe.
- remove(text, spans, ranges)[source]#
Remove parts of a text, while also removing accordingly its associated spans
- Parameters
text (
str
) – The text in which some parts will be removedspans (
List
[AnySpan
]) – The spans associated with textranges (
List
[Tuple
[int
,int
]]) – The ranges of the parts that will be removed (end excluded), sorted by ascending order
- Return type
Tuple
[str
,List
[AnySpan
]]- Returns
text – The updated text
spans – The spans associated with the updated text
- extract(text, spans, ranges)[source]#
Extract parts of a text as well as its associated spans
- Parameters
text (
str
) – The text to extract parts fromspans (
List
[AnySpan
]) – The spans associated with textranges (
List
[Tuple
[int
,int
]]) – The ranges of the parts to extract (end excluded), sorted by ascending order
- Return type
Tuple
[str
,List
[AnySpan
]]- Returns
text – The extracted text
spans – The spans associated with the extracted text
- insert(text, spans, positions, insertion_texts)[source]#
Insert strings in text, and update accordingly its associated spans
- Parameters
text (
str
) – The text in which some strings will be insertedspans (
List
[AnySpan
]) – The spans associated with textpositions (
List
[int
]) – The positions where the strings will be inserted, sorted by ascending orderinsertion_texts (
List
[str
]) – The strings to insert (must be the same length as positions)
- Return type
Tuple
[str
,List
[AnySpan
]]- Returns
text – The updated text
spans – The spans associated with the updated text
Example
>>> text = "Hello, my name is John Doe." >>> spans = [Span(0, len(text))] >>> positions = [5] >>> inserts = [" everybody"] >>> text, spans = insert(text, spans, positions, inserts) >>> print(text) Hello everybody, my name is John Doe."
- move(text, spans, range, destination)[source]#
Move part of a text to another position, also moving its associated spans
- Parameters
text (
str
) – The text in which a part should be movedrange (
Tuple
[int
,int
]) – The range of the part to move (end excluded)destination (
int
) – The position where to insert the displaced range
- Return type
Tuple
[str
,List
[AnySpan
]]- Returns
text – The updated text
spans – The spans associated with the updated text
Example
>>> text = "Hello, my name is John Doe." >>> spans = [Span(0, len(text))] >>> range = (17, 22) >>> dest = len(text) - 1 >>> text, spans = move(text, spans, range, dest) >>> print(text) Hi, my name is Doe John.
- normalize_spans(spans)[source]#
Return a transformed of spans in which all instances of ModifiedSpan are replaced by the spans they refer to, spans are sorted and contiguous spans are merged.
- Parameters
spans (
List
[AnySpan
]) – The spans associated with a text, including additional spans if insertions or replacement were performed- Return type
List
[Span
]- Returns
normalized_spans – Spans in spans normalized as described
Examples
>>> spans = [Span(0, 10), Span(20, 30), ModifiedSpan(8, replaced_spans=[Span(30, 36)])] >>> spans = normalize_spans(spans) >>> print(spans) >>> [Span(0, 10), Span(20, 36)]
- concatenate(texts, all_spans)[source]#
Concatenate text and span objects
- Return type
Tuple
[str
,List
[AnySpan
]]
- clean_up_gaps_in_normalized_spans(spans, text, max_gap_length=3)[source]#
Remove small gaps in normalized spans.
This is useful for converting non-contiguous entity spans with small gaps containing only whitespace or a few meaningless characters (due to clean-up preprocessing or translation) into one unique bigger span. Gaps having less than max_gap_length will be removed by merging the spans before and after the gap.
- Parameters
spans (
List
[Span
]) – The normalized spans in which to remove gapstext (
str
) – The text associated with spansmax_gap_length (
int
) – Max number of characters in gaps, after stripping leading and trailing whitespace.
Examples
>>> text = "heart failure" >>> spans = [Span(0, 5), Span(6, 13)] >>> spans = clean_up_gaps_in_normalized_spans(spans, text) >>> print(spans) >>> spans = [Span(0, 13)]