medkit.tools.mtsamples
medkit.tools.mtsamples#
This module aims to provide facilities for accessing some examples of mtsamples files available on this repository: https://github.com/neurazlab/mtsamplesFR
Refer to the repository for more information.
This repository contains:
- a version of mtsamples.csv
Source: https://www.kaggle.com/datasets/tboyle10/medicaltranscriptions license: CC0: Public Domain
a mtsamples_translation.json file which is a translation to french
Date: 08/04/2022
Functions:
|
Convert mtsamples data into a medkit file |
|
Function loading mtsamples data into medkit text documents |
- load_mtsamples(cache_dir='.cache', translated=True, nb_max=None)[source]#
Function loading mtsamples data into medkit text documents
- Parameters
cache_dir (
Union
[Path
,str
]) – Directory where to store mtsamples file. Default: .cachetranslated (
bool
) – If True (default), mtsamples_translated.json file is used (FR). If False, mtsamples.csv is used (EN)nb_max (
Optional
[int
]) – Maximum number of documents to load
- Return type
List
[TextDocument
]- Returns
List[TextDocument] – The medkit text documents corresponding to mtsamples data
- convert_mtsamples_to_medkit(output_file, encoding='utf-8', cache_dir='.cache', translated=True)[source]#
Convert mtsamples data into a medkit file
- Parameters
output_file (
Union
[Path
,str
]) – Path to the medkit jsonl file to generateencoding (
Optional
[str
]) – Encoding of the medkit file to generatecache_dir (
Union
[Path
,str
]) – Directory where mtsamples file is cached. Default: .cachetranslated (
bool
) – If True (default), mtsamples_translated.json file is used (FR). If False, mtsamples.csv is used (EN)