[scikit-learn] Call for Participation ClinSpEn at Biomedical WMT Shared Task (WMT/EMNLP 2022

Martin Krallinger krallinger.martin at gmail.com
Wed Jul 20 07:23:18 EDT 2022


Call for Participation ClinSpEn @ Biomedical WMT Shared Task (WMT/EMNLP
2022)

Automatic Translation of Clinical cases, ontologies & medical entities:
Spanish - English

https://temu.bsc.es/clinspen/



ClinSpEn is part of the Biomedical WMT 2022 shared task, having the aim to
promote the development and evaluation of machine translation systems
adapted to the medical domain with three highly relevant sub-tracks:
clinical cases, medical controlled vocabularies/ontologies, and clinical
terms and entities extracted from medical content.



Key information:

   -

   ClinSpEn sub-track: https://temu.bsc.es/clinspen/
   -

   Biomedical WMT: https://statmt.org/wmt22/biomedical-translation-task.html
   -

   Main WMT: https://statmt.org/wmt22/
   -

   EMNLP conference: https://2022.emnlp.org/



   -

   Sample/Training Data:


   -

   Clinical Cases: https://doi.org/10.5281/zenodo.6497350
   -

   Clinical Terms: https://doi.org/10.5281/zenodo.6497372
   -

   Ontology Concepts: https://doi.org/10.5281/zenodo.6497388



   -

   BioWMT Registration Form: https://tinyurl.com/mtvdytmt
   -

   ClinSpEn Registration Form (for support and updates):
   https://temu.bsc.es/clinspen/registration/


Motivation

Machine translation applied to the clinical domain is a specially
challenging task due to the complexity of medical language and the heavy
use of health-related technical terms and medical expressions. Therefore,
there is a large community of specialized medical translators, able to deal
with medical narratives, terminologies or the use of ambiguous
abbreviations and acronyms.

Taking into account the relevance, impact and diversity of health-related
content, as well as the rapidly growing number of publications, EHRs,
clinical trials, informed consent documents and medical terminologies there
is a pressing need to be able to generate more robust medical machine
translation resources together with independent quality evaluation
scenarios.

Recent advances in machine translation technologies, together with the use
of other NLP components, are showing promising results, thus domain
adaptation of MT approaches can have a significant impact in unlocking key
information from medical content.

The ClinSpEn sub-task of Biomedical WMT proposes three different highly
relevant sub-tracks, each associated with highly relevant medical machine
translation application scenarios::

   -

   ClinSpEn-CC (Clinical Cases) subtask: translation of clinical case
   documents from English to Spanish, a type of document relevant both for
   processing medical literature as well as clinical records.



   -

   ClinSpEn-CT (Clinical Terms): translation of clinical terms and entity
   mentions from Spanish to English. The used terms were directly extracted
   from medical literature and clinical records, with particular focus on
   diseases, symptoms, findings, procedures and professions.



   -

   ClinSpEn-OC (Ontology Concepts): translation of clinical controlled
   vocabularies and ontology concepts from English to Spanish. Ontologies and
   structured vocabularies represent a key resource for semantic
   interoperability, entity linking, biomedical knowledge bases and precision
   medicine, and thus there is a pressing need to generate multilingual
   biomedical ontologies for a range of clinical applications.



A decently-sized sample set for each data type has been released.
Participants may use it to test their existing systems or try out new ones.

In addition to the manually translated test set by professional medical
translators, participants will also have access to a larger background
collection for each of the three substracks, which might serve as
additional resources and to promote scalability and robustness assessment
of machine translation technology.


Schedule

   -

   Test and Background Set Release: July 21st, 2022
   -

   Participant Predictions Due: July 28th, 2022
   -

   Paper Submission Deadline: September 7th, 2022
   -

   Notification of Acceptance (peer-reviews): October 9th, 2022
   -

   Camera-ready Version Due: October 16th, 2022
   -

   WMT @ EMNLP: December 7th and 8th, 2022


[All deadlines are in AoE (Anywhere on Earth)]


Registration

Participants must register using the official BioWMT Registration Form,
which is available at https://tinyurl.com/mtvdytmt.

Additionally, we’ve created a registration form specific for the ClinSpEn
sub-tracks which will be used to keep participants updated. Register at:
https://temu.bsc.es/clinspen/registration/.

Publications and WMT workshop


Teams participating in the ClinSpEn subtrack of  Biomedical WMT will be
invited to contribute a systems description paper for the WMT 2022 Working
Notes proceedings. More information on the paper’s specifications,
formatting guidelines and review process at:
https://statmt.org/wmt22/index.html.

If you are interested in Machine Translation, the biomedical domain or
other language combinations, remember to check out the Biomedical WMT site
and the rest of this year’s sub-tracks and language pairs:
https://statmt.org/wmt22/biomedical-translation-task.html


ClinSpEn Organizers

   -

   Salvador Lima-López (Barcelona Supercomputing Center, Spain)
   -

   Darryl Johan Estrada (Barcelona Supercomputing Center, Spain)
   -

   Eulàlia Farré-Maduell (Barcelona Supercomputing Center, Spain)
   -

   Martin Krallinger (Barcelona Supercomputing Center, Spain)


Biomedical WMT Organizers

   -

   Rachel Bawden (University of Edinburgh, UK)
   -

   Giorgio Maria Di Nunzio (University of Padua, Italy)
   -

   Darryl Johan Estrada (Barcelona Supercomputing Center, Spain)
   -

   Eulàlia Farré-Maduell (Barcelona Supercomputing Center, Spain)
   -

   Cristian Grozea (Fraunhofer Institute, Germany)
   -

   Antonio Jimeno Yepes (University of Melbourne, Australia)
   -

   Salvador Lima-López (Barcelona Supercomputing Center, Spain)
   -

   Martin Krallinger (Barcelona Supercomputing Center, Spain)
   -

   Aurélie Névéol (Université Paris Saclay, CNRS, LISN, France)
   -

   Mariana Neves (German Federal Institute for Risk Assessment, Germany)
   -

   Roland Roller (DFKI, Germany)
   -

   Amy Siu (Beuth University of Applied Sciences, Germany)
   -

   Philippe Thomas (DFKI, Germany)
   -

   Federica Vezzani (University of Padua, Italy)
   -

   Maika Vicente Navarro, Maika Spanish Translator, Melbourne, Australia
   -

   Dina Wiemann (Novartis, Switzerland)
   -

   Lana Yeganova (NCBI/NLM/NIH, USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20220720/ac89f894/attachment-0001.html>


More information about the scikit-learn mailing list