[scikit-learn] MEDDOCAN Shared task for Named Entity Recognition and Classification with Scikit-Learn

Martin Krallinger krallinger.martin at gmail.com
Thu May 2 13:03:34 EDT 2019

*IberLEF/SEPLN: CFP MEDDOCAN track & task prize: named entity recognition
and sensitive personal information identification*

***** *CFP MEDDOCAN track ****

*First Medical Document Anonymization *

*http://temu.bsc.es/meddocan <http://temu.bsc.es/meddocan>*

*SEAD – Plan TL Sponsoring Track Awards*

Sub-tracks: 1,000€, 500€ and 200€ (first, second, third team)

*Task description*

Scikit-Learn has been successfully used for Named Entity Recognition and
Classification tasks in the past, showing that it is specially competitive
for fining mentions of entities in running text.

Clinical records with protected health information (PHI) cannot be directly
shared as is, due to privacy constraints, making it particularly cumbersome
to carry out NLP research in the medical domain. A necessary precondition
for accessing clinical records outside of hospitals is their
de-identification, i.e., the exhaustive removal (or replacement) of all
mentioned PHI phrases.

The practical relevance of anonymization or de-identification of clinical
texts motivated the proposal of two shared tasks, the 2006 and 2014
de-identification tracks, organized under the umbrella of the i2b2 (*i2b2.org
<http://i2b2.org>*) community evaluation effort. The i2b2 effort has deeply
influenced the clinical NLP community worldwide, but was focused on
documents in English and covering characteristics of US-healthcare data

As part of the IberLEF 2019 (*https://sites.google.com/view/iberlef-2019
<https://sites.google.com/view/iberlef-2019>*) initiative, we announce  *the
first community challenge task specifically devoted to the anonymization of
medical documents in Spanish*, called the MEDDOCAN (Medical Document
Anonymization) track.

In order to carry out these tasks we have prepared a synthetic corpus of
1000 clinical case studies. This corpus was selected manually by a
practicing physician and augmented with PHI information from discharge
summaries and medical genetics clinical records.

The MEDDOCAN task will be structured into *two sub-tracks*:

   - NER offset and entity type classification
   - Sensitive span detection.


Teams will be invited to send a workshop proceedings systems description
paper, similarly to previous *IberEval* events.

We plan to* invite selected works *for full publication in a *Q1 Journal –
Special Issue devoted to MEDDOCAN*.  Invitation to the special issue will
consider multiple aspects such as performance, novelty of the system,
availability of the underlying system (software/web-service) as well as the
workshop presentation.

*Important Dates*

   - March 18, 2019: Sample set and Evaluation script released.
   - March 20, 2019: Training set released.
   - April 4, 2019: Development set released.
   - April 29, 2019: Test set released (includes background set).
   - May 17, 2019: End of evaluation period (system submissions).
   - May 20, 2019: Results posted and Test set with GS annotations
   - May 31, 2019:  Working notes paper submission.
   - June 14, 2019: Notification of acceptance (peer-reviews).
   - June 28, 2019: Camera ready paper submission.
   - September 24, 2019:  IberLEF 2019 Workshop, Bilbao Spain

*Task organizers*

   - Aitor Gonzalez-Agirre, Barcelona Supercomputing Center.
   - Ander Intxaurrondo, Barcelona Supercomputing Center.
   - Jose Antonio Lopez-Martin, Hospital 12 de Octubre.
   - Montserrat Marimon, Barcelona Supercomputing Center.
   - Felipe Soares, Barcelona Supercomputing Center.
   - Marta Villegas, Barcelona Supercomputing Center.
   - Martin Krallinger, Barcelona Supercomputing Center.

*Scientific committee *

• Hercules Dalianis, DSV/Stockholm University, Sweden
• Christoph Dieterich, Klaus-Tschira-Institute for Computational
Cardiology, University Hospital Heidelberg, Germany
• Jelena Jacimovic, University of Belgrade, Serbia
• Bradley Malin, Vanderbilt University Medical Center, USA
• Øystein Nytrø, Norwegian University of Science and Technology, Norway
• Patrick Ruch, SIB Text Mining, HES-SO & Swiss Institute of
Bioinformatics, Switzerland
• Angus Roberts, King’s College London, UK
• Arturo Romero Gutiérrez, Ministerio de Sanidad, Servicios Sociales e
Igualdad, Spain
• Ozlem Uzuner, George Mason University, USA
• Alfonso Valencia, Barcelona Supercomputing Center, Spain

Martin Krallinger, Dr.
Head of Biological Text Mining Unit
Structural Biology and BioComputing Programme
Spanish National Cancer Research Centre (CNIO)
Oficina Técnica General (OTG) del Plan TL en el
área de Biomedicina de la Secretaria de Estado de
Telecomunicaciones y para la Sociedad de la
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190502/b1d7b49b/attachment.html>

More information about the scikit-learn mailing list