From jeremie.du-boisberranger at inria.fr Fri Apr 1 04:38:25 2022 From: jeremie.du-boisberranger at inria.fr (Jeremie du Boisberranger) Date: Fri, 1 Apr 2022 10:38:25 +0200 Subject: [scikit-learn] scikit-learn core dev sprint april 6th - 7th In-Reply-To: References: Message-ID: Dear all, On april 6th and 7th, some of the core developers of scikit-learn are going to participate in a short review oriented sprint. Our goal is to focus mostly on pull requests that are in the milestone for the 1.1 release scheduled in the coming weeks. For that we have set up a github project , listing the pull requests we plan to take a look at during the sprint. If you are the author of one of these pull requests, feel free to join us on the scikit-learn discord (https://discord.gg/VVzhr8cHK8) to have more interactive reviews. We'll try to be online on the discord channel from approximately 9am to 6pm (GMT+2). Best regards, J?r?mie -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmccarty at nvidia.com Fri Apr 1 12:37:11 2022 From: mmccarty at nvidia.com (Mike McCarty) Date: Fri, 1 Apr 2022 16:37:11 +0000 Subject: [scikit-learn] Scikit-learn Developer Position at NVIDIA Message-ID: Hi all, I'm excited to share that I'm hiring a remote Scikit-learn developer at NVIDIA to spend most of their time focused on open-source contributions to the Scikit-learn project. The core development team has graciously allowed me to share this opportunity on this list. We are continuing to expand our support of the PyData ecosystem and looking to hire strong engineers who are or can become contributors to NumPy, Pandas, Scikit-learn, SciPy, and NetworkX. Please see the job posting for more details (https://tinyurl.com/nvidia-pydata-job). Non-US based applicants are eligible in certain countries. I can follow up with individuals to confirm eligibility. Best, Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From krallinger.martin at gmail.com Mon Apr 11 04:48:28 2022 From: krallinger.martin at gmail.com (Martin Krallinger) Date: Mon, 11 Apr 2022 10:48:28 +0200 Subject: [scikit-learn] CFP LivingNER shared task: Named entity recognition, normalization & classification of species, pathogens and food Message-ID: Call for Participation LivingNER Shared Task (IberLEF2022) Named entity recognition, normalization & classification of species, pathogens and food https://temu.bsc.es/livingner/ LivingNER is the first track focusing specifically on the automatic detection of species mentions (humans, plants, animals, insects, pathogens), as well as their normalization to species taxonomy concepts (NCBI Taxonomy @https://www.ncbi.nlm.nih.gov/taxonomy). The detection, grounding, and classification of species mentions from documents is highly relevant for a diversity of real-world applications in medicine, biology, biodiversity, nutrition, and pharmacology. For instance, it represents a key component for text mining applied to: - Pathogens and tropical diseases (and their mode of transmission) - animal caused injuries, bites, etc. on humans - pets and farm animals (incl. health-related issues related to them) - microorganisms, antibiotic resistance, microbiome - hospital-acquired or nosocomial infections - allergies and food (incl. diets, intoxications, certain toxic habits, drug-food interactions) - epidemiology and family history (incl. close contacts, cohabitants, and relatives) - etc. The LivingNER track organizers will also release multilingual resources to stress the clear potential for multilingual adaptation of participating systems including and beyond English content (due to the use of universal scientific species nomenclatures). Following the success of previously organized shared tasks (i.e. CANTEMIST, PharmaCoNER, or MEDDOCAN), we are now launching the LivingNER shared task as part of the IberLEF 2022 evaluation initiative (co-located with SEPLN 2022), with the following three sub-tracks: - LivingNER-Clinical NER: automatic detection of mentions of species (both human and non-human). - LivingNER-Species Norm: finding mentions of species and mapping them to their corresponding NCBI taxonomy concept identifiers. - LivingNER-Clinical IMPACT: classifying documents according to 4 axes of clinical relevance (pets/farm animals, animal causing injuries, food, and nosocomial) and retrieving the evidence supporting the classification. Key information: - Web: https://temu.bsc.es/livingner/ - Data: https://doi.org/10.5281/zenodo.6376662 - Annotation guidelines: https://doi.org/10.5281/zenodo.6385162 - Registration: https://temu.bsc.es/livingner/registration/ - LivingNER vocabulary: https://doi.org/10.5281/zenodo.6390506 - Machine Translated Data (for multilingual purposes): https://doi.org/10.5281/zenodo.6376662 Schedule - Test set release (start of evaluation period): April 22nd, 2022 - End of the evaluation period (system submissions): May 22nd, 2022 - Working papers submission: June 17th, 2022 - Notification of acceptance (peer-reviews): June 26th, 2022 - Camera-ready system descriptions: July 3rd, 2022 - IberLEF @ SEPLN 2022: September 2022 Publications and IBERLEF/SEPLN2022 workshop Teams participating in LivingNER will be invited to contribute a systems description paper for the IberLEF (SEPLN 2022) Working Notes proceedings and a short presentation of their approach at the IberLEF 2022 workshop. Main Organizers - Martin Krallinger, Barcelona Supercomputing Center, Spain - Eul?lia Farr?, Barcelona Supercomputing Center, Spain - Salvador Lima, Barcelona Supercomputing Center, Spain - Antonio Miranda-Escalada, Barcelona Supercomputing Center, Spain -- ======================================= Martin Krallinger, Dr. Head of Biological Text Mining Unit Barcelona Supercomputing Center (BSC-CNS) ======================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From solegalli at protonmail.com Mon Apr 11 13:42:30 2022 From: solegalli at protonmail.com (Sole Galli) Date: Mon, 11 Apr 2022 17:42:30 +0000 Subject: [scikit-learn] intermediate data state in a Pipeline Message-ID: Hello community, Say I have a pipeline with 3 data transformations, i.e., SimpleImputer, OrdinalEncoder and StandardScaler, and a Lasso at the end. And I want to obtain a copy of the transformed data that would be input to the Lasso. Is there a way other than selecting all the steps of the pipeline prior to the Lasso and applying transform sequentially? Thank you! Sent with [ProtonMail](https://protonmail.com/) secure email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Mon Apr 11 16:26:26 2022 From: g.lemaitre58 at gmail.com (g.lemaitre58 at gmail.com) Date: Mon, 11 Apr 2022 22:26:26 +0200 Subject: [scikit-learn] intermediate data state in a Pipeline In-Reply-To: References: Message-ID: <0B7F3162-7D2F-44DD-8B6D-4B2F3EA2A3D4@gmail.com> Using slicing: model[:-1].transform(X) Sent from my iPhone > On 11 Apr 2022, at 20:24, Sole Galli via scikit-learn wrote: > > ? > Hello community, > > Say I have a pipeline with 3 data transformations, i.e., SimpleImputer, OrdinalEncoder and StandardScaler, and a Lasso at the end. And I want to obtain a copy of the transformed data that would be input to the Lasso. > > Is there a way other than selecting all the steps of the pipeline prior to the Lasso and applying transform sequentially? > > Thank you! > > Sent with ProtonMail secure email. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From zaratruta at gmail.com Thu Apr 14 16:44:22 2022 From: zaratruta at gmail.com (Marcelino Borges) Date: Thu, 14 Apr 2022 17:44:22 -0300 Subject: [scikit-learn] How can I implement the AUC-PR in a multiclass scenario using scikit-learn? Message-ID: Hi. I'm dealing with an image classification problem with 25k images classified in 50 classes. The bigger class has 8k samples. The smaller class has 140 samples. I would like to compare the performance of different classifiers using AUC-PR. How can I do this with scikit-leran? Best regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomasjpfan at gmail.com Thu Apr 21 23:18:39 2022 From: thomasjpfan at gmail.com (Thomas J. Fan) Date: Thu, 21 Apr 2022 23:18:39 -0400 Subject: [scikit-learn] scikit-learn monthly developer meeting: Monday April 25, 2022 Message-ID: Dear all, The scikit-learn developer monthly meeting will take place on Monday April 25, 2022 at 14:00 UTC. - Video call link: https://meet.google.com/ews-uszu-djs - Meeting notes / agenda: https://hackmd.io/0yokz72CTZSny8y3Re648Q - Local times: https://www.timeanddate.com/worldclock/meetingdetails.html?year=2022&month=4&day=25&hour=14&min=0&sec=0&p1=1440&p2=240&p3=248&p4=195&p5=179&p6=224 The goal of this meeting is to discuss ongoing development topics for the project. Everybody is welcome. As usual, please follow the code of conduct of the project: https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md Regards, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From krallinger.martin at gmail.com Sat Apr 23 05:54:12 2022 From: krallinger.martin at gmail.com (Martin Krallinger) Date: Sat, 23 Apr 2022 11:54:12 +0200 Subject: [scikit-learn] CFP: DISTEMIST (BioASQ/CLEF2022) shared task on detection & normalization of disease mentions Message-ID: (Apologies for cross-posting) Call for Participation DISTEMIST Shared Task (CLEF 2022) Detection and normalization of diseases mentions https://temu.bsc.es/distemist/ DISTEMIST is the first track focusing specifically on the automatic detection of disease mentions and their normalization (Snomed CT) in Spanish clinical case reports. The DISTEMIST data was tested to develop disease taggers previously applied on a diversity of medical records. Key information: - Web: https://temu.bsc.es/distemist/ - Data: https://doi.org/10.5281/zenodo.6408476 - Annotation guidelines: https://doi.org/10.5281/zenodo.6458078 - DISTEMIST gazetteer: https://doi.org/10.5281/zenodo.6458114 - Registration: https://temu.bsc.es/distemist/registration/ Motivation Systems able to detect and normalize disease mentions from medical content are crucial for a diversity of applications such as semantic indexing for improved retrieval/classification, clinical coding, drug-repurposing, relation extraction (disease-symptom, disease-drug/treatment, disease-gene/mutation), etc. It was estimated that around 20% of PubMed queries are related to diseases, disorders, and anomalies, stressing the importance for different users (researchers, clinicians, Pharma, biologists, healthcare practitioners,..) to extract this key information. Disease mention recognition tools are also relevant to process other kinds of content like social media (e.g. SMM4H/COLING2022 track - SocialDisNER). Disease mention detection systems have been implemented and used to process a diversity of content types, including scientific publications, clinical records, clinical trials, patient forums or social media, resulting in a component integrated into a diversity of practically relevant application types, such as: - health data analytics software and study of disease trajectories - disease outbreak monitoring/surveillance and epidemiology tools - extraction of disease phenotype or comorbidities - drug discovery, repurposing and off label indications - occupational health studies - pharmacogenomics - clinical coding of diagnosis The DISTEMIST organizers will release multilingual resources to foster the development of multilingual tools and generate systems not only for Spanish but also for content in English and Romance languages (French, Portuguese, Italian and Romanian): DISTEMIST-English, DISTEMIST-Italian, DISTEMIST-French, DISTEMIST-Portuguese, DISTEMIST-Catalan and DISTEMIST-Romanian. We foresee that participation in the DISTEMIST track will contribute to generate resources that will improve the exploitation of clinical unstructured data and thus unlock valuable health information, assist data curation and facilitate quality evaluation and interpretability of disease mention detection systems. Inspired by previous initiatives (n2c2, BioCreative) and shared tasks (CANTEMIST, PharmaCoNER, or CodiEsp), we are launching the DISTEMIST shared task as part of the BioASQ 2022 evaluation initiative (co-located with CLEF 2022), with the following two sub-tracks: - DISTEMIST-entities: automatic detection of mentions of diseases. - DISTEMIST-linking: finding mentions of diseases and normalizing them to their Snomed-CT concept identifiers. Schedule - DISTEMIST-linking 2nd Training Set Release: April 23th, 2022 - Test Set Release (DISTEMIST-entities and linking): May 10th, 2022 - Participant Test Prediction Due (DISTEMIST-entities and linking): May 15th, 2022 ("Anywhere on Earth") - Working papers submission: May 27th, 2022 - Notification of acceptance (peer-reviews): June 13th, 2022 - Camera-ready system descriptions: July 1st, 2022 - BioASQ @ CLEF 2022: September 2022 Publications and BioASQ/CLEF2022 workshop Teams participating in DISTEMIST will be invited to contribute a systems description paper for the CLEF 2022 Working Notes proceedings (published on CEUR-WS) and a short presentation of their approach at the CLEF 2022 workshop. Main Organizers - Martin Krallinger, Barcelona Supercomputing Center, Spain - Eul?lia Farr?-Maduell, Barcelona Supercomputing Center, Spain - Luis Gasc?, Barcelona Supercomputing Center, Spain - Anastasios Nentidis, National Center for Scientific Research Demokritos, Greece - Salvador Lima, Barcelona Supercomputing Center, Spain - Antonio Miranda-Escalada, Barcelona Supercomputing Center, Spain -- ======================================= Martin Krallinger, Dr. Head of Biological Text Mining Unit Barcelona Supercomputing Center (BSC-CNS) ======================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrin.jalali at gmail.com Mon Apr 25 08:59:42 2022 From: adrin.jalali at gmail.com (Adrin) Date: Mon, 25 Apr 2022 14:59:42 +0200 Subject: [scikit-learn] PEP 688: Making the buffer protocol accessible in Python In-Reply-To: References: Message-ID: Hi there, Forwarding from numpy mailing list for visibility. This is related to the work we're doing on supporting Array API (ref: https://github.com/scikit-learn/scikit-learn/pull/22554). If we go down the path of having different execution paths based on the input data type, knowing if the given type implements the buffer protocol might be relevant. Cheers, Adrin ---------- Forwarded message --------- From: Jelle Zijlstra Date: Mon, Apr 25, 2022 at 2:02 PM Subject: [Numpy-discussion] PEP 688: Making the buffer protocol accessible in Python To: Discussion of Numerical Python I just posted https://peps.python.org/pep-0688/, which proposes adding a types.Buffer type that will make it possible to check in Python code whether a type implements the buffer protocol. I'm reaching out to the numpy community because numpy was an important driver for creating the buffer protocol. I'd be happy to hear any feedback or possible use cases for the PEP. _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion at python.org To unsubscribe send an email to numpy-discussion-leave at python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: adrin.jalali at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mh4116 at columbia.edu Wed Apr 27 14:24:32 2022 From: mh4116 at columbia.edu (Mingzhe Hu) Date: Wed, 27 Apr 2022 14:24:32 -0400 Subject: [scikit-learn] Questions on computation precision of DBSCAN Message-ID: Dear community, When I am calling the `sklearn.cluster.DBSCAN` function, I found it may result in huge memory costs... I am trying to reduce the computation cost by having my input data type as np.float16 and using "precomputed" as my metric. But I found that it still uses float64 (as it returns me with some errors like float64 computation leads to memory allocation failure) during computation when `fit_predict` is called. All suggestions for reducing computation costs are highly appreciated. Thanks. All the best, -- Mingzhe HU Columbia University in the City of New York M.S. in Electrical Engineering mingzhe.hu at columbia.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremie.du-boisberranger at inria.fr Thu Apr 28 10:10:50 2022 From: jeremie.du-boisberranger at inria.fr (Jeremie du Boisberranger) Date: Thu, 28 Apr 2022 16:10:50 +0200 Subject: [scikit-learn] [ANN] scikit-learn 1.1.0rc1 is online! In-Reply-To: References: Message-ID: <5f1f099d-371a-3127-8bff-dc51d6439cce@inria.fr> Please help us test the first release candidate for scikit-learn 1.1.0: pip install scikit-learn==1.1.0rc1 Changelog:https://scikit-learn.org/1.1/whats_new/v1.1.html In particular, if you maintain a project with a dependency on scikit-learn, please let us know about any regression. Thanks to everyone who contributed to this release! Best, J?r?mie From olivier.grisel at ensta.org Thu Apr 28 10:34:46 2022 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 28 Apr 2022 16:34:46 +0200 Subject: [scikit-learn] [ANN] scikit-learn 1.1.0rc1 is online! In-Reply-To: <5f1f099d-371a-3127-8bff-dc51d6439cce@inria.fr> References: <5f1f099d-371a-3127-8bff-dc51d6439cce@inria.fr> Message-ID: Thanks Jeremie for leading the efforts to get this release out! -- Olivier From mlcnworkshop at gmail.com Sat Apr 30 06:30:15 2022 From: mlcnworkshop at gmail.com (MLCN Workshop) Date: Sat, 30 Apr 2022 12:30:15 +0200 Subject: [scikit-learn] Call for papers for the 5th international workshop on machine learning in clinical neuroimaging Message-ID: Dear Colleagues, Please find below the call for papers for the International Workshop of Machine Learning in Clinical Neuroimaging (MLCN) which is held on *the 18th of September 2022 at MICCAI 2022, Singapore*. We welcome contributions to novel machine learning methods and their applications to clinical neuroimaging data. The submission deadline is *25 June 2022*, and all MLCN accepted papers will be eligible for the best paper award of 500 USD. Top accepted papers will be invited to submit an extended version to the MELBA journal. For more information, please visit https://mlcnws.com/call-for-papers/*.* Best wishes, The MLCN 2022 steering and organizing committees Christos Davatzikos, Andre Marquand, Jonas Richiardi, Emma Robinson Ahmed Abdulkadir, Nicha C. Dvornek, Mohamad Habes, Seyed Mostafa Kia, Vinod Kumar, Thomas Wolfers *Call for Papers* The International Workshop of Machine Learning in Clinical Neuroimaging (MLCN 2022 ), a satellite event of MICCAI (MICCAI 2022 ), calls for original papers in the field of clinical neuroimaging data analysis with machine learning. The two tracks of the workshop include methodological innovations as well as clinical applications. This highly interdisciplinary topic provides an excellent platform to connect researchers of varying disciplines and to collectively advance the field in multiple directions. In the *machine learning* track, we seek novel contributions that address current methodological gaps in analyzing high-dimensional, longitudinal, and heterogeneous clinical neuroscientific data using stable, scalable, and interpretable machine learning models. Topics of interest include but are not limited to: ? Big data ? Spatio-temporal brain data analysis ? Structural data analysis ? Graph theory and complex network analysis ? Longitudinal data analysis ? Model stability and interpretability ? Model scalability in large neuroimaging datasets ? Multi-source data integration and multi-view learning ? Multi-site data analysis, from preprocessing to modeling ? Domain adaptation, data harmonization, and transfer learning in neuroimaging ? Unsupervised methods for stratifying brain disorders ? Deep learning in clinical neuroimaging ? Model uncertainty in clinical predictions ? ? In the *clinical neuroimaging *track*,* the applications of existing machine learning algorithms are evaluated to move towards precision medicine for complex brain disorders. The discovery of biological markers in medicine is an important challenge across different fields and various experimental procedures and designs are used to detect biological signatures that can be utilized for improvement in diagnostic, treatment, or for other beneficial ends. However, for most complex brain disorders, we do not have reliable biomarkers today. The application of advanced machine learning methods may help to reach this goal. Therefore, we invite the community to submit conference contributions on machine learning approaches with the goal to improve our understanding of complex brain disorders, moving the field closer to precision medicine. Topics of interest include but are not limited to: ? Biomarker discovery ? Refinement of nosology and diagnostics ? Biological validation of clinical syndromes ? Treatment outcome prediction ? Course prediction ? Analysis of wearable sensors ? Neurogenetics and brain imaging genetics ? Mechanistic modeling ? Brain aging ? The presentation of clinical neuroimaging databases to stimulate developments in machine learning ? ? ------------------------------ *Submission Process:* The workshop seeks high-quality, original, and unpublished work that addresses one or more challenges described above. Papers should be submitted electronically in Springer Lecture Notes in Computer Science (LCNS) style (see this link for detailed author guidelines) using the CMT system here . The page limit is 8-pages (text, figures, and tables) plus up to 2-pages of references. We review the submissions in a double-blind process. Please make sure that your submission is anonymous. Accepted papers will be published in a joint proceeding with the MICCAI 2022 conference. ------------------------------ *MLCN Special Issue at the MELBA journal:* We will invite the top accepted papers to submit an extended version of their contribution to the MLCN special issue at the Journal of Machine Learning for Biomedical Imaging (MELBA) . The invited papers will go through an independent review process by the journal. *Error! Filename not specified.* ------------------------------ *Donders Best Paper Award:* The MLCN?s best paper award is sponsored by Donders Institute . All MLCN accepted papers will be eligible for the best paper award. The recipient of the award will be chosen by the MLCN scientific committee based on the scientific quality, novelty, and clarity of contributions. The winner will be announced at the end of the workshop and will receive a 500 USD honorarium. ------------------------------ *Important Dates:* ? Paper submission deadline: June 25, 2022 ? Notification of Acceptance: July 23, 2022 ? Camera-ready Submission: Aug 6, 2022 ? Workshop Date: September 18, 2022 -------------- next part -------------- An HTML attachment was scrubbed... URL: