From olivier.grisel at ensta.org Wed May 1 06:58:29 2019 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Wed, 1 May 2019 12:58:29 +0200 Subject: [scikit-learn] Release Candidate for Scikit-learn 0.21 In-Reply-To: References: Message-ID: \o/ From t3kcit at gmail.com Wed May 1 22:13:02 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 1 May 2019 22:13:02 -0400 Subject: [scikit-learn] Release Candidate for Scikit-learn 0.21 In-Reply-To: References: Message-ID: Thank you for all the amazing work y'all! On 4/30/19 10:09 PM, Joel Nothman wrote: > PyPI now has source and binary releases for Scikit-learn 0.21rc2. > > * Documentation at https://scikit-learn.org/0.21 > * Release Notes at https://scikit-learn.org/0.21/whats_new > * Download source or wheels at > https://pypi.org/project/scikit-learn/0.21rc2/ > > Please try out the software and help us edit the release notes before > a final release. > > Highlights include: > * neighbors.NeighborhoodComponentsAnalysis for supervised metric > learning, which learns a weighted euclidean distance for k-nearest > neighbors. https://scikit-learn.org/0.21/modules/neighbors.html#nca > *?ensemble.HistGradientBoostingClassifier > and?ensemble.HistGradientBoostingRegressor: experimental > implementations of efficient binned gradient boosting machines. > https://scikit-learn.org/0.21/modules/ensemble.html#gradient-tree-boosting > * impute.IterativeImputer: a non-trivial approach to missing value > imputation. > https://scikit-learn.org/0.21/modules/impute.html#multivariate-feature-imputation > * cluster.OPTICS: a new density-based clustering algorithm. > https://scikit-learn.org/0.21/modules/clustering.html#optics > * better printing of estimators as strings, with an option to hide > default parameters for compactness: > https://scikit-learn.org/0.21/auto_examples/plot_changed_only_pprint_parameter.html > * for estimator and library developers: a way to tag your estimator so > that it can be treated appropriately with check_estimator. > https://scikit-learn.org/0.21/developers/contributing.html#estimator-tags > > There are many other enhancements and fixes listed in the release > notes (https://scikit-learn.org/0.21/whats_new). > > Please note that Scikit-learn has new dependencies: > * joblib >= 0.11, which used to be vendored within Scikit-learn > * OpenMP, unless the environment variable?SKLEARN_NO_OPENMP=1 when the > code is compiled (and cythonized) > > Happy Learning! > > From the Scikit-learn core dev team. > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Thu May 2 03:28:25 2019 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 2 May 2019 09:28:25 +0200 Subject: [scikit-learn] Release Candidate for Scikit-learn 0.21 In-Reply-To: References: Message-ID: <20190502072825.fzheoqwesuppvs4f@phare.normalesup.org> Thank you all and congratulations indeed. Because this release comes soon after the latest one from the 0.20 series, we might have thought that it would be a light one. But no! Plenty of exciting features! Ga?l On Wed, May 01, 2019 at 10:13:02PM -0400, Andreas Mueller wrote: > Thank you for all the amazing work y'all! > On 4/30/19 10:09 PM, Joel Nothman wrote: > PyPI now has source and binary releases for Scikit-learn 0.21rc2. > * Documentation at?https://scikit-learn.org/0.21 > * Release Notes at?https://scikit-learn.org/0.21/whats_new > * Download source or wheels at?https://pypi.org/project/scikit-learn/ > 0.21rc2/ > Please try out the software and help us edit the release notes before a > final release. > Highlights include: > * neighbors.NeighborhoodComponentsAnalysis for supervised metric learning, > which learns a weighted euclidean distance for k-nearest neighbors.?https:/ > /scikit-learn.org/0.21/modules/neighbors.html#nca > *?ensemble.HistGradientBoostingClassifier > and?ensemble.HistGradientBoostingRegressor: experimental implementations of > efficient binned gradient boosting machines.?https://scikit-learn.org/0.21/ > modules/ensemble.html#gradient-tree-boosting > * impute.IterativeImputer: a non-trivial approach to missing value > imputation.?https://scikit-learn.org/0.21/modules/impute.html# > multivariate-feature-imputation > * cluster.OPTICS: a new density-based clustering algorithm.?https:// > scikit-learn.org/0.21/modules/clustering.html#optics > * better printing of estimators as strings, with an option to hide default > parameters for compactness:?https://scikit-learn.org/0.21/auto_examples/ > plot_changed_only_pprint_parameter.html > * for estimator and library developers: a way to tag your estimator so that > it can be treated appropriately with check_estimator.?https:// > scikit-learn.org/0.21/developers/contributing.html#estimator-tags > There are many other enhancements and fixes listed in the release notes ( > https://scikit-learn.org/0.21/whats_new). > Please note that Scikit-learn has new dependencies: > * joblib >= 0.11, which used to be vendored within Scikit-learn > * OpenMP, unless the environment variable?SKLEARN_NO_OPENMP=1 when the code > is compiled (and cythonized) > Happy Learning! > From the Scikit-learn core dev team. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Senior Researcher, INRIA http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From krallinger.martin at gmail.com Thu May 2 13:03:34 2019 From: krallinger.martin at gmail.com (Martin Krallinger) Date: Thu, 2 May 2019 19:03:34 +0200 Subject: [scikit-learn] MEDDOCAN Shared task for Named Entity Recognition and Classification with Scikit-Learn In-Reply-To: References: Message-ID: *IberLEF/SEPLN: CFP MEDDOCAN track & task prize: named entity recognition and sensitive personal information identification* ***** *CFP MEDDOCAN track **** *First Medical Document Anonymization * *http://temu.bsc.es/meddocan * *SEAD ? Plan TL Sponsoring Track Awards* Sub-tracks: 1,000?, 500? and 200? (first, second, third team) *Task description* Scikit-Learn has been successfully used for Named Entity Recognition and Classification tasks in the past, showing that it is specially competitive for fining mentions of entities in running text. Clinical records with protected health information (PHI) cannot be directly shared as is, due to privacy constraints, making it particularly cumbersome to carry out NLP research in the medical domain. A necessary precondition for accessing clinical records outside of hospitals is their de-identification, i.e., the exhaustive removal (or replacement) of all mentioned PHI phrases. The practical relevance of anonymization or de-identification of clinical texts motivated the proposal of two shared tasks, the 2006 and 2014 de-identification tracks, organized under the umbrella of the i2b2 (*i2b2.org *) community evaluation effort. The i2b2 effort has deeply influenced the clinical NLP community worldwide, but was focused on documents in English and covering characteristics of US-healthcare data providers. As part of the IberLEF 2019 (*https://sites.google.com/view/iberlef-2019 *) initiative, we announce *the first community challenge task specifically devoted to the anonymization of medical documents in Spanish*, called the MEDDOCAN (Medical Document Anonymization) track. In order to carry out these tasks we have prepared a synthetic corpus of 1000 clinical case studies. This corpus was selected manually by a practicing physician and augmented with PHI information from discharge summaries and medical genetics clinical records. The MEDDOCAN task will be structured into *two sub-tracks*: - NER offset and entity type classification - Sensitive span detection. *Publications* Teams will be invited to send a workshop proceedings systems description paper, similarly to previous *IberEval* events. We plan to* invite selected works *for full publication in a *Q1 Journal ? Special Issue devoted to MEDDOCAN*. Invitation to the special issue will consider multiple aspects such as performance, novelty of the system, availability of the underlying system (software/web-service) as well as the workshop presentation. *Important Dates* - March 18, 2019: Sample set and Evaluation script released. - March 20, 2019: Training set released. - April 4, 2019: Development set released. - April 29, 2019: Test set released (includes background set). - May 17, 2019: End of evaluation period (system submissions). - May 20, 2019: Results posted and Test set with GS annotations released. - May 31, 2019: Working notes paper submission. - June 14, 2019: Notification of acceptance (peer-reviews). - June 28, 2019: Camera ready paper submission. - September 24, 2019: IberLEF 2019 Workshop, Bilbao Spain *Task organizers* - Aitor Gonzalez-Agirre, Barcelona Supercomputing Center. - Ander Intxaurrondo, Barcelona Supercomputing Center. - Jose Antonio Lopez-Martin, Hospital 12 de Octubre. - Montserrat Marimon, Barcelona Supercomputing Center. - Felipe Soares, Barcelona Supercomputing Center. - Marta Villegas, Barcelona Supercomputing Center. - Martin Krallinger, Barcelona Supercomputing Center. *Scientific committee * ? Hercules Dalianis, DSV/Stockholm University, Sweden ? Christoph Dieterich, Klaus-Tschira-Institute for Computational Cardiology, University Hospital Heidelberg, Germany ? Jelena Jacimovic, University of Belgrade, Serbia ? Bradley Malin, Vanderbilt University Medical Center, USA ? ?ystein Nytr?, Norwegian University of Science and Technology, Norway ? Patrick Ruch, SIB Text Mining, HES-SO & Swiss Institute of Bioinformatics, Switzerland ? Angus Roberts, King?s College London, UK ? Arturo Romero Guti?rrez, Ministerio de Sanidad, Servicios Sociales e Igualdad, Spain ? Ozlem Uzuner, George Mason University, USA ? Alfonso Valencia, Barcelona Supercomputing Center, Spain ============================ Martin Krallinger, Dr. -------------------------------------------------------------------- Head of Biological Text Mining Unit Structural Biology and BioComputing Programme Spanish National Cancer Research Centre (CNIO) -------------------------------------------------------------------- Oficina T?cnica General (OTG) del Plan TL en el ?rea de Biomedicina de la Secretaria de Estado de Telecomunicaciones y para la Sociedad de la Informaci?n ============================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pahome.chen at mirlab.org Fri May 3 04:03:05 2019 From: pahome.chen at mirlab.org (lampahome) Date: Fri, 3 May 2019 16:03:05 +0800 Subject: [scikit-learn] Can I evaluate clustering efficiency incrementally? Message-ID: I see some algo can cluster incrementally if dataset is too huge ex: minibatchkmeans and Birch. But is there any way to evaluate incrementally? I found silhouette-coefficient and Calinski-Harabaz index because I don't know the ground truth labels. But they can't evaluate incrementally. -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Fri May 3 04:12:09 2019 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Fri, 3 May 2019 10:12:09 +0200 Subject: [scikit-learn] Can I evaluate clustering efficiency incrementally? In-Reply-To: References: Message-ID: You can always predict incrementally by predicting on batches of samples. On Fri, 3 May 2019 at 10:05, lampahome wrote: > I see some algo can cluster incrementally if dataset is too huge ex: > minibatchkmeans and Birch. > > But is there any way to evaluate incrementally? > > I found silhouette-coefficient and Calinski-Harabaz index because I don't > know the ground truth labels. > But they can't evaluate incrementally. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Fri May 3 04:14:28 2019 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Fri, 3 May 2019 10:14:28 +0200 Subject: [scikit-learn] Can I evaluate clustering efficiency incrementally? In-Reply-To: References: Message-ID: oh sorry, I see now that you mention about evaluating. On Fri, 3 May 2019 at 10:12, Guillaume Lema?tre wrote: > You can always predict incrementally by predicting on batches of samples. > > On Fri, 3 May 2019 at 10:05, lampahome wrote: > >> I see some algo can cluster incrementally if dataset is too huge ex: >> minibatchkmeans and Birch. >> >> But is there any way to evaluate incrementally? >> >> I found silhouette-coefficient and Calinski-Harabaz index because I don't >> know the ground truth labels. >> But they can't evaluate incrementally. >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ugoren at gmail.com Fri May 3 07:27:21 2019 From: ugoren at gmail.com (Uri Goren) Date: Fri, 3 May 2019 14:27:21 +0300 Subject: [scikit-learn] Can I evaluate clustering efficiency incrementally? In-Reply-To: References: Message-ID: I usually use clustering to save costs on labelling. I like to apply hierarchical clustering, and then label a small sample and fine-tune the clustering algorithm. That way, you can evaluate the effectiveness in terms of cluster purity (how many clusters contain mixed labels) See example with sklearn here : https://youtu.be/GM8L324MuHc?list=PLqkckaeDLF4IDdKltyBwx8jLaz5nwDPQU On Fri, May 3, 2019, 11:03 AM lampahome wrote: > I see some algo can cluster incrementally if dataset is too huge ex: > minibatchkmeans and Birch. > > But is there any way to evaluate incrementally? > > I found silhouette-coefficient and Calinski-Harabaz index because I don't > know the ground truth labels. > But they can't evaluate incrementally. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From prudhvirajnitjsr at gmail.com Sat May 11 16:28:36 2019 From: prudhvirajnitjsr at gmail.com (prudhviraj nitjsr) Date: Sun, 12 May 2019 01:58:36 +0530 Subject: [scikit-learn] Proposing Encoder class to encode Ordinal values of an attribute Message-ID: Hi All, Recently, when i was solving some ML problem, I came accross an attribute which has Ordinal Values . Eg: Student ID | Subjects ======================================== 1 | ['Math'] 2 | ['Math','Python'] 3 | ['C'] 4 | ['Python','Statistics'] ======================================== Here, attribute Subjects is a list which contains list of subjects the student is interested in. We have sklearn.preprocessing.OneHotEncoder which encodes a single Categorical variable by creating multiple columns. Similarily, I want to propose different encoder that encodes this type of list and creates new columns , one column for each subject. Allowed values are 1/0 which specifies whether student is interested in this subject or not. I'm new to Open Source contribution. Can someone tell me If there is an existing feature that handles this type of data or If I can start working on this feature. Any response would be appreciated. Thanks Prudvi RajKumar From prudhvirajnitjsr at gmail.com Mon May 13 14:58:34 2019 From: prudhvirajnitjsr at gmail.com (prudhviraj nitjsr) Date: Tue, 14 May 2019 00:28:34 +0530 Subject: [scikit-learn] Fwd: Proposing Encoder class to encode Ordinal attributes In-Reply-To: References: Message-ID: Hi, Can someone please respond. Any response would be appreciated Thanks ---------- Forwarded message --------- From: prudhviraj nitjsr Date: Sun, May 12, 2019 at 1:38 AM Subject: Proposing Encoder class to encode Ordinal attributes To: Hi All, Recently, when i was solving some ML problem, I came accross an attribute which has Ordinal Values . Eg: Student ID | Subjects ======================================== 1 | ['Math'] 2 | ['Math','Python'] 3 | ['C'] 4 | ['Python','Statistics'] ======================================== Here, attribute Subjects is a list which contains list of subjects the student is interested in. We have sklearn.preprocessing.OneHotEncoder which encodes a single Categorical variable by creating multiple columns. Similarily, I want to propose different encoder that encodes this type of list and creates new columns , one column for each subject. Allowed values are 1/0 which specifies whether student is interested in this subject or not. I'm new to Open Source contribution. Can someone tell me If there is an existing feature that handles this type of data or If I can start working on this feature. Any response would be appreciated. Thanks Prudvi RajKumar From joel.nothman at gmail.com Mon May 13 16:30:28 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Tue, 14 May 2019 06:30:28 +1000 Subject: [scikit-learn] Fwd: Proposing Encoder class to encode Ordinal attributes In-Reply-To: References: Message-ID: There has been an issue and a pull request for something similar in DictVectorizer. https://github.com/scikit-learn/scikit-learn/pull/8750 got close to merging and I'm not really sure why it was closed rather than completed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicholdav at gmail.com Mon May 13 21:35:17 2019 From: nicholdav at gmail.com (David Nicholson) Date: Mon, 13 May 2019 21:35:17 -0400 Subject: [scikit-learn] Fwd: Proposing Encoder class to encode Ordinal attributes In-Reply-To: References: Message-ID: There is this in scikit-learn-contrib, Categorical Encodering: https://joss.theoj.org/papers/d57818316816a19a80112892c3d12ed7 https://github.com/scikit-learn-contrib/categorical-encoding David Nicholson, Ph.D. https://nicholdav.info/ https://github.com/NickleDave Prinz lab , Emory University, Atlanta, GA, USA On Mon, May 13, 2019 at 4:32 PM Joel Nothman wrote: > There has been an issue and a pull request for something similar in > DictVectorizer. https://github.com/scikit-learn/scikit-learn/pull/8750 > got close to merging and I'm not really sure why it was closed rather than > completed. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pahome.chen at mirlab.org Mon May 13 22:10:22 2019 From: pahome.chen at mirlab.org (lampahome) Date: Tue, 14 May 2019 10:10:22 +0800 Subject: [scikit-learn] Can I evaluate clustering efficiency incrementally? In-Reply-To: References: Message-ID: Uri Goren ? 2019?5?3? ?? ??7:29??? > I usually use clustering to save costs on labelling. > I like to apply hierarchical clustering, and then label a small sample and > fine-tune the clustering algorithm. > > That way, you can evaluate the effectiveness in terms of cluster purity > (how many clusters contain mixed labels) > > See example with sklearn here : > https://youtu.be/GM8L324MuHc?list=PLqkckaeDLF4IDdKltyBwx8jLaz5nwDPQU > > > But if my dataset is too large to load into memory, will it work? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ugoren at gmail.com Tue May 14 03:06:33 2019 From: ugoren at gmail.com (Uri Goren) Date: Tue, 14 May 2019 10:06:33 +0300 Subject: [scikit-learn] Can I evaluate clustering efficiency incrementally? In-Reply-To: References: Message-ID: Sounds like you need to use spark, this project looks promising: https://github.com/xiaocai00/SparkPinkMST On Tue, May 14, 2019 at 5:12 AM lampahome wrote: > > Uri Goren ? 2019?5?3? ?? ??7:29??? > >> I usually use clustering to save costs on labelling. >> I like to apply hierarchical clustering, and then label a small sample >> and fine-tune the clustering algorithm. >> >> That way, you can evaluate the effectiveness in terms of cluster purity >> (how many clusters contain mixed labels) >> >> See example with sklearn here : >> https://youtu.be/GM8L324MuHc?list=PLqkckaeDLF4IDdKltyBwx8jLaz5nwDPQU >> >> >> But if my dataset is too large to load into memory, will it work? > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.augspurger88 at gmail.com Tue May 14 09:18:24 2019 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Tue, 14 May 2019 08:18:24 -0500 Subject: [scikit-learn] Can I evaluate clustering efficiency incrementally? In-Reply-To: References: Message-ID: If anyone is interested in implementing these, dask-ml would welcome additional metrics that work well with Dask arrays: https://github.com/dask/dask-ml/issues/213. On Tue, May 14, 2019 at 2:09 AM Uri Goren wrote: > Sounds like you need to use spark, > this project looks promising: > https://github.com/xiaocai00/SparkPinkMST > > On Tue, May 14, 2019 at 5:12 AM lampahome wrote: > >> >> Uri Goren ? 2019?5?3? ?? ??7:29??? >> >>> I usually use clustering to save costs on labelling. >>> I like to apply hierarchical clustering, and then label a small sample >>> and fine-tune the clustering algorithm. >>> >>> That way, you can evaluate the effectiveness in terms of cluster purity >>> (how many clusters contain mixed labels) >>> >>> See example with sklearn here : >>> https://youtu.be/GM8L324MuHc?list=PLqkckaeDLF4IDdKltyBwx8jLaz5nwDPQU >>> >>> >>> But if my dataset is too large to load into memory, will it work? >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Wed May 15 00:14:17 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Wed, 15 May 2019 14:14:17 +1000 Subject: [scikit-learn] Can I evaluate clustering efficiency incrementally? In-Reply-To: References: Message-ID: Evaluating on large datasets is easy if the sufficient statistics are just the contingency matrix. On Tue., 14 May 2019, 11:19 pm Tom Augspurger, wrote: > If anyone is interested in implementing these, dask-ml would welcome > additional > metrics that work well with Dask arrays: > https://github.com/dask/dask-ml/issues/213. > > On Tue, May 14, 2019 at 2:09 AM Uri Goren wrote: > >> Sounds like you need to use spark, >> this project looks promising: >> https://github.com/xiaocai00/SparkPinkMST >> >> On Tue, May 14, 2019 at 5:12 AM lampahome wrote: >> >>> >>> Uri Goren ? 2019?5?3? ?? ??7:29??? >>> >>>> I usually use clustering to save costs on labelling. >>>> I like to apply hierarchical clustering, and then label a small sample >>>> and fine-tune the clustering algorithm. >>>> >>>> That way, you can evaluate the effectiveness in terms of cluster purity >>>> (how many clusters contain mixed labels) >>>> >>>> See example with sklearn here : >>>> https://youtu.be/GM8L324MuHc?list=PLqkckaeDLF4IDdKltyBwx8jLaz5nwDPQU >>>> >>>> >>>> But if my dataset is too large to load into memory, will it work? >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From drh at aiwerkstatt.com Wed May 15 15:18:13 2019 From: drh at aiwerkstatt.com (drh at aiwerkstatt.com) Date: Wed, 15 May 2019 13:18:13 -0600 Subject: [scikit-learn] Example of a scikit-learn compatible classifier with C++ implementation of the algorithms Message-ID: <20190515131813.Horde.Li6N562F-XfoEURE43WiHLh@just35.justhost.com> I use a PYTHON BASED ECOSYSTEM (SCIKIT-LEARN, ? ) FOR PROTOTYPING and I have a C++ BASED PRODUCTION SYSTEM. A scikit-learn compatible interface allows me to take advantage of scikit-learn?s ecosystem. Implementing the algorithm in C++ allows me to develop and test my algorithms already during prototyping. I started with scikit-learn?s project template to roll my own decision tree and forest classifier and implemented the algorithms in a C++ library, using Cython to create the Python bindings. Starting out with a Python implementation, I experimented a little bit with implementing the algorithms in Cython. But I found that if you are proficient in Python and C++ coding, that implementing the algorithm directly in C++ was much faster than writing it in Cython. I made this project available to everybody, because I think it could serve as an example or template for anybody who would like to roll their own scikit-learn compatible classifier with a C++ based implementation of the algorithms to be re-used in a production system. At least version 1.0.0 should be useful, after that it might become too complex to be used as an example. Check it out: READTHEDOCs: https://koho.readthedocs.io GITHUB: https://github.com/AIWerkstatt/koho I tried to be consistent with scikit-learn?s decision tree and ensemble modules, and the basic concepts, including stack, samples LUT with in-place partitioning, incremental histogram updates, for the implementation of the classifiers are based on: G. Louppe, Understanding Random Forests, PhD Thesis, 2014. Thanks a lot Gilles for that comprehensive work on random forests! -------------- next part -------------- An HTML attachment was scrubbed... URL: From pahome.chen at mirlab.org Wed May 15 21:45:43 2019 From: pahome.chen at mirlab.org (lampahome) Date: Thu, 16 May 2019 09:45:43 +0800 Subject: [scikit-learn] Can I evaluate clustering efficiency incrementally? In-Reply-To: References: Message-ID: Joel Nothman ? 2019?5?15? ?? ??12:16??? > Evaluating on large datasets is easy if the sufficient statistics are just > the contingency matrix. > > Sorry, I don't understand it. Can you explain detailly? You mean we could take subset of samples to evaluating if subset is contingency(normal distribution) matrix? -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Thu May 16 03:06:37 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Thu, 16 May 2019 17:06:37 +1000 Subject: [scikit-learn] Can I evaluate clustering efficiency incrementally? In-Reply-To: References: Message-ID: The contingency matrix ( https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cluster.contingency_matrix.html) counts how many times each pair of (true cluster, predicted cluster) occurs. It is sufficient statistics for every "supervised" (i.e. ground truth-based) clustering evaluation metric in Scikit-learn. In an incremental setting, you can simply add to the contingency matrix with each new predicted batch. In https://github.com/scikit-learn/scikit-learn/issues/8103 I proposed that we provide an API for calculating clustering metrics from the sufficient statistics alone, but it's not come to fruition. On Thu, 16 May 2019 at 11:47, lampahome wrote: > Joel Nothman ? 2019?5?15? ?? ??12:16??? > >> Evaluating on large datasets is easy if the sufficient statistics are >> just the contingency matrix. >> >> > Sorry, I don't understand it. Can you explain detailly? > You mean we could take subset of samples to evaluating if subset is > contingency(normal distribution) matrix? > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Thu May 16 04:03:23 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Thu, 16 May 2019 18:03:23 +1000 Subject: [scikit-learn] ANN: scikit-learn 0.21 released Message-ID: Thanks to the work of many, many contributors, we have released Scikit-learn 0.21. It is available from GitHub, PyPI and Conda-forge, but is not yet available on the Anaconda defaults channel. * Documentation at https://scikit-learn.org/0.21 * Release Notes at https://scikit-learn.org/0.21/whats_new * Download source or wheels at https://pypi.org/project/scikit-learn/0.21rc2/ * Install from conda-forge with `conda install -c conda-forge scikit-learn` Highlights include: * neighbors.NeighborhoodComponentsAnalysis for supervised metric learning, which learns a weighted euclidean distance for k-nearest neighbors. https://scikit-learn.org/0.21/modules/neighbors.html#nca * ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor: experimental implementations of efficient binned gradient boosting machines. https://scikit-learn.org/0.21 /modules/ensemble.html#gradient-tree-boosting * impute.IterativeImputer: an experimental API for a non-trivial approach to missing value imputation. https://scikit-learn.org/0.21 /modules/impute.html#multivariate-feature-imputation * cluster.OPTICS: a new density-based clustering algorithm. https://scikit-learn.org/0.21/modules/clustering.html#optics * better printing of estimators as strings, with an option to hide default parameters for compactness: https://scikit-learn.org/0.21 /auto_examples/plot_changed_only_pprint_parameter.html * for estimator and library developers: a way to tag your estimator so that it can be treated appropriately with check_estimator. https://scikit-learn.org/0.21/developers/contributing.html#estimator-tags There are many other enhancements and fixes listed in the release notes ( https://scikit-learn.org/0.21/whats_new). Please note that Scikit-learn has new dependencies. It requires: * joblib >= 0.11, which used to be vendored within Scikit-learn * OpenMP, unless the environment variable SKLEARN_NO_OPENMP=1 when the code is compiled (and cythonized) * Python >= 3.5. Installing Scikit-learn from Python 2 will continue to provide version 0.20. Thanks again to everyone who contributed and to our sponsors, who helped us to develop such a great set of features and fixes since version 0.20 in under 8 months. Happy Learning! >From the Scikit-learn ]team. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bertrand.thirion at inria.fr Thu May 16 04:21:09 2019 From: bertrand.thirion at inria.fr (bertrand.thirion) Date: Thu, 16 May 2019 10:21:09 +0200 Subject: [scikit-learn] ANN: scikit-learn 0.21 released In-Reply-To: Message-ID: <9cac0f$bdkt92@mail2-relais-roc.national.inria.fr> Congratulations !Bertrand?Envoy? depuis mon smartphone Samsung Galaxy. -------- Message d'origine --------De : Joel Nothman Date : 16/05/2019 10:03 (GMT+01:00) ? : Scikit-learn user and developer mailing list Objet : [scikit-learn] ANN: scikit-learn 0.21 released Thanks to the work of many, many contributors, we have released Scikit-learn 0.21. It is available from GitHub, PyPI and Conda-forge, but is not yet available on the Anaconda defaults channel.* Documentation at?https://scikit-learn.org/0.21*?Release?Notes at?https://scikit-learn.org/0.21/whats_new* Download source or wheels at?https://pypi.org/project/scikit-learn/0.21rc2/* Install from conda-forge with `conda install -c conda-forge scikit-learn`Highlights include:* neighbors.NeighborhoodComponentsAnalysis for supervised metric learning, which learns a weighted euclidean distance for k-nearest neighbors.?https://scikit-learn.org/0.21/modules/neighbors.html#nca*?ensemble.HistGradientBoostingClassifier and?ensemble.HistGradientBoostingRegressor: experimental implementations of efficient binned gradient boosting machines.?https://scikit-learn.org/0.21/modules/ensemble.html#gradient-tree-boosting* impute.IterativeImputer: an experimental API for a non-trivial approach to missing value imputation.?https://scikit-learn.org/0.21/modules/impute.html#multivariate-feature-imputation* cluster.OPTICS: a new density-based clustering algorithm.?https://scikit-learn.org/0.21/modules/clustering.html#optics* better printing of estimators as strings, with an option to hide default parameters for compactness:?https://scikit-learn.org/0.21/auto_examples/plot_changed_only_pprint_parameter.html* for estimator and library developers: a way to tag your estimator so that it can be treated appropriately with check_estimator.?https://scikit-learn.org/0.21/developers/contributing.html#estimator-tagsThere are many other enhancements and fixes listed in the?release?notes (https://scikit-learn.org/0.21/whats_new).Please note that Scikit-learn has new dependencies. It requires:* joblib >= 0.11, which used to be vendored within Scikit-learn* OpenMP, unless the environment variable?SKLEARN_NO_OPENMP=1 when the code is compiled (and cythonized)* Python >= 3.5. Installing Scikit-learn from Python 2 will continue to provide version 0.20.Thanks again to everyone who contributed and to our sponsors, who helped us to develop such a great set of features and fixes?since version 0.20 in under 8 months.Happy Learning!From the Scikit-learn ]team. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Thu May 16 04:35:00 2019 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 16 May 2019 10:35:00 +0200 Subject: [scikit-learn] ANN: scikit-learn 0.21 released In-Reply-To: <9cac0f$bdkt92@mail2-relais-roc.national.inria.fr> References: <9cac0f$bdkt92@mail2-relais-roc.national.inria.fr> Message-ID: <20190516083500.t373fnb2vijtgwe2@phare.normalesup.org> Indeed! Great improvements. And it's a pleasure to see that the releases are more frequent: a huge value to the community. Ga?l On Thu, May 16, 2019 at 10:21:09AM +0200, bertrand.thirion wrote: > Congratulations ! > Bertrand > Envoy? depuis mon smartphone Samsung Galaxy. > -------- Message d'origine -------- > De : Joel Nothman > Date : 16/05/2019 10:03 (GMT+01:00) > ? : Scikit-learn user and developer mailing list > Objet : [scikit-learn] ANN: scikit-learn 0.21 released > Thanks to the work of many, many contributors, we have released Scikit-learn > 0.21. It is available from GitHub, PyPI and Conda-forge, but is not yet > available on the Anaconda defaults channel. > * Documentation at https://scikit-learn.org/0.21 > * Release Notes at https://scikit-learn.org/0.21/whats_new > * Download source or wheels at https://pypi.org/project/scikit-learn/0.21rc2/ > * Install from conda-forge with `conda install -c conda-forge scikit-learn` > Highlights include: > * neighbors.NeighborhoodComponentsAnalysis for supervised metric learning, > which learns a weighted euclidean distance for k-nearest neighbors. https:// > scikit-learn.org/0.21/modules/neighbors.html#nca > * ensemble.HistGradientBoostingClassifier > and ensemble.HistGradientBoostingRegressor: experimental implementations of > efficient binned gradient boosting machines. https://scikit-learn.org/0.21/ > modules/ensemble.html#gradient-tree-boosting > * impute.IterativeImputer: an experimental API for a non-trivial approach to > missing value imputation. https://scikit-learn.org/0.21/modules/impute.html# > multivariate-feature-imputation > * cluster.OPTICS: a new density-based clustering algorithm. https:// > scikit-learn.org/0.21/modules/clustering.html#optics > * better printing of estimators as strings, with an option to hide default > parameters for compactness: https://scikit-learn.org/0.21/auto_examples/ > plot_changed_only_pprint_parameter.html > * for estimator and library developers: a way to tag your estimator so that it > can be treated appropriately with check_estimator. https://scikit-learn.org/ > 0.21/developers/contributing.html#estimator-tags > There are many other enhancements and fixes listed in the release notes (https: > //scikit-learn.org/0.21/whats_new). > Please note that Scikit-learn has new dependencies. It requires: > * joblib >= 0.11, which used to be vendored within Scikit-learn > * OpenMP, unless the environment variable SKLEARN_NO_OPENMP=1 when the code is > compiled (and cythonized) > * Python >= 3.5. Installing Scikit-learn from Python 2 will continue to provide > version 0.20. > Thanks again to everyone who contributed and to our sponsors, who helped us to > develop such a great set of features and fixes since version 0.20 in under 8 > months. > Happy Learning! > From the Scikit-learn ]team. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Senior Researcher, INRIA http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From maxhalford25 at gmail.com Thu May 16 12:22:35 2019 From: maxhalford25 at gmail.com (Max Halford) Date: Thu, 16 May 2019 18:22:35 +0200 Subject: [scikit-learn] Introducing creme for online learning Message-ID: Hello everyone, I sometimes see emails where people are asking about training models incrementally. Me and some friends have started a Python library for doing so-called online learning named creme: https://github.com/creme-ml/creme. The code is idiomatic and the API resembles that of sklearn. Online learning is treated as a first class citizen, which makes more practical and efficient than sklearn if online learning is your goal. Each estimator has a fit_one(x, y) method which allows it to train with one observation at a time. I just presented it at PyData Amsterdam where people seemed enthusiastic about it. The video is not out yet but here are the slides: https://maxhalford.github.io/slides/creme-pydata. Best regards. And congrats on version 0.21! -- Max Halford +336 28 25 13 38 -------------- next part -------------- An HTML attachment was scrubbed... URL: From prudhvirajnitjsr at gmail.com Wed May 22 09:56:28 2019 From: prudhvirajnitjsr at gmail.com (prudhviraj nitjsr) Date: Wed, 22 May 2019 19:26:28 +0530 Subject: [scikit-learn] Regularization in Tree Models Message-ID: Hi All, I've noticed that there is no regularization term for Decision Tree estimators in scikit learn. Are there any plans to introduce that? Thanks Prudvi RajKumar M From t3kcit at gmail.com Wed May 22 11:02:31 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 22 May 2019 11:02:31 -0400 Subject: [scikit-learn] Regularization in Tree Models In-Reply-To: References: Message-ID: <41317a97-e36e-9158-68a1-db19b0dc5747@gmail.com> Hi Prudvi. What exactly do you mean by that? There is regularization in the new HistGradientBoosting, and we're working on post-pruning for decision trees. I'm not sure what l2 regularization for decision tree classifiers or for decision tree regressors would mean. Do you have a reference? Best, Andy On 5/22/19 9:56 AM, prudhviraj nitjsr wrote: > Hi All, > > I've noticed that there is no regularization term for Decision Tree > estimators in scikit learn. Are there any plans to introduce that? > > Thanks > Prudvi RajKumar M > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From ahowe42 at gmail.com Thu May 23 10:39:06 2019 From: ahowe42 at gmail.com (Andrew Howe) Date: Thu, 23 May 2019 15:39:06 +0100 Subject: [scikit-learn] Version 0.21! and plot_tree! Message-ID: I want to say thank you to all the sklearn developers. The breadth and quality of this software is truly breathtaking. Specifically, I want to say thank you very very much for the plot_tree function! I have wasted a lot of effort in the past, on multiple OSes, getting everything to work so I could view the tree.export_graphviz results. Having this new function to plot the trees natively in matplotlib is extremely useful. Thanks again! Andrew <~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD LinkedIn Profile ResearchGate Profile Open Researcher and Contributor ID (ORCID) Github Profile Personal Website I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~> -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Thu May 23 11:22:21 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 23 May 2019 11:22:21 -0400 Subject: [scikit-learn] Version 0.21! and plot_tree! In-Reply-To: References: Message-ID: <428bcef1-fb27-fd98-7d7a-13d33e7fb8c6@gmail.com> Hey Andrew. Thanks for saying thanks! I share your frustration with export_graphviz, in particular for teaching. I feel like plot_tree is not ideal yet, though. In particular the layout is not as compact as the graphviz one. If you have any feedback or suggestions, I'd be very happy to hear them! Cheers, Andy On 5/23/19 10:39 AM, Andrew Howe wrote: > I want to say thank you to all the sklearn developers. The breadth and > quality of this software is truly breathtaking. > > Specifically, I want to say thank you very very much for the plot_tree > function! I have wasted a lot of effort in the past, on multiple OSes, > getting everything to work so I could view the tree.export_graphviz > results. Having this new function to plot the trees natively in > matplotlib is extremely useful. > > Thanks again! > Andrew > > <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > J. Andrew Howe, PhD > LinkedIn Profile > ResearchGate Profile > Open Researcher and Contributor ID (ORCID) > > Github Profile > Personal Website > I live to learn, so I can learn to live. - me > <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From anael.beaugnon at ssi.gouv.fr Thu May 23 11:49:34 2019 From: anael.beaugnon at ssi.gouv.fr (Beaugnon Anael) Date: Thu, 23 May 2019 17:49:34 +0200 Subject: [scikit-learn] decision_path method for tree-based models Message-ID: <9e074d7f-3ead-2df8-8b8c-3f2554d95d4c@ssi.gouv.fr> Hi everyone, The decision_path method is currently available only for DecisionTreeClassifier, DecisionTreeRegressor, and RandomForest, but not for IsolationForest and GradientBoostingClassifier. In these cases, the implementation is quite easy, it is exactly the same as for RandomForest, but it I think it would be very handy to have a public method. What do you think of this proposal ? If you are ok with it, I would be happy to propose a pull request. Thanks, -- Ana?l Beaugnon ANSSI - Intrusion Detection Research Laboratory Les donn?es ? caract?re personnel recueillies et trait?es dans le cadre de cet ?change, le sont ? seule fin d?ex?cution d?une relation professionnelle et s?op?rent dans cette seule finalit? et pour la dur?e n?cessaire ? cette relation. Si vous souhaitez faire usage de vos droits de consultation, de rectification et de suppression de vos donn?es, veuillez contacter contact.rgpd at sgdsn.gouv.fr. Si vous avez re?u ce message par erreur, nous vous remercions d?en informer l?exp?diteur et de d?truire le message. The personal data collected and processed during this exchange aims solely at completing a business relationship and is limited to the necessary duration of that relationship. If you wish to use your rights of consultation, rectification and deletion of your data, please contact: contact.rgpd at sgdsn.gouv.fr. If you have received this message in error, we thank you for informing the sender and destroying the message. From niourf at gmail.com Thu May 23 12:17:58 2019 From: niourf at gmail.com (Nicolas Hug) Date: Thu, 23 May 2019 12:17:58 -0400 Subject: [scikit-learn] decision_path method for tree-based models In-Reply-To: <9e074d7f-3ead-2df8-8b8c-3f2554d95d4c@ssi.gouv.fr> References: <9e074d7f-3ead-2df8-8b8c-3f2554d95d4c@ssi.gouv.fr> Message-ID: <1a2dceee-e9c7-81bb-d1a8-4f1a18d759ac@gmail.com> Hi Ana?l, yes feel free to submit a PR On 5/23/19 11:49 AM, Beaugnon Anael wrote: > Hi everyone, > > The decision_path method is currently available only for > DecisionTreeClassifier, DecisionTreeRegressor, and RandomForest, but not > for IsolationForest and GradientBoostingClassifier. In these cases, the > implementation is quite easy, it is exactly the same as for > RandomForest, but it I think it would be very handy to have a public method. > > What do you think of this proposal ? If you are ok with it, I would be > happy to propose a pull request. > > Thanks, > > -- > Ana?l Beaugnon > ANSSI - Intrusion Detection Research Laboratory > > Les donn?es ? caract?re personnel recueillies et trait?es dans le cadre de cet ?change, le sont ? seule fin d?ex?cution d?une relation professionnelle et s?op?rent dans cette seule finalit? et pour la dur?e n?cessaire ? cette relation. Si vous souhaitez faire usage de vos droits de consultation, de rectification et de suppression de vos donn?es, veuillez contacter contact.rgpd at sgdsn.gouv.fr. Si vous avez re?u ce message par erreur, nous vous remercions d?en informer l?exp?diteur et de d?truire le message. The personal data collected and processed during this exchange aims solely at completing a business relationship and is limited to the necessary duration of that relationship. If you wish to use your rights of consultation, rectification and deletion of your data, please contact: contact.rgpd at sgdsn.gouv.fr. If you have received this message in error, we thank you for informing the sender and destroying the message. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From anael.beaugnon at gmail.com Thu May 23 16:53:16 2019 From: anael.beaugnon at gmail.com (=?UTF-8?Q?Ana=C3=ABl_Beaugnon?=) Date: Thu, 23 May 2019 22:53:16 +0200 Subject: [scikit-learn] decision_path method for tree-based models In-Reply-To: <1a2dceee-e9c7-81bb-d1a8-4f1a18d759ac@gmail.com> References: <9e074d7f-3ead-2df8-8b8c-3f2554d95d4c@ssi.gouv.fr> <1a2dceee-e9c7-81bb-d1a8-4f1a18d759ac@gmail.com> Message-ID: Hi Nicolas, Thanks for your quick answer. I have just submitted a PR ( https://github.com/scikit-learn/scikit-learn/pull/13935). Le jeu. 23 mai 2019 ? 18:21, Nicolas Hug a ?crit : > Hi Ana?l, yes feel free to submit a PR > > On 5/23/19 11:49 AM, Beaugnon Anael wrote: > > Hi everyone, > > > > The decision_path method is currently available only for > > DecisionTreeClassifier, DecisionTreeRegressor, and RandomForest, but not > > for IsolationForest and GradientBoostingClassifier. In these cases, the > > implementation is quite easy, it is exactly the same as for > > RandomForest, but it I think it would be very handy to have a public > method. > > > > What do you think of this proposal ? If you are ok with it, I would be > > happy to propose a pull request. > > > > Thanks, > > > > -- > > Ana?l Beaugnon > > ANSSI - Intrusion Detection Research Laboratory > > > > Les donn?es ? caract?re personnel recueillies et trait?es dans le cadre > de cet ?change, le sont ? seule fin d?ex?cution d?une relation > professionnelle et s?op?rent dans cette seule finalit? et pour la dur?e > n?cessaire ? cette relation. Si vous souhaitez faire usage de vos droits de > consultation, de rectification et de suppression de vos donn?es, veuillez > contacter contact.rgpd at sgdsn.gouv.fr. Si vous avez re?u ce message par > erreur, nous vous remercions d?en informer l?exp?diteur et de d?truire le > message. The personal data collected and processed during this exchange > aims solely at completing a business relationship and is limited to the > necessary duration of that relationship. If you wish to use your rights of > consultation, rectification and deletion of your data, please contact: > contact.rgpd at sgdsn.gouv.fr. If you have received this message in error, > we thank you for informing the sender and destroying the message. > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Fri May 24 03:38:16 2019 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Fri, 24 May 2019 09:38:16 +0200 Subject: [scikit-learn] ANN: scikit-learn 0.21.2 released Message-ID: A quick bugfix release to fix a critical regression in the computation of the euclidean distances returning incorrect values silently. This release also includes other bugfixes listed in the changelog: https://scikit-learn.org/0.21/whats_new.html#version-0-21-2 The PyPI.org wheels and conda-forge packages are online. The packages for the default Anaconda channel should follow soon. Thanks to all the contributors! -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel From mraetz at eonerc.rwth-aachen.de Fri May 24 03:57:27 2019 From: mraetz at eonerc.rwth-aachen.de (=?iso-8859-1?Q?R=E4tz=2C_Martin?=) Date: Fri, 24 May 2019 07:57:27 +0000 Subject: [scikit-learn] [Copyright] Skicit-learn graphic Message-ID: Dear scikit-learn employees, on the scikit-learn webpage (link to the graphic) you will find a graphic, which I would like to use in a publication in an international journal. I slightly modified the graphic as you can see in the appendix. Of course, I refer to Scikit-learn in the caption. The naming in the bibliography is as follows: F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and ?. Duchesnay, "Scikit-learn: Machine learning in python," Journal of Machine Learning Research, vol. 12, no. Oct, pp. 2825- 2830, 2011. I would like to kindly ask for permission to publish the graphic. Yours sincerely Martin R?tz _______________________________________ Martin R?tz, M.Sc. Research Associate T +49 241 80-49794 F +49 241 80-49769 mraetz at eonerc.rwth-aachen.de RWTH Aachen University E.ON Energy Research Center * Institute for Energy Efficient Buildings and Indoor Climate E.ON Energieforschungszentrum * Lehrstuhl f?r Geb?ude- und Raumklimatechnik Mathieustra?e 10 52074 Aachen * Germany www.eonerc.rwth-aachen.de/ebc -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Fri May 24 06:11:21 2019 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Fri, 24 May 2019 12:11:21 +0200 Subject: [scikit-learn] [Copyright] Skicit-learn graphic In-Reply-To: References: Message-ID: I think it's ok to do as you said. -- Olivier From t3kcit at gmail.com Fri May 24 14:16:41 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Fri, 24 May 2019 14:16:41 -0400 Subject: [scikit-learn] Google code reviews Message-ID: Hi All. What do you think of https://www.pullrequest.com/googleserve/? It's sponsored code reviews. Could be interesting, right? Best, Andy From randalljellis at gmail.com Fri May 24 17:21:50 2019 From: randalljellis at gmail.com (Randy Ellis) Date: Fri, 24 May 2019 17:21:50 -0400 Subject: [scikit-learn] Highly cited paper - causal random forests Message-ID: Would this be difficult for a moderate user to implement in sklearn by modifying the existing code base? Estimation and Inference of Heterogeneous Treatment Effects using Random Forests 342 citations in less than a year (Google Scholar): https://amstat.tandfonline.com/doi/full/10.1080/01621459.2017.1319839 "In this article, we develop a nonparametric *causal forest* for estimating heterogeneous treatment effects that extends Breiman?s widely used random forest algorithm. In the potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference. In experiments, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially in the presence of irrelevant covariates." -- *Randall J. Ellis* PhD Student, Hurd lab , Mount Sinai School of Medicine Special Volunteer, Michaelides lab , NIDA IRP Phone: +1-954-260-9891 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Sat May 25 06:21:01 2019 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sat, 25 May 2019 12:21:01 +0200 Subject: [scikit-learn] Highly cited paper - causal random forests In-Reply-To: References: Message-ID: <20190525102101.ttssoo2vt5vr3uxo@phare.normalesup.org> Causal forest are a very nice work. However, they deal with causal inference, rather than prediction. Hence, I am not really sure how we could implement them in the API of scikit-learn. Do you have a suggestion? Cheers, Ga?l On Fri, May 24, 2019 at 05:21:50PM -0400, Randy Ellis wrote: > Would this be difficult for a moderate user to implement in sklearn by > modifying the existing code base? > Estimation and Inference of Heterogeneous Treatment Effects using Random > Forests > 342 citations in less than a year (Google Scholar):?https:// > amstat.tandfonline.com/doi/full/10.1080/01621459.2017.1319839 > "In this article, we develop a nonparametric causal forest for estimating > heterogeneous treatment effects that extends Breiman?s widely used random > forest algorithm. In the potential outcomes framework with unconfoundedness, we > show that causal forests are pointwise consistent for the true treatment effect > and have an asymptotically Gaussian and centered sampling distribution. We also > discuss a practical method for constructing asymptotic confidence intervals for > the true treatment effect that are centered at the causal forest estimates. Our > theoretical results rely on a generic Gaussian theory for a large family of > random forest algorithms. To our knowledge, this is the first set of results > that allows any type of random forest, including classification and regression > forests, to be used for provably valid statistical inference. In experiments, > we find causal forests to be substantially more powerful than classical methods > based on nearest-neighbor matching, especially in the presence of irrelevant > covariates." -- Gael Varoquaux Senior Researcher, INRIA http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From joel.nothman at gmail.com Sat May 25 08:06:50 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Sat, 25 May 2019 22:06:50 +1000 Subject: [scikit-learn] ANN: Scikit-learn 0.21.2 released Message-ID: We've released 0.21.2 primarily to fix an issue with euclidean_distances (and pairwise_distances). It should be available on PyPI and Conda-Forge. Full list of changes at https://scikit-learn.org/0.21/whats_new/v0.21.html Thanks to all who helped fix these issues so quickly after 0.21.1. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Sat May 25 08:07:16 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Sat, 25 May 2019 22:07:16 +1000 Subject: [scikit-learn] ANN: scikit-learn 0.21.2 released In-Reply-To: References: Message-ID: Sorry, didn't see this one already went through! Whoops. On Fri, 24 May 2019 at 17:41, Olivier Grisel wrote: > A quick bugfix release to fix a critical regression in the computation > of the euclidean distances returning incorrect values silently. > > This release also includes other bugfixes listed in the changelog: > > https://scikit-learn.org/0.21/whats_new.html#version-0-21-2 > > The PyPI.org wheels and conda-forge packages are online. The packages > for the default Anaconda channel should follow soon. > > Thanks to all the contributors! > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Sat May 25 08:08:37 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Sat, 25 May 2019 22:08:37 +1000 Subject: [scikit-learn] Google code reviews In-Reply-To: References: Message-ID: For some of the larger PRs, this might be helpful. Not going to help where the intricacies of Scikit-learn API come in play. On Sat, 25 May 2019 at 04:17, Andreas Mueller wrote: > Hi All. > What do you think of https://www.pullrequest.com/googleserve/? > It's sponsored code reviews. Could be interesting, right? > > Best, > Andy > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From regis_cardos at hotmail.com Wed May 29 08:16:09 2019 From: regis_cardos at hotmail.com (=?iso-8859-1?Q?R=E9gis_Cardoso?=) Date: Wed, 29 May 2019 12:16:09 +0000 Subject: [scikit-learn] Problems with installation Scikit Learn Message-ID: Dear, I subscribled on scikit-learn subscription results now. My name is R?gis i am estudying word2vec and artificial neural network using the scikit-learn and i am trying to install the scikit-learn in a Rasp Berry Pi 3, but i don't achieving. I am using the all comands below and i don't have sucess with the installation. 1? Try - pip install -U scikit-learn 2? Try - sudo install scikit-learn 3? Try - sudo apt-get install gfortran libopenblas-dev liblapack-dev sudo pip install scikit-learn 4? Try - sudo pip3 install scikit-learn 5? Try - pip install scikit-learn I would like to know if there is other form to install the scikit-learn in a Rasp Berry Pi 3? I have the Python 3.6 intalled in my board on this case and also there are installed the Numpy, Scipy and Joblib. Regards, Cardoso, Regis -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Wed May 29 08:58:49 2019 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Wed, 29 May 2019 14:58:49 +0200 Subject: [scikit-learn] Problems with installation Scikit Learn In-Reply-To: References: Message-ID: Could you install all package from the system? If you have a Debian distribution these packages should be available. Somehow, I would expect apt-get install python-sklearn to work (it should install the dependencies). On Wed, 29 May 2019 at 14:34, R?gis Cardoso wrote: > Dear, > > I subscribled on scikit-learn subscription results now. > > My name is R?gis i am estudying word2vec and artificial neural network > using the scikit-learn and i am trying to install the scikit-learn in a > Rasp Berry Pi 3, but i don't achieving. I am using the all comands below > and i don't have sucess with the installation. > > 1? Try - pip install -U scikit-learn > > 2? Try - sudo install scikit-learn > > 3? Try - sudo apt-get install gfortran libopenblas-dev liblapack-dev > sudo pip install scikit-learn > > 4? Try - sudo pip3 install scikit-learn > > 5? Try - pip install scikit-learn > > > I would like to know if there is other form to install the scikit-learn in > a Rasp Berry Pi 3? I have the Python 3.6 intalled in my board on this case > and also there are installed the Numpy, Scipy and Joblib. > > > > Regards, > > Cardoso, Regis > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From krallinger.martin at gmail.com Wed May 29 09:42:44 2019 From: krallinger.martin at gmail.com (Martin Krallinger) Date: Wed, 29 May 2019 15:42:44 +0200 Subject: [scikit-learn] scikit-learn for automatic text indexing/classification shared task : MESINESP /BioASQ In-Reply-To: References: Message-ID: *** Call for Participation Medical Semantic indexing in Spanish *** Medical Semantic indexing in Spanish BioASQ MESINESP Task http://temu.bsc.es/mesinesp/ Task description Scikit-learn has been successfully used for a variety of text classification tasks on documents in a range of different languages. As part of the BioASQ challenges on biomedical semantic indexing and question answering (http://bioasq.org/), we organize the first task on semantic indexing of Spanish medical texts. The task will address the automatic classification/indexing with structured medical vocabularies (DeCS terms) of abstracts from the IBECS and LILACS databases written in Spanish. The main aim is to promote the development of semantic indexing tools of practical relevance of non-English content, determining the current-state-of-the art, identifying challenges and comparing the strategies and results to those published for English data. In order to measure classification performance, an on-line evaluation system will be maintained. As the true annotations of the articles are not available beforehand, the evaluation procedure will run continuously by providing online results. The participating systems will be assessed for their performance based on two measures, one hierarchical and one flat: the Lowest Common Ancestor F-measure (LCA-F) and the label-based micro F-measure, respectively. Deadlines for submission: The task will run in Autumn 2019 (detailed schedule TBA). Participants, after downloading the released test sets, will have to submit results within a limited time window. The task will run for two consecutive periods (batches) of 5 weeks each. The first batch will start on October, 2019. For further details, please refer to: Additional information is available at http://temu.bsc.es/mesinesp/ and http://participants-area.bioasq.org/general_information/Taskaspanish/ Best regards, Martin Krallinger -------------- next part -------------- An HTML attachment was scrubbed... URL: From jesse.livezey at gmail.com Wed May 29 13:34:49 2019 From: jesse.livezey at gmail.com (Jesse Livezey) Date: Wed, 29 May 2019 10:34:49 -0700 Subject: [scikit-learn] Difference in normalization between Lasso and LogisticRegression + L1 Message-ID: Hi everyone, I noticed recently that in the Lasso implementation (and docs), the MSE term is normalized by the number of samples https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html but for LogisticRegression + L1, the logloss does not seem to be normalized by the number of samples. One consequence is that the strength of the regularization depends on the number of samples explicitly. For instance, in Lasso, if you tile a dataset N times, you will learn the same coef, but in LogisticRegression, you will learn a different coef. Is this the intended behavior of LogisticRegression? I was surprised by this. Either way, it would be helpful to document this more clearly in the Logistic Regression docs (I can make a PR.) https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html Jesse -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.eickenberg at gmail.com Wed May 29 13:42:04 2019 From: michael.eickenberg at gmail.com (Michael Eickenberg) Date: Wed, 29 May 2019 10:42:04 -0700 Subject: [scikit-learn] Difference in normalization between Lasso and LogisticRegression + L1 In-Reply-To: References: Message-ID: Hi Jesse, I think there was an effort to compare normalization methods on the data attachment term between Lasso and Ridge regression back in 2012/13, but this might have not been finished or extended to Logistic Regression. If it is not documented well, it could definitely benefit from a documentation update. As for changing it to a more consistent state, that would require adding a keyword argument pertaining to this functionality and, after discussion, possibly changing the default value after some deprecation cycles (though this seems like a dangerous one to change at all imho). Michael On Wed, May 29, 2019 at 10:38 AM Jesse Livezey wrote: > Hi everyone, > > I noticed recently that in the Lasso implementation (and docs), the MSE > term is normalized by the number of samples > > https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html > > but for LogisticRegression + L1, the logloss does not seem to be > normalized by the number of samples. One consequence is that the strength > of the regularization depends on the number of samples explicitly. For > instance, in Lasso, if you tile a dataset N times, you will learn the same > coef, but in LogisticRegression, you will learn a different coef. > > Is this the intended behavior of LogisticRegression? I was surprised by > this. Either way, it would be helpful to document this more clearly in the > Logistic Regression docs (I can make a PR.) > > https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html > > Jesse > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Wed May 29 13:48:42 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 29 May 2019 13:48:42 -0400 Subject: [scikit-learn] Difference in normalization between Lasso and LogisticRegression + L1 In-Reply-To: References: Message-ID: That is not very ideal indeed. I think we just went with what liblinear did, and when saga was introduced kept that behavior. It should probably be scaled as in Lasso, I would imagine? On 5/29/19 1:42 PM, Michael Eickenberg wrote: > Hi Jesse, > > I think there was an effort to compare normalization methods on the > data attachment term between Lasso and Ridge regression back in > 2012/13, but this might have not been finished or extended to Logistic > Regression. > > If it is not documented well, it could definitely benefit from a > documentation update. > > As for changing it to a more consistent state, that would require > adding a keyword argument pertaining to this functionality and, after > discussion, possibly changing the default value after some deprecation > cycles (though this seems like a dangerous one to change at all imho). > > Michael > > > On Wed, May 29, 2019 at 10:38 AM Jesse Livezey > > wrote: > > Hi everyone, > > I noticed recently that in the Lasso implementation (and docs), > the MSE term is normalized by the number of samples > https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html > > but for LogisticRegression + L1, the logloss does not seem to be > normalized by the number of samples. One consequence is that the > strength of the regularization depends on the number of samples > explicitly. For instance, in Lasso, if you tile a dataset N times, > you will learn the same coef, but in LogisticRegression, you will > learn a different coef. > > Is this the intended behavior of LogisticRegression? I was > surprised by this. Either way, it would be helpful to document > this more clearly in the Logistic Regression docs (I can make a PR.) > https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html > > Jesse > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuart at stuartreynolds.net Wed May 29 17:29:39 2019 From: stuart at stuartreynolds.net (Stuart Reynolds) Date: Wed, 29 May 2019 14:29:39 -0700 Subject: [scikit-learn] Difference in normalization between Lasso and LogisticRegression + L1 In-Reply-To: References: Message-ID: I looked into like a while ago. There were differences in which algorithms regularized the intercept, and which ones do not. (I believe liblinear does, lbgfs does not). All of the algorithms disagreed with logistic regression in scipy. - Stuart On Wed, May 29, 2019 at 10:50 AM Andreas Mueller wrote: > That is not very ideal indeed. > I think we just went with what liblinear did, and when saga was introduced > kept that behavior. > It should probably be scaled as in Lasso, I would imagine? > > > On 5/29/19 1:42 PM, Michael Eickenberg wrote: > > Hi Jesse, > > I think there was an effort to compare normalization methods on the data > attachment term between Lasso and Ridge regression back in 2012/13, but > this might have not been finished or extended to Logistic Regression. > > If it is not documented well, it could definitely benefit from a > documentation update. > > As for changing it to a more consistent state, that would require adding a > keyword argument pertaining to this functionality and, after discussion, > possibly changing the default value after some deprecation cycles (though > this seems like a dangerous one to change at all imho). > > Michael > > > On Wed, May 29, 2019 at 10:38 AM Jesse Livezey > wrote: > >> Hi everyone, >> >> I noticed recently that in the Lasso implementation (and docs), the MSE >> term is normalized by the number of samples >> >> https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html >> >> but for LogisticRegression + L1, the logloss does not seem to be >> normalized by the number of samples. One consequence is that the strength >> of the regularization depends on the number of samples explicitly. For >> instance, in Lasso, if you tile a dataset N times, you will learn the same >> coef, but in LogisticRegression, you will learn a different coef. >> >> Is this the intended behavior of LogisticRegression? I was surprised by >> this. Either way, it would be helpful to document this more clearly in the >> Logistic Regression docs (I can make a PR.) >> >> https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html >> >> Jesse >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pahome.chen at mirlab.org Thu May 30 04:42:20 2019 From: pahome.chen at mirlab.org (lampahome) Date: Thu, 30 May 2019 16:42:20 +0800 Subject: [scikit-learn] MemoryError when evaluate clustering with gridsearchcv Message-ID: I read a large data into memory and it cost about 2GB ram(I have 4GB ram) Size get from sys.getsizeof(train_X) *63963248* And I evalute clustering with gridsearchcv below: def grid_search_clu(X): def cv_scorer(estimator, X): estimator.fit(X) cluster_labels = estimator.labels_ if hasattr(estimator, 'labels_') else estimator.predict(X)#estimator.predict(X)#.labels_ num_labels = len(set(cluster_labels)) num_samples = len(X) if num_labels == 1 or num_labels == num_samples: return -1 else: return -metrics.davies_bouldin_score(X, cluster_labels) m = cluster.Birch(n_clusters=None, compute_labels=True) m_param = {'branching_factor' : range(10,60,10), 'threshold' : np.arange(0.1, 0.6, 0.1).round(decimals=3) } clf = GridSearchCV(m, m_param, cv=[(slice(None), slice(None))], scoring=cv_scorer, verbose=1, n_jobs=1, return_train_score=False).fit(X) And I got memoryerror, how should I do to solve this? Adjust the parameters' range? thx -------------- next part -------------- An HTML attachment was scrubbed... URL: From regis_cardos at hotmail.com Thu May 30 07:08:25 2019 From: regis_cardos at hotmail.com (=?Windows-1252?Q?R=E9gis_Cardoso?=) Date: Thu, 30 May 2019 11:08:25 +0000 Subject: [scikit-learn] scikit-learn Digest, Vol 38, Issue 18 In-Reply-To: References: Message-ID: Dear, What the dependencies are you talking? Because I have install the Numpy, Scipy and Joblib, this is the necessary programs, ok? I used also this paper below, it is a very nice paper about the programs to data science from Raspberry Pi, but isn't working, when I try $ pytest sklearn, the Raspberry don't working, because don't find the sklearn. I don't know that to do now, I need the sklearn to development my work and don't have more ideia to resolve this problem. https://geoffboeing.com/2016/03/scientific-python-raspberry-pi/ [https://geoffboeing.com/wp-content/uploads/2016/03/raspberry-pi-3-300x225.jpg] Scientific Python for Raspberry Pi - Geoff Boeing A guide to setting up the Python scientific stack, well-suited for geospatial analysis, on a Raspberry Pi 3. The whole process takes just a few minutes. The Raspberry Pi 3 was announced two weeks ago and presents a substantial step up in computational power over its predecessors. It can serve as a functional Wi-Fi connected ? Continue reading Scientific Python for Raspberry Pi geoffboeing.com Cardoso, Regis ________________________________ De: scikit-learn em nome de scikit-learn-request at python.org Enviado: quarta-feira, 29 de maio de 2019 09:59 Para: scikit-learn at python.org Assunto: scikit-learn Digest, Vol 38, Issue 18 Send scikit-learn mailing list submissions to scikit-learn at python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/scikit-learn or, via email, send a message with subject or body 'help' to scikit-learn-request at python.org You can reach the person managing the list at scikit-learn-owner at python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of scikit-learn digest..." Today's Topics: 1. Problems with installation Scikit Learn (R?gis Cardoso) 2. Re: Problems with installation Scikit Learn (Guillaume Lema?tre) ---------------------------------------------------------------------- Message: 1 Date: Wed, 29 May 2019 12:16:09 +0000 From: R?gis Cardoso To: "scikit-learn at python.org" Subject: [scikit-learn] Problems with installation Scikit Learn Message-ID: Content-Type: text/plain; charset="iso-8859-1" Dear, I subscribled on scikit-learn subscription results now. My name is R?gis i am estudying word2vec and artificial neural network using the scikit-learn and i am trying to install the scikit-learn in a Rasp Berry Pi 3, but i don't achieving. I am using the all comands below and i don't have sucess with the installation. 1? Try - pip install -U scikit-learn 2? Try - sudo install scikit-learn 3? Try - sudo apt-get install gfortran libopenblas-dev liblapack-dev sudo pip install scikit-learn 4? Try - sudo pip3 install scikit-learn 5? Try - pip install scikit-learn I would like to know if there is other form to install the scikit-learn in a Rasp Berry Pi 3? I have the Python 3.6 intalled in my board on this case and also there are installed the Numpy, Scipy and Joblib. Regards, Cardoso, Regis -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Wed, 29 May 2019 14:58:49 +0200 From: Guillaume Lema?tre To: Scikit-learn mailing list Subject: Re: [scikit-learn] Problems with installation Scikit Learn Message-ID: Content-Type: text/plain; charset="utf-8" Could you install all package from the system? If you have a Debian distribution these packages should be available. Somehow, I would expect apt-get install python-sklearn to work (it should install the dependencies). On Wed, 29 May 2019 at 14:34, R?gis Cardoso wrote: > Dear, > > I subscribled on scikit-learn subscription results now. > > My name is R?gis i am estudying word2vec and artificial neural network > using the scikit-learn and i am trying to install the scikit-learn in a > Rasp Berry Pi 3, but i don't achieving. I am using the all comands below > and i don't have sucess with the installation. > > 1? Try - pip install -U scikit-learn > > 2? Try - sudo install scikit-learn > > 3? Try - sudo apt-get install gfortran libopenblas-dev liblapack-dev > sudo pip install scikit-learn > > 4? Try - sudo pip3 install scikit-learn > > 5? Try - pip install scikit-learn > > > I would like to know if there is other form to install the scikit-learn in > a Rasp Berry Pi 3? I have the Python 3.6 intalled in my board on this case > and also there are installed the Numpy, Scipy and Joblib. > > > > Regards, > > Cardoso, Regis > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Subject: Digest Footer _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn ------------------------------ End of scikit-learn Digest, Vol 38, Issue 18 ******************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From sepand.haghighi at yahoo.com Thu May 30 11:15:57 2019 From: sepand.haghighi at yahoo.com (Sepand Haghighi) Date: Thu, 30 May 2019 15:15:57 +0000 (UTC) Subject: [scikit-learn] PyCM 2.2 released : A general benchmark based comparison of classification models References: <1134551450.7155028.1559229357976.ref@mail.yahoo.com> Message-ID: <1134551450.7155028.1559229357976@mail.yahoo.com> Hi folks Recently we have released new version of PyCM,?library for confusion matrix statistical analysis. I thought you might find it interesting. http://www.pycm.ir https://github.com/sepandhaghighi/pycm Changelog : - Negative likelihood ratio interpretation (NLRI) added - Cramer's benchmark (SOA5) added - Matthews correlation coefficient interpretation (MCCI) added?#204 - Matthews's benchmark (SOA6) added?#204 - F1 macro added - F1 micro added - Accuracy macro added?#205 - Compare class score calculation modified - Parameters recommendation for multi-class dataset modified - Parameters recommendation for imbalance dataset modified - README.md?modified - Document modified - Logo updated Best RegardsSepand Haghighi -------------- next part -------------- An HTML attachment was scrubbed... URL: From omkar.kumbhar at innoplexus.com Fri May 31 06:14:57 2019 From: omkar.kumbhar at innoplexus.com (Omkar Kumbhar) Date: Fri, 31 May 2019 15:44:57 +0530 Subject: [scikit-learn] Mahalanobis distance metric in OPTICS Message-ID: Hello, I was having issues while fitting OPTICS using Mahalanobis metric. I tried many things and had a hard time fitting it to my data distribution. I have replicated the issue in the ipython notebook below. You could also take a look at the html version of the notebook to look at the issues. Can you guide me on how to resolve this bug? PFA, ipython notebook to replicate the issue html of ipynb to look at stack traces. Thanks & Regards, Omkar Kumbhar Associate Data Scientist Innoplexus Consulting Services Pvt. Ltd. www.innoplexus.com Mob : +91- 9579464473 Landline: +91-20-66527300 The Intelligence MachineTM that is disrupting the traditional data and analytics services model ? 2011-19 Innoplexus Consulting Services Pvt. Ltd. Unless otherwise explicitly stated, all rights including those in copyright in the content of this e-mail are owned by Innoplexus Consulting Services Pvt Ltd. and all related legal entities. The contents of this e-mail shall not be copied, reproduced, or transmitted in any form without the written permission of Innoplexus Consulting Services Pvt Ltd or that of the copyright owner. The receipt of this mail is the acknowledgement of the receipt of contents; if the recipient is not the intended addressee then the recipient shall notify the sender immediately. The contents are provided for information only and no opinions expressed should be relied on without further consultation with Innoplexus Consulting Services Pvt Ltd. and all related legal entities. While all endeavors have been made to ensure accuracy, Innoplexus Consulting Services Pvt. Ltd. makes no warranty or representation to its accuracy, completeness or fairness and persons who rely on it do so entirely at their own risk. The information herein may be changed or withdrawn at any time without notice. Innoplexus Consulting Services Pvt. Ltd. will not be liable to any client or third party for the accuracy of the information supplied through this service. Innoplexus Consulting Services Pvt. Ltd. accepts no responsibility or liability for the contents of any other site, whether linked to this site or not, or any consequences from your acting upon the contents of another site. -- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 1790 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OPTICS Mahalanobis distance issue.ipynb Type: application/x-ipynb+json Size: 22219 bytes Desc: not available URL: From adrin.jalali at gmail.com Fri May 31 12:54:05 2019 From: adrin.jalali at gmail.com (Adrin) Date: Fri, 31 May 2019 18:54:05 +0200 Subject: [scikit-learn] Mahalanobis distance metric in OPTICS In-Reply-To: References: Message-ID: Mahalanobis is always tricky, the covariance is between the features, not the samples. This works: OPTICS(metric='mahalanobis',metric_params={'VI': np.linalg.inv(np.cov(test_array.T))}).fit(test_array) Not sure why it wouldn't work when you pass V, as it suggests as an alternative. On Fri, May 31, 2019 at 12:16 PM Omkar Kumbhar wrote: > Hello, > > I was having issues while fitting OPTICS using Mahalanobis metric. I tried > many things and had a hard time fitting it to my data distribution. > I have replicated the issue in the ipython notebook below. You could also > take a look at the html version of the notebook to look at the issues. Can > you guide me on how to resolve this bug? > > PFA, > ipython notebook to replicate the issue > html of ipynb to look at stack traces. > Thanks & Regards, > > Omkar Kumbhar > Associate Data Scientist > Innoplexus Consulting Services Pvt. Ltd. > www.innoplexus.com > Mob : +91- 9579464473 > Landline: +91-20-66527300 > > The Intelligence MachineTM that is disrupting the traditional data and > analytics services model > > ? 2011-19 Innoplexus Consulting Services Pvt. Ltd. > > Unless otherwise explicitly stated, all rights including those in > copyright in the content of this e-mail are owned by Innoplexus Consulting > Services Pvt Ltd. and all related legal entities. The contents of this > e-mail shall not be copied, reproduced, or transmitted in any form without > the written permission of Innoplexus Consulting Services Pvt Ltd or that of > the copyright owner. The receipt of this mail is the acknowledgement of the > receipt of contents; if the recipient is not the intended addressee then > the recipient shall notify the sender immediately. > > The contents are provided for information only and no opinions expressed > should be relied on without further consultation with Innoplexus Consulting > Services Pvt Ltd. and all related legal entities. While all endeavors have > been made to ensure accuracy, Innoplexus Consulting Services Pvt. Ltd. > makes no warranty or representation to its accuracy, completeness or > fairness and persons who rely on it do so entirely at their own risk. The > information herein may be changed or withdrawn at any time without notice. > Innoplexus Consulting Services Pvt. Ltd. will not be liable to any client > or third party for the accuracy of the information supplied through this > service. > > Innoplexus Consulting Services Pvt. Ltd. accepts no responsibility or > liability for the contents of any other site, whether linked to this site > or not, or any consequences from your acting upon the contents of another > site. > > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmrsg11 at gmail.com Fri May 31 20:54:32 2019 From: tmrsg11 at gmail.com (C W) Date: Fri, 31 May 2019 20:54:32 -0400 Subject: [scikit-learn] How is linear regression in scikit-learn done? Do you need train and test split? Message-ID: Hello everyone, I'm new to scikit learn. I see that many tutorial in scikit-learn follows the work-flow along the lines of 1) tranform the data 2) split the data: train, test 3) instantiate the sklearn object and fit 4) predict and tune parameter But, linear regression is done in least squares, so I don't think train test split is necessary. So, I guess I can just use the entire dataset? Thanks in advance! -------------- next part -------------- An HTML attachment was scrubbed... URL: