From hadrien.lorenzo at inria.fr  Mon Mar  1 08:19:28 2021
From: hadrien.lorenzo at inria.fr (Hadrien Lorenzo)
Date: Mon, 1 Mar 2021 14:19:28 +0100 (CET)
Subject: [scikit-learn] Sparse Partial Least Squares (PLS)
In-Reply-To: <766112199.7795329.1614601648384.JavaMail.zimbra@inria.fr>
References: <766112199.7795329.1614601648384.JavaMail.zimbra@inria.fr>
Message-ID: <658313534.7832967.1614604768708.JavaMail.zimbra@inria.fr>

Dear Maintainers, 

I work on sparse PLS from now many years (doc+postDoc in INRIA and INSERM, see [ https://hadrienlorenzo.netlify.app/ | https://hadrienlorenzo.netlify.app ] for light view) and published about applications. Main problems are about dealing with missing values in the multi-output and degenerate n<<p contexts for multi-block structures. 
I wrote packages 

    * in R : [ https://cran.r-project.org/web/packages/ddsPLS/index.html | https://cran.r-project.org/web/packages/ddsPLS/index.html ] , 
    * and in Python : [ https://pypi.org/project/py-ddspls/ | https://pypi.org/project/py-ddspls/ ] . 

In the both objectives of offering sparse PLS opportunities to the Python community and to improve my Python skills, I would like to propose you a python version of the algorithm for which specificities are the following. 

    * Modification of the PLS2 algorithm. 
    * Soft-thresholding of the empirical covariance matrices. 
    * Automatic-running of the number of component and the the sparsity parameter through bootstrap sampling. 
    * Sparsity both in X and in Y with a single parameter. 

This algorithm has been developed with J?r?me Sarraco (Pr INRIA) and Rodolphe Thi?baut (PUPH Inserm) and recently with Olivier Cloarec (Sartorius, Research Group for Chemometrics, Institute of Chemistry, Ume? University, Ume?, S-901 87, Su?de) 

Would you be interested in this sparse version of the PLS algorithm ? I am more than eager to discuss about this project with you, so do not hesitate to contact me. 

Bests, 

Hadrien Lorenzo 
+33 6 49 09 55 78 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210301/9993d255/attachment.html>

From j.tan at rug.nl  Mon Mar  1 12:12:31 2021
From: j.tan at rug.nl (Tan, J.)
Date: Mon, 1 Mar 2021 18:12:31 +0100
Subject: [scikit-learn] Invitation to participate in a survey about
 Scikit-Learn
Message-ID: <CAJvbMuN_Le9qZ=q3D3E2eOug=vB6gG-FAHo0DVDZc=ona=xUJw@mail.gmail.com>

Dear Scikit-Learn contributor,

We are doing research on understanding how developers manage a special kind
of Technical Debt in *Python.*

We kindly ask 15-20 minutes of your time to fill out our survey. To help
you decide whether to fill it in, we clarify two points.

?Why should I answer this survey??

Your participation is essential for us to correctly understand how
developers manage Technical Debt.

?What is in it for me??

Your valuable contributions to *Scikit-Learn* are part of the information
we analyzed for this study. Thus, if you help us further by answering
this survey, there are two immediate benefits:

   - you help to improve the efficiency of maintaining the quality of
   *Scikit-Learn*.
   - the results will be used to propose recommendations to manage
   technical debt and create tool support.

Here is the link to the survey
<https://docs.google.com/forms/d/e/1FAIpQLSc-L24rO0W2eicLw5xxpSyg2MqXuunhxM8e-S4nz1UCJG6mVA/viewform?usp=sf_link>
.

Thank you for your time and attention.

Kind regards,
Jie Tan, Daniel Feitosa and Paris Avgeriou

Software Engineering and Architecture group <http://www.cs.rug.nl/search>
Faculty of Science and Engineering
University of Groningen, the Netherlands
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210301/c123b37a/attachment.html>

From seralouk at hotmail.com  Sat Mar  6 15:00:10 2021
From: seralouk at hotmail.com (Serafeim Loukas)
Date: Sat, 6 Mar 2021 20:00:10 +0000
Subject: [scikit-learn] Consensus Clustering
In-Reply-To: <0E057EC7-9B81-470D-8FB4-E723362D2728@hotmail.com>
References: <1071563725.605939.1604163089727.ref@mail.yahoo.com>
 <1071563725.605939.1604163089727@mail.yahoo.com>
 <0E057EC7-9B81-470D-8FB4-E723362D2728@hotmail.com>
Message-ID: <9C03210A-25F2-41B9-81DC-D99153B92753@hotmail.com>

Hi all,

Is there an implemented method for Consensus Clustering (https://link.springer.com/article/10.1023%2FA%3A1023949509487<https://link.springer.com/article/10.1023/A:1023949509487>)?

Cheers,
Makis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210306/5fb3d22f/attachment.html>

From reshama.stat at gmail.com  Mon Mar  8 08:00:00 2021
From: reshama.stat at gmail.com (Reshama Shaikh)
Date: Mon, 8 Mar 2021 08:00:00 -0500
Subject: [scikit-learn] Data Umbrella: AFME Sprint Report
Message-ID: <CAKPCsug+1eowKw9i2EutHh3cWFqLRXwizufo7XkPWGcHMdREzw@mail.gmail.com>

Hello,

The Data Umbrella AFME (Africa & Middle East) scikit-learn sprint was on
February 6, 2021, and the report is now available. [a]

For folks who are interested in contributing, or an educator who would like
to share with students, there are resources for getting started in
contributing to scikit-learn. [b]

[a]
https://reshamas.github.io/data-umbrella-afme-2021-scikit-learn-sprint-report/

[b]
https://www.dataumbrella.org/open-source/contributing-to-scikit-learn

Best,
Reshama
---
Reshama Shaikh
she/her
Blog <https://reshamas.github.io> | Twitter <https://twitter.com/reshamas>
| LinkedIn <https://www.linkedin.com/in/reshamas/> | GitHub
<https://github.com/reshamas>

Data Umbrella <https://www.dataumbrella.org>
NYC PyLadies
<https://meet.meetup.com/wf/click?upn=pEEcc35imY7Cq0tG1vyTt6zEs68RbcMfjPcajNHTKtn9NmwqQbJhe15mAZ1gz2La_s50GiGgQPBz9c9AKCDbbu2LRERFOLQHDZ3rAVGAkUEIFdmeKWgLQ1JD-2FBfVxXpI86J1oyur7RYRzToaqco1fWUx-2FWPOn-2FLCyCICxwu5bjlHJvtSvVekt71L43UiQL8dMjr0HfGP-2FSeiGQFG0QQxzS-2FX5o4Q8Ch-2BHrlA5hsa9VyPXC5FvBn1cNbkmil3SgwH7HWFmXsKFJ7RYrzZR0EwWFIMarRA8-2BTgd8yXJYlfxogk-3D>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210308/b5641d54/attachment.html>

From marmochiaskl at gmail.com  Tue Mar  9 06:13:17 2021
From: marmochiaskl at gmail.com (Chiara Marmo)
Date: Tue, 9 Mar 2021 12:13:17 +0100
Subject: [scikit-learn] scikit-learn github label wiki
Message-ID: <CAGfF149zpBx_KpqHvp-19RNicm-mmbr0SrdjpwxWfV8Z11HuJg@mail.gmail.com>

Dear list,
dear triagers,
dear core-devs

I have started a page on the scikit-learn wiki, about the use of labels in
the scikit-learn github repository:
https://github.com/scikit-learn/scikit-learn/wiki/label
The page is meant to provide help in labeling issues and pull requests,
improving the standardization of the workflow and perhaps accelerate the
automation process (thanks to the identification of 'obvious' procedures).
The idea in the end is to make the review more understandable for the
contributors but also making reviewers trust the system a bit more ... :)

Triagers and core devs , feel free to comment and edit there.
List, feel free to comment and ask questions about the triaging process in
this mailing list, any help is welcome!

Thanks for your attention.

Best,

Chiara
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210309/98d8c457/attachment.html>

From joel.nothman at gmail.com  Tue Mar  9 18:43:06 2021
From: joel.nothman at gmail.com (Joel Nothman)
Date: Wed, 10 Mar 2021 10:43:06 +1100
Subject: [scikit-learn] [Vote] SLEP006: Routing sample-aligned metadata
In-Reply-To: <CAAkaFLW=ur_Casw83za8Rr=Xp9_tF8G60sWAbgnQwP1DRpua8g@mail.gmail.com>
References: <CAAkaFLWypE2dv75N3y5ACCQZ11HkdEKB24rMZsPjPMrM7vDYsQ@mail.gmail.com>
 <CAAkaFLW=ur_Casw83za8Rr=Xp9_tF8G60sWAbgnQwP1DRpua8g@mail.gmail.com>
Message-ID: <CAAkaFLU1uW5HNkwAQw2Z9BF3rRvwk2+C1xuZeKZ5J=4HjhzOdw@mail.gmail.com>

We are now twenty days into the voting period. I assume ten remain?

While votes are still scarce, issues that have been raised include:

   - The naming of "metadata"
   <https://github.com/scikit-learn/scikit-learn/pull/16079#discussion_r589290894>.
   There is some support for the word "auxiliary" instead of "meta". (I agree
   that weight/groups are not typically what I would call metadata.)
   - Whether the API for setting requests should be more generic, e.g.
   rolled into the "tags" concept:
   https://github.com/scikit-learn/scikit-learn/pull/16079#issuecomment-794091868
   .
   - Areas in which the WIP PR is still immature and needs substantial
   review: even if the SLEP is accepted, this is unlikely to be ready for 1.0,
   and the API (although not the structure of the model) will continue to be
   refined were the SLEP accepted.

Thanks all for your considered critique and contributions.

On Sat, 27 Feb 2021 at 20:42, Joel Nothman <joel.nothman at gmail.com> wrote:

> Hi all,
>
> Just a reminder that we are ten days into the month-long voting period,
> with one vote on record. Core devs, please find time to consider this
> proposal. Thanks to Andy's suggestion, we have added an example of the new
> API to the opening section:
>
>
> This SLEP proposes an API where users can request certain metadata to be
> passed to its consumer by the meta-estimator it is wrapped in.
>
> The following example illustrates the new request_metadata parameter for
> making scorers, the request_sample_weight estimator method, the metadata parameter
> replacing fit_params in cross_validate, and the automatic passing of
> groups <https://scikit-learn.org/stable/glossary.html#term-groups> to
> GroupKFold to enable nested grouped cross validation. Here, the user
> requests that the sample_weight
> <https://scikit-learn.org/stable/glossary.html#term-sample_weight> metadata
> key should be passed to a customised accuracy scorer (although a predefined
> ?weighted_accuracy? scorer could be introduced), and to the
> LogisticRegressionCV. GroupKFold requests groups
> <https://scikit-learn.org/stable/glossary.html#term-groups> by default.
>
> >>> from sklearn.metrics import accuracy_score, make_scorer>>> from sklearn.model_selection import cross_validate, GroupKFold>>> from sklearn.linear_model import LogisticRegressionCV>>> weighted_acc = make_scorer(accuracy_score,...                            request_metadata=['sample_weight'])>>> group_cv = GroupKFold()>>> lr = LogisticRegressionCV(...    cv=group_cv,...    scoring=weighted_acc,... ).request_sample_weight(fit=True)>>> cross_validate(lr, X, y, cv=group_cv,...                metadata={'sample_weight': my_weights,...                          'groups': my_groups},...                scoring=weighted_acc)
>
>
> On Thu, 18 Feb 2021 at 00:08, Joel Nothman <joel.nothman at gmail.com> wrote:
>
>> With thanks to Alex, Adrin and Christian, we have a proposal to implement
>> what we used to call "sample props" that should be expressive enough for us
>> to resolve tens of issues and PRs, but will be largely unobtrusive for most
>> current users.
>>
>> Core developers, please cast your vote in this PR
>> <https://github.com/scikit-learn/enhancement_proposals/pull/52> after
>> considering the proposal here
>> <https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep006/proposal.html>,
>> which has a partial implementation in #16079
>> <https://github.com/scikit-learn/scikit-learn/pull/16079>.
>>
>>
>> In brief, the problem we are trying to solve:
>>
>> Scikit-learn has limited support for information pertaining to each
>> sample (henceforth ?sample properties?) to be passed through an estimation
>> pipeline. The user can, for instance, pass fit parameters to all members of
>> a FeatureUnion, or to a specified member of a Pipeline using dunder (__)
>> prefixing:
>>
>> >>> from sklearn.pipeline import Pipeline>>> from sklearn.linear_model import LogisticRegression>>> pipe = Pipeline([('clf', LogisticRegression())])>>> pipe.fit([[1, 2], [3, 4]], [5, 6],...          clf__sample_weight=[.5, .7])
>>
>> Several other meta-estimators, such as GridSearchCV, support forwarding
>> these fit parameters to their base estimator when fitting. Yet a number of
>> important use cases are currently not supported.
>>
>> Features we currently do not support and wish to include:
>>
>>    - passing sample properties (e.g. sample_weight
>>    <https://scikit-learn.org/stable/glossary.html#term-sample_weight>)
>>    to a scorer used in cross-validation
>>    - passing sample properties (e.g. groups
>>    <https://scikit-learn.org/stable/glossary.html#term-groups>) to a CV
>>    splitter in nested cross validation
>>    - passing sample properties (e.g. sample_weight
>>    <https://scikit-learn.org/stable/glossary.html#term-sample_weight>)
>>    to some scorers and not others in a multi-metric cross-validation setup
>>
>> Solution: Each consumer requests
>>
>> A meta-estimator provides along to its children only what they request. A
>> meta-estimator needs to request, on behalf of its children, any metadata
>> that descendant consumers request.
>>
>> Each object that could receive metadata should have a method called
>> get_metadata_request() which returns a dict that specifies which
>> metadata is consumed by each of its methods (keys of this dictionary are
>> therefore method names, e.g. fit
>> <https://scikit-learn.org/stable/glossary.html#term-fit>, transform
>> <https://scikit-learn.org/stable/glossary.html#term-transform> etc.).
>> Estimators supporting weighted fitting may return {} by default, but
>> have a method called request_sample_weight which allows the user to
>> specify the requested sample_weight
>> <https://scikit-learn.org/stable/glossary.html#term-sample_weight> in
>> each of its methods. make_scorer accepts request_metadata as keyword
>> parameter through which the user can specify what metadata is requested.
>>
>> Regards,
>>
>> Joel
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210310/ec60a197/attachment-0001.html>

From phamgiaminh0112 at gmail.com  Sun Mar 21 09:57:58 2021
From: phamgiaminh0112 at gmail.com (Pham Gia Minh)
Date: Sun, 21 Mar 2021 20:57:58 +0700
Subject: [scikit-learn] Outlier Detection Problem
Message-ID: <CACj3k1d=23xmBrcd+zS9xhEeixXT-BQ7mdfRR8pfO4j+WGeEkw@mail.gmail.com>

Hello scikit-learn group. I am a beginner to libsvm. I'm having an
outlier detection problem. I train the data in the format <label>
<index1>: <value1> <index2>: <value2> ... in one-class SVM and get a
model file, then I predict test file and model file to get be
accuracy. I want to draw one-class borders using the same interface as
this article: https://scikit-learn.org/stable/auto_examples/miscellaneous/plot_anomaly_comparison.html
but different input got me error (not in type <index:value>). Can you
help me? I hope for help in solving this problem. Thank you very much.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210321/83fa5fc8/attachment.html>

From marmochiaskl at gmail.com  Mon Mar 22 04:51:24 2021
From: marmochiaskl at gmail.com (Chiara Marmo)
Date: Mon, 22 Mar 2021 09:51:24 +0100
Subject: [scikit-learn] Monthly meeting March 29th 2021
Message-ID: <CAGfF14_SMpv+ki7p8-nveFJmPCiqt2PCCxUSKw8mJNh5cTciqg@mail.gmail.com>

Dear list,

The scikit-learn monthly meeting will take place on Monday March 29th at
8PM UTC:
<https://www.timeanddate.com/worldclock/meetingdetails.html?year=2021&month=01&day=04&hour=20&min=0&sec=0&p1=179&p2=240&p3=195&p4=224>
<https://www.timeanddate.com/worldclock/meetingdetails.html?year=2021&month=01&day=25&hour=20&min=0&sec=0&p1=179&p2=240&p3=195&p4=224>
https://www.timeanddate.com/worldclock/meetingdetails.html?year=2021&month=03&day=29&hour=20&min=0&sec=0&p1=179&p2=240&p3=195&p4=224
<https://www.google.com/url?q=https://www.timeanddate.com/worldclock/meetingtime.html?month%3D10%26day%3D26%26year%3D2020%26p1%3D179%26p2%3D240%26p3%3D195%26p4%3D224%26iv%3D0&sa=D&source=calendar&usd=2&usg=AOvVaw1nSuFYBdv9nvJzi_QftPp->
<https://www.google.com/url?q=https://www.timeanddate.com/worldclock/meetingtime.html?month%3D10%26day%3D26%26year%3D2020%26p1%3D179%26p2%3D240%26p3%3D195%26p4%3D224%26iv%3D0&sa=D&source=calendar&usd=2&usg=AOvVaw1nSuFYBdv9nvJzi_QftPp->

While these meetings are mainly for core-devs to discuss the current
topics, non core-devs and other project maintainers are welcome.

Feel free to join, using the following link:

https://meet.google.com/xhq-yoga-rtf

If you plan to attend and you would like to discuss something specific
about your contribution please add your name (or github pseudo) in the "
Contributors <https://hackmd.io/3KKinuJ-TSeVY_Rs9IeyqQ#Contributors>"
section, of the public pad:

<https://hackmd.io/qtKt7pTNSXanU-MJOIMxbw>
https://hackmd.io/3KKinuJ-TSeVY_Rs9IeyqQ

Best

Chiara
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210322/b5451259/attachment.html>

From leibniz01 at gmail.com  Tue Mar 23 22:54:36 2021
From: leibniz01 at gmail.com (James Bunn)
Date: Tue, 23 Mar 2021 22:54:36 -0400
Subject: [scikit-learn] running examples
Message-ID: <CALfXuVusBHsKdRFjgt+7cmfHDSxswE=T78FP5FTwXi7PaLpGBw@mail.gmail.com>

Hi,

I am a new user trying to run the Visualization of MLP weights on MNIST
example for neural networks.

I am not able to get the example to run.  I loaded the scikitlearn and
matplotlib packages called in the program, but still it will not work.

Is there any more I need to do?

My error text is below.

Thank you,

James

"C:\Users\James\PycharmProjects\MATH541 Project\venv\Scripts\python.exe"
C:/Users/James/Documents/MATH541/plot_mnist_filters.py
C:\Users\James\PycharmProjects\MATH541
Project\venv\lib\site-packages\sklearn\datasets\_openml.py:65:
RuntimeWarning: Invalid cache, redownloading file
  warn("Invalid cache, redownloading file", RuntimeWarning)

=====================================
Visualization of MLP weights on MNIST
=====================================

Sometimes looking at the learned coefficients of a neural network can
provide
insight into the learning behavior. For example if weights look
unstructured,
maybe some were not used at all, or if very large coefficients exist, maybe
regularization was too low or the learning rate too high.

This example shows how to plot some of the first layer weights in a
MLPClassifier trained on the MNIST dataset.

The input data consists of 28x28 pixel handwritten digits, leading to 784
features in the dataset. Therefore the first layer weight matrix have the
shape
(784, hidden_layer_sizes[0]).  We can therefore visualize a single column of
the weight matrix as a 28x28 pixel image.

To make the example run faster, we use very few hidden units, and train only
for a very short time. Training longer would result in weights with a much
smoother spatial appearance. The example will throw a warning because it
doesn't converge, in this case this is what we want because of CI's time
constraints.

Traceback (most recent call last):
  File "C:\Users\James\PycharmProjects\MATH541
Project\venv\lib\site-packages\sklearn\utils\__init__.py", line 1081, in
check_pandas_support
    import pandas  # noqa
ModuleNotFoundError: No module named 'pandas'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\James\PycharmProjects\MATH541
Project\venv\lib\site-packages\sklearn\datasets\_openml.py", line 61, in
wrapper
    return f(*args, **kw)
  File "C:\Users\James\PycharmProjects\MATH541
Project\venv\lib\site-packages\sklearn\datasets\_openml.py", line 518, in
_load_arff_response
    parsed_arff = parse_arff(arff)
  File "C:\Users\James\PycharmProjects\MATH541
Project\venv\lib\site-packages\sklearn\datasets\_openml.py", line 332, in
_convert_arff_data_dataframe
    pd = check_pandas_support('fetch_openml with as_frame=True')
  File "C:\Users\James\PycharmProjects\MATH541
Project\venv\lib\site-packages\sklearn\utils\__init__.py", line 1084, in
check_pandas_support
    raise ImportError(
ImportError: fetch_openml with as_frame=True requires pandas.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\James\PycharmProjects\MATH541
Project\venv\lib\site-packages\sklearn\utils\__init__.py", line 1081, in
check_pandas_support
    import pandas  # noqa
ModuleNotFoundError: No module named 'pandas'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\James\Documents\MATH541\plot_mnist_filters.py", line 36,
in <module>
    X, y = fetch_openml('mnist_784', version=1, return_X_y=True)
  File "C:\Users\James\PycharmProjects\MATH541
Project\venv\lib\site-packages\sklearn\utils\validation.py", line 63, in
inner_f
    return f(*args, **kwargs)
  File "C:\Users\James\PycharmProjects\MATH541
Project\venv\lib\site-packages\sklearn\datasets\_openml.py", line 915, in
fetch_openml
    bunch = _download_data_to_bunch(url, return_sparse, data_home,
  File "C:\Users\James\PycharmProjects\MATH541
Project\venv\lib\site-packages\sklearn\datasets\_openml.py", line 633, in
_download_data_to_bunch
    out = _retry_with_clean_cache(url, data_home)(
  File "C:\Users\James\PycharmProjects\MATH541
Project\venv\lib\site-packages\sklearn\datasets\_openml.py", line 69, in
wrapper
    return f(*args, **kw)
  File "C:\Users\James\PycharmProjects\MATH541
Project\venv\lib\site-packages\sklearn\datasets\_openml.py", line 518, in
_load_arff_response
    parsed_arff = parse_arff(arff)
  File "C:\Users\James\PycharmProjects\MATH541
Project\venv\lib\site-packages\sklearn\datasets\_openml.py", line 332, in
_convert_arff_data_dataframe
    pd = check_pandas_support('fetch_openml with as_frame=True')
  File "C:\Users\James\PycharmProjects\MATH541
Project\venv\lib\site-packages\sklearn\utils\__init__.py", line 1084, in
check_pandas_support
    raise ImportError(
ImportError: fetch_openml with as_frame=True requires pandas.

Process finished with exit code 1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210323/38dd97de/attachment.html>

From nicholdav at gmail.com  Tue Mar 23 23:18:41 2021
From: nicholdav at gmail.com (David Nicholson)
Date: Tue, 23 Mar 2021 23:18:41 -0400
Subject: [scikit-learn] running examples
In-Reply-To: <CALfXuVusBHsKdRFjgt+7cmfHDSxswE=T78FP5FTwXi7PaLpGBw@mail.gmail.com>
References: <CALfXuVusBHsKdRFjgt+7cmfHDSxswE=T78FP5FTwXi7PaLpGBw@mail.gmail.com>
Message-ID: <CAMabFbWJMuBS09p4vTMX8DOsS-yYoML8UJ56iNxCpJchwmQ74g@mail.gmail.com>

Looks like you need to install pandas for this example--`fetch_openl` is
trying to give you back a pandas DataFrame

https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_openml.html#sklearn.datasets.fetch_openml

not sure if you could just run it with as_frame = False

David Nicholson, Ph.D.
https://nicholdav.info/
https://github.com/NickleDave
Prinz lab <http://www.biology.emory.edu/research/Prinz/>, Emory University,
Atlanta, GA, USA


On Tue, Mar 23, 2021 at 10:57 PM James Bunn <leibniz01 at gmail.com> wrote:

> Hi,
>
> I am a new user trying to run the Visualization of MLP weights on MNIST
> example for neural networks.
>
> I am not able to get the example to run.  I loaded the scikitlearn and
> matplotlib packages called in the program, but still it will not work.
>
> Is there any more I need to do?
>
> My error text is below.
>
> Thank you,
>
> James
>
> "C:\Users\James\PycharmProjects\MATH541 Project\venv\Scripts\python.exe"
> C:/Users/James/Documents/MATH541/plot_mnist_filters.py
> C:\Users\James\PycharmProjects\MATH541
> Project\venv\lib\site-packages\sklearn\datasets\_openml.py:65:
> RuntimeWarning: Invalid cache, redownloading file
>   warn("Invalid cache, redownloading file", RuntimeWarning)
>
> =====================================
> Visualization of MLP weights on MNIST
> =====================================
>
> Sometimes looking at the learned coefficients of a neural network can
> provide
> insight into the learning behavior. For example if weights look
> unstructured,
> maybe some were not used at all, or if very large coefficients exist, maybe
> regularization was too low or the learning rate too high.
>
> This example shows how to plot some of the first layer weights in a
> MLPClassifier trained on the MNIST dataset.
>
> The input data consists of 28x28 pixel handwritten digits, leading to 784
> features in the dataset. Therefore the first layer weight matrix have the
> shape
> (784, hidden_layer_sizes[0]).  We can therefore visualize a single column
> of
> the weight matrix as a 28x28 pixel image.
>
> To make the example run faster, we use very few hidden units, and train
> only
> for a very short time. Training longer would result in weights with a much
> smoother spatial appearance. The example will throw a warning because it
> doesn't converge, in this case this is what we want because of CI's time
> constraints.
>
> Traceback (most recent call last):
>   File "C:\Users\James\PycharmProjects\MATH541
> Project\venv\lib\site-packages\sklearn\utils\__init__.py", line 1081, in
> check_pandas_support
>     import pandas  # noqa
> ModuleNotFoundError: No module named 'pandas'
>
> The above exception was the direct cause of the following exception:
>
> Traceback (most recent call last):
>   File "C:\Users\James\PycharmProjects\MATH541
> Project\venv\lib\site-packages\sklearn\datasets\_openml.py", line 61, in
> wrapper
>     return f(*args, **kw)
>   File "C:\Users\James\PycharmProjects\MATH541
> Project\venv\lib\site-packages\sklearn\datasets\_openml.py", line 518, in
> _load_arff_response
>     parsed_arff = parse_arff(arff)
>   File "C:\Users\James\PycharmProjects\MATH541
> Project\venv\lib\site-packages\sklearn\datasets\_openml.py", line 332, in
> _convert_arff_data_dataframe
>     pd = check_pandas_support('fetch_openml with as_frame=True')
>   File "C:\Users\James\PycharmProjects\MATH541
> Project\venv\lib\site-packages\sklearn\utils\__init__.py", line 1084, in
> check_pandas_support
>     raise ImportError(
> ImportError: fetch_openml with as_frame=True requires pandas.
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
>   File "C:\Users\James\PycharmProjects\MATH541
> Project\venv\lib\site-packages\sklearn\utils\__init__.py", line 1081, in
> check_pandas_support
>     import pandas  # noqa
> ModuleNotFoundError: No module named 'pandas'
>
> The above exception was the direct cause of the following exception:
>
> Traceback (most recent call last):
>   File "C:\Users\James\Documents\MATH541\plot_mnist_filters.py", line 36,
> in <module>
>     X, y = fetch_openml('mnist_784', version=1, return_X_y=True)
>   File "C:\Users\James\PycharmProjects\MATH541
> Project\venv\lib\site-packages\sklearn\utils\validation.py", line 63, in
> inner_f
>     return f(*args, **kwargs)
>   File "C:\Users\James\PycharmProjects\MATH541
> Project\venv\lib\site-packages\sklearn\datasets\_openml.py", line 915, in
> fetch_openml
>     bunch = _download_data_to_bunch(url, return_sparse, data_home,
>   File "C:\Users\James\PycharmProjects\MATH541
> Project\venv\lib\site-packages\sklearn\datasets\_openml.py", line 633, in
> _download_data_to_bunch
>     out = _retry_with_clean_cache(url, data_home)(
>   File "C:\Users\James\PycharmProjects\MATH541
> Project\venv\lib\site-packages\sklearn\datasets\_openml.py", line 69, in
> wrapper
>     return f(*args, **kw)
>   File "C:\Users\James\PycharmProjects\MATH541
> Project\venv\lib\site-packages\sklearn\datasets\_openml.py", line 518, in
> _load_arff_response
>     parsed_arff = parse_arff(arff)
>   File "C:\Users\James\PycharmProjects\MATH541
> Project\venv\lib\site-packages\sklearn\datasets\_openml.py", line 332, in
> _convert_arff_data_dataframe
>     pd = check_pandas_support('fetch_openml with as_frame=True')
>   File "C:\Users\James\PycharmProjects\MATH541
> Project\venv\lib\site-packages\sklearn\utils\__init__.py", line 1084, in
> check_pandas_support
>     raise ImportError(
> ImportError: fetch_openml with as_frame=True requires pandas.
>
> Process finished with exit code 1
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210323/c71a18ee/attachment-0001.html>

From alexandre.gramfort at inria.fr  Wed Mar 24 03:02:08 2021
From: alexandre.gramfort at inria.fr (Alexandre Gramfort)
Date: Wed, 24 Mar 2021 08:02:08 +0100
Subject: [scikit-learn] running examples
In-Reply-To: <CALfXuVusBHsKdRFjgt+7cmfHDSxswE=T78FP5FTwXi7PaLpGBw@mail.gmail.com>
References: <CALfXuVusBHsKdRFjgt+7cmfHDSxswE=T78FP5FTwXi7PaLpGBw@mail.gmail.com>
Message-ID: <CADeotZqcV3WvO8EHiZm1z6ANixh-KNoqAdykEfcEdj4c718q+Q@mail.gmail.com>

hi James,

you need to install pandas

pip install pandas
or
conda install pandas

should fix your pb

Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210324/cac7883b/attachment.html>

From olivier.grisel at ensta.org  Wed Mar 24 06:09:15 2021
From: olivier.grisel at ensta.org (Olivier Grisel)
Date: Wed, 24 Mar 2021 11:09:15 +0100
Subject: [scikit-learn] running examples
In-Reply-To: <CADeotZqcV3WvO8EHiZm1z6ANixh-KNoqAdykEfcEdj4c718q+Q@mail.gmail.com>
References: <CALfXuVusBHsKdRFjgt+7cmfHDSxswE=T78FP5FTwXi7PaLpGBw@mail.gmail.com>
 <CADeotZqcV3WvO8EHiZm1z6ANixh-KNoqAdykEfcEdj4c718q+Q@mail.gmail.com>
Message-ID: <CAFvE7K6gVBwgtk8aT47=xpMVLdTQyz7zs6jQ9oWnfy=9XyY=6w@mail.gmail.com>

Alternatively, you can edit the code to use fetch_openml(...,
as_frame=False) to use a numpy array instead of a pandas dataframe for
this example.

-- 
Olivier

From krallinger.martin at gmail.com  Fri Mar 26 08:22:08 2021
From: krallinger.martin at gmail.com (Martin Krallinger)
Date: Fri, 26 Mar 2021 13:22:08 +0100
Subject: [scikit-learn] CFP MESINESP2 (BioASQ / CLEF2021 shared task) on
 semantic indexing of heterogenous health content: literature,
 clinical trials and patents
In-Reply-To: <mailman.2661.1616555935.4980.scikit-learn@python.org>
References: <mailman.2661.1616555935.4980.scikit-learn@python.org>
Message-ID: <CAMx+MKEbRzuJ3eX940VC1hbE8qAzxmOP64a4t5n+EtK+y4qMZQ@mail.gmail.com>

**** CFP  MESINESP2 track:*

*Medical Semantic Indexing (BioASQ ? CLEF 2021) ****

https://temu.bsc.es/mesinesp2/

Plan TL Award for MESINESP2 winners


There is a pressing need to improve information access, retrieval,
classification, semantic annotation as well as integration across multiple
document types, in particular for health-related content such as
literature, clinical trials and medicinal patents.

This is especially true for multilingual content from heterogeneous sources
(cross-genre), where for instance many of the initially reported COVID-19
case reports were published in a variety of languages, a considerable
fraction being non-English publications.

Due to the significant practical impact of advanced semantic indexing
technologies in health, and the direct collaboration and interest in the
generated results by collaborating international and national healthcare
organizations (BIREME/WHO, ISCIII/Spain) we are organizing the MESINESP2
shared task in collaboration with the well-established BioASQ (CLEF2021)
initiative.

A variety of complementary strategies were explored so far for semantic
indexing of health-content including (extreme) multi-label classification,
multilingual X-BERT, transformers, graph matching, text similarity, string
matching/term indexing, named entity recognition or machine translation
components.

Inspired by the settings of past BioASQ tracks and our BioCreative corpora
(CHEMPROT, BC4CHEMD/CHEMDNER) included in popular benchmark datasets like
BioBERT, we propose the following three MESINESP2 subtracks:


*1.  MESINESP-L ? Scientific Literature *(sub-track 1): This track will
require automatic indexing with DeCS terms (similar to MeSH) of abstracts
using two highly used databases in Spanish (IBECS and LILACS).

*2. MESINESP-T ? Clinical trials *(sub-track 2): This track will require
automatic indexing with DeCS terms of clinical trials from REEC (Registro
Espa?ol de Estudios Cl?nicos).

*3. MESINESP-P ? Patents *(sub-track 3)*: *This track will require
automatic indexing with DeCS terms the content of Spanish patents extracted
from Google Patents.


*Key information*

1.     MESINESP2 web, info & detailed description:
https://temu.bsc.es/mesinesp2

2.     Registration for MESINESP2:
http://clef2021-labs-registration.dei.unipd.it/ and register to *Task 3 ?
Task MESINESP: Medical Semantic Indexing In Spanish *(Which is part of the
workshop ?BioASQ - Large-scale biomedical semantic indexing and question
answering?)

3.     Datasets: https://zenodo.org/record/4634129#.YFu0MZ1KiUl
<https://zenodo.org/record/4634129#.YFu0MZ1KiUl>


*Task impact*

We foresee that the systems resulting from MESINESP2 will provide directly
useful for a variety of use case scenarios beyond literature indexing,
including competitive intelligence, prior art searches, complex search
queries for systematic reviews, evidence-based medicine, decision making,
as well database curation, elaboration of clinical practice guidelines.
Moreover the document selection criteria of MESINESP2  considered
additional scenarios of future tasks on  semantic indexing of medical
records.


*Important dates*

March, 17: Train set and guidelines release

March, 17: First development set release

April, 15: Test and Background set release

April, 30: BioASQ9 Lab @CLEF 2021 Registration Deadline

April, 30: End of the evaluation period

May, 28: Submission of Participant Papers at CLEF2021

July, 2: Camera-ready paper submission

Sep 21-24: CLEF 2021 Conference


*Publications and workshop*

The MESINESP2 track results will be presented at the BioASQ workshop *allocated
at CLEF 2021* (http://clef2021.clef-initiative.eu). Participating teams
will be invited to present their systems and obtained results. Moreover,
participating teams will be invited to submit their system description
papers for publication at the *CLEF 2021 Working Notes proceedings*.


*MESINESP2 awards*

There will be awards for the top-scoring teams promoted by the Spanish Plan
for the Advancement of Language Technology (Plan TL) and the Barcelona
Supercomputing Center (BSC).


*Main Track organizers*

?        *Martin Krallinger*, Barcelona Supercomputing Center (BSC), Spain.

?        *Luis Gasc?*, Barcelona Supercomputing Center (BSC), Spain.

?        *Anastasios Nentidis*, National Center for Scientific Research
Demokritos, Greece.

?        *Elena Primo-Pe?a,* Biblioteca Nacional de Ciencias de Salud.
Instituto de Salud Carlos III, Spain.

?        *Cristina Bojo Canales, *Biblioteca Nacional de Ciencias de la
Salud. Instituto de Salud Carlos III, Spain.

?        *George Paliouras*, National Center for Scientific Research
Demokritos, Greece.

?        *Anastasia Krithara*, National Center for Scientific Research
Demokritos, Greece.

?        *Renato Murasaki, *BIREME ? Organizaci?n Panamericana de la Salud
(WHO), Brasil.

*Scientific Committee*


?            *David Camacho*, Applied Intelligence and Data Analysis
Research Group, Universidad Polit?cnica de Madrid (Spain)

?            *Oscar Corcho*, Ontology Engineering Group, Universidad
Polit?cnica de Madrid (Spain)

?            *Parminder Batia*, Amazon Health AI (USA)

?            *Irena Spasic*, School of Computer Science & Informatics,
co-Director of the Data Innovation Research Institute, Cardiff University
(UK)

?            *Jose Luis Redondo Garc?a, *Amazon Alexa, Amazon (UK)

?            *Carlos Badenes-Olmedo*, Ontology Engineering Group,
Universidad Polit?cnica de Madrid (Spain)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210326/8b2003f4/attachment-0001.html>