From g.lemaitre58 at gmail.com  Wed Dec  1 08:31:52 2021
From: g.lemaitre58 at gmail.com (=?utf-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Wed, 1 Dec 2021 14:31:52 +0100
Subject: [scikit-learn] scikit-learn office hours on Monday Dec. 6 2021
Message-ID: <D869D0D5-C288-4E8E-A3A0-7184012AF2A9@gmail.com>

Hi all,

Some of us will be online on the scikit-learn discord next Monday at
10:00 PT / 13:00 ET / 18:00 UTC / 19:00 CET for about an hour or so.

First time and occasional contributors are welcome to join us to
discord using this invitation link:
https://discord.gg/YyYRXMju <https://discord.gg/YyYRXMju <https://discord.gg/YBdN45kD>>

The focus of these office hour sessions is to answer questions about
contributing to scikit-learn. We can also split into break out
audio/text channels and do pair programming or live reviewing of
forgotten pull requests with screen sharing.

We can also try to assist you into crafting minimal reproduction cases
for bug reports to get a higher likelihood of resolution (e.g.
https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports <https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports> <https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports <https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports>>).

Please note, our Code of Conduct applies:
https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md <https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md> <https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md <https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md>>

See you soon on discord!
--
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211201/b8d98aab/attachment.html>

From thomasjpfan at gmail.com  Wed Dec  1 09:22:40 2021
From: thomasjpfan at gmail.com (Thomas J. Fan)
Date: Wed, 1 Dec 2021 09:22:40 -0500
Subject: [scikit-learn] scikit-learn monthly developer meeting: Monday
 January 3rd 2022
Message-ID: <CAK3g5AZRTc-+t_dC8jqwpUW=UVVgK2q8MML76S5tYEmiDG5jaQ@mail.gmail.com>

Dear all,


The scikit-learn developer monthly meeting will take place on Monday

January 3rd at 22:00 UTC.


- Video call link: https://meet.google.com/ews-uszu-djs

- Meeting notes / agenda: https://hackmd.io/0yokz72CTZSny8y3Re648Q

- Local times:
https://www.timeanddate.com/worldclock/meetingdetails.html?year=2022&month=1&day=3&hour=22&min=0&sec=0&p1=1440&p2=240&p3=248&p4=195&p5=179&p6=224


The goal of this meeting is to discuss ongoing development topics for

the project. Everybody is welcome.


As usual, please follow the code of conduct of the project:

https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md


Regards,

Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211201/42c70cef/attachment.html>

From norbert at preining.info  Wed Dec  8 23:57:33 2021
From: norbert at preining.info (Norbert Preining)
Date: Thu, 9 Dec 2021 13:57:33 +0900
Subject: [scikit-learn] scikit-learn 1 - pytest - multiprocessing Pool -
 hangs?
Message-ID: <YbGMvaDHWKUJyVbP@bulldog.preining.info>

Dear all,

I am trying to track down a strange behaviour in one of our (Fujitsu)
library we are planning to open source. In preparation for that, I am
trying to bring it into a state that it works with scikit-learn >= 1.

But, some of our tests fail when running in parallel mode. But they
only fail when running under pytest, but NOT when running under python.

The library code contains

	def fit(self, X, y=None):
	    ...
	    p = multiprocessing.Pool()
	    ret = _reduce(
	        p.map(....))

Now what happens is that with scikit-learn 1(.0.1), the code hangs
forever. I adjusted the code also so that the pool definition is not in
the fit function, but in the __init__ function, and saved into self, but
that didn't help either.

When interrupted, pytest gives:

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! KeyboardInterrupt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/threading.py:312: KeyboardInterrupt
(to show a full traceback on KeyboardInterrupt use --full-trace)
================================================ 1 passed, 2 warnings in 273.84s (0:04:33) =================================================
Exception ignored in: <function Pool.__del__ at 0x7ff72f31b9d0>
Traceback (most recent call last):
  File "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/pool.py", line 268, in __del__
    self._change_notifier.put(None)
  File "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/queues.py", line 378, in put
    self._writer.send_bytes(obj)
  File "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
    self._send(header + buf)
  File "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)


While when running under python testfile.py all goes well.


I have tested the following combinations:
* scikit-learn 0.23.*, python 3.8 and python 3.9 => works
* scikit-learn 0.24.*, python 3.8 and python 3.9 => works
* scikit-learn 1.0.1,  python 3.8 and python 3.9 => fails

I don't really understand where scikit-learn comes into the play here,
so I wanted to ask whether someone here has an idea.

Thanks for any suggestion


Norbert

--
PREINING Norbert                              https://www.preining.info
Fujitsu Research  +  IFMGA Guide  +  TU Wien  +  TeX Live  + Debian Dev
GPG: 0x860CDC13   fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13

From olivier.grisel at ensta.org  Thu Dec  9 04:05:35 2021
From: olivier.grisel at ensta.org (Olivier Grisel)
Date: Thu, 9 Dec 2021 10:05:35 +0100
Subject: [scikit-learn] scikit-learn 1 - pytest - multiprocessing Pool -
 hangs?
In-Reply-To: <YbGMvaDHWKUJyVbP@bulldog.preining.info>
References: <YbGMvaDHWKUJyVbP@bulldog.preining.info>
Message-ID: <CAFvE7K60HsJdgwQSMoQnm5LX=qUt_t1pmiJJfZw_SRpnDwkDng@mail.gmail.com>

Maybe you can try to use faulthandler.dump_traceback_later
https://docs.python.org/3/library/faulthandler.html#faulthandler.dump_traceback_later
to get a traceback of all the threads of the main process.

But the fact that you are using the default `p =
multiprocessing.Pool()` makes me think that it might be related to the
lack of fork-safety of the OpenMP runtime library of GCC (libgomp)
[1]. There are several ways to check this:

- print the output of threadpoolctl.threadpool_info() before calling
the code that freezes to confirm (or not) that the libgomp runtime has
been loaded before creating the MP Pool.
- use multiprocessing Pool using a forkserver context instead of the
default fork context: multiprocessing.get_context("forkserver").Pool()
- alternatively, use loky.get_reusable_excutor() instead of
multiprocessing.Pool() (with a slightly different API)
- alternatively, use joblib that uses loky internally with an even
more different API.
- alternatively, recompile scikit-learn from source with clang instead
of gcc so as to link scikit-learn to llvm-openmp instead of gcc's
libgomp runtime. llvm-openmp is forksafe,
- alternatively, install scikit-learn from conda-forge (conda install
-c conda-forge scikit-learn) as the conda-forge distribution relinks
all OpenMP compiled extensions of its packaged libraries to
llvm-openmp transparently at install time, even if they were built
with GCC (maybe we should do that for our linux wheels).

[1] https://gcc.gnu.org/legacy-ml/gcc-patches/2014-02/msg00979.html

If that does not work or need more help, please feel free to open an
issue with a minimal reproducer and ping me on gitter or discord.

Le jeu. 9 d?c. 2021 ? 05:59, Norbert Preining <norbert at preining.info> a ?crit :
>
> Dear all,
>
> I am trying to track down a strange behaviour in one of our (Fujitsu)
> library we are planning to open source. In preparation for that, I am
> trying to bring it into a state that it works with scikit-learn >= 1.
>
> But, some of our tests fail when running in parallel mode. But they
> only fail when running under pytest, but NOT when running under python.
>
> The library code contains
>
>         def fit(self, X, y=None):
>             ...
>             p = multiprocessing.Pool()
>             ret = _reduce(
>                 p.map(....))
>
> Now what happens is that with scikit-learn 1(.0.1), the code hangs
> forever. I adjusted the code also so that the pool definition is not in
> the fit function, but in the __init__ function, and saved into self, but
> that didn't help either.
>
> When interrupted, pytest gives:
>
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! KeyboardInterrupt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> /home/norbert/.pyenv/versions/3.9.6/lib/python3.9/threading.py:312: KeyboardInterrupt
> (to show a full traceback on KeyboardInterrupt use --full-trace)
> ================================================ 1 passed, 2 warnings in 273.84s (0:04:33) =================================================
> Exception ignored in: <function Pool.__del__ at 0x7ff72f31b9d0>
> Traceback (most recent call last):
>   File "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/pool.py", line 268, in __del__
>     self._change_notifier.put(None)
>   File "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/queues.py", line 378, in put
>     self._writer.send_bytes(obj)
>   File "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes
>     self._send_bytes(m[offset:offset + size])
>   File "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
>     self._send(header + buf)
>   File "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/connection.py", line 373, in _send
>     n = write(self._handle, buf)
>
>
> While when running under python testfile.py all goes well.
>
>
> I have tested the following combinations:
> * scikit-learn 0.23.*, python 3.8 and python 3.9 => works
> * scikit-learn 0.24.*, python 3.8 and python 3.9 => works
> * scikit-learn 1.0.1,  python 3.8 and python 3.9 => fails
>
> I don't really understand where scikit-learn comes into the play here,
> so I wanted to ask whether someone here has an idea.
>
> Thanks for any suggestion
>
>
> Norbert
>
> --
> PREINING Norbert                              https://www.preining.info
> Fujitsu Research  +  IFMGA Guide  +  TU Wien  +  TeX Live  + Debian Dev
> GPG: 0x860CDC13   fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


-- 
Olivier

From thomasjpfan at gmail.com  Thu Dec  9 17:22:46 2021
From: thomasjpfan at gmail.com (Thomas J. Fan)
Date: Thu, 9 Dec 2021 17:22:46 -0500
Subject: [scikit-learn] scikit-learn Triage-Focused Development Meeting:
 Friday December 10 2021
Message-ID: <CAK3g5AaGibkLxesukTDvj4j8TATVbyDx-K01McSVcOkVMcoyCQ@mail.gmail.com>

Hi all,


Our triage-focused development meeting will be on Friday, December 10,
16:30 UTC.


- Discord invite: https://discord.gg/92NYvrPSgU

- Meeting notes / agenda: https://hackmd.io/C_qEdGapRm2V0kHLx8OcQw?both

- Local times:
https://www.timeanddate.com/worldclock/meetingdetails.html?year=2021&month=12&day=10&hour=16&min=30&sec=0&p1=1440&p2=240&p3=248&p4=195&p5=179&p6=224&iv=1800


Everyone is welcome to join us in prioritizing and discussing issues or
pull requests.


Please note, our Code of Conduct applies:

https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md


See you soon on discord!


Regards,

Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211209/2f2affb4/attachment.html>

From norbert at preining.info  Fri Dec 10 00:16:55 2021
From: norbert at preining.info (Norbert Preining)
Date: Fri, 10 Dec 2021 14:16:55 +0900
Subject: [scikit-learn] scikit-learn 1 - pytest - multiprocessing Pool -
 hangs?
In-Reply-To: <CAFvE7K60HsJdgwQSMoQnm5LX=qUt_t1pmiJJfZw_SRpnDwkDng@mail.gmail.com>
References: <YbGMvaDHWKUJyVbP@bulldog.preining.info>
 <CAFvE7K60HsJdgwQSMoQnm5LX=qUt_t1pmiJJfZw_SRpnDwkDng@mail.gmail.com>
Message-ID: <YbLix8IFVbdAWrON@bulldog.preining.info>

Hi Olivier,

thanks a lot, I will try the various options and see what I can do. If
and when I understand more, I will report back.

Thanks again for the detailed explanation and hints, much appreciated.

Best

Norbert

On Thu, 09 Dec 2021, Olivier Grisel wrote:
> Maybe you can try to use faulthandler.dump_traceback_later
> https://docs.python.org/3/library/faulthandler.html#faulthandler.dump_traceback_later
> to get a traceback of all the threads of the main process.
[...]

--
PREINING Norbert                              https://www.preining.info
Fujitsu Research  +  IFMGA Guide  +  TU Wien  +  TeX Live  + Debian Dev
GPG: 0x860CDC13   fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13

From reshama.stat at gmail.com  Tue Dec 14 12:48:48 2021
From: reshama.stat at gmail.com (Reshama Shaikh)
Date: Tue, 14 Dec 2021 12:48:48 -0500
Subject: [scikit-learn] [Data Umbrella] AFME (Africa & Middle East)
 scikit-learn open source sprint (scikit-learn)
In-Reply-To: <CAEOrW48pYuph=2r9x90yZzi-fAZy58JE+aXnLVU3_TPwkQGYew@mail.gmail.com>
References: <CAKPCsugqH-JBYrUH0wn+A0KXCNPur88zG0Fei-+mxoR7T+OC1w@mail.gmail.com>
 <CAKPCsuga6oTFUQtjKm_iJ49uP61HYjH1CK_YVYQ+-BsLbRBKjw@mail.gmail.com>
 <CAKPCsuh1f0C__Aw_Sh5OMWXSZBCVUFaREYeRG6OegUK7xS1VFQ@mail.gmail.com>
 <CAEOrW48pYuph=2r9x90yZzi-fAZy58JE+aXnLVU3_TPwkQGYew@mail.gmail.com>
Message-ID: <CAKPCsui=dp5jr0xLgL5+YRgeyhXQgKsv9HC3gqif7=PZyW1KLg@mail.gmail.com>

Hi Adrin,
Thanks! Can we send the below text over to NumFOCUS for their next monthly
newsletter?
If you prefer me to send the note directly to Arliss, I can do that.

===
The Data Umbrella Africa & Middle East (AFME2) *scikit-learn* online sprint
was held on October 23, 2021, and the event report
<https://blog.dataumbrella.org/data-umbrella-afme2-2021-scikit-learn-sprint-report>
<https://blog.dataumbrella.org/data-umbrella-afme2-2021-scikit-learn-sprint-report>is
now available. 40 participants joined from 17 countries, and 57% were
returning contributors. Check out the report for informative plots.
===


Reshama Shaikh
she/her
Blog <https://reshamas.github.io> | Twitter <https://twitter.com/reshamas>
| LinkedIn <https://www.linkedin.com/in/reshamas/> | GitHub
<https://github.com/reshamas>

Data Umbrella <https://www.dataumbrella.org>
NYC PyLadies
<https://meet.meetup.com/wf/click?upn=pEEcc35imY7Cq0tG1vyTt6zEs68RbcMfjPcajNHTKtn9NmwqQbJhe15mAZ1gz2La_s50GiGgQPBz9c9AKCDbbu2LRERFOLQHDZ3rAVGAkUEIFdmeKWgLQ1JD-2FBfVxXpI86J1oyur7RYRzToaqco1fWUx-2FWPOn-2FLCyCICxwu5bjlHJvtSvVekt71L43UiQL8dMjr0HfGP-2FSeiGQFG0QQxzS-2FX5o4Q8Ch-2BHrlA5hsa9VyPXC5FvBn1cNbkmil3SgwH7HWFmXsKFJ7RYrzZR0EwWFIMarRA8-2BTgd8yXJYlfxogk-3D>


On Mon, Nov 22, 2021 at 5:36 PM Adrin <adrin.jalali at gmail.com> wrote:

> Thanks Reshama,
>
> That's a really nice report!
>
> On Mon, Nov 22, 2021 at 12:01 PM Reshama Shaikh <reshama.stat at gmail.com>
> wrote:
>
>> Hello,
>> The report from the Data Umbrella Africa & Middle East sprint is here
>> [a].
>>
>> SUMMARY
>> - 40 people joined
>> - 17 countries represented
>> - 57% were returning contributors
>>
>> There are a lot of good plots in the report. This is one of the first
>> times I've examined attrition more closely, related to gender and country.
>>
>> Thanks to everyone on the Data Umbrella and scikit-learn teams for their
>> assistance in making this happen!
>>
>> [a]:
>> https://blog.dataumbrella.org/data-umbrella-afme2-2021-scikit-learn-sprint-report
>>
>> Best,
>> Reshama
>> ---
>> Reshama Shaikh
>> she/her
>> Blog <https://reshamas.github.io> | Twitter
>> <https://twitter.com/reshamas> | LinkedIn
>> <https://www.linkedin.com/in/reshamas/> | GitHub
>> <https://github.com/reshamas>
>>
>> Data Umbrella <https://www.dataumbrella.org>
>> NYC PyLadies
>> <https://meet.meetup.com/wf/click?upn=pEEcc35imY7Cq0tG1vyTt6zEs68RbcMfjPcajNHTKtn9NmwqQbJhe15mAZ1gz2La_s50GiGgQPBz9c9AKCDbbu2LRERFOLQHDZ3rAVGAkUEIFdmeKWgLQ1JD-2FBfVxXpI86J1oyur7RYRzToaqco1fWUx-2FWPOn-2FLCyCICxwu5bjlHJvtSvVekt71L43UiQL8dMjr0HfGP-2FSeiGQFG0QQxzS-2FX5o4Q8Ch-2BHrlA5hsa9VyPXC5FvBn1cNbkmil3SgwH7HWFmXsKFJ7RYrzZR0EwWFIMarRA8-2BTgd8yXJYlfxogk-3D>
>>
>>
>> On Mon, Oct 11, 2021 at 8:00 AM Reshama Shaikh <reshama.stat at gmail.com>
>> wrote:
>>
>>> Hello,
>>> At this time, we have a few spots open for the upcoming October 23
>>> online scikit-learn sprint organized by Data Umbrella.
>>>
>>> If you reside outside of the Africa and Middle East region, you are now
>>> able to apply.
>>> https://afme2021rc.dataumbrella.org/home
>>>
>>> Note 1:  we offer a stipend of $10 USD to cover the cost of internet
>>> access, and you can indicate such on your application.
>>>
>>> Note 2:  if you need a translator, please indicate so on your
>>> application.
>>>
>>> Key Notes:
>>> a)  There is a pre-sprint event on Saturday October 16 from 5-6pm EAT.
>>> This pre-sprint event is *optional* and an opportunity to answer any
>>> questions in general and aid in setting up your virtual environment.
>>>
>>> b)  Sprint is on *Saturday, October 23 at 5pm - 9pm EAT (East Africa
>>> Time) *on our Discord server.
>>>
>>> c)  There is a post-sprint event on Saturday November 23 from 5-6pm
>>> EAT.  This post-sprint event is *optional* and an opportunity to ask the
>>> core devs questions on open pull requests.
>>>
>>> d)  There is 3-4 hours of pre-work for the sprint.    Here is the
>>> checklist:  https://afme2021rc.dataumbrella.org/about/prep-work
>>>
>>> Please feel free to send any questions to me off the mailing list.
>>>
>>> Best,
>>> Reshama
>>> Reshama Shaikh
>>> she/her
>>> Blog <https://reshamas.github.io> | Twitter
>>> <https://twitter.com/reshamas> | LinkedIn
>>> <https://www.linkedin.com/in/reshamas/> | GitHub
>>> <https://github.com/reshamas>
>>>
>>> Data Umbrella <https://www.dataumbrella.org>
>>> NYC PyLadies
>>> <https://meet.meetup.com/wf/click?upn=pEEcc35imY7Cq0tG1vyTt6zEs68RbcMfjPcajNHTKtn9NmwqQbJhe15mAZ1gz2La_s50GiGgQPBz9c9AKCDbbu2LRERFOLQHDZ3rAVGAkUEIFdmeKWgLQ1JD-2FBfVxXpI86J1oyur7RYRzToaqco1fWUx-2FWPOn-2FLCyCICxwu5bjlHJvtSvVekt71L43UiQL8dMjr0HfGP-2FSeiGQFG0QQxzS-2FX5o4Q8Ch-2BHrlA5hsa9VyPXC5FvBn1cNbkmil3SgwH7HWFmXsKFJ7RYrzZR0EwWFIMarRA8-2BTgd8yXJYlfxogk-3D>
>>>
>>>
>>> On Sat, Sep 25, 2021 at 5:05 PM Reshama Shaikh <reshama.stat at gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> Data Umbrella is organizing a scikit-learn sprint for this October 23,
>>>> with a focus on **Africa and the Middle East**.  This event is free.
>>>>
>>>> A sprint is a 4-hour hands-on hackathon where we work on beginner
>>>> issues in the scikit-learn GitHub repository.  Participants will be paired
>>>> with another person.  There will be core contributors available to answer
>>>> any questions.
>>>>
>>>> Event website is:  https://afme2021rc.dataumbrella.org
>>>> We encourage folks to read the website and then complete the
>>>> application.
>>>>
>>>> The event can be shared in these ways:
>>>> - Retweet:  https://twitter.com/DataUmbrella/status/1435972074842034184
>>>> - Share post on LinkedIn:
>>>> https://www.linkedin.com/feed/update/urn:li:activity:6841738994305294336/
>>>>
>>>> Please feel free to contact me if you have any questions.
>>>>
>>>> Cheers,
>>>> Reshama Shaikh
>>>> she/her
>>>>
>>>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211214/228ee89d/attachment.html>

From g.lemaitre58 at gmail.com  Fri Dec 17 16:53:16 2021
From: g.lemaitre58 at gmail.com (=?utf-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Fri, 17 Dec 2021 22:53:16 +0100
Subject: [scikit-learn] scikit-learn office hours on Monday Dec. 20, 2021
Message-ID: <6DD3B04D-D6D9-471A-B9CD-3B0DCACEB8F4@gmail.com>

Hi all,

Some of us will be online on the scikit-learn discord next Monday at
10:00 PT / 13:00 ET / 18:00 UTC / 19:00 CET for about an hour or so.

First time and occasional contributors are welcome to join us to
discord using this invitation link:
https://discord.gg/N8dGHPpq <https://discord.gg/N8dGHPpq>

The focus of these office hour sessions is to answer questions about
contributing to scikit-learn. We can also split into break out
audio/text channels and do pair programming or live reviewing of
forgotten pull requests with screen sharing.

We can also try to assist you into crafting minimal reproduction cases
for bug reports to get a higher likelihood of resolution (e.g.
https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports <https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports> <https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports <https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports>>).

Please note, our Code of Conduct applies:
https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md <https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md> <https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md <https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md>>

See you soon on discord!
--
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211217/192f98e1/attachment.html>

From reshama.stat at gmail.com  Tue Dec 21 11:34:36 2021
From: reshama.stat at gmail.com (Reshama Shaikh)
Date: Tue, 21 Dec 2021 11:34:36 -0500
Subject: [scikit-learn] Community Office Hours
Message-ID: <CAKPCsuhe_rTx8kvqJkKsWUPToPftiPewmJtHuQK_UQZ5k=ZPwA@mail.gmail.com>

Hello,


*scikit-learn: Community Office Hours*

Beginning January 11, 2022, the scikit-learn team will be holding bi-weekly
(every two weeks) office hours on Mondays.

There is also a link to a public calendar which can be added manually:
https://calendar.google.com/calendar/u/0/embed?src=social.scikitlearn at gmail.com&ctz=America/New_York

DATE: biweekly (every two weeks on Mondays)
TIME: 10:00 PT / 13:00 ET / 18:00 UTC / 19:00 CET
DURATION: 1 hour
WHERE:  Discord (https://discord.gg/N8dGHPpq
<https://www.google.com/url?q=https://discord.gg/N8dGHPpq&sa=D&source=calendar&usd=2&usg=AOvVaw1wsWJr6eb1BQOXFMqh4zb7>
)

ABOUT
First time, occasional and regular contributors are welcome to join us on
Discord.  The focus of these office hour sessions is to answer questions
about contributing to scikit-learn. We can also split into break out
audio/text channels and do pair programming or live reviewing of stalled
pull requests with screen sharing. We can also try to assist you into
crafting minimal reproduction cases for bug reports to get a higher
likelihood of resolution (a). Please note, our Code of Conduct applies (b).

(a) https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports
<https://www.google.com/url?q=https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports&sa=D&source=calendar&usd=2&usg=AOvVaw20mwl83uEpcsnApSOQx6eJ>
(b)
https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md
<https://www.google.com/url?q=https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md&sa=D&source=calendar&usd=2&usg=AOvVaw2otfWH86LRF_-YeojzsOf0>

Cheers,
Reshama
---
Reshama Shaikh
she/her
Blog <https://reshamas.github.io> | Twitter <https://twitter.com/reshamas>
| LinkedIn <https://www.linkedin.com/in/reshamas/> | GitHub
<https://github.com/reshamas>

Data Umbrella <https://www.dataumbrella.org>
NYC PyLadies
<https://meet.meetup.com/wf/click?upn=pEEcc35imY7Cq0tG1vyTt6zEs68RbcMfjPcajNHTKtn9NmwqQbJhe15mAZ1gz2La_s50GiGgQPBz9c9AKCDbbu2LRERFOLQHDZ3rAVGAkUEIFdmeKWgLQ1JD-2FBfVxXpI86J1oyur7RYRzToaqco1fWUx-2FWPOn-2FLCyCICxwu5bjlHJvtSvVekt71L43UiQL8dMjr0HfGP-2FSeiGQFG0QQxzS-2FX5o4Q8Ch-2BHrlA5hsa9VyPXC5FvBn1cNbkmil3SgwH7HWFmXsKFJ7RYrzZR0EwWFIMarRA8-2BTgd8yXJYlfxogk-3D>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211221/06bb8a2e/attachment.html>

From g.lemaitre58 at gmail.com  Sat Dec 25 15:34:02 2021
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Sat, 25 Dec 2021 21:34:02 +0100
Subject: [scikit-learn] [ANN] scikit-learn 1.0.2 is online!
Message-ID: <CACDxx9hrf0MBs0oEao9-p1qARzaCwMM5Q2VfwF8+_pZ8HdSs5Q@mail.gmail.com>

scikit-learn 1.0.2 is out on pypi.org and conda-forge!

This is a small maintenance release that fixes a couple of regressions.
Binaries and wheels are available for Python 3.10.

https://scikit-learn.org/dev/whats_new/v1.0.html#version-1-0-2

You can upgrade with pip as usual:

pip install -U scikit-learn

The conda-forge builds will be available shortly, which you can then
install using:

conda install -c conda-forge scikit-learn

Thanks again to all the contributors.
On behalf of the scikit-learn maintainer team.
-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211225/0d84c806/attachment.html>

From mh.nwafu at gmail.com  Sun Dec 26 22:16:20 2021
From: mh.nwafu at gmail.com (Haylee Miller)
Date: Mon, 27 Dec 2021 11:16:20 +0800
Subject: [scikit-learn] Fwd: There is a problem with using "r2" to calculate
 cross_val_score and GridSearchCV scores
In-Reply-To: <CAAy0z8TW297DKyhe8bfawFX2nbOorzwTmZjPeboc-1350ZD2Gg@mail.gmail.com>
References: <CAAy0z8TW297DKyhe8bfawFX2nbOorzwTmZjPeboc-1350ZD2Gg@mail.gmail.com>
Message-ID: <CAAy0z8QDOWMpF4-0cRSNZW9rBJApQwtMRi9AW1oUa_PctUvH9g@mail.gmail.com>

I don?t know if the email was successfully sent last time. I send it again
now. I?m sorry to disturb you.

---------- Forwarded message ---------
???? Haylee Miller <mh.nwafu at gmail.com>
Date: 2021?12?24??? 21:17
Subject: There is a problem with using "r2" to calculate cross_val_score
and GridSearchCV scores
To: <scikit-learn at python.org>


Dear sklearn developers?

First of all, thank you for developing this module, it is very useful.
However, recently we found a small problem in the use of cross_val_score
and GridSearchCV.
Using "scoring = ?r2?" to calculate the cross_val_score and GridSearchCV
scores is inconsistent with the result calculated using "metrics.r2_score".
[image: 5.png]
According to the principle of k-fold cross-validation, we performed manual
3-fold cross-validation and there was a big gap between the score and the
result of cross_val_score.
Below is the code and results of our manual verification process.
[image: 1.png]
[image: 2.png]

[image: 3.png][image: 4.png]

Theoretically, the three values in results 1-3 should be similar to the
three values in cross_val_score 1 and cross_val_score 2.
However, only the first value in cross_val_score 1 and cross_val_score 2 is
close to the result 1-3 in  figures.
Why is this so, looking forward to your reply?
Finally, Merry Christmas?

Best wishes,
Ma Hui
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211227/85bec9cc/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1.png
Type: image/png
Size: 13921 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211227/85bec9cc/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2.png
Type: image/png
Size: 12312 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211227/85bec9cc/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 3.png
Type: image/png
Size: 11661 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211227/85bec9cc/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 4.png
Type: image/png
Size: 21220 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211227/85bec9cc/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 5.png
Type: image/png
Size: 31637 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211227/85bec9cc/attachment-0009.png>

From g.lemaitre58 at gmail.com  Mon Dec 27 03:31:59 2021
From: g.lemaitre58 at gmail.com (g.lemaitre58 at gmail.com)
Date: Mon, 27 Dec 2021 09:31:59 +0100
Subject: [scikit-learn] Fwd: There is a problem with using "r2" to
 calculate cross_val_score and GridSearchCV scores
In-Reply-To: <CAAy0z8QDOWMpF4-0cRSNZW9rBJApQwtMRi9AW1oUa_PctUvH9g@mail.gmail.com>
References: <CAAy0z8QDOWMpF4-0cRSNZW9rBJApQwtMRi9AW1oUa_PctUvH9g@mail.gmail.com>
Message-ID: <68943D06-8BC3-476F-B505-001931FAC7F6@gmail.com>

I am not surprised to observe these variations. First, the oob score tends to overestimate the statistical performance of the model. Then the last score is an evaluation without cross validation. Therefore, you trained a single model on full x1 and tested on x3. In cross validation you evaluate 3 models on different of x1. So you have less data and I would expect the last score to be potentially better. The remaining variation is across the fold in the first score. This can happen when you use a idols that does not shuffle the data and that there is a structure in the order of the data. Shuffling the data will break this and make it easier to predict without this variation, most probably. What is important however is to know if this structure is supposed to be existing or not. If it is then shuffling should not be done and the original estimate is what you should look at. Such a wrong shuffling coule be something like shuffling time series: you break the ordering by shuffling while you certainly want to split considering this time structure. 

Sent from my iPhone

> On 27 Dec 2021, at 04:18, Haylee Miller <mh.nwafu at gmail.com> wrote:
> 
> ?
> I don?t know if the email was successfully sent last time. I send it again now. I?m sorry to disturb you.
> 
> ---------- Forwarded message ---------
> ???? Haylee Miller <mh.nwafu at gmail.com>
> Date: 2021?12?24??? 21:17
> Subject: There is a problem with using "r2" to calculate cross_val_score and GridSearchCV scores
> To: <scikit-learn at python.org>
> 
> 
> Dear sklearn developers?
> First of all, thank you for developing this module, it is very useful. However, recently we found a small problem in the use of cross_val_score and GridSearchCV.
> Using "scoring = ?r2?" to calculate the cross_val_score and GridSearchCV scores is inconsistent with the result calculated using "metrics.r2_score".
> 
> According to the principle of k-fold cross-validation, we performed manual 3-fold cross-validation and there was a big gap between the score and the result of cross_val_score. 
> Below is the code and results of our manual verification process.
> 
> 
> 
> Theoretically, the three values in results 1-3 should be similar to the three values in cross_val_score 1 and cross_val_score 2. 
> However, only the first value in cross_val_score 1 and cross_val_score 2 is close to the result 1-3 in  figures.
> Why is this so, looking forward to your reply?
> Finally, Merry Christmas?
> Best wishes,
> Ma Hui
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211227/8705d620/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 5.png
Type: image/png
Size: 31637 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211227/8705d620/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1.png
Type: image/png
Size: 13921 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211227/8705d620/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2.png
Type: image/png
Size: 12312 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211227/8705d620/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 3.png
Type: image/png
Size: 11661 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211227/8705d620/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 4.png
Type: image/png
Size: 21220 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211227/8705d620/attachment-0009.png>