Hi all, https://github.com/scipy/scipy/pull/8306 adds Pythran support to scipy.signal.max_len_seq. It's optional, so one can set an env var to keep using the Cython version instead. I plan to merge that soon, now that we're at the beginning of a release cycle, so we can get some real feedback. Then if all goes well we can decide to keep it and expand its usage, and if there are showstoppers we can revert before the 1.7.x branch split. The first time this was proposed was already 3 years ago: https://mail.python.org/pipermail/scipy-dev/2018-January/022325.html. As a reminder, Pythran is an ahead-of-time compiler, taking pure Python code with some type comments to generate C++, which then gets compiled the regular way. Its advantages over Cython are: - pure Python rather than a separate language, so easier to use - generated source code ~100x shorter - generated shared libraries ~10x smaller - on average faster The main concern I think is Pythran's maturity (which could be okay, getting some data will be nice) and that it has a bus factor of one (which isn't that different from Cython). All CI currently passes, including on Windows, aarch64, and ppc64le. It's not unlikely that we'll find some other hiccup (e.g. AIX, PyPy), but so far it looks pretty good so it'd be nice to get some real-world experience with it. And to preempt the obvious question: no we don't need to compare with Numba. That situation didn't change from last time we discussed it; Numba is a heavy and fragile runtime dependency, and supporting libraries like SciPy isn't Numba's core focus. I also checked in with Stan Seibert (Numba core dev) recently, and he agreed with that assessment. Please have a look at the PR and comment on it or here if there's something concerning. Cheers, Ralf
Hi, I'm probably a bit more cautious re: "maintenance burden" but don't see an issue with experimentation and gradual adoption if the team is in favor (I haven't seen any substantial objections). It is perhaps fair to note that "pure Python" is perhaps a little optimistic--I believe there are at least a few restrictions here and perhaps a few more if you take mixed Python/NumPy code and expect full Pythran benefits with no modifications. We did notice in an earlier iteration of that PR that producing identical behavior for algorithms that depend on the details of random number generation under the hood in NumPy also requires some caution beyond merely decorating the code for transpiling. Best wishes, Tyler On Sat, 26 Dec 2020 at 14:32, Ralf Gommers <ralf.gommers@gmail.com> wrote:
Hi all,
https://github.com/scipy/scipy/pull/8306 adds Pythran support to scipy.signal.max_len_seq. It's optional, so one can set an env var to keep using the Cython version instead. I plan to merge that soon, now that we're at the beginning of a release cycle, so we can get some real feedback. Then if all goes well we can decide to keep it and expand its usage, and if there are showstoppers we can revert before the 1.7.x branch split.
The first time this was proposed was already 3 years ago: https://mail.python.org/pipermail/scipy-dev/2018-January/022325.html. As a reminder, Pythran is an ahead-of-time compiler, taking pure Python code with some type comments to generate C++, which then gets compiled the regular way. Its advantages over Cython are:
- pure Python rather than a separate language, so easier to use - generated source code ~100x shorter - generated shared libraries ~10x smaller - on average faster
The main concern I think is Pythran's maturity (which could be okay, getting some data will be nice) and that it has a bus factor of one (which isn't that different from Cython).
All CI currently passes, including on Windows, aarch64, and ppc64le. It's not unlikely that we'll find some other hiccup (e.g. AIX, PyPy), but so far it looks pretty good so it'd be nice to get some real-world experience with it.
And to preempt the obvious question: no we don't need to compare with Numba. That situation didn't change from last time we discussed it; Numba is a heavy and fragile runtime dependency, and supporting libraries like SciPy isn't Numba's core focus. I also checked in with Stan Seibert (Numba core dev) recently, and he agreed with that assessment.
Please have a look at the PR and comment on it or here if there's something concerning.
Cheers, Ralf
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
On Sun, Dec 27, 2020 at 04:46:36PM -0700, Tyler Reddy wrote:
Hi,
Hi, Pythran maintainer here. You're points make sense, let me second them with a few more technical details.
I'm probably a bit more cautious re: "maintenance burden" but don't see an issue with experimentation and gradual adoption if the team is in favor (I haven't seen any substantial objections). It is perhaps fair to note that "pure Python" is perhaps a little optimistic--I believe there are at least a few restrictions here and perhaps a few more if you take mixed Python/NumPy code and expect full Pythran benefits with no modifications. We did notice in an
That's correct. A goog example would be that it's sometime more efficient to write the loop explicitly than doing the high-level equivalent Numpy operation. That's fortunately not always the case but there are still room for improvement there. Stated differently, even if it's pure Python, one may be tented to optimize the code for transpiling.
earlier iteration of that PR that producing identical behavior for algorithms that depend on the details of random number generation under the hood in NumPy also requires some caution beyond merely decorating the code for transpiling.
Pythran strives for 100% reproducibility of observable behavior. Until now we've been using a different PRNG than Numpy's (for perfrmance reason, mostly). But we could use the same, that would also solve a CI difficulty we have when testing numpy.random and random package.
On Mon, Dec 28, 2020 at 8:10 AM Serge Guelton < serge.guelton@telecom-bretagne.eu> wrote:
On Sun, Dec 27, 2020 at 04:46:36PM -0700, Tyler Reddy wrote:
Hi,
Hi, Pythran maintainer here. You're points make sense, let me second them with a few more technical details.
I'm probably a bit more cautious re: "maintenance burden" but don't see an issue with experimentation and gradual adoption if the team is in favor (I haven't seen any substantial objections). It is perhaps fair to note that "pure Python" is perhaps a little optimistic--I believe there are at least a few restrictions here and perhaps a few more if you take mixed Python/NumPy code and expect full Pythran benefits with no modifications. We did notice in an
That's correct. A goog example would be that it's sometime more efficient to write the loop explicitly than doing the high-level equivalent Numpy operation. That's fortunately not always the case but there are still room for improvement there. Stated differently, even if it's pure Python, one may be tented to optimize the code for transpiling.
earlier iteration of that PR that producing identical behavior for algorithms that depend on the details of random number generation under the hood in NumPy also requires some caution beyond merely decorating the code for transpiling.
Pythran strives for 100% reproducibility of observable behavior. Until now we've been using a different PRNG than Numpy's (for perfrmance reason, mostly). But we could use the same, that would also solve a CI difficulty we have when testing numpy.random and random package.
Identical results would be nice, but in practice I'm not sure how high-prio that is for SciPy. It only matters for functions that take a seed or a generator instance, and we don't have all that many of those that would be candidates for a Pythran rewrite. Also, NumPy now has multiple generators, so it's not like you can just implement one to cover all potential use cases. Cheers, Ralf
On Sat, Dec 26, 2020 at 3:33 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
And to preempt the obvious question: no we don't need to compare with Numba. That situation didn't change from last time we discussed it; Numba is a heavy and fragile runtime dependency, and supporting libraries like SciPy isn't Numba's core focus. I also checked in with Stan Seibert (Numba core dev) recently, and he agreed with that assessment.
Just to jump in here, I would say that supporting SciPy, specifically, isn't Numba's *current* core focus. As one of the most core PyData libraries (second only to NumPy), we agree that one needs to be very conservative about introducing new code and new ways of doing things to SciPy, and Numba's design approach makes it not a direct drop-in for Cython use cases. Pythran fits more naturally into a Cython usage pattern, which is beneficial here. Clearly Numba would need to have more robust ahead-of-time compilation support to be usable in a library like SciPy, and that is still on the back burner while we think about various issues. However, other libraries that do numerical computing like SciPy (but do not have the constraints of SciPy) *are* Numba's focus. I just wanted to make sure there was no confusion about this. :) As a meta comment, this PR is basically implementing a sort of dependency injection for the SciPy internals, to allow a different compiler system to be swapped in to compile a specific internal function. Where this could be generalized is very interesting, and relevant if there is a future where a number of SciPy functions could be compiled by one of two compilers. For example, if the type information embedded in a comment for Pythran here: https://github.com/scipy/scipy/pull/8306/files#diff-6e0de4105e10b6c609d5d186... were available at runtime and/or in a more readily parsible form, that would be part of opening up SciPy to more compiler tools. I'm not sure if a best practice has emerged for writing Python type annotations with NumPy types, though. (None of these questions should hold up this PR, which I have no opinion about as I'm not a SciPy maintainer. :) )
Please have a look at the PR and comment on it or here if there's something concerning.
Cheers, Ralf
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
On Mon, Dec 28, 2020 at 4:02 PM Stanley Seibert <sseibert@anaconda.com> wrote:
On Sat, Dec 26, 2020 at 3:33 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
And to preempt the obvious question: no we don't need to compare with Numba. That situation didn't change from last time we discussed it; Numba is a heavy and fragile runtime dependency, and supporting libraries like SciPy isn't Numba's core focus. I also checked in with Stan Seibert (Numba core dev) recently, and he agreed with that assessment.
Just to jump in here, I would say that supporting SciPy, specifically, isn't Numba's *current* core focus. As one of the most core PyData libraries (second only to NumPy), we agree that one needs to be very conservative about introducing new code and new ways of doing things to SciPy, and Numba's design approach makes it not a direct drop-in for Cython use cases. Pythran fits more naturally into a Cython usage pattern, which is beneficial here. Clearly Numba would need to have more robust ahead-of-time compilation support to be usable in a library like SciPy, and that is still on the back burner while we think about various issues.
However, other libraries that do numerical computing like SciPy (but do not have the constraints of SciPy) *are* Numba's focus. I just wanted to make sure there was no confusion about this. :)
Thanks Stan, that's all helpful context!
As a meta comment, this PR is basically implementing a sort of dependency injection for the SciPy internals, to allow a different compiler system to be swapped in to compile a specific internal function.
If you mean the conditional compile based on a SCIPY_USE_PYTHRAN environment variable, we don't want to keep that part long-term. It's just an escape hatch during the introduction period in case of issues, and we had the Cython code already so it was easy to do. Maintaining two implementations in parallel is usually a bad idea. Where this could be generalized is very interesting, and relevant if there
is a future where a number of SciPy functions could be compiled by one of two compilers. For example, if the type information embedded in a comment for Pythran here:
https://github.com/scipy/scipy/pull/8306/files#diff-6e0de4105e10b6c609d5d186...
were available at runtime and/or in a more readily parsible form, that would be part of opening up SciPy to more compiler tools. I'm not sure if a best practice has emerged for writing Python type annotations with NumPy types, though.
Yes, that is a great point. NumPy 1.20 will be the first release to include type annotations. Annotating ndarray properties like shape and dtype is complicated though and still WIP - initial support landed very recently in https://github.com/numpy/numpy/pull/17719. Also note that Transonic aims to use type annotations and then allow using Cython, Pythran and Numba as backends: https://fluiddyn.netlify.app/transonic-vision.html. Cheers, Ralf
participants (4)
-
Ralf Gommers -
Serge Guelton -
Stanley Seibert -
Tyler Reddy