Questions on the pypy+numpy project

Hi all. Hi Fijal, you tweeted in response to my https://twitter.com/#!/ianozsvald/status/124898441087299584 the other day. I met Travis Oliphant on Friday at the Enthought Cambridge office opening. Didrik Pinte and I mentioned that we'd offered £600 each towards pypy+numpy integration. Travis had a few thoughts on the matter and this has left me in the position of not being sure of the full costs and benefits of the pypy+numpy project. The main position (held by Travis and several others - and 'this is as best as I remember it and I'm open to correction') was that porting just numpy could leave much of the scipy ecosystem separated from pypy as the numpy port wouldn't have the same (and maybe I'm getting details mixed up here) API so couldn't be compiled easily. I've bcc'd Travis and Didrik, maybe someone else can come and clear the position (and correct my inevitable errors). I use numpy and parts of scipy and haven't looked into pypy's specifics so I'm far too ignorant on the whole subject. I'd like to pose some questions: * how big is the scipy ecosystem beyond numpy? What's the rough line count for Python, C, Fortran etc that depends on numpy? * can these other libraries *easily* be compiled against a pypy+numpy port (and if not, why not?) * are there other routes for numpy work (e.g. refactoring the core numpy libs and separating the C-dependent interface away, opening the door to a side-by-side pypy interface) that might benefit both the CPython and pypy communities? I apologise for the above being rather vague, I'm hoping some of you can help clarify the pros and cons of whatever options are available. Cheers, Ian. -- Ian Ozsvald (A.I. researcher) ian@IanOzsvald.com http://IanOzsvald.com http://MorConsulting.com/ http://StrongSteam.com/ http://SocialTiesApp.com/ http://TheScreencastingHandbook.com http://FivePoundApp.com/ http://twitter.com/IanOzsvald

Hi Ian, On Sun, Oct 16, 2011 at 10:20 PM, Ian Ozsvald <ian@ianozsvald.com> wrote:
I'd like to pose some questions: * how big is the scipy ecosystem beyond numpy? What's the rough line count for Python, C, Fortran etc that depends on numpy?
The ecosystem is pretty big. There are at least in the order of hundred of packages that depend directly on numpy and scipy. For scipy alone, the raw count is around 150k-300k LOC (it is a bit hard to estimate because we include some swig-generated code that I have ignored here, and some code duplication to deal with distutils insanity). There is around 80k LOC of fortran alone in there. More and more scientific code use cython for speed or just for interfacing with C (and recently C++). Other tools have been used for similar reasons (f2py, in particular, to automatically wrap fortran and C). f2py at least is quite tightly coupled to numpy C API. I know there is work for a pypy-friendly backend for cython, but I don't know where things are there. I would like to see less C boilerplate code in scipy, and more cython usage (which generates faster code and is much more maitainable); this can also benefit pypy, if only for making the scipy code less dependend on CPython details. One thing I have little doubt about is that pypy needs a "story" to makes wrapping of fortran/c/c++ libraries easy, because otherwise few people in the scientific community will be interested. For better or worse, there are tens of millions of lines of code written in those languages, and a lot of them domain specific (you will not write a decent FFT code without knowing a lot about its implementation details, same for large eigen values problems). There needs to be some automatic wrappers generators. Scipy alone easily wraps thousand if not more functions written in fortran. cheers, David

One thing I have little doubt about is that pypy needs a "story" to makes wrapping of fortran/c/c++ libraries easy, because otherwise few people in the scientific community will be interested. For better or worse, there are tens of millions of lines of code written in those languages, and a lot of them domain specific (you will not write a decent FFT code without knowing a lot about its implementation details, same for large eigen values problems). There needs to be some automatic wrappers generators. Scipy alone easily wraps thousand if not more functions written in fortran.
Yes, we're well aware of that and we don't plan to rewrite this existing codebase. As of now you can relatively easily call C/fortran from RPython and compile it together with PyPy. While PyPy does not have a C API, arrays are still "raw memory" which means you can pass pointers to the underlaying C libraries. We don't have (yet?) automatic binding generation but this is for later. The main thing is that we want to provide something immediately useful. That is a numpy which maybe does not integrate (yet) with the entire ecosystem, but is much faster on both array computations and pure python iterations, ufuncs etc. This would be very useful since you don't need to use Cython or any other things like this to provide working code and it already caters for some group of people. Cheers, fijal

The main thing is that we want to provide something immediately useful. That is a numpy which maybe does not integrate (yet) with the entire ecosystem, but is much faster on both array computations and pure python iterations, ufuncs etc.
from a motivational perspective this is also quite important: having numpypy running at great speads will draw the braincapital to get that wrapperstuff programmed. Harald -- GHUM GmbH Harald Armin Massa Spielberger Straße 49 70435 Stuttgart 0173/9409607 Amtsgericht Stuttgart, HRB 734971 - persuadere. et programmare

On Mon, Oct 17, 2011 at 9:31 AM, Massa, Harald Armin <chef@ghum.de> wrote:
The main thing is that we want to provide something immediately useful. That is a numpy which maybe does not integrate (yet) with the entire ecosystem, but is much faster on both array computations and pure python iterations, ufuncs etc.
from a motivational perspective this is also quite important: having numpypy running at great speads will draw the braincapital to get that wrapperstuff programmed.
Yes, that's our secret plan (not any more) :) Cheers, fijal

This would be very useful since you don't need to use Cython or any other things like this to provide working code and it already caters for some group of people.
Hi Fijal. This would be useful for a demo - but will it be useful for the userbase that becomes motivated to integrate Cython and SciPy? If it isn't useful to the wider community (which is the point I've made after David's email) then aren't we creating a (potentially) dead-end project rather than one that opens the doors to increased collaboration between the communities? Perhaps I should ask a wider question: If the pypy-numpy project only supports the core features of numpy and not the API (so excluding Cython/SciPy etc for now), what's the roadmap that lets people integrate SciPy's C/Fortran code in a maintainable way? I.e. how is the door opened to community members to introduce SciPy compatibility? Some idea of the complexity of the task would be very useful, preferably with input from people involved with CPython's numpy/scipy internals. i. -- Ian Ozsvald (A.I. researcher) ian@IanOzsvald.com http://IanOzsvald.com http://MorConsulting.com/ http://StrongSteam.com/ http://SocialTiesApp.com/ http://TheScreencastingHandbook.com http://FivePoundApp.com/ http://twitter.com/IanOzsvald

On Mon, Oct 17, 2011 at 8:29 AM, Ian Ozsvald <ian@ianozsvald.com> wrote:
This would be very useful since you don't need to use Cython or any other things like this to provide working code and it already caters for some group of people.
Hi Fijal. This would be useful for a demo - but will it be useful for the userbase that becomes motivated to integrate Cython and SciPy?
If it isn't useful to the wider community (which is the point I've made after David's email) then aren't we creating a (potentially) dead-end project rather than one that opens the doors to increased collaboration between the communities?
Perhaps I should ask a wider question: If the pypy-numpy project only supports the core features of numpy and not the API (so excluding Cython/SciPy etc for now), what's the roadmap that lets people integrate SciPy's C/Fortran code in a maintainable way? I.e. how is the door opened to community members to introduce SciPy compatibility? Some idea of the complexity of the task would be very useful, preferably with input from people involved with CPython's numpy/scipy internals.
i.
-- Ian Ozsvald (A.I. researcher) ian@IanOzsvald.com
http://IanOzsvald.com http://MorConsulting.com/ http://StrongSteam.com/ http://SocialTiesApp.com/ http://TheScreencastingHandbook.com http://FivePoundApp.com/ http://twitter.com/IanOzsvald _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
Let me ask the opposite question: What route do you envision that gives us both the speed we (and everyone else) desires, while not having any of these issues? That's not a question that has a very good answer I think. We could bring CPyExt up to a point where NumPy's C code runs, but that would be slow and a royal pain in the ass. What's the other alternative? We could port all of NumPy to be pure Python + wrapping code only for existing C/Fortran libraries. To be honest, that sounds very swell to me, would the core-numpy people go for that? Of course not, because it would be heinously slow on CPython, which I presume is unacceptable. So where does that leave us? Neither of the current platforms seems acceptable, what's the way forward where we're all in this together, because I'm having trouble seeing that (especially if, as Travis's post indicates backwards compatibility within NumPy means that none of the C APIs can be removed). Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero

On 17 October 2011 13:35, Alex Gaynor <alex.gaynor@gmail.com> wrote:
On Mon, Oct 17, 2011 at 8:29 AM, Ian Ozsvald <ian@ianozsvald.com> wrote:
This would be very useful since you don't need to use Cython or any other things like this to provide working code and it already caters for some group of people.
Hi Fijal. This would be useful for a demo - but will it be useful for the userbase that becomes motivated to integrate Cython and SciPy?
If it isn't useful to the wider community (which is the point I've made after David's email) then aren't we creating a (potentially) dead-end project rather than one that opens the doors to increased collaboration between the communities?
Perhaps I should ask a wider question: If the pypy-numpy project only supports the core features of numpy and not the API (so excluding Cython/SciPy etc for now), what's the roadmap that lets people integrate SciPy's C/Fortran code in a maintainable way? I.e. how is the door opened to community members to introduce SciPy compatibility? Some idea of the complexity of the task would be very useful, preferably with input from people involved with CPython's numpy/scipy internals.
i.
-- Ian Ozsvald (A.I. researcher) ian@IanOzsvald.com
http://IanOzsvald.com http://MorConsulting.com/ http://StrongSteam.com/ http://SocialTiesApp.com/ http://TheScreencastingHandbook.com http://FivePoundApp.com/ http://twitter.com/IanOzsvald _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
Let me ask the opposite question: What route do you envision that gives us both the speed we (and everyone else) desires, while not having any of these issues? That's not a question that has a very good answer I think. We could bring CPyExt up to a point where NumPy's C code runs, but that would be slow and a royal pain in the ass. What's the other alternative? We could port all of NumPy to be pure Python + wrapping code only for existing C/Fortran libraries. To be honest, that sounds very swell to me, would the core-numpy people go for that? Of course not, because it would be heinously slow on CPython, which I presume is unacceptable. So where does that leave us? Neither of the current platforms seems acceptable, what's the way forward where we're all in this together, because I'm having trouble seeing that (especially if, as Travis's post indicates backwards compatibility within NumPy means that none of the C APIs can be removed).
Travis' post seems to suggest that it is the responsibility of the *pypy* dev team to do the work necessary to integrate the numpy refactor (initially sponsored by Microsoft). That refactoring (smaller numpy core) seems like a great way forward for numpy - particularly if *it* wants to play well with multiple implementations, but it is unreasonable to expect the pypy team to pick that up! For pypy I can't see any better approach than the way they have taken. Once people are using numpy on pypy the limitations and missing parts will become clear, and not only will the way forward be more obvious but there will be more people involved to do the work. It seems odd to argue that extending numpy to pypy will be a net negative for the community! Sure there are some difficulties involved, just as there are difficulties with having multiple implementations in the first place, but the benefits are much greater. All the best, Michael Foord
Alex
-- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

For pypy I can't see any better approach than the way they have taken. Once people are using numpy on pypy the limitations and missing parts will become clear, and not only will the way forward be more obvious but there will be more people involved to do the work.
Michael - I agree that the PyPy community shouldn't do all the legwork! I agree also that the proposed path may spur more work (and maybe that's the best goal for now). I've gone back to the donations page: http://pypy.org/numpydonate.html to re-read the spec. What I get now (but didn't get before the discussion at Enthought Cambridge) is that "we don't plan to implement NumPy's C API" is a big deal (and not taking it on is entirely reasonable for this project!). In my mind (and maybe in the mind of some others who use scipy?) a base pypy+numpy project would easily open the door to matplotlib and all the other scipy goodies, it looks now like that isn't the case. Hence my questions to try to understand what else might be involved. i.
Travis' post seems to suggest that it is the responsibility of the *pypy* dev team to do the work necessary to integrate the numpy refactor (initially sponsored by Microsoft). That refactoring (smaller numpy core) seems like a great way forward for numpy - particularly if *it* wants to play well with multiple implementations, but it is unreasonable to expect the pypy team to pick that up!
It seems odd to argue that extending numpy to pypy will be a net negative for the community! Sure there are some difficulties involved, just as there are difficulties with having multiple implementations in the first place, but the benefits are much greater.
All the best,
Michael Foord
Alex
-- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
--
May you do good and not evil May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html
-- Ian Ozsvald (A.I. researcher) ian@IanOzsvald.com http://IanOzsvald.com http://MorConsulting.com/ http://StrongSteam.com/ http://SocialTiesApp.com/ http://TheScreencastingHandbook.com http://FivePoundApp.com/ http://twitter.com/IanOzsvald

On 17 October 2011 16:42, Ian Ozsvald <ian@ianozsvald.com> wrote:
For pypy I can't see any better approach than the way they have taken. Once people are using numpy on pypy the limitations and missing parts will become clear, and not only will the way forward be more obvious but there will be more people involved to do the work.
Michael - I agree that the PyPy community shouldn't do all the legwork! I agree also that the proposed path may spur more work (and maybe that's the best goal for now).
I've gone back to the donations page: http://pypy.org/numpydonate.html to re-read the spec. What I get now (but didn't get before the discussion at Enthought Cambridge) is that "we don't plan to implement NumPy's C API" is a big deal (and not taking it on is entirely reasonable for this project!).
In my mind (and maybe in the mind of some others who use scipy?) a base pypy+numpy project would easily open the door to matplotlib and all the other scipy goodies, it looks now like that isn't the case. Hence my questions to try to understand what else might be involved.
Well, I think it definitely "opens the door" - certainly a lot more than not doing the work! You have to start somewhere. It seems like other projects (like the pypy cython backend) will help make other parts of project easier down the line. Back to Alex's question, how else would you *suggest* starting? Isn't a core port of the central parts the obvious way to begin? Given the architecture of numpy it does seem that it opens up a whole bunch of questions around numpy on multiple implementations. Certainly pypy should be involved in the discussion here, but I don't think it is up to pypy to find (or implement) the answers... All the best, Michael
i.
Travis' post seems to suggest that it is the responsibility of the *pypy* dev team to do the work necessary to integrate the numpy refactor (initially sponsored by Microsoft). That refactoring (smaller numpy core) seems like a great way forward for numpy - particularly if *it* wants to play well with multiple implementations, but it is unreasonable to expect the pypy team to pick that up!
It seems odd to argue that extending numpy to pypy will be a net negative for the community! Sure there are some difficulties involved, just as there are difficulties with having multiple implementations in the first place, but the benefits are much greater.
All the best,
Michael Foord
Alex
-- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
--
May you do good and not evil May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html
-- Ian Ozsvald (A.I. researcher) ian@IanOzsvald.com
http://IanOzsvald.com http://MorConsulting.com/ http://StrongSteam.com/ http://SocialTiesApp.com/ http://TheScreencastingHandbook.com http://FivePoundApp.com/ http://twitter.com/IanOzsvald
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

Monday 17 October 2011 you wrote:
On 17 October 2011 16:42, Ian Ozsvald <ian@ianozsvald.com> wrote:
For pypy I can't see any better approach than the way they have taken.
Once
people are using numpy on pypy the limitations and missing parts will
become
clear, and not only will the way forward be more obvious but there will
be
more people involved to do the work.
Michael - I agree that the PyPy community shouldn't do all the legwork! I agree also that the proposed path may spur more work (and maybe that's the best goal for now).
I've gone back to the donations page: http://pypy.org/numpydonate.html to re-read the spec. What I get now (but didn't get before the discussion at Enthought Cambridge) is that "we don't plan to implement NumPy's C API" is a big deal (and not taking it on is entirely reasonable for this project!).
In my mind (and maybe in the mind of some others who use scipy?) a base pypy+numpy project would easily open the door to matplotlib and all the other scipy goodies, it looks now like that isn't the case. Hence my questions to try to understand what else might be involved.
Well, I think it definitely "opens the door" - certainly a lot more than not doing the work! You have to start somewhere.
It seems like other projects (like the pypy cython backend) will help make other parts of project easier down the line.
Back to Alex's question, how else would you *suggest* starting? Isn't a core port of the central parts the obvious way to begin?
Given the architecture of numpy it does seem that it opens up a whole bunch of questions around numpy on multiple implementations. Certainly pypy should be involved in the discussion here, but I don't think it is up to pypy to find (or implement) the answers...
I'd just like to note that the compelling reason for PyPy to develop numpy support is popular demand. We did a survey last spring, in which an overwhelming number of people asked for numpy support. This indicates that there is a large group of people who will be reap benefits from using PyPy plus Numpy, without specific support for scipy packages. Some of them may want to port their favourite scipy packages to work with PyPy. If the PyPy community decided that it was more important to keep the integrity of the Numpy community, we would hold these people back in order to prevent a fragmentation. I think we have to accept that some people have needs that are better served with what PyPy can provide today, while others will have to wait for theirs to be dealt with. This is the natural succession of technologies. From my perspective, PyPy based scientific computing makes sense. You get to write more of your code in a high level language, saving implementation time. If you agree with this, then the most sensible thing is to help make the transition as smooth as possible. Making it easy to integrate modules from FORTRAN, C++ and whatnot is part of such a task. Exactly what to do should be demand driven, and that is why PyPy doesn't have a plan or a timetable for these things. Like in all technology shifts, some of the old stuff will be easy to bring along, some will be hard and will be done anyway. The rest will fall to the wayside. If you believe the researchers are better served writing more code in low level languages and dealing with the issues of integrating their low level stuff with Python (in order to not to have to modify existing packages), then you will probably hope that PyPy is a fad that will die. In the very short term you are probably right. In the very short term it will be quicker to wade across a river rather than build a bridge. Jacob

Jacob Hallén, 18.10.2011 18:41:
I'd just like to note that the compelling reason for PyPy to develop numpy support is popular demand. We did a survey last spring, in which an overwhelming number of people asked for numpy support. This indicates that there is a large group of people who will be reap benefits from using PyPy plus Numpy, without specific support for scipy packages.
Depends on what the question was. Many people say "NumPy", and when you ask back, you find out that they actually meant "SciPy" or at least "NumPy and parts x, y and z of its ecosystem that I commonly use, oh, and I forgot about abc as well, and ...". NumPy itself is just the most visible pile in a fairly vast landscape.
From my perspective, PyPy based scientific computing makes sense. You get to write more of your code in a high level language, saving implementation time. If you agree with this, then the most sensible thing is to help make the transition as smooth as possible. Making it easy to integrate modules from FORTRAN, C++ and whatnot is part of such a task. Exactly what to do should be demand driven, and that is why PyPy doesn't have a plan or a timetable for these things. Like in all technology shifts, some of the old stuff will be easy to bring along, some will be hard and will be done anyway. The rest will fall to the wayside.
I don't think anyone here speaks against PyPy integrating with the scientific world and all of its existing achievements in terms of code. However, *that* is the right direction. It's PyPy that needs to integrate. Integration with (C)Python is already there, for tons of tools and in manyfold ways. Suggesting that people throw that away, that they restart from scratch and maintain a separate set of integration code in parallel, just to use yet another Python implementation, is asking for a huge waste of time.
If you believe the researchers are better served writing more code in low level languages and dealing with the issues of integrating their low level stuff with Python (in order to not to have to modify existing packages), then you will probably hope that PyPy is a fad that will die. In the very short term you are probably right. In the very short term it will be quicker to wade across a river rather than build a bridge.
Sorry to get you wrong, but that smells a bit too much like a "PyPy will generate the fastest code on earth, so you won't need anything else" ad, which is not supported by any facts I know of. I'm yet to see PyPy compete with FFTW, just as an example. Researchers *are* better served by integrating their own and other people's "low-level stuff", than by writing it all over again. They want to do research, not programming. And there will always be points where they resort to a low-level language, be it because of specific performance requirements or because they need to integrate it with something else than Python, be it in the form of PyPy or CPython. Remember that C is many times more ubiquitous than the entire set of Python implementations taken together. That won't change. Stefan

On Mon, Oct 17, 2011 at 2:22 PM, Michael Foord <fuzzyman@gmail.com> wrote:
Travis' post seems to suggest that it is the responsibility of the *pypy* dev team to do the work necessary to integrate the numpy refactor (initially sponsored by Microsoft). That refactoring (smaller numpy core) seems like a great way forward for numpy - particularly if *it* wants to play well with multiple implementations, but it is unreasonable to expect the pypy team to pick that up!
I am pretty sure Travis did not intend to suggest that (I did not understand that from his wordings, but maybe that's because we had discussion in person on that topic several times already). There are a lot of reasons to do that refactor that has nothing to do with pypy, so the idea is more: let's talk about what pypy would need to make this refactor beneficial for pypy *as well*. I (and other) have advocated using more cython inside numpy and scipy. We could share resources to do that.
It seems odd to argue that extending numpy to pypy will be a net negative for the community! Sure there are some difficulties involved, just as there are difficulties with having multiple implementations in the first place, but the benefits are much greater.
The net negative would be the community split, with numpy losing some resources taken by numpy on pypy. This seems like a plausible situation. Without a C numpy API, you can't have scipy or matplotlib, no scikit-learns, etc... But you could hide most of it behind cython, which has momentum in the scientific community. Then a realistic approach becomes: - makes the cython+pypy backend a reality - ideally make cython to wrap fortran a reality - convert as much as possible from python C API to cython People of all level can participate. The first point in particular could help pypy besides the scipy community. And that's a plan where both parties would benefit from each other. cheers, David
All the best,
Michael Foord
Alex
-- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
--
May you do good and not evil May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev

On Mon, Oct 17, 2011 at 1:20 PM, David Cournapeau <cournape@gmail.com>wrote:
On Mon, Oct 17, 2011 at 2:22 PM, Michael Foord <fuzzyman@gmail.com> wrote:
Travis' post seems to suggest that it is the responsibility of the *pypy* dev team to do the work necessary to integrate the numpy refactor
(initially
sponsored by Microsoft). That refactoring (smaller numpy core) seems like a great way forward for numpy - particularly if *it* wants to play well with multiple implementations, but it is unreasonable to expect the pypy team to pick that up!
I am pretty sure Travis did not intend to suggest that (I did not understand that from his wordings, but maybe that's because we had discussion in person on that topic several times already).
There are a lot of reasons to do that refactor that has nothing to do with pypy, so the idea is more: let's talk about what pypy would need to make this refactor beneficial for pypy *as well*. I (and other) have advocated using more cython inside numpy and scipy. We could share resources to do that.
It seems odd to argue that extending numpy to pypy will be a net negative for the community! Sure there are some difficulties involved, just as there are difficulties with having multiple implementations in the first place, but the benefits are much greater.
The net negative would be the community split, with numpy losing some resources taken by numpy on pypy. This seems like a plausible situation.
Without a C numpy API, you can't have scipy or matplotlib, no scikit-learns, etc... But you could hide most of it behind cython, which has momentum in the scientific community. Then a realistic approach becomes: - makes the cython+pypy backend a reality - ideally make cython to wrap fortran a reality - convert as much as possible from python C API to cython
People of all level can participate. The first point in particular could help pypy besides the scipy community. And that's a plan where both parties would benefit from each other.
cheers,
David
All the best,
Michael Foord
Alex
-- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
--
May you do good and not evil May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
Why can't you have scipy and friends without a C-API? Presumabley it's all code that either manipulates an array or calls into an existing lib to manipulate an array. Why can't you write pure python code to manipulate arrays and then call into other libs via ctypes and friends? Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero

On Mon, Oct 17, 2011 at 6:22 PM, Alex Gaynor <alex.gaynor@gmail.com> wrote:
Why can't you have scipy and friends without a C-API? Presumabley it's all code that either manipulates an array or calls into an existing lib to manipulate an array. Why can't you write pure python code to manipulate arrays and then call into other libs via ctypes and friends?
Sorry, I was not very clear: with scipy *as of today*, you can't make it work without supporting the numpy C API. What I meant by hiding is that once a code uses numpy C API and the python C API only through cython, it becomes much easier to support both CPython and pypy at least in principle (and the code is more maintainable). But scipy is basically pure python code + lots of pure fortran/C code + wrappers around it. Having something that automatically wraps C/fortran for pypy is something that seems reasonable for pypy people to do and would narrow the gap. cheers, David

On Mon, Oct 17, 2011 at 7:20 PM, David Cournapeau <cournape@gmail.com> wrote:
On Mon, Oct 17, 2011 at 2:22 PM, Michael Foord <fuzzyman@gmail.com> wrote:
Travis' post seems to suggest that it is the responsibility of the *pypy* dev team to do the work necessary to integrate the numpy refactor (initially sponsored by Microsoft). That refactoring (smaller numpy core) seems like a great way forward for numpy - particularly if *it* wants to play well with multiple implementations, but it is unreasonable to expect the pypy team to pick that up!
I am pretty sure Travis did not intend to suggest that (I did not understand that from his wordings, but maybe that's because we had discussion in person on that topic several times already).
There are a lot of reasons to do that refactor that has nothing to do with pypy, so the idea is more: let's talk about what pypy would need to make this refactor beneficial for pypy *as well*. I (and other) have advocated using more cython inside numpy and scipy. We could share resources to do that.
I think alex's question was whether the refactoring is going to be merged upstream or not (and what's the plan). I don't think you understand our point. Reusing the current numpy implementation is not giving us much *even* if it was all Cython and no C API. It's just that we can do cool stuff with the JIT. *Right now* operation chain like this: a, b, c = [numpy.arange(100) for i in range(3)] a + b - c becomes ... i = 0 while i < 100: res[i] = a[i] + b[i] - c[i] without allocating intermediates. In the near future we plan to implement this using SSE so it becomes even faster. It also applies to all kinds of operations that we implemented in RPython - ufuncs, castings etc. All of them get unrolled into a single loop right now, they can get nicely vectorized in the near future. Having numpy still implementing stuff in C doesn't buy us much - we wouldn't be able to do all the cool stuff we're doing now and we won't get all the speedups. That's why we don't reuse the current numpy and not because it uses C API. Now the scenario is slightly different with FFT and other more complex algorithms. We want to call existing C code with array pointers so we don't have to reimplement it. Now tell me - how us moving pieces of scipy or numpy to cython give us anything?
It seems odd to argue that extending numpy to pypy will be a net negative for the community! Sure there are some difficulties involved, just as there are difficulties with having multiple implementations in the first place, but the benefits are much greater.
The net negative would be the community split, with numpy losing some resources taken by numpy on pypy. This seems like a plausible situation.
So, you're saying that giving people the ability to run numpy code faster if the refrain from using scipy and matplotlib (for now) is producing the community split? How does it? My interpretation is that we want to give people powerful tools that can be used to achieve things not possible before - like not using cython but instead implementing it in python. I imagine how someone might not get value from that, but how does that decrease the value?
Without a C numpy API, you can't have scipy or matplotlib, no scikit-learns, etc... But you could hide most of it behind cython, which has momentum in the scientific community. Then a realistic approach becomes: - makes the cython+pypy backend a reality - ideally make cython to wrap fortran a reality - convert as much as possible from python C API to cython
People of all level can participate. The first point in particular could help pypy besides the scipy community. And that's a plan where both parties would benefit from each other.
I think our priority right now is to provide a working numpy. Next point is to make it use SSE. Does that fit somehow with your plan? Cheers, fijal

On Mon, Oct 17, 2011 at 8:40 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
On Mon, Oct 17, 2011 at 7:20 PM, David Cournapeau <cournape@gmail.com> wrote:
On Mon, Oct 17, 2011 at 2:22 PM, Michael Foord <fuzzyman@gmail.com> wrote:
Travis' post seems to suggest that it is the responsibility of the *pypy* dev team to do the work necessary to integrate the numpy refactor (initially sponsored by Microsoft). That refactoring (smaller numpy core) seems like a great way forward for numpy - particularly if *it* wants to play well with multiple implementations, but it is unreasonable to expect the pypy team to pick that up!
I am pretty sure Travis did not intend to suggest that (I did not understand that from his wordings, but maybe that's because we had discussion in person on that topic several times already).
There are a lot of reasons to do that refactor that has nothing to do with pypy, so the idea is more: let's talk about what pypy would need to make this refactor beneficial for pypy *as well*. I (and other) have advocated using more cython inside numpy and scipy. We could share resources to do that.
I think alex's question was whether the refactoring is going to be merged upstream or not (and what's the plan).
I don't know if the refactoring will be merged as is, but at least I think the refactoring needs to happen, independently of pypy. There is no denying that parts of numpy's code are crufty, some stuff not clearly separated, etc...
I don't think you understand our point.
I really do. I understand that pypy is a much better platform than cpython to do lazy evalution, fast pure python ufunc. Nobody denies that. To be even clearer: if the goal is to have some concept of array which looks like numpy, then yes, using numpy's code is useless.
Reusing the current numpy implementation is not giving us much *even* if it was all Cython and no C API.
This seems to be the source of the disagreement: I think reusing numpy means that you are much more likely to be able to run the existing scripts using numpy on top of pypy. So my question is whether the disagreement is on the value of that, or whether pypy community generally thinks they can rewrite a "numpypy" which is a drop-in replacement of numpy on cpython without using original numpy's code.
So, you're saying that giving people the ability to run numpy code faster if the refrain from using scipy and matplotlib (for now) is producing the community split? How does it? My interpretation is that we want to give people powerful tools that can be used to achieve things not possible before - like not using cython but instead implementing it in python. I imagine how someone might not get value from that, but how does that decrease the value?
It is not my place to questioning anyone's value, we all have our different usages. But the split is obvious: you may have scientific code which works on numpy+pypy and does not on numpy+python, and vice and versa.
I think our priority right now is to provide a working numpy. Next point is to make it use SSE. Does that fit somehow with your plan?
I guess there is an ambiguity in the exact meaning of "working numpy". Something that looks like numpy with cool features from pypy, or something that can be used as a drop-in of numpy (any script using numpy will work with the numpy+pypy). If it is the former, than again, I would agree that there is not much point in reusing numpy's code. But then, I think calling it numpy is a bit confusing. cheers, David

On Mon, Oct 17, 2011 at 10:18 PM, David Cournapeau <cournape@gmail.com> wrote:
On Mon, Oct 17, 2011 at 8:40 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
On Mon, Oct 17, 2011 at 7:20 PM, David Cournapeau <cournape@gmail.com> wrote:
On Mon, Oct 17, 2011 at 2:22 PM, Michael Foord <fuzzyman@gmail.com> wrote:
Travis' post seems to suggest that it is the responsibility of the *pypy* dev team to do the work necessary to integrate the numpy refactor (initially sponsored by Microsoft). That refactoring (smaller numpy core) seems like a great way forward for numpy - particularly if *it* wants to play well with multiple implementations, but it is unreasonable to expect the pypy team to pick that up!
I am pretty sure Travis did not intend to suggest that (I did not understand that from his wordings, but maybe that's because we had discussion in person on that topic several times already).
There are a lot of reasons to do that refactor that has nothing to do with pypy, so the idea is more: let's talk about what pypy would need to make this refactor beneficial for pypy *as well*. I (and other) have advocated using more cython inside numpy and scipy. We could share resources to do that.
I think alex's question was whether the refactoring is going to be merged upstream or not (and what's the plan).
I don't know if the refactoring will be merged as is, but at least I think the refactoring needs to happen, independently of pypy. There is no denying that parts of numpy's code are crufty, some stuff not clearly separated, etc...
I don't think you understand our point.
I really do. I understand that pypy is a much better platform than cpython to do lazy evalution, fast pure python ufunc. Nobody denies that. To be even clearer: if the goal is to have some concept of array which looks like numpy, then yes, using numpy's code is useless.
Reusing the current numpy implementation is not giving us much *even* if it was all Cython and no C API.
This seems to be the source of the disagreement: I think reusing numpy means that you are much more likely to be able to run the existing scripts using numpy on top of pypy. So my question is whether the disagreement is on the value of that, or whether pypy community generally thinks they can rewrite a "numpypy" which is a drop-in replacement of numpy on cpython without using original numpy's code.
Ok Reusing numpy is maybe more likely to run the existing code indeed, but we'll take care to be compatible (same with Python as a language actually). Reusing the CPython C API parts of numpy however does mean that we nullify all the good parts of pypy - this is entirely pointless from my perspective. I can't see how you can get both JIT running nicely and reuse most of numpy. You have to sacrifice something and I would be willing to sacrifice code reuse. Indeed you would end up with two numpy implementations but it's not like numpy is changing that much after all. We can provide a cython or some sort of API to integrate with the existing legacy code later, but the point stays - I can't see the plan of using cool parts of pypy and numpy together. This is the question of what is harder - writing a reasonable JIT or writing numpy. I would say numpy and you guys seems to say JIT. Cheers, fijal

On Tue, Oct 18, 2011 at 11:20, Maciej Fijalkowski <fijall@gmail.com> wrote:
numpy together. This is the question of what is harder - writing a reasonable JIT or writing numpy. I would say numpy and you guys seems to say JIT.
I'm confused -- I'm fairly convinced you think that a reasonable JIT is harder than writing numpy, and not the other way around? Cheers, Dirkjan

Hi, On Tue, Oct 18, 2011 at 11:34, Dirkjan Ochtman <dirkjan@ochtman.nl> wrote:
I'm confused -- I'm fairly convinced you think that a reasonable JIT is harder than writing numpy, and not the other way around?
Let me chime in --- applying the JIT to "numpypy" or to any other piece of RPython code is, if not trivial, at least very straightforward. That's what our "JIT generator" does for you. In comparison, writing numpy (in whatever way, including all the discussions here) is a much longer task. Fijal is sticking to his point, which is that if we rewrite (large parts of) numpy in RPython, we are getting JIT support for free; but if we are *only* going down the route of interfacing with existing pieces of C code, we don't get any JIT in the end, and the performance will just suck. Not to mention that from my point of view it's clear which of the two paths is best to attract newcomers to pypy. A bientôt, Armin.

On Tue, Oct 18, 2011 at 11:34 AM, Dirkjan Ochtman <dirkjan@ochtman.nl> wrote:
On Tue, Oct 18, 2011 at 11:20, Maciej Fijalkowski <fijall@gmail.com> wrote:
numpy together. This is the question of what is harder - writing a reasonable JIT or writing numpy. I would say numpy and you guys seems to say JIT.
I'm confused -- I'm fairly convinced you think that a reasonable JIT is harder than writing numpy, and not the other way around?
Yes, of course you're right :-)

On 17 October 2011 18:20, David Cournapeau <cournape@gmail.com> wrote: [snip...] On Mon, Oct 17, 2011 at 2:22 PM, Michael Foord <fuzzyman@gmail.com> wrote:
It seems odd to argue that extending numpy to pypy will be a net negative for the community! Sure there are some difficulties involved, just as there are difficulties with having multiple implementations in the first place, but the benefits are much greater.
The net negative would be the community split, with numpy losing some resources taken by numpy on pypy. This seems like a plausible situation.
Note that this is *exactly* the same "negative" that Python itself faces with multiple implementations. It has in fact been a great positive, widening the community and improving Python (and yes sometimes improving it by pointing out its problems). All the best, Michael
Without a C numpy API, you can't have scipy or matplotlib, no scikit-learns, etc... But you could hide most of it behind cython, which has momentum in the scientific community. Then a realistic approach becomes: - makes the cython+pypy backend a reality - ideally make cython to wrap fortran a reality - convert as much as possible from python C API to cython
People of all level can participate. The first point in particular could help pypy besides the scipy community. And that's a plan where both parties would benefit from each other.
cheers,
David
All the best,
Michael Foord
Alex
-- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
--
May you do good and not evil May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

Michael Foord, 18.10.2011 11:44:
On 17 October 2011 18:20, David Cournapeau wrote: On Mon, Oct 17, 2011 at 2:22 PM, Michael Foord wrote:
It seems odd to argue that extending numpy to pypy will be a net negative for the community! Sure there are some difficulties involved, just as there are difficulties with having multiple implementations in the first place, but the benefits are much greater.
The net negative would be the community split, with numpy losing some resources taken by numpy on pypy. This seems like a plausible situation.
Note that this is *exactly* the same "negative" that Python itself faces with multiple implementations. It has in fact been a great positive, widening the community and improving Python (and yes sometimes improving it by pointing out its problems).
I think both of you are talking about two different scenarios here. One situation is where PyPy gains a NumPy compatible implementation (based on NumPy or not) and most code that runs on NumPy today can run on either CPython's NumPy or PyPy's NumPy. That may lead to the gains that you are talking about, because users can freely choose what suits their needs best, without getting into dependency hell. It may also eventually lead to changes in CPython's NumPy to adapt to requirements or improvements in PyPy's. Both could learn from each other and win. The other situation is where PyPy does its own thing and supports some NumPy code that happens to run faster than in CPython, while other code does not work at all, with the possibility to replace it in a PyPy specific way. That would mean that some people would write code for one platform that won't run on the other, and vice-versa, although it actually deals with the same kind of data. This strategy is sometimes referred to as "embrace, extend and extinguish". It would not be an improvement, not for CPython, likely not for PyPy either, and certainly not for the scientific Python community as a whole. Stefan

Hi, On Tue, Oct 18, 2011 at 14:19, Stefan Behnel <stefan_ml@behnel.de> wrote:
The other situation is where PyPy does its own thing and supports some NumPy code that happens to run faster than in CPython, while other code does not work at all, with the possibility to replace it in a PyPy specific way.
I think you are disregarding what 8 years of the PyPy project should have made obvious. Yes, some code will not run at all on PyPy at first, and that amount of code is going to be reduced over time. But what we want is to avoid a community split, so we are never, ever, going to add and advertise PyPy-only ways to write programs. A bientôt, Armin.

On Tue, Oct 18, 2011 at 1:41 PM, Armin Rigo <arigo@tunes.org> wrote:
Hi,
On Tue, Oct 18, 2011 at 14:19, Stefan Behnel <stefan_ml@behnel.de> wrote:
The other situation is where PyPy does its own thing and supports some NumPy code that happens to run faster than in CPython, while other code does not work at all, with the possibility to replace it in a PyPy specific way.
I think you are disregarding what 8 years of the PyPy project should have made obvious. Yes, some code will not run at all on PyPy at first, and that amount of code is going to be reduced over time. But what we want is to avoid a community split, so we are never, ever, going to add and advertise PyPy-only ways to write programs.
Ok. In that case, it is fair to say that you are talking about a full reimplementation of the whole scipy ecosystem, at least as much as pypy itself is a reimplementation of python ? (since none of the existing ecosystem will work without the C numpy API). Sorry for being dense, just want to make sure I am not misrepresenting your approach, David

Hi David, On Tue, Oct 18, 2011 at 18:29, David Cournapeau <cournape@gmail.com> wrote:
(...) with the possibility to replace it in a PyPy specific way.
I think you are disregarding what 8 years of the PyPy project should have made obvious. (...)
Ok. In that case, it is fair to say that you are talking about a full reimplementation of the whole scipy ecosystem, at least as much as pypy itself is a reimplementation of python ?
I think the original topic of this discussion is numpy, not scipy. The answer is that I don't know. I am sure that people will reimplement whatever module is needed, or design a generic but slower way to interface with C a la cpyext, or write a different C API, or rely on Cython versions of their libraries and have Cython support in PyPy... or more likely all of these approaches and more. The point is that right now we are focusing on numpy only, and we want to make existing pure Python numpy programs run fast --- not just run horribly slowly --- both in the case of "standard" numpy programs, and in the case of programs that do not strictly follow the mold of "take your algorithm, then shuffle it and rewrite it and possibly obfuscate it until it is expressed as matrix operations with no computation left in pure Python". This is the first step for us right now. It will take some time before we have to even consider running scipy programs. By then I imagine that either the approach works and delivers good performance --- and then people (us and others) will have to consider the next steps to build on top of that --- or it just doesn't (which looks unlikely given the good preliminary results, which is why we can ask for support via donations). We did not draw precise plans for what comes next. I think the above would already be a very useful result for some users. But to me, it looks like a strong enough pull to motivate some more people to do the next steps --- Cython, C API, rewrite of some modules, and so on, including the perfectly fine opinion "in my case pypy is not giving enough benefits for me to care". Note that this is roughly the same issues and same solution spaces as the ones that exist in any domain with PyPy, not just numpy/scipy. A bientôt, Armin.

On Tue, Oct 18, 2011 at 8:02 PM, Armin Rigo <arigo@tunes.org> wrote:
Hi David,
On Tue, Oct 18, 2011 at 18:29, David Cournapeau <cournape@gmail.com> wrote:
(...) with the possibility to replace it in a PyPy specific way.
I think you are disregarding what 8 years of the PyPy project should have made obvious. (...)
Ok. In that case, it is fair to say that you are talking about a full reimplementation of the whole scipy ecosystem, at least as much as pypy itself is a reimplementation of python ?
I think the original topic of this discussion is numpy, not scipy. The answer is that I don't know. I am sure that people will reimplement whatever module is needed, or design a generic but slower way to interface with C a la cpyext, or write a different C API, or rely on Cython versions of their libraries and have Cython support in PyPy... or more likely all of these approaches and more.
The point is that right now we are focusing on numpy only, and we want to make existing pure Python numpy programs run fast --- not just run horribly slowly --- both in the case of "standard" numpy programs, and in the case of programs that do not strictly follow the mold of "take your algorithm, then shuffle it and rewrite it and possibly obfuscate it until it is expressed as matrix operations with no computation left in pure Python".
This is the first step for us right now. It will take some time before we have to even consider running scipy programs. By then I imagine that either the approach works and delivers good performance --- and then people (us and others) will have to consider the next steps to build on top of that --- or it just doesn't (which looks unlikely given the good preliminary results, which is why we can ask for support via donations).
We did not draw precise plans for what comes next. I think the above would already be a very useful result for some users. But to me, it looks like a strong enough pull to motivate some more people to do the next steps --- Cython, C API, rewrite of some modules, and so on, including the perfectly fine opinion "in my case pypy is not giving enough benefits for me to care". Note that this is roughly the same issues and same solution spaces as the ones that exist in any domain with PyPy, not just numpy/scipy.
Thank you for the clear explanation, Armin, that makes things much clearer, at least to me. cheers, David

I think the original topic of this discussion is numpy, not scipy. The answer is that I don't know. I am sure that people will reimplement whatever module is needed, or design a generic but slower way to interface with C a la cpyext, or write a different C API, or rely on Cython versions of their libraries and have Cython support in PyPy... or more likely all of these approaches and more.
Great discussion... Any idea how "micro" the micronumpy implementation will be? Numpy includes matrix multiplication, eigenvalue decomposition, histogramming, etc. For all the people meaning Scipy when they get excited for a Numpy implementation, their feature of choice may be included in Numpy after all.

On 10/18/2011 02:41 PM Armin Rigo wrote:
Hi,
On Tue, Oct 18, 2011 at 14:19, Stefan Behnel<stefan_ml@behnel.de> wrote:
The other situation is where PyPy does its own thing and supports some NumPy code that happens to run faster than in CPython, while other code does not work at all, with the possibility to replace it in a PyPy specific way.
I think you are disregarding what 8 years of the PyPy project should have made obvious. Yes, some code will not run at all on PyPy at first, and that amount of code is going to be reduced over time. But what we want is to avoid a community split, so we are never, ever, going to add and advertise PyPy-only ways to write programs.
A bientôt,
Armin.
Just the same, I think PyPy could be allowed to have an "import that" ;-) I think I read somewhere that PyPy's ambitions were not just to serve official Python with fast implementation, but possibly other language development too. Is that still true? If one wanted to use PyPy's great infrastructure to implement a new language, what would be the pypythonic bootstrapping path towards a self-hosting new language? BTW, would self-hosting be seen as a disruptive forking goal? If the new language were just a tweak on Python, what would be the attitude towards starting with a source-source preprocessor followed by invoking pypy on the result? Just as an example (without looking at grammar problems for the moment), what if I wanted to transform just assignments such that x:=expr meant x=expr as usual, except produced source mods to tell pypy that subsequently it could assume that x had unchanging type (which it might be able to infer anyway in most cases, but it would also make programmer intent human-perceptible). In similar vein, x::=expr could mean x's value (and type) would be frozen after the first evaluation. This raises the question of how would I best tell pypy this meta-information about x with legal manual edits now? Assertions? Could you see pypy accepting command-line options with information about specific variables, functions, modules, etc.? (such options could of course be collected in a file referenced from a command line option). It is then a short step to create some linkage between the source files and the meta-data files, e.g. by file name extensions like .py and .pyc for the same file. Maybe .pym? If so pypy could look for .pym the way it looks for .pyc at the appropriate time. Going further, what about preprocessing with a statement decorator using e.g., @@decomodule.fun statement # with suite to mean the preprocessor at read time should import decomodule in a special environment and pass the statement (with suite) to decomodule.fun for source transformation with return of source to be substituted. BTW, during the course of preprocessing, the "special environment" for importing statement-decorating modules could persist, and state from early decorator calls could affect subsequent ones. Well, this would be Python with macros in a sense, so it would presumably be too disruptive to be considered for any pythonic Python. OTOH, it might be an interesting path for some variations to converge under one sort-of-pythonic (given the python decorator as precedent) source-transformation markup methodology. (I'm just looking for reaction to the general idea, not specific syntax problems). Regards, Bengt Richter

Hi Bengt, PyPy is indeed supporting multiple _existing_ languages (with SWI Prolog being the 2nd quasi-complete language right now). However, most of us are not interested in exploratory language design, say in the form of syntax tweaks to Python. You are welcome to fork PyPy's bitbucket repository and hack there, but you will have more interested answers if you move this discussion somewhere more appropriate (like the python-ideas mailing list). A bientôt, Armin.

On 10/19/2011 02:38 PM Armin Rigo wrote:
Hi Bengt,
PyPy is indeed supporting multiple _existing_ languages (with SWI Prolog being the 2nd quasi-complete language right now). However, most of us are not interested in exploratory language design, say in the form of syntax tweaks to Python.
You are welcome to fork PyPy's bitbucket repository and hack there, but you will have more interested answers if you move this discussion somewhere more appropriate (like the python-ideas mailing list).
A bientôt,
Armin. Thank you.
Regards, Bengt Richter

Let me ask the opposite question: What route do you envision that gives us both the speed we (and everyone else) desires, while not having any of these issues? That's not a question that has a very good answer I think.
Hi Alex. I don't have a proposed route. I'm (sadly) too ignorant, I'm voicing the issues that came up in conversation. Seeing as I offered to put up money but the proposed route might not achieve my aims (having a good numpy foundation running with PyPy which opens the door to scipy support) I figure that I need to ask some questions - even if only to reduce my own ignorance. This isn't to say I want to avoid donating (*not at all*) - I just want to understand what looks like a wider set of issues than I'd originally understood from the short discussion at EuroPython during the PyPy demo and call for donations. See the reply to Michael for an extra detail. i. -- Ian Ozsvald (A.I. researcher) ian@IanOzsvald.com http://IanOzsvald.com http://MorConsulting.com/ http://StrongSteam.com/ http://SocialTiesApp.com/ http://TheScreencastingHandbook.com http://FivePoundApp.com/ http://twitter.com/IanOzsvald

On Mon, Oct 17, 2011 at 5:30 PM, Ian Ozsvald <ian@ianozsvald.com> wrote:
Let me ask the opposite question: What route do you envision that gives us both the speed we (and everyone else) desires, while not having any of these issues? That's not a question that has a very good answer I think.
Hi Alex. I don't have a proposed route. I'm (sadly) too ignorant, I'm voicing the issues that came up in conversation. Seeing as I offered to put up money but the proposed route might not achieve my aims (having a good numpy foundation running with PyPy which opens the door to scipy support) I figure that I need to ask some questions - even if only to reduce my own ignorance.
This isn't to say I want to avoid donating (*not at all*) - I just want to understand what looks like a wider set of issues than I'd originally understood from the short discussion at EuroPython during the PyPy demo and call for donations.
See the reply to Michael for an extra detail.
i.
The call for donations precisely mentions the fact that scipy, matplotlib and a billion other libraries written in C/Cython won't work. It also precisely mentions that the C API of numpy won't be there. However, it'll still export raw pointers to arrays so you can call existing C/fortran code like blas. I admit this proposal does not cater for Travises usecase - have what he has now just faster. That was not the point. The point is we want to take numpy as a pretty good API and implement it better to cater for people who want to use it and have it nicely integrate with Python and be fast. FFT libraries won't work out of the box, but they should be relatively simply to get to run, without reimplementing algorithms. PyPy is not really trying to solve all the problems of the world - it's still work to adjust current code (like scipy) to work with different APIs and we won't cater for all of this, at least not immediately. I seriously don't buy it that it's a net loose for numpy community - having numpy running faster and nicely integrated with Python is a win for a lot of people already and that's good enough to try and see where it leads us. I'll reiterate because it seems this is misinterpreted again and again - pypy's numpy *will* integrate in some sort of way with existing C/fortran libraries, but this way *will* be different than current CPython C API. It's really just too hard to get both. Cheers, fijal

Maciej Fijalkowski, 17.10.2011 17:46:
- pypy's numpy *will* integrate in some sort of way with existing C/fortran libraries, but this way *will* be different than current CPython C API. It's really just too hard to get both.
Why reinvent yet another wheel when you could make Cython a common language to write extensions and wrapper code for both? Even if that requires a few feature restrictions for Cython users or adaptations to their code to keep it portable, it's still better than forcing users into a complete vendor lock-in on both sides. Stefan

On Mon, Oct 17, 2011 at 6:01 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Maciej Fijalkowski, 17.10.2011 17:46:
- pypy's numpy *will* integrate in some sort of way with existing C/fortran libraries, but this way *will* be different than current CPython C API. It's really just too hard to get both.
Why reinvent yet another wheel when you could make Cython a common language to write extensions and wrapper code for both? Even if that requires a few feature restrictions for Cython users or adaptations to their code to keep it portable, it's still better than forcing users into a complete vendor lock-in on both sides.
Yeah, agreed. We don't have a C API at all and it's unlikely we'll implement something yet-completely-different. Cython is definitely very high on the list of things to consider for "a reasonable FFI". Cheers, fijal

On Mon, Oct 17, 2011 at 12:01 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Maciej Fijalkowski, 17.10.2011 17:46:
- pypy's numpy *will* integrate in some sort of way with existing
C/fortran libraries, but this way *will* be different than current CPython C API. It's really just too hard to get both.
Why reinvent yet another wheel when you could make Cython a common language to write extensions and wrapper code for both? Even if that requires a few feature restrictions for Cython users or adaptations to their code to keep it portable, it's still better than forcing users into a complete vendor lock-in on both sides.
Stefan
______________________________**_________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/**mailman/listinfo/pypy-dev<http://mail.python.org/mailman/listinfo/pypy-dev>
There's no fundamental objection to Cython, but there are practical ones. a) Most of NumPy isn't Cython, so just having Cython gives us little. b) Is the NumPy on Cython house in order? AFAIK part of the MS project involved rewriting parts of NumPy in Cython and modularising Cython for targets besides CPython. And that this was *not* merged. For me to be convinced Cython is a good target, I'd need belief that there's an interest in it being a common platform, and when I see that there's work done, by core developers, which sits unmerged (with no timeline) I can't have faith in that. Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero

Alex Gaynor, 17.10.2011 18:14:
On Mon, Oct 17, 2011 at 12:01 PM, Stefan Behnel wrote:
Maciej Fijalkowski, 17.10.2011 17:46: - pypy's numpy *will* integrate in some sort of way with existing
C/fortran libraries, but this way *will* be different than current CPython C API. It's really just too hard to get both.
Why reinvent yet another wheel when you could make Cython a common language to write extensions and wrapper code for both? Even if that requires a few feature restrictions for Cython users or adaptations to their code to keep it portable, it's still better than forcing users into a complete vendor lock-in on both sides.
There's no fundamental objection to Cython, but there are practical ones.
I'm very well aware of that. There are both technical and practical issues. I didn't hide the fact that the Python+ctypes backend for Cython is quite far from being ready for use, for example.
a) Most of NumPy isn't Cython, so just having Cython gives us little.
There has been the move towards a smaller core for NumPy, and we perceive substantial interest, both inside and outside of the Scientific Python community, in writing new wrapper code in Cython and even in rewriting existing code in Cython to make it more maintainable. Even generated wrappers were and are being rewritten, e.g. to get rid of SWIG. Rewriting several hundred to thousand lines of C code in Cython can often be done within a few days, depending on test coverage and code complexity, and from what we hear, this is actually being done or at least seriously considered in several projects. It's helped by the fact that CPython users do not have to make the switch right away, but can often migrate or add a module at a time. I agree that simply supporting Cython is not going to magically connect huge amounts of foreign code to PyPy. It just makes it a lot easier to get closer to that goal than by inventing yet another way of interfacing that is not supported by anything else. Also note that there isn't just NumPy. A relatively large part of Sage is written in Cython, for example, especially those parts that glue the rest together, which consists of huge amounts of C, C++ and Fortran code. After all, Cython's predecessor Pyrex has been around for almost ten years now.
b) Is the NumPy on Cython house in order? AFAIK part of the MS project involved rewriting parts of NumPy in Cython and modularising Cython for targets besides CPython. And that this was *not* merged. For me to be convinced Cython is a good target, I'd need belief that there's an interest in it being a common platform, and when I see that there's work done, by core developers, which sits unmerged (with no timeline) I can't have faith in that.
I understand that objection. The Cython project is largely driven by the interest of core developers and users (now, how unexpected is that?), and none of the developers currently uses IronPython or PyPy. So, while we'd like to see Cython support other targets (and the core developers agree on that goal), there isn't really a strong incentive for ourselves to move it into that direction. It's a bit of a chicken and egg problem - why support other platforms that no-one uses it for, and who'd use it on a platform that's not as well supported as CPython? I'd personally like to get the ctypes backend merged, but it's not exactly in a state that is ready-to-merge soonish. There's a branch, and Romain (our GSoC student for the project) is still working on it, but obviously with much less time for it, so I'm sure he could use another helping hand. https://github.com/hardshooter/CythonCTypesBackend The IronPython port is a different beast. It ran almost completely in cloak mode, outside of the scope of the core developers, and it's neither clear what the exact design goals were, nor what was eventually achieved or in what status the code branch currently is. The project itself died from sudden lack of interest on the side of the financial supporters (MS) at some point, and it appears that there is currently no-one who can easily take it over. Sad, but really nothing to blame the Cython developers for. I'd be happy to see it revived, if there is any interest. https://bitbucket.org/cwitty/cython-for-ironpython/overview Stefan

Hi everyone I guess people want to know what is the current status of the ctypes backend for Cython, you can read the last status update there : http://mail.python.org/pipermail/pypy-dev/2011-September/008260.html Of course I'm available for any kind of questions :) Cheers Romain On Mon, Oct 17, 2011 at 11:14:08PM +0200, Stefan Behnel wrote:
Alex Gaynor, 17.10.2011 18:14:
On Mon, Oct 17, 2011 at 12:01 PM, Stefan Behnel wrote:
Maciej Fijalkowski, 17.10.2011 17:46: - pypy's numpy *will* integrate in some sort of way with existing
C/fortran libraries, but this way *will* be different than current CPython C API. It's really just too hard to get both.
Why reinvent yet another wheel when you could make Cython a common language to write extensions and wrapper code for both? Even if that requires a few feature restrictions for Cython users or adaptations to their code to keep it portable, it's still better than forcing users into a complete vendor lock-in on both sides.
There's no fundamental objection to Cython, but there are practical ones.
I'm very well aware of that. There are both technical and practical issues. I didn't hide the fact that the Python+ctypes backend for Cython is quite far from being ready for use, for example.
a) Most of NumPy isn't Cython, so just having Cython gives us little.
There has been the move towards a smaller core for NumPy, and we perceive substantial interest, both inside and outside of the Scientific Python community, in writing new wrapper code in Cython and even in rewriting existing code in Cython to make it more maintainable. Even generated wrappers were and are being rewritten, e.g. to get rid of SWIG. Rewriting several hundred to thousand lines of C code in Cython can often be done within a few days, depending on test coverage and code complexity, and from what we hear, this is actually being done or at least seriously considered in several projects. It's helped by the fact that CPython users do not have to make the switch right away, but can often migrate or add a module at a time.
I agree that simply supporting Cython is not going to magically connect huge amounts of foreign code to PyPy. It just makes it a lot easier to get closer to that goal than by inventing yet another way of interfacing that is not supported by anything else.
Also note that there isn't just NumPy. A relatively large part of Sage is written in Cython, for example, especially those parts that glue the rest together, which consists of huge amounts of C, C++ and Fortran code. After all, Cython's predecessor Pyrex has been around for almost ten years now.
b) Is the NumPy on Cython house in order? AFAIK part of the MS project involved rewriting parts of NumPy in Cython and modularising Cython for targets besides CPython. And that this was *not* merged. For me to be convinced Cython is a good target, I'd need belief that there's an interest in it being a common platform, and when I see that there's work done, by core developers, which sits unmerged (with no timeline) I can't have faith in that.
I understand that objection. The Cython project is largely driven by the interest of core developers and users (now, how unexpected is that?), and none of the developers currently uses IronPython or PyPy. So, while we'd like to see Cython support other targets (and the core developers agree on that goal), there isn't really a strong incentive for ourselves to move it into that direction. It's a bit of a chicken and egg problem - why support other platforms that no-one uses it for, and who'd use it on a platform that's not as well supported as CPython?
I'd personally like to get the ctypes backend merged, but it's not exactly in a state that is ready-to-merge soonish. There's a branch, and Romain (our GSoC student for the project) is still working on it, but obviously with much less time for it, so I'm sure he could use another helping hand.
https://github.com/hardshooter/CythonCTypesBackend
The IronPython port is a different beast. It ran almost completely in cloak mode, outside of the scope of the core developers, and it's neither clear what the exact design goals were, nor what was eventually achieved or in what status the code branch currently is. The project itself died from sudden lack of interest on the side of the financial supporters (MS) at some point, and it appears that there is currently no-one who can easily take it over. Sad, but really nothing to blame the Cython developers for. I'd be happy to see it revived, if there is any interest.
https://bitbucket.org/cwitty/cython-for-ironpython/overview
Stefan
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev

David Cournapeau, 17.10.2011 00:01:
On Sun, Oct 16, 2011 at 10:20 PM, Ian Ozsvald wrote:
how big is the scipy ecosystem beyond numpy? What's the rough line count for Python, C, Fortran etc that depends on numpy?
The ecosystem is pretty big. There are at least in the order of hundred of packages that depend directly on numpy and scipy.
For scipy alone, the raw count is around 150k-300k LOC (it is a bit hard to estimate because we include some swig-generated code that I have ignored here, and some code duplication to deal with distutils insanity). There is around 80k LOC of fortran alone in there.
More and more scientific code use cython for speed or just for interfacing with C (and recently C++). Other tools have been used for similar reasons (f2py, in particular, to automatically wrap fortran and C).
and fwrap nowadays, which also generates glue code for talking to Fortran from Cython code, through a thin C code wrapper (AFAIK).
f2py at least is quite tightly coupled to numpy C API. I know there is work for a pypy-friendly backend for cython, but I don't know where things are there.
It's, erm, resting. The GSoC is over, the code hasn't been merged into mainline yet, lacks support for some recent Cython language features and is not in a state that would allow building anything major with it right away. It's based on ctypes, so it suffers from the same problems as ctypes, namely API/ABI inconsistencies beyond those that "ctypes_configure" can handle. In particular, things like talking to C macros will at least require additional C glue code to be generated, which doesn't currently happen. What works is the stripping of Cython specific syntax off the code and to map "regular" C code interactions to corresponding ctypes calls. So, some things work as it is, everything else needs more work. Helping hands and funding are welcome. That being said, I still think it's a promising approach, and it would be very interesting for PyPy to support Cython code (in one way or another). Cython certainly has a good standing in the Scientific Python community these days. If PyPy wants to enter as well, it will have to show that it can easily and efficiently interface with the huge amount of existing scientific code out there, be it C, C++, Fortran, Cython or whatever. And rewriting the code or even just the wrappers for Yet Another Python Implementation is not a scalable solution to that problem.
I would like to see less C boilerplate code in scipy, and more cython usage (which generates faster code and is much more maitainable); this can also benefit pypy, if only for making the scipy code less dependend on CPython details.
And by making the implementation essentially Python. That way, it can much more easily be ported to other Python platforms, especially PyPy, than if you have to start by reverse engineering even the exact wrapper signature from C code. Stefan

The ecosystem is pretty big. There are at least in the order of hundred of packages that depend directly on numpy and scipy.
For scipy alone, the raw count is around 150k-300k LOC (it is a bit hard to estimate because we include some swig-generated code that I have ignored here, and some code duplication to deal with distutils insanity). There is around 80k LOC of fortran alone in there.
Hi David, thanks for the numbers. Travis has posted a long discussion: http://technicaldiscovery.blogspot.com/2011/10/thoughts-on-porting-numpy-to-... and a few other points are raised at HackerNews: http://news.ycombinator.com/item?id=3118620 Whilst I understand Fijal's point about having a fast/lightweight demo of numpy I'm not sure what value this really brings to the project (I'll post this to Fijal's answer in a moment). If it isolates the rest of the numpy ecosystem (since it doesn't have a compatible C API) then only a fraction of people will be able to use it and it won't open a roadmap for increased library support, surely? As an example - I want numpy for client work. For my clients (the main being a physics company that is replacing Fortran with Python) numpy is at the heart of their simulations. However - numpy is used with matplotlib and pyCUDA and parts of scipy. If basic tools like FFT aren't available *and compatible* (i.e. not new implementations but running on tried, trusted and consistent C libs) then there'd be little reason to use pypy+numpy. pyCUDA could be a longer term goal but matplotlib would be essential. I note that many scientists won't switch to Python 3 due to lack of library support. numpy caught up with Py3 earlier in the year and matplotlib followed recently (so I guess SciPy itself will follow). Can we look at the details of the py3 porting process to get an idea of the complexity of the pypy-numpy + scipy project? Ian.

As an example - I want numpy for client work. For my clients (the main being a physics company that is replacing Fortran with Python) numpy is at the heart of their simulations. However - numpy is used with matplotlib and pyCUDA and parts of scipy. If basic tools like FFT aren't available *and compatible* (i.e. not new implementations but running on tried, trusted and consistent C libs) then there'd be little reason to use pypy+numpy. pyCUDA could be a longer term goal but matplotlib would be essential.
Hi David, Fijal. I'll reply to this earlier post as the overnight discussion doesn't seem to have a good place to add this. Someone else (I can't find a name) posted this nice summary: http://blog.streamitive.com/2011/10/17/numpy-isnt-about-fast-arrays/ which mostly echoes my position. Does anyone have a guestimate of the size of the active numpy user community minus the scipy/extensions community. I.e. the size of the community that might benefit from pypy-numpy (excluding those that use scipy etc who couldn't benefit for a [long] while)? At EuroSciPy it felt as though many people used numpy+scipy (noting that it was a scipy conference). At EuroPython there were a number of talks that used numpy but mostly they used other C or extension components (e.g. pyCUDA, Theano, visualisation tools). i. -- Ian Ozsvald (A.I. researcher) ian@IanOzsvald.com http://IanOzsvald.com http://MorConsulting.com/ http://StrongSteam.com/ http://SocialTiesApp.com/ http://TheScreencastingHandbook.com http://FivePoundApp.com/ http://twitter.com/IanOzsvald

On Tue, Oct 18, 2011 at 11:05 AM, Ian Ozsvald <ian@ianozsvald.com> wrote:
As an example - I want numpy for client work. For my clients (the main being a physics company that is replacing Fortran with Python) numpy is at the heart of their simulations. However - numpy is used with matplotlib and pyCUDA and parts of scipy. If basic tools like FFT aren't available *and compatible* (i.e. not new implementations but running on tried, trusted and consistent C libs) then there'd be little reason to use pypy+numpy. pyCUDA could be a longer term goal but matplotlib would be essential.
Hi David, Fijal. I'll reply to this earlier post as the overnight discussion doesn't seem to have a good place to add this.
Someone else (I can't find a name) posted this nice summary: http://blog.streamitive.com/2011/10/17/numpy-isnt-about-fast-arrays/ which mostly echoes my position.
Yes and pypy numpy does support dtype IIUC so in the end it will have all the features of numpy described in the article, it is going to be one interface to all the libraries to talk to, but it is not going to be the same as cpython numpy. I don't think it is impossible to have an easy path for people to support both cpython numpy and pypy numpy on the same lib (either using cython or a simple C API). Maybe a easy to do is to make something like cpyext just for numpy api, and then latter agree on a common api for both, or to make cython to generate the correct one for each interpreter. -- Leonardo Santagada

Leonardo Santagada, 18.10.2011 19:23:
pypy numpy does support dtype IIUC so in the end it will have all the features of numpy described in the article, it is going to be one interface to all the libraries to talk to, but it is not going to be the same as cpython numpy. I don't think it is impossible to have an easy path for people to support both cpython numpy and pypy numpy on the same lib (either using cython or a simple C API). Maybe a easy to do is to make something like cpyext just for numpy api, and then latter agree on a common api for both, or to make cython to generate the correct one for each interpreter.
Basically, all that Cython does (at least for recent versions of NumPy), is to generate C level access code through the PEP 3118 buffer API. I don't know if that (or something like it) is available in PyPy. Even if not, it may not be hard to emulate at a ctypes-like level (it requires C data types for correct access to the array fields). Stefan
participants (14)
-
Alex Gaynor
-
Armin Rigo
-
Bengt Richter
-
David Cournapeau
-
Dirkjan Ochtman
-
Ian Ozsvald
-
Jacob Biesinger
-
Jacob Hallén
-
Leonardo Santagada
-
Maciej Fijalkowski
-
Massa, Harald Armin
-
Michael Foord
-
Romain Guillebert
-
Stefan Behnel