Mailman 3 Proposed Roadmap Overview - NumPy-Discussion

Proposed Roadmap Overview

Travis Oliphant

Feb. 16, 2012

10:39 p.m.

Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: * addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. As a result, I am proposing a rough outline for releases over the next year: * NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding "filters" to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures. * NumPy 2.0 to come out in January of 2013. Mark Wiebe and I will post to this list a document that explains some of it's proposed features and enhancements. I won't steal his thunder for some of the things he is working on. If there are code issues people would like to see addressed, it would be a great time to speak up and/or propose something that you would like to see. In general NumPy 1.8 will have new features that need to be explored in order that NumPy 2.0 has enough code "experience" in order to be as useful as possible. I recognize that NumPy 1.8 has quite a few proposed features. These have been building up and are the big reason I've committed so many resources to NumPy. The feature-list did not just come out of my head. They are the result of talking and interacting with many NumPy users and watching the code get used (and not used) in the real world. This will be a faster pace of development. But, all of this will be in the open. If the NumPy 2.0 schedule is too aggressive, then we will have a NumPy 1.9 release in order to allow features to come out. Thanks, -Travis

Show replies by date

Warren Weckesser

February 2012

10:56 p.m.

On Thu, Feb 16, 2012 at 4:39 PM, Travis Oliphant <travis@continuum.io>wrote:

...

Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires:

* addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy

I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process.

As a result, I am proposing a rough outline for releases over the next year:

* NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those.

* NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding "filters" to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

* NumPy 2.0 to come out in January of 2013. Mark Wiebe and I will post to this list a document that explains some of it's proposed features and enhancements. I won't steal his thunder for some of the things he is working on.

If there are code issues people would like to see addressed, it would be a great time to speak up and/or propose something that you would like to see.

The above list looks great. Another request that comes up occasionally on the mailing list is for the efficient computation of order statistics, the simplest case being a combined min/max function. Longish thread starts here: http://thread.gmane.org/gmane.comp.python.numeric.general/44130/ Warren

...

In general NumPy 1.8 will have new features that need to be explored in order that NumPy 2.0 has enough code "experience" in order to be as useful as possible. I recognize that NumPy 1.8 has quite a few proposed features. These have been building up and are the big reason I've committed so many resources to NumPy. The feature-list did not just come out of my head. They are the result of talking and interacting with many NumPy users and watching the code get used (and not used) in the real world. This will be a faster pace of development. But, all of this will be in the open. If the NumPy 2.0 schedule is too aggressive, then we will have a NumPy 1.9 release in order to allow features to come out.

Thanks,

-Travis

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

josef.pktd＠gmail.com

11:20 p.m.

On Thu, Feb 16, 2012 at 5:56 PM, Warren Weckesser <warren.weckesser@enthought.com> wrote:

...

On Thu, Feb 16, 2012 at 4:39 PM, Travis Oliphant <travis@continuum.io> wrote:

...
Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires:

* addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy

I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process.

As a result, I am proposing a rough outline for releases over the next year:

* NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those.

* NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding "filters" to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

* NumPy 2.0 to come out in January of 2013. Mark Wiebe and I will post to this list a document that explains some of it's proposed features and enhancements. I won't steal his thunder for some of the things he is working on.

If there are code issues people would like to see addressed, it would be a great time to speak up and/or propose something that you would like to see.

The above list looks great. Another request that comes up occasionally on the mailing list is for the efficient computation of order statistics, the simplest case being a combined min/max function. Longish thread starts here: http://thread.gmane.org/gmane.comp.python.numeric.general/44130/

The list looks great, but for the time table I expect there will be at least a 1.9 and 1.10 necessary to improve what "we didn't get quite right in the first place", or what not many users had time to try out. Josef

...

Warren

...
In general NumPy 1.8 will have new features that need to be explored in order that NumPy 2.0 has enough code "experience" in order to be as useful as possible. I recognize that NumPy 1.8 has quite a few proposed features. These have been building up and are the big reason I've committed so many resources to NumPy. The feature-list did not just come out of my head. They are the result of talking and interacting with many NumPy users and watching the code get used (and not used) in the real world. This will be a faster pace of development. But, all of this will be in the open. If the NumPy 2.0 schedule is too aggressive, then we will have a NumPy 1.9 release in order to allow features to come out.

Thanks,

-Travis

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Charles R Harris

11:24 p.m.

On Thu, Feb 16, 2012 at 4:20 PM, <josef.pktd@gmail.com> wrote:

...

On Thu, Feb 16, 2012 at 5:56 PM, Warren Weckesser <warren.weckesser@enthought.com> wrote:

...
On Thu, Feb 16, 2012 at 4:39 PM, Travis Oliphant <travis@continuum.io> wrote:

...
Mark Wiebe and I have been discussing off and on (as well as talking

...
...
Charles) a good way forward to balance two competing desires:

* addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy

I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to

...
...
feature-set and user experience in the process.

As a result, I am proposing a rough outline for releases over the next year:

* NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those.

* NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding "filters" to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

* NumPy 2.0 to come out in January of 2013. Mark Wiebe and I will post to this list a document that explains some of it's proposed features and enhancements. I won't steal his thunder for some of the things he is working on.

If there are code issues people would like to see addressed, it would be a great time to speak up and/or propose something that you would like to see.

The above list looks great. Another request that comes up occasionally on the mailing list is for the efficient computation of order statistics,

with the the

...
simplest case being a combined min/max function. Longish thread starts here: http://thread.gmane.org/gmane.comp.python.numeric.general/44130/

The list looks great, but for the time table I expect there will be at least a 1.9 and 1.10 necessary to improve what "we didn't get quite right in the first place", or what not many users had time to try out.

That's my sense also. I think the long list needs to be prioritized and broken up into smaller chunks. <snip> Chuck

Ralf Gommers

8:49 p.m.

On Fri, Feb 17, 2012 at 12:24 AM, Charles R Harris < charlesr.harris@gmail.com> wrote:

...

On Thu, Feb 16, 2012 at 4:20 PM, <josef.pktd@gmail.com> wrote:

...
On Thu, Feb 16, 2012 at 5:56 PM, Warren Weckesser <warren.weckesser@enthought.com> wrote:

...
On Thu, Feb 16, 2012 at 4:39 PM, Travis Oliphant <travis@continuum.io> wrote:

...
Mark Wiebe and I have been discussing off and on (as well as talking

...
...
Charles) a good way forward to balance two competing desires:

* addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy

I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to

...
...
feature-set and user experience in the process.

As a result, I am proposing a rough outline for releases over the next year:

* NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those.

* NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding "filters" to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

* NumPy 2.0 to come out in January of 2013. Mark Wiebe and I will post to this list a document that explains some of it's proposed features and enhancements. I won't steal his thunder for some of the things he is working on.

If there are code issues people would like to see addressed, it would be a great time to speak up and/or propose something that you would like to see.

The above list looks great. Another request that comes up occasionally on the mailing list is for the efficient computation of order statistics,

with the the

...
simplest case being a combined min/max function. Longish thread starts here: http://thread.gmane.org/gmane.comp.python.numeric.general/44130/

The list looks great, but for the time table I expect there will be at least a 1.9 and 1.10 necessary to improve what "we didn't get quite right in the first place", or what not many users had time to try out.

That's my sense also. I think the long list needs to be prioritized and broken up into smaller chunks.

+1 for an extra release (or two). Looking at the list of features, which looks great by the way, I think the last release before adding a whole bunch of new features should be the LTS. Ideally 1.8 would be mostly the refactoring and the LTS, with 1.9 containing most of the new features. If not, 1.7 should probably be the LTS. Ralf

Mark Wiebe

8:56 p.m.

On Fri, Feb 17, 2012 at 12:49 PM, Ralf Gommers <ralf.gommers@googlemail.com>wrote:

...

On Fri, Feb 17, 2012 at 12:24 AM, Charles R Harris < charlesr.harris@gmail.com> wrote:

...
On Thu, Feb 16, 2012 at 4:20 PM, <josef.pktd@gmail.com> wrote:

...
On Thu, Feb 16, 2012 at 5:56 PM, Warren Weckesser <warren.weckesser@enthought.com> wrote:

...
On Thu, Feb 16, 2012 at 4:39 PM, Travis Oliphant <travis@continuum.io> wrote:

...
Mark Wiebe and I have been discussing off and on (as well as talking

...
...
Charles) a good way forward to balance two competing desires:

* addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy

I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to

...
...
feature-set and user experience in the process.

As a result, I am proposing a rough outline for releases over the next year:

* NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those.

* NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding "filters" to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

* NumPy 2.0 to come out in January of 2013. Mark Wiebe and I will post to this list a document that explains some of it's proposed features and enhancements. I won't steal his thunder for some of the things he is working on.

If there are code issues people would like to see addressed, it would be a great time to speak up and/or propose something that you would like to see.

The above list looks great. Another request that comes up occasionally on the mailing list is for the efficient computation of order statistics,

with the the

...
simplest case being a combined min/max function. Longish thread starts here: http://thread.gmane.org/gmane.comp.python.numeric.general/44130/

The list looks great, but for the time table I expect there will be at least a 1.9 and 1.10 necessary to improve what "we didn't get quite right in the first place", or what not many users had time to try out.

That's my sense also. I think the long list needs to be prioritized and broken up into smaller chunks.

+1 for an extra release (or two).

Looking at the list of features, which looks great by the way, I think the last release before adding a whole bunch of new features should be the LTS. Ideally 1.8 would be mostly the refactoring and the LTS, with 1.9 containing most of the new features. If not, 1.7 should probably be the LTS.

To be clear, the purpose behind an LTS release is to provide ongoing bugfixes for users to whom one of the following applies: * Must use Python 2.4. * Are on a platform whose C/C++ compiler will never be updated anymore This way, developing NumPy can be made easier by not having to keep compatibility with really old systems. Am I understanding this correctly, or am I missing some aspect of the LTS strategy? Thanks, Mark

...

Ralf

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Ralf Gommers

9:02 p.m.

On Fri, Feb 17, 2012 at 9:56 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:

...

On Fri, Feb 17, 2012 at 12:49 PM, Ralf Gommers < ralf.gommers@googlemail.com> wrote:

...
On Fri, Feb 17, 2012 at 12:24 AM, Charles R Harris < charlesr.harris@gmail.com> wrote:

...
On Thu, Feb 16, 2012 at 4:20 PM, <josef.pktd@gmail.com> wrote:

...
On Thu, Feb 16, 2012 at 5:56 PM, Warren Weckesser <warren.weckesser@enthought.com> wrote:

...
On Thu, Feb 16, 2012 at 4:39 PM, Travis Oliphant <travis@continuum.io

wrote:

...
Mark Wiebe and I have been discussing off and on (as well as talking

...
...
Charles) a good way forward to balance two competing desires:

* addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy

I know there are load voices for just focusing on the second of

...
...
avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process.

As a result, I am proposing a rough outline for releases over the next year:

* NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those.

* NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding "filters" to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

* NumPy 2.0 to come out in January of 2013. Mark Wiebe and I will post to this list a document that explains some of it's proposed features and enhancements. I won't steal his thunder for some of the

with these and things he is

...
...
working on.

If there are code issues people would like to see addressed, it would be a great time to speak up and/or propose something that you would like to see.

The above list looks great. Another request that comes up occasionally on the mailing list is for the efficient computation of order statistics, the simplest case being a combined min/max function. Longish thread starts here: http://thread.gmane.org/gmane.comp.python.numeric.general/44130/

The list looks great, but for the time table I expect there will be at least a 1.9 and 1.10 necessary to improve what "we didn't get quite right in the first place", or what not many users had time to try out.

That's my sense also. I think the long list needs to be prioritized and broken up into smaller chunks.

+1 for an extra release (or two).

Looking at the list of features, which looks great by the way, I think the last release before adding a whole bunch of new features should be the LTS. Ideally 1.8 would be mostly the refactoring and the LTS, with 1.9 containing most of the new features. If not, 1.7 should probably be the LTS.

To be clear, the purpose behind an LTS release is to provide ongoing bugfixes for users to whom one of the following applies:

* Must use Python 2.4. * Are on a platform whose C/C++ compiler will never be updated anymore

Those both apply.

...

This way, developing NumPy can be made easier by not having to keep compatibility with really old systems. Am I understanding this correctly, or am I missing some aspect of the LTS strategy?

The main reason is to allow starting to clean up the code, as Chuck said in his initial message: http://comments.gmane.org/gmane.comp.python.numeric.general/47765. So this would include old macros, maybe things like numarray support.

Ralf

Benjamin Root

1:35 a.m.

On Thursday, February 16, 2012, Warren Weckesser wrote:

...

On Thu, Feb 16, 2012 at 4:39 PM, Travis Oliphant <travis@continuum.io>wrote:

...
Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires:

* addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy

I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process.

As a result, I am proposing a rough outline for releases over the next year:

* NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those.

* NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding "filters" to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

* NumPy 2.0 to come out in January of 2013. Mark Wiebe and I will post to this list a document that explains some of it's proposed features and enhancements. I won't steal his thunder for some of the things he is working on.

If there are code issues people would like to see addressed, it would be a great time to speak up and/or propose something that you would like to see.

The above list looks great. Another request that comes up occasionally on the mailing list is for the efficient computation of order statistics, the simplest case being a combined min/max function. Longish thread starts here: http://thread.gmane.org/gmane.comp.python.numeric.general/44130/

Warren

+1 on this. Also, before I forget, it looks like as of matlab 2011, they also have a "minmax" function, but for the neural network toolbox. Also, what it does is so constrained and different that at the very least, a note about it should go into the "numpy for matlab users" webpage. Ben Root

...

...
In general NumPy 1.8 will have new features that need to be explored in order that NumPy 2.0 has enough code "experience" in order to be as useful as possible. I recognize that NumPy 1.8 has quite a few proposed features. These have been building up and are the big reason I've committed so many resources to NumPy. The feature-list did not just come out of my head. They are the result of talking and interacting with many NumPy users and watching the code get used (and not used) in the real world. This will be a faster pace of development. But, all of this will be in the open. If the NumPy 2.0 schedule is too aggressive, then we will have a NumPy 1.9 release in order to allow features to come out.

Thanks,

-Travis

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

David Gowers (kampu)

12:41 a.m.

On Fri, Feb 17, 2012 at 9:09 AM, Travis Oliphant <travis@continuum.io> wrote:

...

* incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * simple computed fields for dtypes

...

From the sound of that, I'm certainly looking forward to seeing some details (like: Do you mean Pascal (length, content) style strings, AKA struct code 'p'?; Read-only dtype fields computed via a callback function?).

...

* accepting a Data-Type specification as a class or JSON file

On that subject, I incidentally have implemented a pair of functions (freeze()/thaw()) that make de/serialization to JSON or YAML fairly simple. (currently they leave fundamental dtypes as is. Basically the only thing that would be necessary to render the result serializable to/from JSON, is representing fundamental dtypes as JSON-safe objects .. a string would probably do.) http://paste.pocoo.org/show/552311/ (Modified slightly from code in my project here: https://gitorious.org/bits/bits/blobs/master/dtype.py) I've tried and failed to find a bug report for dtype serialization. Should I create a new ticket for JSON deserialization? (serialization wouldn't hurt either, since that would let us store both an array's data/shape/etc and its dtype in the same JSON document.)

Pauli Virtanen

11:41 a.m.

Hi, 16.02.2012 23:39, Travis Oliphant kirjoitti: [clip]

...

* NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding "filters" to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

That looks a pretty great heap of work -- it's great that you're going to tackle it! Here's one additional wishlist point: - Add necessary hooks to the ufunc machinery, dot products, etc., so that the behavior of sparse matrices can be made nice. Sparse matrices are pretty ubiquitous in many fields, but right now it seems that there are dark corners in the interplay between dense and sparse. This is a bit of a sticky API design problem though: what should be done to make the ufunc machinery "subclassable"? Addressing this could also resolve problems coming up with the `matrix` ndarray subclass. Pauli

David Cournapeau

3:01 p.m.

Hi Travis, On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant <travis@continuum.io> wrote:

...

Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires:

* addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy

I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process.

As a result, I am proposing a rough outline for releases over the next year:

* NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those.

* NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding "filters" to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset that is both C and C++ compatible is quite constraining. cheers, David

Charles R Harris

3:39 p.m.

On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <cournape@gmail.com>wrote:

...

Hi Travis,

...
Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires:

* addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more

On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant <travis@continuum.io> wrote: maintainable NumPy

...
I know there are load voices for just focusing on the second of these

and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process.

...
As a result, I am proposing a rough outline for releases over the next

year:

...
* NumPy 1.7 to come out as soon as the serious bugs can be

eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those.

...
* NumPy 1.8 to come out in July which will have as many

ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are:

...
* resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length

unicode and an enum type)

...
* adding ufunc support for flexible dtypes and possibly

structured arrays

...
* allowing generalized ufuncs to work on more kinds of arrays

besides just contiguous

...
* improving the ability for NumPy to receive JIT-generated

function pointers for ufuncs and other calculation opportunities

...
* adding "filters" to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler

and be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset that is both C and C++ compatible is quite constraining.

I'm in favor of this myself, C++ would allow a lot code cleanup and make it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I think we should grab onto it. The initial step would be a release whose code that would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'. I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think the problems are insurmountable. And choosing a set of compilers to support is something that will need to be done. Chuck

David Cournapeau

4:27 p.m.

On Fri, Feb 17, 2012 at 3:39 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <cournape@gmail.com> wrote:

...
Hi Travis,

On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant <travis@continuum.io> wrote:

...
Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires:

* addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy

I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process.

As a result, I am proposing a rough outline for releases over the next year:

* NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those.

* NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding "filters" to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset that is both C and C++ compatible is quite constraining.

I'm in favor of this myself, C++ would allow a lot code cleanup and make it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I think we should grab onto it. The initial step would be a release whose code that would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'.

C++ will make integration with external environments much harder (calling a C++ library from a non C++ program is very hard, especially for cross-platform projects), and I am not convinced by the more extensible argument. Making the numpy C code buildable by a C++ compiler is harder than removing keywords.

...

I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think the problems are insurmountable. And choosing a set of compilers to support is something that will need to be done.

I don't know for matplotlib, but for scipy, quite a few issues were caused by our C++ extensions in scipy.sparse. But build issues are a not a strong argument against C++ - I am sure those could be worked out. regards, David

Bryan Van de Ven

5:02 p.m.

...

Making the numpy C code buildable by a C++ compiler is harder than removing keywords. Just as a data point, I took the cpp branch mark started and got numpy built and running with multiarray compiled using C++ (OSX llvm-g++ 4.2). All I really did was rename reserved keywords and add extern "C" where necessary. Although, AFAIK C99 complex support is included as an extension, so I believe you are correct that there would be more work

On 2/17/12 10:27 AM, David Cournapeau wrote: there to get that working under more platforms. Bryan Van de Ven

Ralf Gommers

8:46 p.m.

Hi Bryan, On Fri, Feb 17, 2012 at 6:02 PM, Bryan Van de Ven <bryanv@continuum.io>wrote:

...

On 2/17/12 10:27 AM, David Cournapeau wrote:

...
Making the numpy C code buildable by a C++ compiler is harder than removing keywords. Just as a data point, I took the cpp branch mark started and got numpy built and running with multiarray compiled using C++ (OSX llvm-g++ 4.2).

That sounds promising. So far llvm-gcc has proved to be painful. Are you by any chance using scipy too? So far no one has managed to build the numpy/scipy combo with the LLVM-based compilers, so if you were willing to have a go at fixing that it would be hugely appreciated. See http://projects.scipy.org/scipy/ticket/1500 for details. Once that's fixed, numpy can switch to using it for releases. Ralf

Samuel John

9:34 a.m.

On 17.02.2012, at 21:46, Ralf Gommers wrote:

...

[...] So far no one has managed to build the numpy/scipy combo with the LLVM-based compilers, so if you were willing to have a go at fixing that it would be hugely appreciated. See http://projects.scipy.org/scipy/ticket/1500 for details.

Once that's fixed, numpy can switch to using it for releases.

Well, I had great success with using clang and clang++ (which uses llvm) to compile both numpy and scipy on OS X 10.7.3. Samuel

Mark Wiebe

5:57 p.m.

On Fri, Feb 17, 2012 at 10:27 AM, David Cournapeau <cournape@gmail.com>wrote:

...

On Fri, Feb 17, 2012 at 3:39 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <cournape@gmail.com> wrote:

...
Hi Travis,

On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant <travis@continuum.io> wrote:

...
Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires:

* addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy

I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the

...
...
...
improve the code base, but I will also be pushing for improvements to

need to the

...
...
...
feature-set and user experience in the process.

As a result, I am proposing a rough outline for releases over the next year:

* NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those.

* NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding "filters" to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset that is both C and C++ compatible is quite constraining.

I'm in favor of this myself, C++ would allow a lot code cleanup and make it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I think we should grab onto it. The initial step would be a release whose code that would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'.

C++ will make integration with external environments much harder (calling a C++ library from a non C++ program is very hard, especially for cross-platform projects), and I am not convinced by the more extensible argument.

The whole of NumPy could be written utilizing C++ extensively while still using exactly the same API and ABI numpy has now. C++ does not force anything about API/ABI design decisions. One good document to read about how a major open source project transitioned from C to C++ is about gcc. Their points comparing C and C++ apply to numpy quite well, and being compiler authors, they're intimately familiar with ABI and performance issues: http://gcc.gnu.org/wiki/gcc-in-cxx#The_gcc-in-cxx_branch Making the numpy C code buildable by a C++ compiler is harder than

...

removing keywords.

Certainly, but it's not a difficult task for someone who's familiar with both C and C++.

...

...
I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think the problems are insurmountable. And choosing a set of compilers to support is something that will need to be done.

I don't know for matplotlib, but for scipy, quite a few issues were caused by our C++ extensions in scipy.sparse. But build issues are a not a strong argument against C++ - I am sure those could be worked out.

On this topic, I'd like to ask what it would take to change the default warning levels in all the build configurations? Building with no warnings under high warning levels is a pretty standard practice as a basic mechanisms for catching some classes of bugs, and it would be nice for numpy to do this. The only way this is reasonable, though, is if it's the default in the build system. Thanks, Mark

...

regards,

David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

David Cournapeau

11:55 p.m.

Le 17 févr. 2012 17:58, "Mark Wiebe" <mwwiebe@gmail.com> a écrit :

...

On Fri, Feb 17, 2012 at 10:27 AM, David Cournapeau <cournape@gmail.com>

...

...
On Fri, Feb 17, 2012 at 3:39 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <cournape@gmail.com> wrote:

...
Hi Travis,

On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant <travis@continuum.io

...
...
...
wrote:

...
Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires:

* addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy

I know there are load voices for just focusing on the second of

...

...
...
...
...
and avoiding the first until we have finished that. I recognize

...

...
...
...
...
improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process.

As a result, I am proposing a rough outline for releases over the next year:

* NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those.

* NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding "filters" to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset that is both C and C++ compatible is quite constraining.

I'm in favor of this myself, C++ would allow a lot code cleanup and make it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I think we should grab onto it. The initial step would be a release whose code that would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'.

C++ will make integration with external environments much harder (calling a C++ library from a non C++ program is very hard, especially for cross-platform projects), and I am not convinced by the more extensible argument.

The whole of NumPy could be written utilizing C++ extensively while still using exactly the same API and ABI numpy has now. C++ does not force anything about API/ABI design decisions.

One good document to read about how a major open source project

wrote: these the need to transitioned from C to C++ is about gcc. Their points comparing C and C++ apply to numpy quite well, and being compiler authors, they're intimately familiar with ABI and performance issues:

...

http://gcc.gnu.org/wiki/gcc-in-cxx#The_gcc-in-cxx_branch

...
Making the numpy C code buildable by a C++ compiler is harder than removing keywords.

Certainly, but it's not a difficult task for someone who's familiar with

both C and C++.

...

...
...
I did suggest running it by you for build issues, so please raise any

...

...
...
can think of. Note that MatPlotLib is in C++, so I don't think the

you problems

...

...
...
are insurmountable. And choosing a set of compilers to support is something that will need to be done.

I don't know for matplotlib, but for scipy, quite a few issues were caused by our C++ extensions in scipy.sparse. But build issues are a not a strong argument against C++ - I am sure those could be worked out.

On this topic, I'd like to ask what it would take to change the default warning levels in all the build configurations? Building with no warnings under high warning levels is a pretty standard practice as a basic mechanisms for catching some classes of bugs, and it would be nice for numpy to do this. The only way this is reasonable, though, is if it's the default in the build system.

Doing it for say just gcc is not that complicated. Generally, easy custimization of cimpilation flags is one of the stated goal of bento :) david

...

Thanks, Mark

...
regards,

David

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Eric Firing

5:52 p.m.

On 02/17/2012 05:39 AM, Charles R Harris wrote:

...

On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <cournape@gmail.com <mailto:cournape@gmail.com>> wrote:

Hi Travis,

On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant <travis@continuum.io <mailto:travis@continuum.io>> wrote: > Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: > > * addition of new features that are needed in NumPy > * improving the code-base generally and moving towards a more maintainable NumPy > > I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. > > As a result, I am proposing a rough outline for releases over the next year: > > * NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. > > * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: > * resolving the NA/missing-data issues > * finishing group-by > * incorporating the start of label arrays > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) > * adding ufunc support for flexible dtypes and possibly structured arrays > * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous > * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities > * adding "filters" to Input and Output > * simple computed fields for dtypes > * accepting a Data-Type specification as a class or JSON file > * work towards improving the dtype-addition mechanism > * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset that is both C and C++ compatible is quite constraining.

I'm in favor of this myself, C++ would allow a lot code cleanup and make it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I think we should grab onto it. The initial step would be a release whose code that would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'.

I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think the problems are insurmountable. And choosing a set of compilers to support is something that will need to be done.

It's true that matplotlib relies heavily on C++, both via the Agg library and in its own extension code. Personally, I don't like this; I think it raises the barrier to contributing. C++ is an order of magnitude more complicated than C--harder to read, and much harder to write, unless one is a true expert. In mpl it brings reliance on the CXX library, which Mike D. has had to help maintain. And if it does increase compiler specificity, that's bad. I would much rather see development in the direction of sticking with C where direct low-level control and speed are needed, and using cython to gain higher level language benefits where appropriate. Of course, that brings in the danger of reliance on another complex tool, cython. If that danger is considered excessive, then just stick with C. Eric

...

Chuck

Mark Wiebe

6:21 p.m.

On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efiring@hawaii.edu> wrote:

...

On 02/17/2012 05:39 AM, Charles R Harris wrote:

...
On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <cournape@gmail.com <mailto:cournape@gmail.com>> wrote:

Hi Travis,

On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant <travis@continuum.io <mailto:travis@continuum.io>> wrote: > Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: > > * addition of new features that are needed in NumPy > * improving the code-base generally and moving towards a more maintainable NumPy > > I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. > > As a result, I am proposing a rough outline for releases over the next year: > > * NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. > > * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: > * resolving the NA/missing-data issues > * finishing group-by > * incorporating the start of label arrays > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) > * adding ufunc support for flexible dtypes and possibly structured arrays > * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous > * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities > * adding "filters" to Input and Output > * simple computed fields for dtypes > * accepting a Data-Type specification as a class or JSON

file

...
> * work towards improving the dtype-addition mechanism > * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset that is both C and C++ compatible is quite constraining.

I'm in favor of this myself, C++ would allow a lot code cleanup and make it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I think we should grab onto it. The initial step would be a release whose code that would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'.

I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think the problems are insurmountable. And choosing a set of compilers to support is something that will need to be done.

It's true that matplotlib relies heavily on C++, both via the Agg library and in its own extension code. Personally, I don't like this; I think it raises the barrier to contributing. C++ is an order of magnitude more complicated than C--harder to read, and much harder to write, unless one is a true expert. In mpl it brings reliance on the CXX library, which Mike D. has had to help maintain. And if it does increase compiler specificity, that's bad.

This gets to the recruitment issue, which is one of the most important problems I see numpy facing. I personally have contributed a lot of code to NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was the biggest negative point when I considered whether it was worth contributing to the project. I suspect there are many programmers out there who are skilled in low-level, high-performance C++, who would be willing to contribute, but don't want to code in C. I believe NumPy should be trying to find people who want to make high performance, close to the metal, libraries. This is a very different type of programmer than one who wants to program in Python, but is willing to dabble in a lower level language to make something run faster. High performance library development is one of the things the C++ developer community does very well, and that community is where we have a good chance of finding the programmers NumPy needs. I would much rather see development in the direction of sticking with C

...

where direct low-level control and speed are needed, and using cython to gain higher level language benefits where appropriate. Of course, that brings in the danger of reliance on another complex tool, cython. If that danger is considered excessive, then just stick with C.

There are many small benefits C++ can offer, even if numpy chooses only to use a tiny subset of the C++ language. For example, RAII can be used to reliably eliminate PyObject reference leaks. Consider a regression like this: http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html Fixing this in C would require switching all the relevant usages of NPY_MAXARGS to use a dynamic memory allocation. This brings with it the potential of easily introducing a memory leak, and is a lot of work to do. In C++, this functionality could be placed inside a class, where the deterministic construction/destruction semantics eliminate the risk of memory leaks and make the code easier to read at the same time. There are other examples like this where the C language has forced a suboptimal design choice because of how hard it would be to do it better. Cheers, Mark

...

Eric

...
Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Neal Becker

6:37 p.m.

Mark Wiebe wrote:

...

On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efiring@hawaii.edu> wrote:

...
On 02/17/2012 05:39 AM, Charles R Harris wrote:

...
On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <cournape@gmail.com <mailto:cournape@gmail.com>> wrote:

Hi Travis,

On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant <travis@continuum.io <mailto:travis@continuum.io>> wrote: > Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: > > * addition of new features that are needed in NumPy > * improving the code-base generally and moving towards a more maintainable NumPy > > I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. > > As a result, I am proposing a rough outline for releases over the next year: > > * NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. > > * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: > * resolving the NA/missing-data issues > * finishing group-by > * incorporating the start of label arrays > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) > * adding ufunc support for flexible dtypes and possibly structured arrays > * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous > * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities > * adding "filters" to Input and Output > * simple computed fields for dtypes > * accepting a Data-Type specification as a class or JSON

file

...
> * work towards improving the dtype-addition mechanism > * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset that is both C and C++ compatible is quite constraining.

I'm in favor of this myself, C++ would allow a lot code cleanup and make it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I think we should grab onto it. The initial step would be a release whose code that would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'.

I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think the problems are insurmountable. And choosing a set of compilers to support is something that will need to be done.

It's true that matplotlib relies heavily on C++, both via the Agg library and in its own extension code. Personally, I don't like this; I think it raises the barrier to contributing. C++ is an order of magnitude more complicated than C--harder to read, and much harder to write, unless one is a true expert. In mpl it brings reliance on the CXX library, which Mike D. has had to help maintain. And if it does increase compiler specificity, that's bad.

This gets to the recruitment issue, which is one of the most important problems I see numpy facing. I personally have contributed a lot of code to NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was the biggest negative point when I considered whether it was worth contributing to the project. I suspect there are many programmers out there who are skilled in low-level, high-performance C++, who would be willing to contribute, but don't want to code in C.

I believe NumPy should be trying to find people who want to make high performance, close to the metal, libraries. This is a very different type of programmer than one who wants to program in Python, but is willing to dabble in a lower level language to make something run faster. High performance library development is one of the things the C++ developer community does very well, and that community is where we have a good chance of finding the programmers NumPy needs.

I would much rather see development in the direction of sticking with C

...
where direct low-level control and speed are needed, and using cython to gain higher level language benefits where appropriate. Of course, that brings in the danger of reliance on another complex tool, cython. If that danger is considered excessive, then just stick with C.

There are many small benefits C++ can offer, even if numpy chooses only to use a tiny subset of the C++ language. For example, RAII can be used to reliably eliminate PyObject reference leaks.

Consider a regression like this: http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html

Fixing this in C would require switching all the relevant usages of NPY_MAXARGS to use a dynamic memory allocation. This brings with it the potential of easily introducing a memory leak, and is a lot of work to do. In C++, this functionality could be placed inside a class, where the deterministic construction/destruction semantics eliminate the risk of memory leaks and make the code easier to read at the same time. There are other examples like this where the C language has forced a suboptimal design choice because of how hard it would be to do it better.

Cheers, Mark

I think numpy really wants to use c++ templates to generate specific instantiations of algorithms for each dtype from a generic version, rather than the current code that uses cpp.

Charles R Harris

6:46 p.m.

On Fri, Feb 17, 2012 at 11:37 AM, Neal Becker <ndbecker2@gmail.com> wrote:

...

Mark Wiebe wrote:

...
On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efiring@hawaii.edu> wrote:

...
On 02/17/2012 05:39 AM, Charles R Harris wrote:

...
On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <cournape@gmail.com <mailto:cournape@gmail.com>> wrote:

Hi Travis,

On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant <travis@continuum.io <mailto:travis@continuum.io>> wrote: > Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: > > * addition of new features that are needed in NumPy > * improving the code-base generally and moving towards a more maintainable NumPy > > I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. > > As a result, I am proposing a rough outline for releases over

the

...
next year: > > * NumPy 1.7 to come out as soon as the serious bugs can

be

...
eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. > > * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: > * resolving the NA/missing-data issues > * finishing group-by > * incorporating the start of label arrays > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) > * adding ufunc support for flexible dtypes and possibly structured arrays > * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous > * improving the ability for NumPy to receive

JIT-generated

...
function pointers for ufuncs and other calculation opportunities > * adding "filters" to Input and Output > * simple computed fields for dtypes > * accepting a Data-Type specification as a class or JSON

file

...
> * work towards improving the dtype-addition mechanism > * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale

for

...
code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some

of

...
the existing features (like our use of C99 complex). The subset

that

...
is both C and C++ compatible is quite constraining.

I'm in favor of this myself, C++ would allow a lot code cleanup and

...
it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I think we should grab onto it. The initial step would be a release whose code

make that

...
would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'.

I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think the problems are insurmountable. And choosing a set of compilers to support is something that will need to be done.

It's true that matplotlib relies heavily on C++, both via the Agg library and in its own extension code. Personally, I don't like this; I think it raises the barrier to contributing. C++ is an order of magnitude more complicated than C--harder to read, and much harder to write, unless one is a true expert. In mpl it brings reliance on the CXX library, which Mike D. has had to help maintain. And if it does increase compiler specificity, that's bad.

This gets to the recruitment issue, which is one of the most important problems I see numpy facing. I personally have contributed a lot of code to NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was the biggest negative point when I considered whether it was worth contributing to the project. I suspect there are many programmers out there who are skilled in low-level, high-performance C++, who would be willing to contribute, but don't want to code in C.

I believe NumPy should be trying to find people who want to make high performance, close to the metal, libraries. This is a very different type of programmer than one who wants to program in Python, but is willing to dabble in a lower level language to make something run faster. High performance library development is one of the things the C++ developer community does very well, and that community is where we have a good chance of finding the programmers NumPy needs.

I would much rather see development in the direction of sticking with C

...
where direct low-level control and speed are needed, and using cython to gain higher level language benefits where appropriate. Of course, that brings in the danger of reliance on another complex tool, cython. If that danger is considered excessive, then just stick with C.

There are many small benefits C++ can offer, even if numpy chooses only to use a tiny subset of the C++ language. For example, RAII can be used to reliably eliminate PyObject reference leaks.

Consider a regression like this: http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html

Fixing this in C would require switching all the relevant usages of NPY_MAXARGS to use a dynamic memory allocation. This brings with it the potential of easily introducing a memory leak, and is a lot of work to do. In C++, this functionality could be placed inside a class, where the deterministic construction/destruction semantics eliminate the risk of memory leaks and make the code easier to read at the same time. There are other examples like this where the C language has forced a suboptimal design choice because of how hard it would be to do it better.

Cheers, Mark

I think numpy really wants to use c++ templates to generate specific instantiations of algorithms for each dtype from a generic version, rather than the current code that uses cpp.

One of many places. Exception handling, smart pointers, and iterators are the first things that come to my mind. Note that smart pointers also provide a nice way to do some high performance stuff, like transparent pointer swapping with memory deallocation. Chuck

Christopher Jordan-Squire

7 p.m.

On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe <mwwiebe@gmail.com> wrote:

...

On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efiring@hawaii.edu> wrote:

...
On 02/17/2012 05:39 AM, Charles R Harris wrote:

...
On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <cournape@gmail.com <mailto:cournape@gmail.com>> wrote:

Hi Travis,

On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant <travis@continuum.io <mailto:travis@continuum.io>> wrote: > Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: > > * addition of new features that are needed in NumPy > * improving the code-base generally and moving towards a more maintainable NumPy > > I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. > > As a result, I am proposing a rough outline for releases over the next year: > > * NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. > > * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: > * resolving the NA/missing-data issues > * finishing group-by > * incorporating the start of label arrays > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) > * adding ufunc support for flexible dtypes and possibly structured arrays > * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous > * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities > * adding "filters" to Input and Output > * simple computed fields for dtypes > * accepting a Data-Type specification as a class or JSON file > * work towards improving the dtype-addition mechanism > * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset that is both C and C++ compatible is quite constraining.

I'm in favor of this myself, C++ would allow a lot code cleanup and make it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I think we should grab onto it. The initial step would be a release whose code that would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'.

I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think the problems are insurmountable. And choosing a set of compilers to support is something that will need to be done.

It's true that matplotlib relies heavily on C++, both via the Agg library and in its own extension code. Personally, I don't like this; I think it raises the barrier to contributing. C++ is an order of magnitude more complicated than C--harder to read, and much harder to write, unless one is a true expert. In mpl it brings reliance on the CXX library, which Mike D. has had to help maintain. And if it does increase compiler specificity, that's bad.

This gets to the recruitment issue, which is one of the most important problems I see numpy facing. I personally have contributed a lot of code to NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was the biggest negative point when I considered whether it was worth contributing to the project. I suspect there are many programmers out there who are skilled in low-level, high-performance C++, who would be willing to contribute, but don't want to code in C.

I believe NumPy should be trying to find people who want to make high performance, close to the metal, libraries. This is a very different type of programmer than one who wants to program in Python, but is willing to dabble in a lower level language to make something run faster. High performance library development is one of the things the C++ developer community does very well, and that community is where we have a good chance of finding the programmers NumPy needs.

...
I would much rather see development in the direction of sticking with C where direct low-level control and speed are needed, and using cython to gain higher level language benefits where appropriate. Of course, that brings in the danger of reliance on another complex tool, cython. If that danger is considered excessive, then just stick with C.

There are many small benefits C++ can offer, even if numpy chooses only to use a tiny subset of the C++ language. For example, RAII can be used to reliably eliminate PyObject reference leaks.

Consider a regression like this: http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html

Fixing this in C would require switching all the relevant usages of NPY_MAXARGS to use a dynamic memory allocation. This brings with it the potential of easily introducing a memory leak, and is a lot of work to do. In C++, this functionality could be placed inside a class, where the deterministic construction/destruction semantics eliminate the risk of memory leaks and make the code easier to read at the same time. There are other examples like this where the C language has forced a suboptimal design choice because of how hard it would be to do it better.

Cheers, Mark

In a similar vein, could incorporating C++ lead to a simpler low-level API for numpy? I know Mark has talked before about--in the long-term, as a dream project to scratch his own itch, and something the BDF12 doesn't necessarily agree with--implementing the great ideas in numpy as a layered C++ library. (Which would have the added benefit of making numpy more of a general array library that could be exposed to any language which can call C++ libraries.) I don't imagine that's on the table for anything near-term, but I wonder if making more of the low-level stuff C++ would make it easier for performance nuts to write their own code in C/C++ interfacing with numpy, and then expose it to python. After playing around with ufuncs at the C level for a little while last summer, I quickly realized any simplifications would be greatly appreciated. -Chris

...

...
Eric

...
Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Benjamin Root

7:09 p.m.

On Fri, Feb 17, 2012 at 1:00 PM, Christopher Jordan-Squire <cjordan1@uw.edu>wrote:

...

...
On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efiring@hawaii.edu> wrote:

...
On 02/17/2012 05:39 AM, Charles R Harris wrote:

...
On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <cournape@gmail.com <mailto:cournape@gmail.com>> wrote:

Hi Travis,

On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant <travis@continuum.io <mailto:travis@continuum.io>> wrote: > Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: > > * addition of new features that are needed in NumPy > * improving the code-base generally and moving towards a more maintainable NumPy > > I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. > > As a result, I am proposing a rough outline for releases over

...
...
...
next year: > > * NumPy 1.7 to come out as soon as the serious bugs can

be

...
eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. > > * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: > * resolving the NA/missing-data issues > * finishing group-by > * incorporating the start of label arrays > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) > * adding ufunc support for flexible dtypes and possibly structured arrays > * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous > * improving the ability for NumPy to receive

JIT-generated

...
function pointers for ufuncs and other calculation opportunities > * adding "filters" to Input and Output > * simple computed fields for dtypes > * accepting a Data-Type specification as a class or JSON file > * work towards improving the dtype-addition mechanism > * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some

of

...
the existing features (like our use of C99 complex). The subset

...
...
...
is both C and C++ compatible is quite constraining.

I'm in favor of this myself, C++ would allow a lot code cleanup and

make

...
it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I think we should grab onto it. The initial step would be a release whose code

...
...
...
would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'.

I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think the problems are insurmountable. And choosing a set of compilers to support is something that will need to be done.

It's true that matplotlib relies heavily on C++, both via the Agg library and in its own extension code. Personally, I don't like this; I think it raises the barrier to contributing. C++ is an order of magnitude more complicated than C--harder to read, and much harder to write, unless one is a true expert. In mpl it brings reliance on the CXX library, which Mike D. has had to help maintain. And if it does increase compiler specificity, that's bad.

This gets to the recruitment issue, which is one of the most important problems I see numpy facing. I personally have contributed a lot of code to NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was the biggest negative point when I considered whether it was worth contributing to the project. I suspect there are many programmers out

...
who are skilled in low-level, high-performance C++, who would be willing to contribute, but don't want to code in C.

I believe NumPy should be trying to find people who want to make high performance, close to the metal, libraries. This is a very different type of programmer than one who wants to program in Python, but is willing to dabble in a lower level language to make something run faster. High performance library development is one of the things the C++ developer community does very well, and that community is where we have a good chance of finding

On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe <mwwiebe@gmail.com> wrote: the that that there the

...
programmers NumPy needs.

...
I would much rather see development in the direction of sticking with C where direct low-level control and speed are needed, and using cython to gain higher level language benefits where appropriate. Of course, that brings in the danger of reliance on another complex tool, cython. If that danger is considered excessive, then just stick with C.

There are many small benefits C++ can offer, even if numpy chooses only to use a tiny subset of the C++ language. For example, RAII can be used to reliably eliminate PyObject reference leaks.

Consider a regression like this: http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html

Fixing this in C would require switching all the relevant usages of NPY_MAXARGS to use a dynamic memory allocation. This brings with it the potential of easily introducing a memory leak, and is a lot of work to do. In C++, this functionality could be placed inside a class, where the deterministic construction/destruction semantics eliminate the risk of memory leaks and make the code easier to read at the same time. There are other examples like this where the C language has forced a suboptimal design choice because of how hard it would be to do it better.

Cheers, Mark

In a similar vein, could incorporating C++ lead to a simpler low-level API for numpy? I know Mark has talked before about--in the long-term, as a dream project to scratch his own itch, and something the BDF12 doesn't necessarily agree with--implementing the great ideas in numpy as a layered C++ library. (Which would have the added benefit of making numpy more of a general array library that could be exposed to any language which can call C++ libraries.)

I don't imagine that's on the table for anything near-term, but I wonder if making more of the low-level stuff C++ would make it easier for performance nuts to write their own code in C/C++ interfacing with numpy, and then expose it to python. After playing around with ufuncs at the C level for a little while last summer, I quickly realized any simplifications would be greatly appreciated.

-Chris

I am also in favor of moving towards a C++ oriented library. Personally, I find C++ easier to read and understand, most likely because I learned it first. I only learned C in the context of learning C++. Just a thought, with the upcoming revisions to the C++ standard, this does open up the possibility of some nice templating features that would make the library easier to use in native C++ programs. On a side note, does anybody use std::valarray? Cheers! Ben Root

Charles R Harris

7:15 p.m.

On Fri, Feb 17, 2012 at 12:09 PM, Benjamin Root <ben.root@ou.edu> wrote:

...

On Fri, Feb 17, 2012 at 1:00 PM, Christopher Jordan-Squire < cjordan1@uw.edu> wrote:

...
...
On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efiring@hawaii.edu> wrote:

...
On 02/17/2012 05:39 AM, Charles R Harris wrote:

...
On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <

cournape@gmail.com

...
...
<mailto:cournape@gmail.com>> wrote:

Hi Travis,

On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant <travis@continuum.io <mailto:travis@continuum.io>> wrote: > Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: > > * addition of new features that are needed in NumPy > * improving the code-base generally and moving towards a more maintainable NumPy > > I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. > > As a result, I am proposing a rough outline for releases over

...
...
...
next year: > > * NumPy 1.7 to come out as soon as the serious bugs

can be

...
eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. > > * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: > * resolving the NA/missing-data issues > * finishing group-by > * incorporating the start of label arrays > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) > * adding ufunc support for flexible dtypes and possibly structured arrays > * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous > * improving the ability for NumPy to receive

JIT-generated

...
function pointers for ufuncs and other calculation opportunities > * adding "filters" to Input and Output > * simple computed fields for dtypes > * accepting a Data-Type specification as a class or

JSON

...
file > * work towards improving the dtype-addition mechanism > * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset

...
...
...
is both C and C++ compatible is quite constraining.

I'm in favor of this myself, C++ would allow a lot code cleanup and

make

...
it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I

...
...
...
should grab onto it. The initial step would be a release whose code

...
...
...
would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'.

I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think

...
...
...
problems are insurmountable. And choosing a set of compilers to support is something that will need to be done.

It's true that matplotlib relies heavily on C++, both via the Agg library and in its own extension code. Personally, I don't like this; I think it raises the barrier to contributing. C++ is an order of magnitude more complicated than C--harder to read, and much harder to write, unless one is a true expert. In mpl it brings reliance on the CXX library, which Mike D. has had to help maintain. And if it does increase compiler specificity, that's bad.

This gets to the recruitment issue, which is one of the most important problems I see numpy facing. I personally have contributed a lot of code to NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was the biggest negative point when I considered whether it was worth contributing to the project. I suspect there are many programmers out

...
who are skilled in low-level, high-performance C++, who would be willing to contribute, but don't want to code in C.

I believe NumPy should be trying to find people who want to make high performance, close to the metal, libraries. This is a very different type of programmer than one who wants to program in Python, but is willing to dabble in a lower level language to make something run faster. High performance library development is one of the things the C++ developer community does very well, and that community is where we have a good chance of finding

On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe <mwwiebe@gmail.com> wrote: the that think we that the there the

...
programmers NumPy needs.

...
I would much rather see development in the direction of sticking with C where direct low-level control and speed are needed, and using cython to gain higher level language benefits where appropriate. Of course, that brings in the danger of reliance on another complex tool, cython. If that danger is considered excessive, then just stick with C.

There are many small benefits C++ can offer, even if numpy chooses only to use a tiny subset of the C++ language. For example, RAII can be used to reliably eliminate PyObject reference leaks.

Consider a regression like this: http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html

Fixing this in C would require switching all the relevant usages of NPY_MAXARGS to use a dynamic memory allocation. This brings with it the potential of easily introducing a memory leak, and is a lot of work to do. In C++, this functionality could be placed inside a class, where the deterministic construction/destruction semantics eliminate the risk of memory leaks and make the code easier to read at the same time. There are other examples like this where the C language has forced a suboptimal design choice because of how hard it would be to do it better.

Cheers, Mark

In a similar vein, could incorporating C++ lead to a simpler low-level API for numpy? I know Mark has talked before about--in the long-term, as a dream project to scratch his own itch, and something the BDF12 doesn't necessarily agree with--implementing the great ideas in numpy as a layered C++ library. (Which would have the added benefit of making numpy more of a general array library that could be exposed to any language which can call C++ libraries.)

I don't imagine that's on the table for anything near-term, but I wonder if making more of the low-level stuff C++ would make it easier for performance nuts to write their own code in C/C++ interfacing with numpy, and then expose it to python. After playing around with ufuncs at the C level for a little while last summer, I quickly realized any simplifications would be greatly appreciated.

-Chris

I am also in favor of moving towards a C++ oriented library. Personally, I find C++ easier to read and understand, most likely because I learned it first. I only learned C in the context of learning C++.

Just a thought, with the upcoming revisions to the C++ standard, this does open up the possibility of some nice templating features that would make the library easier to use in native C++ programs. On a side note, does anybody use std::valarray?

My impression is that std::valarray didn't really solve the problems it was intended to solve. IIRC, the valarray author himself said as much, but I don't recall where. Chuck

Neal Becker

7:41 p.m.

Charles R Harris wrote:

...

On Fri, Feb 17, 2012 at 12:09 PM, Benjamin Root <ben.root@ou.edu> wrote:

...
On Fri, Feb 17, 2012 at 1:00 PM, Christopher Jordan-Squire < cjordan1@uw.edu> wrote:

...
...
On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efiring@hawaii.edu> wrote:

...
On 02/17/2012 05:39 AM, Charles R Harris wrote:

...
On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <

cournape@gmail.com

...
...
<mailto:cournape@gmail.com>> wrote:

Hi Travis,

On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant <travis@continuum.io <mailto:travis@continuum.io>> wrote: > Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: > > * addition of new features that are needed in NumPy > * improving the code-base generally and moving towards a more maintainable NumPy > > I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. > > As a result, I am proposing a rough outline for releases over

...
...
...
next year: > > * NumPy 1.7 to come out as soon as the serious bugs

can be

...
eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. > > * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: > * resolving the NA/missing-data issues > * finishing group-by > * incorporating the start of label arrays > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) > * adding ufunc support for flexible dtypes and possibly structured arrays > * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous > * improving the ability for NumPy to receive

JIT-generated

...
function pointers for ufuncs and other calculation opportunities > * adding "filters" to Input and Output > * simple computed fields for dtypes > * accepting a Data-Type specification as a class or

JSON

...
file > * work towards improving the dtype-addition mechanism > * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset

...
...
...
is both C and C++ compatible is quite constraining.

I'm in favor of this myself, C++ would allow a lot code cleanup and

make

...
it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I

...
...
...
should grab onto it. The initial step would be a release whose code

...
...
...
would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'.

I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think

...
...
...
problems are insurmountable. And choosing a set of compilers to support is something that will need to be done.

It's true that matplotlib relies heavily on C++, both via the Agg library and in its own extension code. Personally, I don't like this; I think it raises the barrier to contributing. C++ is an order of magnitude more complicated than C--harder to read, and much harder to write, unless one is a true expert. In mpl it brings reliance on the CXX library, which Mike D. has had to help maintain. And if it does increase compiler specificity, that's bad.

This gets to the recruitment issue, which is one of the most important problems I see numpy facing. I personally have contributed a lot of code to NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was the biggest negative point when I considered whether it was worth contributing to the project. I suspect there are many programmers out

...
who are skilled in low-level, high-performance C++, who would be willing to contribute, but don't want to code in C.

I believe NumPy should be trying to find people who want to make high performance, close to the metal, libraries. This is a very different type of programmer than one who wants to program in Python, but is willing to dabble in a lower level language to make something run faster. High performance library development is one of the things the C++ developer community does very well, and that community is where we have a good chance of finding

On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe <mwwiebe@gmail.com> wrote: the that think we that the there the

...
programmers NumPy needs.

...
I would much rather see development in the direction of sticking with C where direct low-level control and speed are needed, and using cython to gain higher level language benefits where appropriate. Of course, that brings in the danger of reliance on another complex tool, cython. If that danger is considered excessive, then just stick with C.

There are many small benefits C++ can offer, even if numpy chooses only to use a tiny subset of the C++ language. For example, RAII can be used to reliably eliminate PyObject reference leaks.

Consider a regression like this: http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html

Fixing this in C would require switching all the relevant usages of NPY_MAXARGS to use a dynamic memory allocation. This brings with it the potential of easily introducing a memory leak, and is a lot of work to do. In C++, this functionality could be placed inside a class, where the deterministic construction/destruction semantics eliminate the risk of memory leaks and make the code easier to read at the same time. There are other examples like this where the C language has forced a suboptimal design choice because of how hard it would be to do it better.

Cheers, Mark

In a similar vein, could incorporating C++ lead to a simpler low-level API for numpy? I know Mark has talked before about--in the long-term, as a dream project to scratch his own itch, and something the BDF12 doesn't necessarily agree with--implementing the great ideas in numpy as a layered C++ library. (Which would have the added benefit of making numpy more of a general array library that could be exposed to any language which can call C++ libraries.)

I don't imagine that's on the table for anything near-term, but I wonder if making more of the low-level stuff C++ would make it easier for performance nuts to write their own code in C/C++ interfacing with numpy, and then expose it to python. After playing around with ufuncs at the C level for a little while last summer, I quickly realized any simplifications would be greatly appreciated.

-Chris

I am also in favor of moving towards a C++ oriented library. Personally, I find C++ easier to read and understand, most likely because I learned it first. I only learned C in the context of learning C++.

Just a thought, with the upcoming revisions to the C++ standard, this does open up the possibility of some nice templating features that would make the library easier to use in native C++ programs. On a side note, does anybody use std::valarray?

My impression is that std::valarray didn't really solve the problems it was intended to solve. IIRC, the valarray author himself said as much, but I don't recall where.

Chuck

A related question is whether numpy core in c++ would be based on any existing c++ libs for HPC. There are quite a few efforts for 1 and 2 dimensions. Fewer for arbitrary (or arbitrary up to some reasonable limit) dimension. Or, would we be talking about purely custom c++ code for numpy? I suspect the latter. Although there are many promising c++ matrix/vector type libraries (too many), I suspect it would be too difficult to preserve all numpy semantics via this route.

Mark Wiebe

7:31 p.m.

On Fri, Feb 17, 2012 at 11:00 AM, Christopher Jordan-Squire <cjordan1@uw.edu

...

wrote:

...

...
On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efiring@hawaii.edu> wrote:

...
On 02/17/2012 05:39 AM, Charles R Harris wrote:

...
On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <cournape@gmail.com <mailto:cournape@gmail.com>> wrote:

Hi Travis,

On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant <travis@continuum.io <mailto:travis@continuum.io>> wrote: > Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: > > * addition of new features that are needed in NumPy > * improving the code-base generally and moving towards a more maintainable NumPy > > I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. > > As a result, I am proposing a rough outline for releases over

...
...
...
next year: > > * NumPy 1.7 to come out as soon as the serious bugs can

be

...
eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. > > * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: > * resolving the NA/missing-data issues > * finishing group-by > * incorporating the start of label arrays > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) > * adding ufunc support for flexible dtypes and possibly structured arrays > * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous > * improving the ability for NumPy to receive

JIT-generated

...
function pointers for ufuncs and other calculation opportunities > * adding "filters" to Input and Output > * simple computed fields for dtypes > * accepting a Data-Type specification as a class or JSON file > * work towards improving the dtype-addition mechanism > * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some

of

...
the existing features (like our use of C99 complex). The subset

...
...
...
is both C and C++ compatible is quite constraining.

I'm in favor of this myself, C++ would allow a lot code cleanup and

make

...
it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I think we should grab onto it. The initial step would be a release whose code

...
...
...
would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'.

I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think the problems are insurmountable. And choosing a set of compilers to support is something that will need to be done.

It's true that matplotlib relies heavily on C++, both via the Agg library and in its own extension code. Personally, I don't like this; I think it raises the barrier to contributing. C++ is an order of magnitude more complicated than C--harder to read, and much harder to write, unless one is a true expert. In mpl it brings reliance on the CXX library, which Mike D. has had to help maintain. And if it does increase compiler specificity, that's bad.

This gets to the recruitment issue, which is one of the most important problems I see numpy facing. I personally have contributed a lot of code to NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was the biggest negative point when I considered whether it was worth contributing to the project. I suspect there are many programmers out

...
who are skilled in low-level, high-performance C++, who would be willing to contribute, but don't want to code in C.

I believe NumPy should be trying to find people who want to make high performance, close to the metal, libraries. This is a very different type of programmer than one who wants to program in Python, but is willing to dabble in a lower level language to make something run faster. High performance library development is one of the things the C++ developer community does very well, and that community is where we have a good chance of finding

On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe <mwwiebe@gmail.com> wrote: the that that there the

...
programmers NumPy needs.

...
I would much rather see development in the direction of sticking with C where direct low-level control and speed are needed, and using cython to gain higher level language benefits where appropriate. Of course, that brings in the danger of reliance on another complex tool, cython. If that danger is considered excessive, then just stick with C.

There are many small benefits C++ can offer, even if numpy chooses only to use a tiny subset of the C++ language. For example, RAII can be used to reliably eliminate PyObject reference leaks.

Consider a regression like this: http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html

Fixing this in C would require switching all the relevant usages of NPY_MAXARGS to use a dynamic memory allocation. This brings with it the potential of easily introducing a memory leak, and is a lot of work to do. In C++, this functionality could be placed inside a class, where the deterministic construction/destruction semantics eliminate the risk of memory leaks and make the code easier to read at the same time. There are other examples like this where the C language has forced a suboptimal design choice because of how hard it would be to do it better.

Cheers, Mark

In a similar vein, could incorporating C++ lead to a simpler low-level API for numpy?

This could definitely happen. One way to do it is to have a stable C API which remains fixed over many releases, and a C++ library which is allowed to change significantly at each release. This is what the LLVM project does, for example. OpenCV is an example of another project which was previously just C, but now has an extensive C++ API.

...

I know Mark has talked before about--in the long-term, as a dream project to scratch his own itch, and something the BDF12 doesn't necessarily agree with--implementing the great ideas in numpy as a layered C++ library. (Which would have the added benefit of making numpy more of a general array library that could be exposed to any language which can call C++ libraries.)

I don't imagine that's on the table for anything near-term, but I wonder if making more of the low-level stuff C++ would make it easier for performance nuts to write their own code in C/C++ interfacing with numpy, and then expose it to python. After playing around with ufuncs at the C level for a little while last summer, I quickly realized any simplifications would be greatly appreciated.

This is all possible, yes. The way this typically works is that library authors use advanced C++ techniques to get generality, performance, and usability. The library user can then write code which is very simple and written in a way which makes simple errors very difficult to make compared to using a C-like API. -Mark

...

-Chris

...
...
Eric

...
Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Ralf Gommers

8:38 p.m.

On Fri, Feb 17, 2012 at 8:31 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:

...

On Fri, Feb 17, 2012 at 11:00 AM, Christopher Jordan-Squire < cjordan1@uw.edu> wrote:

...
...
On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efiring@hawaii.edu> wrote:

...
On 02/17/2012 05:39 AM, Charles R Harris wrote:

...
On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <

cournape@gmail.com

...
...
<mailto:cournape@gmail.com>> wrote:

Hi Travis,

On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant <travis@continuum.io <mailto:travis@continuum.io>> wrote: > Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: > > * addition of new features that are needed in NumPy > * improving the code-base generally and moving towards a more maintainable NumPy > > I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. > > As a result, I am proposing a rough outline for releases over

...
...
...
next year: > > * NumPy 1.7 to come out as soon as the serious bugs

can be

...
eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. > > * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: > * resolving the NA/missing-data issues > * finishing group-by > * incorporating the start of label arrays > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) > * adding ufunc support for flexible dtypes and possibly structured arrays > * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous > * improving the ability for NumPy to receive

JIT-generated

...
function pointers for ufuncs and other calculation opportunities > * adding "filters" to Input and Output > * simple computed fields for dtypes > * accepting a Data-Type specification as a class or

JSON

...
file > * work towards improving the dtype-addition mechanism > * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset

...
...
...
is both C and C++ compatible is quite constraining.

I'm in favor of this myself, C++ would allow a lot code cleanup and

make

...
it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I

...
...
...
should grab onto it. The initial step would be a release whose code

...
...
...
would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'.

I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think

...
...
...
problems are insurmountable. And choosing a set of compilers to support is something that will need to be done.

It's true that matplotlib relies heavily on C++, both via the Agg library and in its own extension code. Personally, I don't like this; I think it raises the barrier to contributing. C++ is an order of magnitude more complicated than C--harder to read, and much harder to write, unless one is a true expert. In mpl it brings reliance on the CXX library, which Mike D. has had to help maintain. And if it does increase compiler specificity, that's bad.

This gets to the recruitment issue, which is one of the most important problems I see numpy facing. I personally have contributed a lot of code to NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was the biggest negative point when I considered whether it was worth contributing to the project. I suspect there are many programmers out

...
who are skilled in low-level, high-performance C++, who would be willing to contribute, but don't want to code in C.

I believe NumPy should be trying to find people who want to make high performance, close to the metal, libraries. This is a very different type of programmer than one who wants to program in Python, but is willing to dabble in a lower level language to make something run faster. High performance library development is one of the things the C++ developer community does very well, and that community is where we have a good chance of finding

On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe <mwwiebe@gmail.com> wrote: the that think we that the there the

...
programmers NumPy needs.

...
I would much rather see development in the direction of sticking with C where direct low-level control and speed are needed, and using cython to gain higher level language benefits where appropriate. Of course, that brings in the danger of reliance on another complex tool, cython. If that danger is considered excessive, then just stick with C.

There are many small benefits C++ can offer, even if numpy chooses only to use a tiny subset of the C++ language. For example, RAII can be used to reliably eliminate PyObject reference leaks.

Consider a regression like this: http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html

Fixing this in C would require switching all the relevant usages of NPY_MAXARGS to use a dynamic memory allocation. This brings with it the potential of easily introducing a memory leak, and is a lot of work to do. In C++, this functionality could be placed inside a class, where the deterministic construction/destruction semantics eliminate the risk of memory leaks and make the code easier to read at the same time. There are other examples like this where the C language has forced a suboptimal design choice because of how hard it would be to do it better.

Cheers, Mark

In a similar vein, could incorporating C++ lead to a simpler low-level API for numpy?

This could definitely happen. One way to do it is to have a stable C API which remains fixed over many releases, and a C++ library which is allowed to change significantly at each release. This is what the LLVM project does, for example. OpenCV is an example of another project which was previously just C, but now has an extensive C++ API.

...
I know Mark has talked before about--in the long-term, as a dream project to scratch his own itch, and something the BDF12 doesn't necessarily agree with--implementing the great ideas in numpy as a layered C++ library. (Which would have the added benefit of making numpy more of a general array library that could be exposed to any language which can call C++ libraries.)

I don't imagine that's on the table for anything near-term, but I wonder if making more of the low-level stuff C++ would make it easier for performance nuts to write their own code in C/C++ interfacing with numpy, and then expose it to python. After playing around with ufuncs at the C level for a little while last summer, I quickly realized any simplifications would be greatly appreciated.

This is all possible, yes. The way this typically works is that library authors use advanced C++ techniques to get generality, performance, and usability. The library user can then write code which is very simple and written in a way which makes simple errors very difficult to make compared to using a C-like API.

While the longer compile times are going to annoy me, I don't have a strong opinion on using C++. One thing to keep in mind though is portability. Numpy is used on many platforms and with many compilers. Keeping things working on AIX or with a PathScale compiler for example will be a lot more difficult when using C++. Or will support for not-so-common platforms be reduced? Ralf

Christopher Hanley

8:54 p.m.

On Fri, Feb 17, 2012 at 3:38 PM, Ralf Gommers <ralf.gommers@googlemail.com>wrote:

...

On Fri, Feb 17, 2012 at 8:31 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:

...
On Fri, Feb 17, 2012 at 11:00 AM, Christopher Jordan-Squire < cjordan1@uw.edu> wrote:

...
...
On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efiring@hawaii.edu> wrote:

...
On 02/17/2012 05:39 AM, Charles R Harris wrote:

...
On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <

cournape@gmail.com

...
...
<mailto:cournape@gmail.com>> wrote:

Hi Travis,

On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant <travis@continuum.io <mailto:travis@continuum.io>> wrote: > Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: > > * addition of new features that are needed in NumPy > * improving the code-base generally and moving towards a more maintainable NumPy > > I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. > > As a result, I am proposing a rough outline for releases over the next year: > > * NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help

...
...
...
some of those. > > * NumPy 1.8 to come out in July which will have as

many

...
ABI-compatible feature enhancements as we can add while

improving

...
test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included

for

...
possible inclusion are: > * resolving the NA/missing-data issues > * finishing group-by > * incorporating the start of label arrays > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) > * adding ufunc support for flexible dtypes and

...
...
...
structured arrays > * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous > * improving the ability for NumPy to receive

JIT-generated

...
function pointers for ufuncs and other calculation opportunities > * adding "filters" to Input and Output > * simple computed fields for dtypes > * accepting a Data-Type specification as a class or

JSON

...
file > * work towards improving the dtype-addition mechanism > * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset

...
...
...
is both C and C++ compatible is quite constraining.

I'm in favor of this myself, C++ would allow a lot code cleanup and

make

...
it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I

...
...
...
should grab onto it. The initial step would be a release whose code

...
...
...
would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'.

I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think

...
...
...
problems are insurmountable. And choosing a set of compilers to support is something that will need to be done.

It's true that matplotlib relies heavily on C++, both via the Agg library and in its own extension code. Personally, I don't like

...
...
think it raises the barrier to contributing. C++ is an order of magnitude more complicated than C--harder to read, and much harder to write, unless one is a true expert. In mpl it brings reliance on the CXX library, which Mike D. has had to help maintain. And if it does increase compiler specificity, that's bad.

This gets to the recruitment issue, which is one of the most important problems I see numpy facing. I personally have contributed a lot of code to NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was the biggest negative point when I considered whether it was worth contributing to the project. I suspect there are many programmers out

...
who are skilled in low-level, high-performance C++, who would be willing to contribute, but don't want to code in C.

I believe NumPy should be trying to find people who want to make high performance, close to the metal, libraries. This is a very different type of programmer than one who wants to program in Python, but is willing to dabble in a lower level language to make something run faster. High

...
library development is one of the things the C++ developer community does very well, and that community is where we have a good chance of finding the programmers NumPy needs.

...
I would much rather see development in the direction of sticking with C where direct low-level control and speed are needed, and using cython to gain higher level language benefits where appropriate. Of course,

On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe <mwwiebe@gmail.com> wrote: triage possibly that think we that the this; I there performance that

...
...
brings in the danger of reliance on another complex tool, cython. If that danger is considered excessive, then just stick with C.

There are many small benefits C++ can offer, even if numpy chooses only to use a tiny subset of the C++ language. For example, RAII can be used to reliably eliminate PyObject reference leaks.

Consider a regression like this: http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html

Fixing this in C would require switching all the relevant usages of NPY_MAXARGS to use a dynamic memory allocation. This brings with it the potential of easily introducing a memory leak, and is a lot of work to do. In C++, this functionality could be placed inside a class, where the deterministic construction/destruction semantics eliminate the risk of memory leaks and make the code easier to read at the same time. There are other examples like this where the C language has forced a suboptimal design choice because of how hard it would be to do it better.

Cheers, Mark

In a similar vein, could incorporating C++ lead to a simpler low-level API for numpy?

This could definitely happen. One way to do it is to have a stable C API which remains fixed over many releases, and a C++ library which is allowed to change significantly at each release. This is what the LLVM project does, for example. OpenCV is an example of another project which was previously just C, but now has an extensive C++ API.

...
I know Mark has talked before about--in the long-term, as a dream project to scratch his own itch, and something the BDF12 doesn't necessarily agree with--implementing the great ideas in numpy as a layered C++ library. (Which would have the added benefit of making numpy more of a general array library that could be exposed to any language which can call C++ libraries.)

I don't imagine that's on the table for anything near-term, but I wonder if making more of the low-level stuff C++ would make it easier for performance nuts to write their own code in C/C++ interfacing with numpy, and then expose it to python. After playing around with ufuncs at the C level for a little while last summer, I quickly realized any simplifications would be greatly appreciated.

This is all possible, yes. The way this typically works is that library authors use advanced C++ techniques to get generality, performance, and usability. The library user can then write code which is very simple and written in a way which makes simple errors very difficult to make compared to using a C-like API.

While the longer compile times are going to annoy me, I don't have a strong opinion on using C++. One thing to keep in mind though is portability. Numpy is used on many platforms and with many compilers. Keeping things working on AIX or with a PathScale compiler for example will be a lot more difficult when using C++. Or will support for not-so-common platforms be reduced?

Ralf

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Ralf makes a good point. During the early numpy development days I was eternally fighting with Solaris compilers. It's not really a big issue for us anymore since we have dropped Solaris support. But I'm '+1' for having easy numpy distribution being something to consider. Chris

David Cournapeau

11:44 p.m.

I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context. I would much rather move most part to cython to solve subtle ref counting issues, typically. The only way that i know of to have a stable and usable abi is to wrap the c++ code in c. Wrapping c++ libraries in python has always been a pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers. Interestingly, the api from clang exported to other languages is in c... David Le 17 févr. 2012 18:21, "Mark Wiebe" <mwwiebe@gmail.com> a écrit :

...

On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efiring@hawaii.edu> wrote:

...
On 02/17/2012 05:39 AM, Charles R Harris wrote:

...
On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <cournape@gmail.com <mailto:cournape@gmail.com>> wrote:

Hi Travis,

On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant <travis@continuum.io <mailto:travis@continuum.io>> wrote: > Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: > > * addition of new features that are needed in NumPy > * improving the code-base generally and moving towards a more maintainable NumPy > > I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. > > As a result, I am proposing a rough outline for releases over

...

...
...
next year: > > * NumPy 1.7 to come out as soon as the serious bugs can

be

...
eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. > > * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: > * resolving the NA/missing-data issues > * finishing group-by > * incorporating the start of label arrays > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) > * adding ufunc support for flexible dtypes and possibly structured arrays > * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous > * improving the ability for NumPy to receive

JIT-generated

...
function pointers for ufuncs and other calculation opportunities > * adding "filters" to Input and Output > * simple computed fields for dtypes > * accepting a Data-Type specification as a class or JSON

file

...
> * work towards improving the dtype-addition mechanism > * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale

for

...
code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some

of

...
the existing features (like our use of C99 complex). The subset

...

...
...
is both C and C++ compatible is quite constraining.

I'm in favor of this myself, C++ would allow a lot code cleanup and

make

...
it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I think we should grab onto it. The initial step would be a release whose code

...

...
...
would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'.

I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think the problems are insurmountable. And choosing a set of compilers to support is something that will need to be done.

It's true that matplotlib relies heavily on C++, both via the Agg library and in its own extension code. Personally, I don't like this; I think it raises the barrier to contributing. C++ is an order of magnitude more complicated than C--harder to read, and much harder to write, unless one is a true expert. In mpl it brings reliance on the CXX library, which Mike D. has had to help maintain. And if it does increase compiler specificity, that's bad.

This gets to the recruitment issue, which is one of the most important

...

I believe NumPy should be trying to find people who want to make high

performance, close to the metal, libraries. This is a very different type of programmer than one who wants to program in Python, but is willing to dabble in a lower level language to make something run faster. High performance library development is one of the things the C++ developer community does very well, and that community is where we have a good chance of finding the programmers NumPy needs.

...

...
I would much rather see development in the direction of sticking with C where direct low-level control and speed are needed, and using cython to gain higher level language benefits where appropriate. Of course, that brings in the danger of reliance on another complex tool, cython. If that danger is considered excessive, then just stick with C.

There are many small benefits C++ can offer, even if numpy chooses only

to use a tiny subset of the C++ language. For example, RAII can be used to reliably eliminate PyObject reference leaks.

...

Consider a regression like this: http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html

Fixing this in C would require switching all the relevant usages of

NPY_MAXARGS to use a dynamic memory allocation. This brings with it the potential of easily introducing a memory leak, and is a lot of work to do. In C++, this functionality could be placed inside a class, where the deterministic construction/destruction semantics eliminate the risk of memory leaks and make the code easier to read at the same time. There are other examples like this where the C language has forced a suboptimal design choice because of how hard it would be to do it better.

...

Cheers, Mark

...
Eric

...
Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Charles R Harris

12:58 a.m.

On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau <cournape@gmail.com>wrote:

...

I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context.

I think C++ offers much better tools than C for the sort of things in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor.

...

I would much rather move most part to cython to solve subtle ref counting issues, typically.

Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things.

...

The only way that i know of to have a stable and usable abi is to wrap the c++ code in c. Wrapping c++ libraries in python has always been a pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers.

Interestingly, the api from clang exported to other languages is in c...

The api isn't the same as the implementation language. I wouldn't prejudge these issues, but some indication of how they would be solved might be helpful. <snip> Chuck

Matthew Brett

1:54 a.m.

Hi, On Fri, Feb 17, 2012 at 4:58 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau <cournape@gmail.com> wrote:

...
I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context.

I think C++ offers much better tools than C for the sort of things in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor.

...
I would much rather move most part to cython to solve subtle ref counting issues, typically.

Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things.

Maybe a straw poll of the number of recent contributors to numpy who know: C C++ Cython would help resolve this. I suspect using C++ would reduce the number of people who feel able to contribute, compared to: Simplifying the C code Rewriting in Cython Unless there is some reason to think that neither of these approaches would work in the particular case of numpy? Best, Matthew

Charles R Harris

2:04 a.m.

On Fri, Feb 17, 2012 at 6:54 PM, Matthew Brett <matthew.brett@gmail.com>wrote:

...

Hi,

On Fri, Feb 17, 2012 at 4:58 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau <cournape@gmail.com> wrote:

...
I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people

argument

...
...
either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context.

I think C++ offers much better tools than C for the sort of things in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor.

...
I would much rather move most part to cython to solve subtle ref

counting

...
issues, typically.

Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things.

Maybe a straw poll of the number of recent contributors to numpy who know:

C C++ Cython

would help resolve this.

I suspect using C++ would reduce the number of people who feel able to contribute, compared to:

Simplifying the C code Rewriting in Cython

Unless there is some reason to think that neither of these approaches would work in the particular case of numpy?

How about a different variation. How many people writing Python would happily give up the following: 1) lists 2) dictionaries 3) default types 4) classes 5) automatic dellocation of memory Chuck

Matthew Brett

2:16 a.m.

Hi, On Fri, Feb 17, 2012 at 6:04 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

On Fri, Feb 17, 2012 at 6:54 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Fri, Feb 17, 2012 at 4:58 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau <cournape@gmail.com> wrote:

...
I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context.

I think C++ offers much better tools than C for the sort of things in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor.

...
I would much rather move most part to cython to solve subtle ref counting issues, typically.

Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things.

Maybe a straw poll of the number of recent contributors to numpy who know:

C C++ Cython

would help resolve this.

I suspect using C++ would reduce the number of people who feel able to contribute, compared to:

Simplifying the C code Rewriting in Cython

Unless there is some reason to think that neither of these approaches would work in the particular case of numpy?

How about a different variation. How many people writing Python would happily give up the following:

1) lists 2) dictionaries 3) default types 4) classes 5) automatic dellocation of memory

You gain some things and lose a lot of potential developers. Cython of course does give you access to classes, much of the automatic deallocation. Lists and dictionaries are fast in python used from Cython, as they are in Python. @Dag. @David, @anyone - have you ever had time to look and see what could be done with Cython in the numpy core? See you, Matthew

Sturla Molden

3:25 a.m.

...

How about a different variation. How many people writing Python would happily give up the following:

1) lists 2) dictionaries 3) default types 4) classes 5) automatic dellocation of memory

1) std::vector 2) std::unordered_map 3) auto 4) class 5) std::shared_ptr Sturla

David Cournapeau

2:29 a.m.

Le 18 févr. 2012 00:58, "Charles R Harris" <charlesr.harris@gmail.com> a écrit :

...

On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau <cournape@gmail.com>

...

...
I don't think c++ has any significant advantage over c for high

...

I think C++ offers much better tools than C for the sort of things in

...

...
I would much rather move most part to cython to solve subtle ref

counting issues, typically.

Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things.

...
The only way that i know of to have a stable and usable abi is to wrap

...

...
Interestingly, the api from clang exported to other languages is in c...

The api isn't the same as the implementation language. I wouldn't

Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor. There are two arguments here: that c code in numpy could be improved, and that c++ is the best way to do it. Nobody so far has argued against the first argument. i think there is a lot of space to improve things while still be in C. You say that the compiler would take care of a lot of things: so far, the main thing that has been mentionned is raii. While it is certainly a useful concept, I find it ewtremely difficult to use correctly in real applications. Things that are simple to do on simple examples become really hard to deal with when features start to interact with each other (which is always in c++). Writing robust code that is exception safe with the stl requires a lot of knowledge. I don't have this knowledge. I have .o doubt Mark has this knowledge. Does anyone else on this list has ? the c++ code in c. Wrapping c++ libraries in python has always been a pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers. prejudge these issues, but some indication of how they would be solved might be helpful. I understand that api and inplementation language are not the same: you just quoted the part where I was mentioning it :) Assuming a c++ inplementation with a c api, how will you deal with templates ? how will you deal with exception ? How will you deal with exception crossing dll/so between different compilers, which is a very common situation in our community ? david

...

<snip>

Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

josef.pktd＠gmail.com

2:40 a.m.

On Fri, Feb 17, 2012 at 9:29 PM, David Cournapeau <cournape@gmail.com> wrote:

...

Le 18 févr. 2012 00:58, "Charles R Harris" <charlesr.harris@gmail.com> a écrit :

...
On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau <cournape@gmail.com> wrote:

...
I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context.

I think C++ offers much better tools than C for the sort of things in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor.

There are two arguments here: that c code in numpy could be improved, and that c++ is the best way to do it. Nobody so far has argued against the first argument. i think there is a lot of space to improve things while still be in C.

You say that the compiler would take care of a lot of things: so far, the main thing that has been mentionned is raii. While it is certainly a useful concept, I find it ewtremely difficult to use correctly in real applications. Things that are simple to do on simple examples become really hard to deal with when features start to interact with each other (which is always in c++). Writing robust code that is exception safe with the stl requires a lot of knowledge. I don't have this knowledge. I have .o doubt Mark has this knowledge. Does anyone else on this list has ?

...
...
I would much rather move most part to cython to solve subtle ref counting issues, typically.

Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things.

What happened with the IronPython implementation of numpy that was translating into cython, as far as I understood. (just as curious bystander, I have no idea about any of the low level stuff.) Josef

...

...
...
The only way that i know of to have a stable and usable abi is to wrap the c++ code in c. Wrapping c++ libraries in python has always been a pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers.

Interestingly, the api from clang exported to other languages is in c...

The api isn't the same as the implementation language. I wouldn't prejudge these issues, but some indication of how they would be solved might be helpful.

I understand that api and inplementation language are not the same: you just quoted the part where I was mentioning it :)

Assuming a c++ inplementation with a c api, how will you deal with templates ? how will you deal with exception ? How will you deal with exception crossing dll/so between different compilers, which is a very common situation in our community ?

david

...
<snip>

...
Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Charles R Harris

3:52 a.m.

On Fri, Feb 17, 2012 at 7:29 PM, David Cournapeau <cournape@gmail.com>wrote:

...

Le 18 févr. 2012 00:58, "Charles R Harris" <charlesr.harris@gmail.com> a écrit :

...
On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau <cournape@gmail.com>

...
...
I don't think c++ has any significant advantage over c for high

wrote: performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context.

...
I think C++ offers much better tools than C for the sort of things in

Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor.

There are two arguments here: that c code in numpy could be improved, and that c++ is the best way to do it. Nobody so far has argued against the first argument. i think there is a lot of space to improve things while still be in C.

You say that the compiler would take care of a lot of things: so far, the main thing that has been mentionned is raii. While it is certainly a useful concept, I find it ewtremely difficult to use correctly in real applications. Things that are simple to do on simple examples become really hard to deal with when features start to interact with each other (which is always in c++). Writing robust code that is exception safe with the stl requires a lot of knowledge. I don't have this knowledge. I have .o doubt Mark has this knowledge. Does anyone else on this list has ?

I have the sense you have written much in C++. Exception handling is maybe one of the weakest aspects of C, that is, it basically doesn't have any. The point is, I'd rather not *have* to worry much about the C/C++ side of things, and I think once a solid foundation is in place I won't have to nearly as much. Back in the late 80's I used rather nice Fortran and C++ compilers for writing code to run in extended DOS (the dos limit was 640 KB at that time). They were written in - wait for it - Pascal. The authors explained this seemingly odd decision by claiming that Pascal was better for bigger projects than C, and I agreed with them ;) Now you can point to Linux, which is 30 million + lines of C, but that is rather exceptional and the barriers to entry at this point are pretty darn high. My own experience is that beginners can seldom write more than a page of C and get it right, mostly because of pointers. Now C++ has a ton of subtleties and one needs to decide up front what parts to use and what not, but once a well designed system is in place, many things become easier because a lot of housekeeping is done for you. My own concern here is that the project is bigger than Mark thinks and he might get sucked off into a sideline, but I'd sure like to see the experiment made.

...

...
...
I would much rather move most part to cython to solve subtle ref counting issues, typically.

Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things.

...
The only way that i know of to have a stable and usable abi is to wrap

the c++ code in c. Wrapping c++ libraries in python has always been a pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers.

...
Interestingly, the api from clang exported to other languages is in c...

The api isn't the same as the implementation language. I wouldn't prejudge these issues, but some indication of how they would be solved might be helpful.

I understand that api and inplementation language are not the same: you just quoted the part where I was mentioning it :)

Assuming a c++ inplementation with a c api, how will you deal with templates ? how will you deal with exception ? How will you deal with exception crossing dll/so between different compilers, which is a very common situation in our community ?

None of these strike me as relevant, I mean, they are internals, not api problems, and shouldn't be visible to the user. How Mark would implement the C++ API, as opposed to the C API I don't know, but since both would be there I don't see the problem. But really, we need more details on how these things would work. Chuck

David Cournapeau

4:18 a.m.

Le 18 févr. 2012 03:53, "Charles R Harris" <charlesr.harris@gmail.com> a écrit :

...

On Fri, Feb 17, 2012 at 7:29 PM, David Cournapeau <cournape@gmail.com>

...

...
Le 18 févr. 2012 00:58, "Charles R Harris" <charlesr.harris@gmail.com> a

écrit :

...
...
On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau <cournape@gmail.com>

wrote:

...
...
...
I don't think c++ has any significant advantage over c for high

...

...
...
I think C++ offers much better tools than C for the sort of things in

Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor.

There are two arguments here: that c code in numpy could be improved, and that c++ is the best way to do it. Nobody so far has argued against the first argument. i think there is a lot of space to improve things while still be in C.

You say that the compiler would take care of a lot of things: so far,

...

I have the sense you have written much in C++. Exception handling is

maybe one of the weakest aspects of C, that is, it basically doesn't have any. The point is, I'd rather not *have* to worry much about the C/C++ side of things, and I think once a solid foundation is in place I won't have to nearly as much.

...

Back in the late 80's I used rather nice Fortran and C++ compilers for

writing code to run in extended DOS (the dos limit was 640 KB at that time). They were written in - wait for it - Pascal. The authors explained this seemingly odd decision by claiming that Pascal was better for bigger projects than C, and I agreed with them ;) Now you can point to Linux, which is 30 million + lines of C, but that is rather exceptional and the barriers to entry at this point are pretty darn high. My own experience is that beginners can seldom write more than a page of C and get it right, mostly because of pointers. Now C++ has a ton of subtleties and one needs to decide up front what parts to use and what not, but once a well designed system is in place, many things become easier because a lot of housekeeping is done for you.

...

My own concern here is that the project is bigger than Mark thinks and he

...

...
...
...
I would much rather move most part to cython to solve subtle ref

counting issues, typically.

...
...
Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner

;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things.

...
...
The only way that i know of to have a stable and usable abi is to

wrap the c++ code in c. Wrapping c++ libraries in python has always been a

...

...
...
...
Interestingly, the api from clang exported to other languages is in

c...

The api isn't the same as the implementation language. I wouldn't

...

...
I understand that api and inplementation language are not the same: you

just quoted the part where I was mentioning it :)

...
Assuming a c++ inplementation with a c api, how will you deal with

templates ? how will you deal with exception ? How will you deal with exception crossing dll/so between different compilers, which is a very common situation in our community ?

None of these strike me as relevant, I mean, they are internals, not api

might get sucked off into a sideline, but I'd sure like to see the experiment made. pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers. prejudge these issues, but some indication of how they would be solved might be helpful. problems, and shouldn't be visible to the user. How Mark would implement the C++ API, as opposed to the C API I don't know, but since both would be there I don't see the problem. But really, we need more details on how these things would work. I don't understand why you think this is not relevant ? If numpy is in c++, with a C API, most users of numpy C/C++ API will use the C API, at least at first, since most of them are in C. Changes of restrictions on how this API xan be used is visible. To be more concrete, if numpy is built by MS compiler, and an exception is thrown, you will have a lots of trouble with an extension built with gcc. I have also observed some weird things in linux when mixing intel and gcc. This will have significant impacts on how people will be able to use extensions. I am a bit surprised by the claim.that abi and cross language API are not an issue with c++: it is a widely shared issue even within c++ proponents. David

...

Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Charles R Harris

4:37 a.m.

On Fri, Feb 17, 2012 at 9:18 PM, David Cournapeau <cournape@gmail.com>wrote:

...

Le 18 févr. 2012 03:53, "Charles R Harris" <charlesr.harris@gmail.com> a écrit :

...
On Fri, Feb 17, 2012 at 7:29 PM, David Cournapeau <cournape@gmail.com>

...
...
Le 18 févr. 2012 00:58, "Charles R Harris" <charlesr.harris@gmail.com>

a écrit :

...
...
On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau <cournape@gmail.com>

wrote:

...
...
...
I don't think c++ has any significant advantage over c for high

...
...
...
I think C++ offers much better tools than C for the sort of things in

Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor.

There are two arguments here: that c code in numpy could be improved, and that c++ is the best way to do it. Nobody so far has argued against the first argument. i think there is a lot of space to improve things while still be in C.

You say that the compiler would take care of a lot of things: so far,

wrote: performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context. the main thing that has been mentionned is raii. While it is certainly a useful concept, I find it ewtremely difficult to use correctly in real applications. Things that are simple to do on simple examples become really hard to deal with when features start to interact with each other (which is always in c++). Writing robust code that is exception safe with the stl requires a lot of knowledge. I don't have this knowledge. I have .o doubt Mark has this knowledge. Does anyone else on this list has ?

...
I have the sense you have written much in C++. Exception handling is

maybe one of the weakest aspects of C, that is, it basically doesn't have any. The point is, I'd rather not *have* to worry much about the C/C++ side of things, and I think once a solid foundation is in place I won't have to nearly as much.

...
Back in the late 80's I used rather nice Fortran and C++ compilers for

writing code to run in extended DOS (the dos limit was 640 KB at that time). They were written in - wait for it - Pascal. The authors explained this seemingly odd decision by claiming that Pascal was better for bigger projects than C, and I agreed with them ;) Now you can point to Linux, which is 30 million + lines of C, but that is rather exceptional and the barriers to entry at this point are pretty darn high. My own experience is that beginners can seldom write more than a page of C and get it right, mostly because of pointers. Now C++ has a ton of subtleties and one needs to decide up front what parts to use and what not, but once a well designed system is in place, many things become easier because a lot of housekeeping is done for you.

...
My own concern here is that the project is bigger than Mark thinks and

...
...
...
...
I would much rather move most part to cython to solve subtle ref

counting issues, typically.

...
...
Not me, I'd rather write most stuff in C/C++ than Cython, C is

cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things.

...
...
The only way that i know of to have a stable and usable abi is to

wrap the c++ code in c. Wrapping c++ libraries in python has always been a

...
...
...
...
Interestingly, the api from clang exported to other languages is in

c...

The api isn't the same as the implementation language. I wouldn't

...
...
I understand that api and inplementation language are not the same: you

just quoted the part where I was mentioning it :)

...
Assuming a c++ inplementation with a c api, how will you deal with

templates ? how will you deal with exception ? How will you deal with exception crossing dll/so between different compilers, which is a very common situation in our community ?

None of these strike me as relevant, I mean, they are internals, not api

he might get sucked off into a sideline, but I'd sure like to see the experiment made. pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers. prejudge these issues, but some indication of how they would be solved might be helpful. problems, and shouldn't be visible to the user. How Mark would implement the C++ API, as opposed to the C API I don't know, but since both would be there I don't see the problem. But really, we need more details on how these things would work.

I don't understand why you think this is not relevant ? If numpy is in c++, with a C API, most users of numpy C/C++ API will use the C API, at least at first, since most of them are in C. Changes of restrictions on how this API xan be used is visible.

To be more concrete, if numpy is built by MS compiler, and an exception is thrown, you will have a lots of trouble with an extension built with gcc.

Why would you even see an exception if it is caught before it escapes? I would expect the C API to behave just as it currently does. What am I missing?

...

I have also observed some weird things in linux when mixing intel and gcc. This will have significant impacts on how people will be able to use extensions.

I am a bit surprised by the claim.that abi and cross language API are not an issue with c++: it is a widely shared issue even within c++ proponents.

Chuck

Sturla Molden

4:47 a.m.

...

Why would you even see an exception if it is caught before it escapes? I would expect the C API to behave just as it currently does. What am I missing?

Structured exception handling in the OS. MSVC uses SEH for C++ exceptions. Memory allocation fails in gcc code. Instead of returning NULL, Windows jumps to the SEH handler set in the MSVC code... *poff* Sturla

Charles R Harris

4:56 a.m.

On Fri, Feb 17, 2012 at 9:47 PM, Sturla Molden <sturla@molden.no> wrote:

...

...
Why would you even see an exception if it is caught before it escapes? I

would expect the C API to behave just as it currently does. What am I missing?

Structured exception handling in the OS.

MSVC uses SEH for C++ exceptions.

Memory allocation fails in gcc code. Instead of returning NULL, Windows jumps to the SEH handler set in the MSVC code... *poff*

But won't a C++ wrapper catch that? Chuck

Sturla Molden

5:16 a.m.

Den 18. feb. 2012 kl. 05:56 skrev Charles R Harris <charlesr.harris@gmail.com>:

...

But won't a C++ wrapper catch that?

A try-catch block with MSVC will register an SEH with the operating system. GCC (g++) implements exceptions without SEH. What happens if GCC code tries to catch a std::bad_alloc? Windows intervenes and sends control to a registered SEH. So the flow of control jumps out of GCC's hands, and goes to some catch or __except block set by MSVC instead. And now the stack is FUBAR... But this can always happen when you mix MSVC and MinGW. Even pure C code can set an SEH with MSVC, so it's not a C++ issue. You cannot wrap in a way that protects you from an intervention by the operating system. It's better to stick with MS and Intel compilers on Windows. MinGW code must execute in an SEH free environment. Sturla

Charles R Harris

8:29 p.m.

On Fri, Feb 17, 2012 at 10:16 PM, Sturla Molden <sturla@molden.no> wrote:

...

Den 18. feb. 2012 kl. 05:56 skrev Charles R Harris < charlesr.harris@gmail.com>:

...
But won't a C++ wrapper catch that?

A try-catch block with MSVC will register an SEH with the operating system. GCC (g++) implements exceptions without SEH. What happens if GCC code tries to catch a std::bad_alloc? Windows intervenes and sends control to a registered SEH. So the flow of control jumps out of GCC's hands, and goes to some catch or __except block set by MSVC instead. And now the stack is FUBAR... But this can always happen when you mix MSVC and MinGW. Even pure C code can set an SEH with MSVC, so it's not a C++ issue. You cannot wrap in a way that protects you from an intervention by the operating system. It's better to stick with MS and Intel compilers on Windows. MinGW code must execute in an SEH free environment.

Here's a link with some current comments<http://www.kineticsystem.org/?q=node/19>on mingw-64. I have the impression that things are moving (slowly) towards interoperability. Chuck

...

David Cournapeau

5 a.m.

Le 18 févr. 2012 04:37, "Charles R Harris" <charlesr.harris@gmail.com> a écrit :

...

On Fri, Feb 17, 2012 at 9:18 PM, David Cournapeau <cournape@gmail.com>

...

...
Le 18 févr. 2012 03:53, "Charles R Harris" <charlesr.harris@gmail.com> a

écrit :

...
...
On Fri, Feb 17, 2012 at 7:29 PM, David Cournapeau <cournape@gmail.com>

wrote:

...
...
...
Le 18 févr. 2012 00:58, "Charles R Harris" <charlesr.harris@gmail.com>

a écrit :

...
...
On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau <

cournape@gmail.com> wrote:

...
...
...
I don't think c++ has any significant advantage over c for high

...

...
...
...
...
I think C++ offers much better tools than C for the sort of things

in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor.

There are two arguments here: that c code in numpy could be improved, and that c++ is the best way to do it. Nobody so far has argued against the first argument. i think there is a lot of space to improve things while still be in C.

You say that the compiler would take care of a lot of things: so far,

...

...
...
I have the sense you have written much in C++. Exception handling is

maybe one of the weakest aspects of C, that is, it basically doesn't have any. The point is, I'd rather not *have* to worry much about the C/C++ side of things, and I think once a solid foundation is in place I won't have to nearly as much.

...
Back in the late 80's I used rather nice Fortran and C++ compilers for

writing code to run in extended DOS (the dos limit was 640 KB at that time). They were written in - wait for it - Pascal. The authors explained

...

...
...
My own concern here is that the project is bigger than Mark thinks and

he might get sucked off into a sideline, but I'd sure like to see the experiment made.

...
...
...
...
I would much rather move most part to cython to solve subtle ref

counting issues, typically.

...
...
Not me, I'd rather write most stuff in C/C++ than Cython, C is

cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things.

...
...
The only way that i know of to have a stable and usable abi is to

wrap the c++ code in c. Wrapping c++ libraries in python has always been a

...

...
...
...
...
...
Interestingly, the api from clang exported to other languages is

in c...

The api isn't the same as the implementation language. I wouldn't

...

...
...
...
I understand that api and inplementation language are not the same:

you just quoted the part where I was mentioning it :)

...
Assuming a c++ inplementation with a c api, how will you deal with

templates ? how will you deal with exception ? How will you deal with exception crossing dll/so between different compilers, which is a very common situation in our community ?

None of these strike me as relevant, I mean, they are internals, not api problems, and shouldn't be visible to the user. How Mark would implement the C++ API, as opposed to the C API I don't know, but since both would be there I don't see the problem. But really, we need more details on how these things would work.

I don't understand why you think this is not relevant ? If numpy is in c++, with a C API, most users of numpy C/C++ API will use the C API, at least at first, since most of them are in C. Changes of restrictions on how

wrote: performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context. the main thing that has been mentionned is raii. While it is certainly a useful concept, I find it ewtremely difficult to use correctly in real applications. Things that are simple to do on simple examples become really hard to deal with when features start to interact with each other (which is always in c++). Writing robust code that is exception safe with the stl requires a lot of knowledge. I don't have this knowledge. I have .o doubt Mark has this knowledge. Does anyone else on this list has ? this seemingly odd decision by claiming that Pascal was better for bigger projects than C, and I agreed with them ;) Now you can point to Linux, which is 30 million + lines of C, but that is rather exceptional and the barriers to entry at this point are pretty darn high. My own experience is that beginners can seldom write more than a page of C and get it right, mostly because of pointers. Now C++ has a ton of subtleties and one needs to decide up front what parts to use and what not, but once a well designed system is in place, many things become easier because a lot of housekeeping is done for you. pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers. prejudge these issues, but some indication of how they would be solved might be helpful. this API xan be used is visible.

...

...
To be more concrete, if numpy is built by MS compiler, and an exception

is thrown, you will have a lots of trouble with an extension built with gcc.

Why would you even see an exception if it is caught before it escapes? I would expect the C API to behave just as it currently does. What am I missing?

...

...
I have also observed some weird things in linux when mixing intel and

gcc. This will have significant impacts on how people will be able to use extensions.

...
I am a bit surprised by the claim.that abi and cross language API are

not an issue with c++: it is a widely shared issue even within c++

I believe that you cannot always guarantee that no exception will go through even with a catch all at the c++ -> c layer. I will try to find more about it, as I cannot remember the exact details I have in mind (need to look at the customer's code). David proponents.

...

Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Charles R Harris

5:06 a.m.

On Fri, Feb 17, 2012 at 10:00 PM, David Cournapeau <cournape@gmail.com>wrote:

...

Le 18 févr. 2012 04:37, "Charles R Harris" <charlesr.harris@gmail.com> a écrit :

...
On Fri, Feb 17, 2012 at 9:18 PM, David Cournapeau <cournape@gmail.com>

...
...
Le 18 févr. 2012 03:53, "Charles R Harris" <charlesr.harris@gmail.com>

a écrit :

...
...
On Fri, Feb 17, 2012 at 7:29 PM, David Cournapeau <cournape@gmail.com>

wrote:

...
...
...
Le 18 févr. 2012 00:58, "Charles R Harris" <

charlesr.harris@gmail.com> a écrit :

...
...
On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau <

cournape@gmail.com> wrote:

...
...
> > I don't think c++ has any significant advantage over c for high

...
...
...
...
...
I think C++ offers much better tools than C for the sort of things

in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor.

There are two arguments here: that c code in numpy could be improved, and that c++ is the best way to do it. Nobody so far has argued against the first argument. i think there is a lot of space to improve

...
...
...
...
You say that the compiler would take care of a lot of things: so

far, the main thing that has been mentionned is raii. While it is certainly a useful concept, I find it ewtremely difficult to use correctly in real applications. Things that are simple to do on simple examples become really hard to deal with when features start to interact with each other (which is always in c++). Writing robust code that is exception safe with the stl requires a lot of knowledge. I don't have this knowledge. I have .o doubt Mark has this knowledge. Does anyone else on this list has ?

I have the sense you have written much in C++. Exception handling is maybe one of the weakest aspects of C, that is, it basically doesn't have any. The point is, I'd rather not *have* to worry much about the C/C++ side of things, and I think once a solid foundation is in place I won't have to nearly as much.

Back in the late 80's I used rather nice Fortran and C++ compilers for writing code to run in extended DOS (the dos limit was 640 KB at that time). They were written in - wait for it - Pascal. The authors explained

...
...
...
My own concern here is that the project is bigger than Mark thinks

and he might get sucked off into a sideline, but I'd sure like to see the experiment made.

...
...
...
> I would much rather move most part to cython to solve subtle ref

counting issues, typically.

...
...
Not me, I'd rather write most stuff in C/C++ than Cython, C is

cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things.

...
> > The only way that i know of to have a stable and usable abi is to wrap the c++ code in c. Wrapping c++ libraries in python has always been a

...
...
...
...
...
> > Interestingly, the api from clang exported to other languages is in c...

The api isn't the same as the implementation language. I wouldn't

...
...
...
...
I understand that api and inplementation language are not the same:

you just quoted the part where I was mentioning it :)

...
Assuming a c++ inplementation with a c api, how will you deal with

templates ? how will you deal with exception ? How will you deal with exception crossing dll/so between different compilers, which is a very common situation in our community ?

None of these strike me as relevant, I mean, they are internals, not api problems, and shouldn't be visible to the user. How Mark would implement the C++ API, as opposed to the C API I don't know, but since both would be there I don't see the problem. But really, we need more details on how these things would work.

I don't understand why you think this is not relevant ? If numpy is in c++, with a C API, most users of numpy C/C++ API will use the C API, at least at first, since most of them are in C. Changes of restrictions on how

wrote: performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context. things while still be in C. this seemingly odd decision by claiming that Pascal was better for bigger projects than C, and I agreed with them ;) Now you can point to Linux, which is 30 million + lines of C, but that is rather exceptional and the barriers to entry at this point are pretty darn high. My own experience is that beginners can seldom write more than a page of C and get it right, mostly because of pointers. Now C++ has a ton of subtleties and one needs to decide up front what parts to use and what not, but once a well designed system is in place, many things become easier because a lot of housekeeping is done for you. pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers. prejudge these issues, but some indication of how they would be solved might be helpful. this API xan be used is visible.

...
...
To be more concrete, if numpy is built by MS compiler, and an exception

is thrown, you will have a lots of trouble with an extension built with gcc.

Why would you even see an exception if it is caught before it escapes? I would expect the C API to behave just as it currently does. What am I missing?

I believe that you cannot always guarantee that no exception will go through even with a catch all at the c++ -> c layer. I will try to find more about it, as I cannot remember the exact details I have in mind (need to look at the customer's code).

Stackoverflow<http://stackoverflow.com/questions/276102/catching-all-unhandled-c-exceptions>says you can catch all MSVC MEH exceptions. Chuck

Charles R Harris

4:54 a.m.

On Fri, Feb 17, 2012 at 9:18 PM, David Cournapeau <cournape@gmail.com>wrote:

...

Le 18 févr. 2012 03:53, "Charles R Harris" <charlesr.harris@gmail.com> a écrit :

...
On Fri, Feb 17, 2012 at 7:29 PM, David Cournapeau <cournape@gmail.com>

...
...
Le 18 févr. 2012 00:58, "Charles R Harris" <charlesr.harris@gmail.com>

a écrit :

...
...
On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau <cournape@gmail.com>

wrote:

...
...
...
I don't think c++ has any significant advantage over c for high

...
...
...
I think C++ offers much better tools than C for the sort of things in

Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor.

There are two arguments here: that c code in numpy could be improved, and that c++ is the best way to do it. Nobody so far has argued against the first argument. i think there is a lot of space to improve things while still be in C.

You say that the compiler would take care of a lot of things: so far,

wrote: performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context. the main thing that has been mentionned is raii. While it is certainly a useful concept, I find it ewtremely difficult to use correctly in real applications. Things that are simple to do on simple examples become really hard to deal with when features start to interact with each other (which is always in c++). Writing robust code that is exception safe with the stl requires a lot of knowledge. I don't have this knowledge. I have .o doubt Mark has this knowledge. Does anyone else on this list has ?

...
I have the sense you have written much in C++. Exception handling is

maybe one of the weakest aspects of C, that is, it basically doesn't have any. The point is, I'd rather not *have* to worry much about the C/C++ side of things, and I think once a solid foundation is in place I won't have to nearly as much.

...
Back in the late 80's I used rather nice Fortran and C++ compilers for

writing code to run in extended DOS (the dos limit was 640 KB at that time). They were written in - wait for it - Pascal. The authors explained this seemingly odd decision by claiming that Pascal was better for bigger projects than C, and I agreed with them ;) Now you can point to Linux, which is 30 million + lines of C, but that is rather exceptional and the barriers to entry at this point are pretty darn high. My own experience is that beginners can seldom write more than a page of C and get it right, mostly because of pointers. Now C++ has a ton of subtleties and one needs to decide up front what parts to use and what not, but once a well designed system is in place, many things become easier because a lot of housekeeping is done for you.

...
My own concern here is that the project is bigger than Mark thinks and

...
...
...
...
I would much rather move most part to cython to solve subtle ref

counting issues, typically.

...
...
Not me, I'd rather write most stuff in C/C++ than Cython, C is

cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things.

...
...
The only way that i know of to have a stable and usable abi is to

wrap the c++ code in c. Wrapping c++ libraries in python has always been a

...
...
...
...
Interestingly, the api from clang exported to other languages is in

c...

The api isn't the same as the implementation language. I wouldn't

...
...
I understand that api and inplementation language are not the same: you

just quoted the part where I was mentioning it :)

...
Assuming a c++ inplementation with a c api, how will you deal with

templates ? how will you deal with exception ? How will you deal with exception crossing dll/so between different compilers, which is a very common situation in our community ?

None of these strike me as relevant, I mean, they are internals, not api

he might get sucked off into a sideline, but I'd sure like to see the experiment made. pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers. prejudge these issues, but some indication of how they would be solved might be helpful. problems, and shouldn't be visible to the user. How Mark would implement the C++ API, as opposed to the C API I don't know, but since both would be there I don't see the problem. But really, we need more details on how these things would work.

I don't understand why you think this is not relevant ? If numpy is in c++, with a C API, most users of numpy C/C++ API will use the C API, at least at first, since most of them are in C. Changes of restrictions on how this API xan be used is visible.

To be more concrete, if numpy is built by MS compiler, and an exception is thrown, you will have a lots of trouble with an extension built with gcc.

I have also observed some weird things in linux when mixing intel and gcc. This will have significant impacts on how people will be able to use extensions.

I am a bit surprised by the claim.that abi and cross language API are not an issue with c++: it is a widely shared issue even within c++ proponents.

I found this<http://stackoverflow.com/questions/4978330/c-library-with-c-interface>, which references 0mq (used by ipython) as an example of a C++ library with a C interface. It seems enums can have different sizes in C/C++, so that is something to watch. Chuck

Robert Kern

11:25 a.m.

On Sat, Feb 18, 2012 at 04:54, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

I found this , which references 0mq (used by ipython) as an example of a C++ library with a C interface. It seems enums can have different sizes in C/C++, so that is something to watch.

One of the ways they manage to do this is by scrupulously avoiding exceptions even in the internal, never-touches-C zone. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

David Cournapeau

1:38 p.m.

Le 18 févr. 2012 11:25, "Robert Kern" <robert.kern@gmail.com> a écrit :

...

On Sat, Feb 18, 2012 at 04:54, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
I found this , which references 0mq (used by ipython) as an example of

a C++

...

...
library with a C interface. It seems enums can have different sizes in C/C++, so that is something to watch.

One of the ways they manage to do this is by scrupulously avoiding exceptions even in the internal, never-touches-C zone.

I took a superficial look at zeromq 2.x sources: it looks like they don't use much of the stl (beyond vector and some trivial usages of algorithm). I wonder if this is linked ? FWIW, I would be fine with using such a subset in numpy. David

...

-- Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Sturla Molden

3:47 p.m.

Den 18. feb. 2012 kl. 14:38 skrev David Cournapeau <cournape@gmail.com>:

...

I took a superficial look at zeromq 2.x sources: it looks like they don't use much of the stl (beyond vector and some trivial usages of algorithm). I wonder if this is linked ?

FWIW, I would be fine with using such a subset in numpy.

I think basing it on STL and perhaps Boost would be fine. The problem is not exposing C++ to C, it is mixing MSVC and MinGW. But that problem exists regardless of what we do. It's not a fair argument against C++. Sturla

Sturla Molden

3:07 a.m.

Den 18. feb. 2012 kl. 01:58 skrev Charles R Harris <charlesr.harris@gmail.com>:

...

On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau <cournape@gmail.com> wrote: I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context.

I think C++ offers much better tools than C for the sort of things in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor.

The C++11 standard is fantastic. There are automatic data types, closures, reference counting, weak references, an improved STL with datatypes that map almost 1:1 against any built-in Python type, a sane threading API, regex, ect. Even prng is Mersenne Twister by standard. With C++11 it is finally possible to "write C++ (almost) like Python". On the downside, C++ takes a long term to learn, most C++ text books teach bad programming habits from the beginning to the end, and C++ becomes inherently dangerous if you write C++ like C. Many also abuse C++ as an bloatware generator. Templates can also be abused to write code that are impossible to debug. While it in theory could be better, C is a much smaller language. Personally I prefer C++ to C, but I am not convinced it will be better for NumPy. I agree about Cython. It is nice for writing a Python interface for C, but get messy and unclean when used for anything else. It also has too much focus on adding all sorts of "new features" instead of correctness and stability. I don't trust it to generate bug-free code anymore. For wrapping C, Swig might be just as good. For C++, SIP, CXX or Boost.Pyton work well too. If cracy ideas are allowed, what about PyPy RPython? Or perhaps Go? Or even C# if a native compuler could be found? Sturla

...

I would much rather move most part to cython to solve subtle ref counting issues, typically.

Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things. The only way that i know of to have a stable and usable abi is to wrap the c++ code in c. Wrapping c++ libraries in python has always been a pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers.

Interestingly, the api from clang exported to other languages is in c...

The api isn't the same as the implementation language. I wouldn't prejudge these issues, but some indication of how they would be solved might be helpful.

<snip>

Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Jason Grout

3:27 a.m.

On 2/17/12 9:07 PM, Sturla Molden wrote:

...

Den 18. feb. 2012 kl. 01:58 skrev Charles R Harris <charlesr.harris@gmail.com <mailto:charlesr.harris@gmail.com>>:

...
On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau <cournape@gmail.com <mailto:cournape@gmail.com>> wrote:

I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context.

I think C++ offers much better tools than C for the sort of things in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor.

The C++11 standard is fantastic. There are automatic data types, closures, reference counting, weak references, an improved STL with datatypes that map almost 1:1 against any built-in Python type, a sane threading API, regex, ect. Even prng is Mersenne Twister by standard. With C++11 it is finally possible to "write C++ (almost) like Python". On the downside, C++ takes a long term to learn, most C++ text books teach bad programming habits from the beginning to the end, and C++ becomes inherently dangerous if you write C++ like C. Many also abuse C++ as an bloatware generator. Templates can also be abused to write code that are impossible to debug. While it in theory could be better, C is a much smaller language. Personally I prefer C++ to C, but I am not convinced it will be better for NumPy.

I agree about Cython. It is nice for writing a Python interface for C, but get messy and unclean when used for anything else. It also has too much focus on adding all sorts of "new features" instead of correctness and stability. I don't trust it to generate bug-free code anymore.

For what it's worth, Cython supports C++ now. I'm sure there are people on this list that know much better than me the extent of this support, so I will let them chime in, but here are some docs on it: http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html If you have specific examples of new features trumping correctness and stability, I'm sure the Cython devel list would love to hear about it. They seem to be pretty concerned about stability and correctness to me, though I admit I don't follow the list extremely deeply. I don't trust any automated tool to generate bug-free code. I don't even trust myself to generate bug-free code :). Jason

Sturla Molden

3:55 a.m.

Den 18. feb. 2012 kl. 04:27 skrev Jason Grout <jason-sage@creativetrax.com>:

...

On 2/17/12 9:07 PM, Sturla Molden wrote:

...
Den 18. feb. 2012 kl. 01:58 skrev Charles R Harris <charlesr.harris@gmail.com <mailto:charlesr.harris@gmail.com>>:

...
On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau <cournape@gmail.com <mailto:cournape@gmail.com>> wrote:

I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context.

I think C++ offers much better tools than C for the sort of things in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor.

The C++11 standard is fantastic. There are automatic data types, closures, reference counting, weak references, an improved STL with datatypes that map almost 1:1 against any built-in Python type, a sane threading API, regex, ect. Even prng is Mersenne Twister by standard. With C++11 it is finally possible to "write C++ (almost) like Python". On the downside, C++ takes a long term to learn, most C++ text books teach bad programming habits from the beginning to the end, and C++ becomes inherently dangerous if you write C++ like C. Many also abuse C++ as an bloatware generator. Templates can also be abused to write code that are impossible to debug. While it in theory could be better, C is a much smaller language. Personally I prefer C++ to C, but I am not convinced it will be better for NumPy.

I agree about Cython. It is nice for writing a Python interface for C, but get messy and unclean when used for anything else. It also has too much focus on adding all sorts of "new features" instead of correctness and stability. I don't trust it to generate bug-free code anymore.

For what it's worth, Cython supports C++ now. I'm sure there are people on this list that know much better than me the extent of this support, so I will let them chime in, but here are some docs on it:

http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html

If you have specific examples of new features trumping correctness and stability, I'm sure the Cython devel list would love to hear about it. They seem to be pretty concerned about stability and correctness to me, though I admit I don't follow the list extremely deeply.

I don't trust any automated tool to generate bug-free code. I don't even trust myself to generate bug-free code :).

Jason _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Sturla Molden

4:10 a.m.

...

For what it's worth, Cython supports C++ now. I'm sure there are people on this list that know much better than me the extent of this support, so I will let them chime in, but here are some docs on it:

http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html

Sure. They just keep adding features for the expence of stability. No focus or sence of direction. Focus on a small feature set, make it right, then don't add to it. That is the root of the successes of C, Python and Java. NumPy needs a stabile compiler that don't make mistakes everywhere. You cannot trust that to Cython. Sturla

Charles R Harris

4:16 a.m.

On Fri, Feb 17, 2012 at 9:10 PM, Sturla Molden <sturla@molden.no> wrote:

...

...
For what it's worth, Cython supports C++ now. I'm sure there are people on this list that know much better than me the extent of this support, so I will let them chime in, but here are some docs on it:

http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html

Sure. They just keep adding features for the expence of stability. No focus or sence of direction. Focus on a small feature set, make it right, then don't add to it. That is the root of the successes of C, Python and Java. NumPy needs a stabile compiler that don't make mistakes everywhere. You cannot trust that to Cython.

I'm staying out of this fight. Chuck

Jason Grout

4:23 a.m.

On 2/17/12 10:10 PM, Sturla Molden wrote:

...

Sure. They just keep adding features for the expence of stability. No focus or sence of direction. Focus on a small feature set, make it right, then don't add to it. That is the root of the successes of C, Python and Java. NumPy needs a stabile compiler that don't make mistakes everywhere. You cannot trust that to Cython.

Again, if you have specific examples of stability being sacrificed, I'm sure the Cython list would like to hear about it. Your statements, as-is, are raising huge FUD flags for me. Anyways, I've said enough on this, and we've seen enough problems in discussions on this list already. Many people in the numpy community know Cython well enough to judge these things for themselves. Thanks, Jason

Sturla Molden

4:35 a.m.

Den 18. feb. 2012 kl. 05:23 skrev Jason Grout <jason-sage@creativetrax.com>:

...

On 2/17/12 10:10 PM, Sturla Molden wrote:

...
Sure. They just keep adding features for the expence of stability. No focus or sence of direction. Focus on a small feature set, make it right, then don't add to it. That is the root of the successes of C, Python and Java. NumPy needs a stabile compiler that don't make mistakes everywhere. You cannot trust that to Cython.

Again, if you have specific examples of stability being sacrificed, I'm sure the Cython list would like to hear about it. Your statements, as-is, are raising huge FUD flags for me.

Cython is still 0.16, it is still unfinished. We cannot base NumPy on an unfinished compiler. Sturla

Charles R Harris

4:03 a.m.

On Fri, Feb 17, 2012 at 8:07 PM, Sturla Molden <sturla@molden.no> wrote:

...

Den 18. feb. 2012 kl. 01:58 skrev Charles R Harris < charlesr.harris@gmail.com>:

On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau <cournape@gmail.com>wrote:

...
I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context.

I think C++ offers much better tools than C for the sort of things in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor.

The C++11 standard is fantastic. There are automatic data types, closures, reference counting, weak references, an improved STL with datatypes that map almost 1:1 against any built-in Python type, a sane threading API, regex, ect. Even prng is Mersenne Twister by standard. With C++11 it is finally possible to "write C++ (almost) like Python". On the downside, C++ takes a long term to learn, most C++ text books

Are crap ;) Yeah, that is a downside.

...

teach bad programming habits from the beginning to the end, and C++ becomes inherently dangerous if you write C++ like C. Many also abuse C++ as an bloatware generator. Templates can also be abused to write code that are impossible to debug. While it in theory could be better, C is a much smaller language. Personally I prefer C++ to C, but I am not convinced it will be better for NumPy.

I agree about Cython. It is nice for writing a Python interface for C, but get messy and unclean when used for anything else. It also has too much focus on adding all sorts of "new features" instead of correctness and stability. I don't trust it to generate bug-free code anymore.

For wrapping C, Swig might be just as good. For C++, SIP, CXX or Boost.Pyton work well too.

If cracy ideas are allowed, what about PyPy RPython? Or perhaps Go? Or even C# if a native compuler could be found?

Chuck

Neal Becker

2 p.m.

Sturla Molden wrote:

...

Den 18. feb. 2012 kl. 01:58 skrev Charles R Harris <charlesr.harris@gmail.com>:

...
On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau <cournape@gmail.com> wrote: I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context.

I think C++ offers much better tools than C for the sort of things in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor.

The C++11 standard is fantastic. There are automatic data types, closures, reference counting, weak references, an improved STL with datatypes that map almost 1:1 against any built-in Python type, a sane threading API, regex, ect. Even prng is Mersenne Twister by standard. With C++11 it is finally possible to "write C++ (almost) like Python". On the downside, C++ takes a long term to learn, most C++ text books teach bad programming habits from the beginning to the end, and C++ becomes inherently dangerous if you write C++ like C. Many also abuse C++ as an bloatware generator. Templates can also be abused to write code that are impossible to debug. While it in theory could be better, C is a much smaller language. Personally I prefer C++ to C, but I am not convinced it will be better for NumPy.

I'm all for c++11, but if you are worried about portability, dude, you have a bit of a problem here.

...

I agree about Cython. It is nice for writing a Python interface for C, but get messy and unclean when used for anything else. It also has too much focus on adding all sorts of "new features" instead of correctness and stability. I don't trust it to generate bug-free code anymore.

For wrapping C, Swig might be just as good. For C++, SIP, CXX or Boost.Pyton work well too.

If cracy ideas are allowed, what about PyPy RPython? Or perhaps Go? Or even C# if a native compuler could be found?

c# is a non-starter if you want to run on linux.

David Cournapeau

2:59 a.m.

Le 17 févr. 2012 18:21, "Mark Wiebe" <mwwiebe@gmail.com> a écrit :

...

On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efiring@hawaii.edu> wrote:

...
On 02/17/2012 05:39 AM, Charles R Harris wrote:

...
On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <cournape@gmail.com <mailto:cournape@gmail.com>> wrote:

Hi Travis,

On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant <travis@continuum.io <mailto:travis@continuum.io>> wrote: > Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: > > * addition of new features that are needed in NumPy > * improving the code-base generally and moving towards a more maintainable NumPy > > I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. > > As a result, I am proposing a rough outline for releases over

...

...
...
next year: > > * NumPy 1.7 to come out as soon as the serious bugs can

be

...
eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. > > * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: > * resolving the NA/missing-data issues > * finishing group-by > * incorporating the start of label arrays > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) > * adding ufunc support for flexible dtypes and possibly structured arrays > * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous > * improving the ability for NumPy to receive

JIT-generated

...
function pointers for ufuncs and other calculation opportunities > * adding "filters" to Input and Output > * simple computed fields for dtypes > * accepting a Data-Type specification as a class or JSON

file

...
> * work towards improving the dtype-addition mechanism > * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale

for

...
code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some

of

...
the existing features (like our use of C99 complex). The subset

...

...
...
is both C and C++ compatible is quite constraining.

I'm in favor of this myself, C++ would allow a lot code cleanup and

make

...
it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I think we should grab onto it. The initial step would be a release whose code

...

...
...
would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'.

I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think the problems are insurmountable. And choosing a set of compilers to support is something that will need to be done.

It's true that matplotlib relies heavily on C++, both via the Agg library and in its own extension code. Personally, I don't like this; I think it raises the barrier to contributing. C++ is an order of magnitude more complicated than C--harder to read, and much harder to write, unless one is a true expert. In mpl it brings reliance on the CXX library, which Mike D. has had to help maintain. And if it does increase compiler specificity, that's bad.

This gets to the recruitment issue, which is one of the most important

the that that problems I see numpy facing. I personally have contributed a lot of code to NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was the biggest negative point when I considered whether it was worth contributing to the project. I suspect there are many programmers out there who are skilled in low-level, high-performance C++, who would be willing to contribute, but don't want to code in C. This is a really important issue, because accessibility is the essential reason why I am so strongly against it. It trumps by far all my technical reservations. Maybe this is just a coincidence that you use this word but "recrutment" is not what is happening in an open community, and finding people who want to make close to the metal, high performance is very different from making the codebase more accessible. I would argue that they are actually contradictory, but I would concede this is slightly more subjective claim. To be used approprietly, c++ requires much more discipline than c. Doing this for a community-based project is very hard. Doing this with people who often are scientist first and programmers second even harder. I have been contributing to numpy for quite a few years and I have seen/been told many times that numpy c code was hard to dive in, people did not know where to start, etc... I cannot remember a case where people said that C itself was the reason: other contributors can correct me if I am wrong, but I believe you are the first person who considered c/c++ to be a fundamental reason. I have no reason to believe you would not be able to produce better code in c++. But I believe you are in a minority within the people I would like to see contributing to numpy. David

...

I believe NumPy should be trying to find people who want to make high

...

...
I would much rather see development in the direction of sticking with C where direct low-level control and speed are needed, and using cython to gain higher level language benefits where appropriate. Of course, that brings in the danger of reliance on another complex tool, cython. If that danger is considered excessive, then just stick with C.

There are many small benefits C++ can offer, even if numpy chooses only

to use a tiny subset of the C++ language. For example, RAII can be used to reliably eliminate PyObject reference leaks.

...

Consider a regression like this: http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html

Fixing this in C would require switching all the relevant usages of

...

Cheers, Mark

...
Eric

...
Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Sturla Molden

3:20 a.m.

...

.

To be used approprietly, c++ requires much more discipline than c. Doing this for a community-based project is very hard. Doing this with people who often are scientist first and programmers second even harder.

This is very important. I am not sure it is doable. Bad C++ is far worse than bad C. We would have to invent strict coding style and idiom rules, and enforce them like a totalitarian government. That begs the question if something else than C or C++ should be used. D C# Fortran 2003 Go RPython Sturla

Sturla Molden

3:54 a.m.

Den 17. feb. 2012 kl. 18:52 skrev Eric Firing <efiring@hawaii.edu>:.

...

It's true that matplotlib relies heavily on C++, both via the Agg library and in its own extension code. Personally, I don't like this; I think it raises the barrier to contributing. C++ is an order of magnitude more complicated than C--harder to read, and much harder to write, unless one is a true expert.

This is not true. C++ can be much easier, particularly for those who already know Python. The problem: C++ textbooks teach C++ as a subset of C. Writing C in C++ just adds the complexity of C++ on top of C, for no good reason. I can write FORTRAN in any language, it does not mean it is a good idea. We would have to start by teaching people to write good C++. E.g., always use the STL like Python built-in types if possible. Dynamic memory should be std::vector, not new or malloc. Pointers should be replaced with references. We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge. Sturla

Jason Grout

4:01 a.m.

On 2/17/12 9:54 PM, Sturla Molden wrote:

...

We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge.

I personally would love such a thing. It's been a while since I did anything nontrivial on my own in C++. Jason

Sturla Molden

4:30 a.m.

Den 18. feb. 2012 kl. 05:01 skrev Jason Grout <jason-sage@creativetrax.com>:

...

On 2/17/12 9:54 PM, Sturla Molden wrote:

...
We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge.

I personally would love such a thing. It's been a while since I did anything nontrivial on my own in C++.

One example: How do we code multiple return values? In Python: - Return a tuple. In C: - Use pointers (evilness) In C++: - Return a std::tuple, as you would in Python. - Use references, as you would in Fortran or Pascal. - Use pointers, as you would in C. C++ textbooks always pick the last... I would show the first and the second method, and perhaps intentionally forget the last. Sturla

Christopher Jordan-Squire

6:18 a.m.

On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden <sturla@molden.no> wrote:

...

Den 18. feb. 2012 kl. 05:01 skrev Jason Grout <jason-sage@creativetrax.com>:

...
On 2/17/12 9:54 PM, Sturla Molden wrote:

...
We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge.

I personally would love such a thing. It's been a while since I did anything nontrivial on my own in C++.

One example: How do we code multiple return values?

In Python: - Return a tuple.

In C: - Use pointers (evilness)

In C++: - Return a std::tuple, as you would in Python. - Use references, as you would in Fortran or Pascal. - Use pointers, as you would in C.

C++ textbooks always pick the last...

I would show the first and the second method, and perhaps intentionally forget the last.

Sturla

I can add my own 2 cents about cython vs. C vs. C++, based on summer coding experiences. I was an intern at Enthought, sharing an office with Mark W. (Which was a treat. I recommend you all quit your day jobs and haunt whatever office Mark is inhabiting.) I was trying to optimize some code and that lead to experimenting with both cython and C. Dealing with the C internals of numpy was frustrating. Since C doesn't have templating but numpy kinda needs it, instead python scripts go over and manually perform templating. Not the most obvious thing. There were other issues in the background--including C doesn't allow for abstraction (i.e. easy to read), lots of pointer-fu is required, and the C API is lightly documented and already plenty difficult. On the flip side, cython looked pretty...but I didn't get the performance gains I wanted, and had to spend a lot of time figuring out if it was cython, needing to add types, buggy support for numpy, or actually the algorithm. The C files generated by cython were enormous and difficult to read. They really weren't meant for human consumption. As Sturla has said, regardless of the quality of the current product, it isn't stable. And even if it looks friendly there's magic going on under the hood. Magic means it's hard to diagnose and fix problems. At least one very smart person has told me they find cython most useful for wrapping C/C++ libraries and exposing them to python, which is a far cry from library writing. (Of course Wes McKinney, a cython evangelist, uses it all over his pandas library.) In comparison, there are a number of high quality, performant, open-source C++ based array libraries out there with very friendly API's. Things like eigen (http://eigen.tuxfamily.org/index.php?title=Main_Page) and Armadillo (http://arma.sourceforge.net/). They seem to have plenty of users and more devs than numpy. On the broader topic of recruitment...sure, cython has a lower barrier to entry than C++. But there are many, many more C++ developers and resources out there than cython resources. And it likely will stay that way for quite some time. -Chris

...

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Matthew Brett

7:31 a.m.

Hi, On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote:

...

On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden <sturla@molden.no> wrote:

...
Den 18. feb. 2012 kl. 05:01 skrev Jason Grout <jason-sage@creativetrax.com>:

...
On 2/17/12 9:54 PM, Sturla Molden wrote:

...
We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge.

I personally would love such a thing. It's been a while since I did anything nontrivial on my own in C++.

One example: How do we code multiple return values?

In Python: - Return a tuple.

In C: - Use pointers (evilness)

In C++: - Return a std::tuple, as you would in Python. - Use references, as you would in Fortran or Pascal. - Use pointers, as you would in C.

C++ textbooks always pick the last...

I would show the first and the second method, and perhaps intentionally forget the last.

Sturla

...

On the flip side, cython looked pretty...but I didn't get the performance gains I wanted, and had to spend a lot of time figuring out if it was cython, needing to add types, buggy support for numpy, or actually the algorithm.

At the time, was the numpy support buggy? I personally haven't had many problems with Cython and numpy.

...

The C files generated by cython were enormous and difficult to read. They really weren't meant for human consumption.

Yes, it takes some practice to get used to what Cython will do, and how to optimize the output.

...

As Sturla has said, regardless of the quality of the current product, it isn't stable.

I've personally found it more or less rock solid. Could you say what you mean by "it isn't stable"? Best, Matthew

Christopher Jordan-Squire

8:18 a.m.

On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...

Hi,

On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote:

...
On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden <sturla@molden.no> wrote:

...
Den 18. feb. 2012 kl. 05:01 skrev Jason Grout <jason-sage@creativetrax.com>:

...
On 2/17/12 9:54 PM, Sturla Molden wrote:

...
We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge.

I personally would love such a thing. It's been a while since I did anything nontrivial on my own in C++.

One example: How do we code multiple return values?

In Python: - Return a tuple.

In C: - Use pointers (evilness)

In C++: - Return a std::tuple, as you would in Python. - Use references, as you would in Fortran or Pascal. - Use pointers, as you would in C.

C++ textbooks always pick the last...

I would show the first and the second method, and perhaps intentionally forget the last.

Sturla

...
On the flip side, cython looked pretty...but I didn't get the performance gains I wanted, and had to spend a lot of time figuring out if it was cython, needing to add types, buggy support for numpy, or actually the algorithm.

At the time, was the numpy support buggy? I personally haven't had many problems with Cython and numpy.

It's not that the support WAS buggy, it's that it wasn't clear to me what was going on and where my performance bottleneck was. Even after microbenchmarking with ipython, using timeit and prun, and using the cython code visualization tool. Ultimately I don't think it was cython, so perhaps my comment was a bit unfair. But it was unfortunately difficult to verify that. Of course, as you say, diagnosing and solving such issues would become easier to resolve with more cython experience.

...

...
The C files generated by cython were enormous and difficult to read. They really weren't meant for human consumption.

Yes, it takes some practice to get used to what Cython will do, and how to optimize the output.

...
As Sturla has said, regardless of the quality of the current product, it isn't stable.

I've personally found it more or less rock solid. Could you say what you mean by "it isn't stable"?

I just meant what Sturla said, nothing more: "Cython is still 0.16, it is still unfinished. We cannot base NumPy on an unfinished compiler." -Chris

...

Best,

Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Sturla Molden

3:56 p.m.

...

I just meant what Sturla said, nothing more:

"Cython is still 0.16, it is still unfinished. We cannot base NumPy on an unfinished compiler."

Albeit Cython has a special syntax for NumPy arrays, we are talking about implementation of NumPy, not using it. I would not consider Cython for this before e.g. memoryviews have been stable for a long period. The subset of Cython we could safely use is not better than plain C. Sturla

Sturla Molden

4:24 p.m.

...

Albeit Cython has a special syntax for NumPy arrays, we are talking about implementation of NumPy, not using it. I would not consider Cython for this before e.g. memoryviews have been stable for a long period. The subset of Cython we could safely use is not better than plain C.

If we want something more readable than C or C++, that looks like Python, Cython is not the only option. Another is RPython, which is the subset of Python used for PyPy. It can be translated to various languages, including C, Java and .NET. Since RPython is valid Python, it an also be debugged with CPython. Code translated by RPython is extremely fast (often "faster than C" due to human limitation in C coding) and RPython is a stabile compiler. http://doc.pypy.org/en/latest/coding-guide.html#id1 http://doc.pypy.org/en/latest/translation.html http://olliwang.com/2009/12/20/aes-implementation-in-rpython/ Sturla

Pauli Virtanen

5:27 p.m.

18.02.2012 17:24, Sturla Molden kirjoitti: [clip]

...

If we want something more readable than C or C++, that looks like Python, Cython is not the only option. Another is RPython, which is the subset [clip]

Except that AFAIK integrating it with CPython efficiently or providing C APIs with it is not that much fun. -- Pauli Virtanen

Matthew Brett

7:21 p.m.

Hi. On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote:

...

On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote:

...
On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden <sturla@molden.no> wrote:

...
Den 18. feb. 2012 kl. 05:01 skrev Jason Grout <jason-sage@creativetrax.com>:

...
On 2/17/12 9:54 PM, Sturla Molden wrote:

...
We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge.

I personally would love such a thing. It's been a while since I did anything nontrivial on my own in C++.

One example: How do we code multiple return values?

In Python: - Return a tuple.

In C: - Use pointers (evilness)

In C++: - Return a std::tuple, as you would in Python. - Use references, as you would in Fortran or Pascal. - Use pointers, as you would in C.

C++ textbooks always pick the last...

I would show the first and the second method, and perhaps intentionally forget the last.

Sturla

...
On the flip side, cython looked pretty...but I didn't get the performance gains I wanted, and had to spend a lot of time figuring out if it was cython, needing to add types, buggy support for numpy, or actually the algorithm.

At the time, was the numpy support buggy? I personally haven't had many problems with Cython and numpy.

It's not that the support WAS buggy, it's that it wasn't clear to me what was going on and where my performance bottleneck was. Even after microbenchmarking with ipython, using timeit and prun, and using the cython code visualization tool. Ultimately I don't think it was cython, so perhaps my comment was a bit unfair. But it was unfortunately difficult to verify that. Of course, as you say, diagnosing and solving such issues would become easier to resolve with more cython experience.

...
...
The C files generated by cython were enormous and difficult to read. They really weren't meant for human consumption.

Yes, it takes some practice to get used to what Cython will do, and how to optimize the output.

...
As Sturla has said, regardless of the quality of the current product, it isn't stable.

I've personally found it more or less rock solid. Could you say what you mean by "it isn't stable"?

I just meant what Sturla said, nothing more:

"Cython is still 0.16, it is still unfinished. We cannot base NumPy on an unfinished compiler."

Y'all mean, it has a zero at the beginning of the version number and it is still adding new features? Yes, that is correct, but it seems more reasonable to me to phrase that as 'active development' rather than 'unstable', because they take considerable care to be backwards compatible, have a large automated Cython test suite, and a major stress-tester in the Sage test suite. Best, Matthew

Charles R Harris

8:35 p.m.

On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett <matthew.brett@gmail.com>wrote:

...

Hi.

On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote:

...
On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote:

...
On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden <sturla@molden.no> wrote:

...
Den 18. feb. 2012 kl. 05:01 skrev Jason Grout <

jason-sage@creativetrax.com>:

...
...
On 2/17/12 9:54 PM, Sturla Molden wrote: > We would have to write a C++ programming tutorial that is based on

Pyton knowledge instead of C knowledge.

...
...
I personally would love such a thing. It's been a while since I did anything nontrivial on my own in C++.

One example: How do we code multiple return values?

In Python: - Return a tuple.

In C: - Use pointers (evilness)

In C++: - Return a std::tuple, as you would in Python. - Use references, as you would in Fortran or Pascal. - Use pointers, as you would in C.

C++ textbooks always pick the last...

I would show the first and the second method, and perhaps intentionally forget the last.

Sturla

...
On the flip side, cython looked pretty...but I didn't get the performance gains I wanted, and had to spend a lot of time figuring out if it was cython, needing to add types, buggy support for numpy, or actually the algorithm.

At the time, was the numpy support buggy? I personally haven't had many problems with Cython and numpy.

It's not that the support WAS buggy, it's that it wasn't clear to me what was going on and where my performance bottleneck was. Even after microbenchmarking with ipython, using timeit and prun, and using the cython code visualization tool. Ultimately I don't think it was cython, so perhaps my comment was a bit unfair. But it was unfortunately difficult to verify that. Of course, as you say, diagnosing and solving such issues would become easier to resolve with more cython experience.

...
...
The C files generated by cython were enormous and difficult to read. They really weren't meant for human consumption.

Yes, it takes some practice to get used to what Cython will do, and how to optimize the output.

...
As Sturla has said, regardless of the quality of the current product, it isn't stable.

I've personally found it more or less rock solid. Could you say what you mean by "it isn't stable"?

I just meant what Sturla said, nothing more:

"Cython is still 0.16, it is still unfinished. We cannot base NumPy on an unfinished compiler."

Y'all mean, it has a zero at the beginning of the version number and it is still adding new features? Yes, that is correct, but it seems more reasonable to me to phrase that as 'active development' rather than 'unstable', because they take considerable care to be backwards compatible, have a large automated Cython test suite, and a major stress-tester in the Sage test suite.

Matthew, No one in their right mind would build a large performance library using Cython, it just isn't the right tool. For what it was designed for - wrapping existing c code or writing small and simple things close to Python - it does very well, but it was never designed for making core C/C++ libraries and in that role it just gets in the way. Chuck

Matthew Brett

8:39 p.m.

Hi, On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi.

On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote:

...
On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote:

...
On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden <sturla@molden.no> wrote:

...
Den 18. feb. 2012 kl. 05:01 skrev Jason Grout <jason-sage@creativetrax.com>:

> On 2/17/12 9:54 PM, Sturla Molden wrote: >> We would have to write a C++ programming tutorial that is based on >> Pyton knowledge instead of C knowledge. > > I personally would love such a thing. It's been a while since I did > anything nontrivial on my own in C++. >

One example: How do we code multiple return values?

In Python: - Return a tuple.

In C: - Use pointers (evilness)

In C++: - Return a std::tuple, as you would in Python. - Use references, as you would in Fortran or Pascal. - Use pointers, as you would in C.

C++ textbooks always pick the last...

I would show the first and the second method, and perhaps intentionally forget the last.

Sturla

...
On the flip side, cython looked pretty...but I didn't get the performance gains I wanted, and had to spend a lot of time figuring out if it was cython, needing to add types, buggy support for numpy, or actually the algorithm.

At the time, was the numpy support buggy? I personally haven't had many problems with Cython and numpy.

It's not that the support WAS buggy, it's that it wasn't clear to me what was going on and where my performance bottleneck was. Even after microbenchmarking with ipython, using timeit and prun, and using the cython code visualization tool. Ultimately I don't think it was cython, so perhaps my comment was a bit unfair. But it was unfortunately difficult to verify that. Of course, as you say, diagnosing and solving such issues would become easier to resolve with more cython experience.

...
...
The C files generated by cython were enormous and difficult to read. They really weren't meant for human consumption.

Yes, it takes some practice to get used to what Cython will do, and how to optimize the output.

...
As Sturla has said, regardless of the quality of the current product, it isn't stable.

I've personally found it more or less rock solid. Could you say what you mean by "it isn't stable"?

I just meant what Sturla said, nothing more:

"Cython is still 0.16, it is still unfinished. We cannot base NumPy on an unfinished compiler."

Y'all mean, it has a zero at the beginning of the version number and it is still adding new features? Yes, that is correct, but it seems more reasonable to me to phrase that as 'active development' rather than 'unstable', because they take considerable care to be backwards compatible, have a large automated Cython test suite, and a major stress-tester in the Sage test suite.

Matthew,

No one in their right mind would build a large performance library using Cython, it just isn't the right tool. For what it was designed for - wrapping existing c code or writing small and simple things close to Python - it does very well, but it was never designed for making core C/C++ libraries and in that role it just gets in the way.

I believe the proposal is to refactor the lowest levels in pure C and move the some or most of the library superstructure to Cython. Best, Matthew

Charles R Harris

8:45 p.m.

On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett <matthew.brett@gmail.com>wrote:

...

Hi,

On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett <matthew.brett@gmail.com

wrote:

...
Hi.

On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote:

...
On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote:

...
On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden <sturla@molden.no> wrote: > > > Den 18. feb. 2012 kl. 05:01 skrev Jason Grout > <jason-sage@creativetrax.com>: > >> On 2/17/12 9:54 PM, Sturla Molden wrote: >>> We would have to write a C++ programming tutorial that is based

on

...
...
...
...
...
>>> Pyton knowledge instead of C knowledge. >> >> I personally would love such a thing. It's been a while since I did >> anything nontrivial on my own in C++. >> > > One example: How do we code multiple return values? > > In Python: > - Return a tuple. > > In C: > - Use pointers (evilness) > > In C++: > - Return a std::tuple, as you would in Python. > - Use references, as you would in Fortran or Pascal. > - Use pointers, as you would in C. > > C++ textbooks always pick the last... > > I would show the first and the second method, and perhaps > intentionally forget the last. > > Sturla >

...
On the flip side, cython looked pretty...but I didn't get the performance gains I wanted, and had to spend a lot of time figuring out if it was cython, needing to add types, buggy support for numpy, or actually the algorithm.

At the time, was the numpy support buggy? I personally haven't had many problems with Cython and numpy.

It's not that the support WAS buggy, it's that it wasn't clear to me what was going on and where my performance bottleneck was. Even after microbenchmarking with ipython, using timeit and prun, and using the cython code visualization tool. Ultimately I don't think it was cython, so perhaps my comment was a bit unfair. But it was unfortunately difficult to verify that. Of course, as you say, diagnosing and solving such issues would become easier to resolve with more cython experience.

...
...
The C files generated by cython were enormous and difficult to read. They really weren't meant for human consumption.

Yes, it takes some practice to get used to what Cython will do, and how to optimize the output.

...
As Sturla has said, regardless of the quality of the current product, it isn't stable.

I've personally found it more or less rock solid. Could you say what you mean by "it isn't stable"?

I just meant what Sturla said, nothing more:

"Cython is still 0.16, it is still unfinished. We cannot base NumPy on an unfinished compiler."

Y'all mean, it has a zero at the beginning of the version number and it is still adding new features? Yes, that is correct, but it seems more reasonable to me to phrase that as 'active development' rather than 'unstable', because they take considerable care to be backwards compatible, have a large automated Cython test suite, and a major stress-tester in the Sage test suite.

Matthew,

No one in their right mind would build a large performance library using Cython, it just isn't the right tool. For what it was designed for - wrapping existing c code or writing small and simple things close to Python - it does very well, but it was never designed for making core C/C++ libraries and in that role it just gets in the way.

I believe the proposal is to refactor the lowest levels in pure C and move the some or most of the library superstructure to Cython.

Go for it. Chuck

Matthew Brett

9:02 p.m.

Hi, On Sat, Feb 18, 2012 at 12:45 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi.

On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote:

...
On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote: > On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden <sturla@molden.no> > wrote: >> >> >> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout >> <jason-sage@creativetrax.com>: >> >>> On 2/17/12 9:54 PM, Sturla Molden wrote: >>>> We would have to write a C++ programming tutorial that is based >>>> on >>>> Pyton knowledge instead of C knowledge. >>> >>> I personally would love such a thing. It's been a while since I >>> did >>> anything nontrivial on my own in C++. >>> >> >> One example: How do we code multiple return values? >> >> In Python: >> - Return a tuple. >> >> In C: >> - Use pointers (evilness) >> >> In C++: >> - Return a std::tuple, as you would in Python. >> - Use references, as you would in Fortran or Pascal. >> - Use pointers, as you would in C. >> >> C++ textbooks always pick the last... >> >> I would show the first and the second method, and perhaps >> intentionally forget the last. >> >> Sturla >>

> On the flip side, cython looked pretty...but I didn't get the > performance gains I wanted, and had to spend a lot of time figuring > out if it was cython, needing to add types, buggy support for > numpy, > or actually the algorithm.

At the time, was the numpy support buggy? I personally haven't had many problems with Cython and numpy.

It's not that the support WAS buggy, it's that it wasn't clear to me what was going on and where my performance bottleneck was. Even after microbenchmarking with ipython, using timeit and prun, and using the cython code visualization tool. Ultimately I don't think it was cython, so perhaps my comment was a bit unfair. But it was unfortunately difficult to verify that. Of course, as you say, diagnosing and solving such issues would become easier to resolve with more cython experience.

...
> The C files generated by cython were > enormous and difficult to read. They really weren't meant for human > consumption.

Yes, it takes some practice to get used to what Cython will do, and how to optimize the output.

> As Sturla has said, regardless of the quality of the > current product, it isn't stable.

I've personally found it more or less rock solid. Could you say what you mean by "it isn't stable"?

I just meant what Sturla said, nothing more:

"Cython is still 0.16, it is still unfinished. We cannot base NumPy on an unfinished compiler."

Y'all mean, it has a zero at the beginning of the version number and it is still adding new features? Yes, that is correct, but it seems more reasonable to me to phrase that as 'active development' rather than 'unstable', because they take considerable care to be backwards compatible, have a large automated Cython test suite, and a major stress-tester in the Sage test suite.

Matthew,

No one in their right mind would build a large performance library using Cython, it just isn't the right tool. For what it was designed for - wrapping existing c code or writing small and simple things close to Python - it does very well, but it was never designed for making core C/C++ libraries and in that role it just gets in the way.

I believe the proposal is to refactor the lowest levels in pure C and move the some or most of the library superstructure to Cython.

Go for it.

My goal was to try and contribute to substantive discussion of the benefits / costs of the various approaches. It does require a realistic assessment of what is being proposed. It may be, that discussion is not fruitful. But then we all lose, I think, Best, Matthew

David Cournapeau

9:17 p.m.

On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi.

On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote:

...
On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote: > On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden <sturla@molden.no> > wrote: >> >> >> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout >> <jason-sage@creativetrax.com>: >> >>> On 2/17/12 9:54 PM, Sturla Molden wrote: >>>> We would have to write a C++ programming tutorial that is based >>>> on >>>> Pyton knowledge instead of C knowledge. >>> >>> I personally would love such a thing. It's been a while since I >>> did >>> anything nontrivial on my own in C++. >>> >> >> One example: How do we code multiple return values? >> >> In Python: >> - Return a tuple. >> >> In C: >> - Use pointers (evilness) >> >> In C++: >> - Return a std::tuple, as you would in Python. >> - Use references, as you would in Fortran or Pascal. >> - Use pointers, as you would in C. >> >> C++ textbooks always pick the last... >> >> I would show the first and the second method, and perhaps >> intentionally forget the last. >> >> Sturla >>

> On the flip side, cython looked pretty...but I didn't get the > performance gains I wanted, and had to spend a lot of time figuring > out if it was cython, needing to add types, buggy support for > numpy, > or actually the algorithm.

At the time, was the numpy support buggy? I personally haven't had many problems with Cython and numpy.

It's not that the support WAS buggy, it's that it wasn't clear to me what was going on and where my performance bottleneck was. Even after microbenchmarking with ipython, using timeit and prun, and using the cython code visualization tool. Ultimately I don't think it was cython, so perhaps my comment was a bit unfair. But it was unfortunately difficult to verify that. Of course, as you say, diagnosing and solving such issues would become easier to resolve with more cython experience.

...
> The C files generated by cython were > enormous and difficult to read. They really weren't meant for human > consumption.

Yes, it takes some practice to get used to what Cython will do, and how to optimize the output.

> As Sturla has said, regardless of the quality of the > current product, it isn't stable.

I've personally found it more or less rock solid. Could you say what you mean by "it isn't stable"?

I just meant what Sturla said, nothing more:

"Cython is still 0.16, it is still unfinished. We cannot base NumPy on an unfinished compiler."

Y'all mean, it has a zero at the beginning of the version number and it is still adding new features? Yes, that is correct, but it seems more reasonable to me to phrase that as 'active development' rather than 'unstable', because they take considerable care to be backwards compatible, have a large automated Cython test suite, and a major stress-tester in the Sage test suite.

Matthew,

No one in their right mind would build a large performance library using Cython, it just isn't the right tool. For what it was designed for - wrapping existing c code or writing small and simple things close to Python - it does very well, but it was never designed for making core C/C++ libraries and in that role it just gets in the way.

I believe the proposal is to refactor the lowest levels in pure C and move the some or most of the library superstructure to Cython.

Go for it.

The proposal of moving to a core C + cython has been discussed by multiple contributors. It is certainly a valid proposal. *I* have worked on this (npymath, separate compilation), although certainly not as much as I would have wanted to. I think much can be done in that vein. Using the "shut up if you don't do it" is a straw man (and uncalled for). Moving away from subjective considerations on how to do things, is there a way that one can see the pros/cons of each approach. For the C++ approach, I would really like to see which C++ is being considered. I was. Once the choice is done, going back would be quite hard, so I can't see how we could go for it just because some people prefer it without very clear technical arguments. Saying that C++ is more readable, or scale better are frankly very weak and too subjective to be convincing. There are too many projects way more complex than numpy that have been done in either C or C++. David

Charles R Harris

9:40 p.m.

On Sat, Feb 18, 2012 at 2:17 PM, David Cournapeau <cournape@gmail.com>wrote:

...

On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi.

On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote:

...
On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett <matthew.brett@gmail.com> wrote: > Hi, > > On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire > <cjordan1@uw.edu> wrote: >> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden <sturla@molden.no

...
...
...
...
>> wrote: >>> >>> >>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout >>> <jason-sage@creativetrax.com>: >>> >>>> On 2/17/12 9:54 PM, Sturla Molden wrote: >>>>> We would have to write a C++ programming tutorial that is

...
...
...
...
...
>>>>> on >>>>> Pyton knowledge instead of C knowledge. >>>> >>>> I personally would love such a thing. It's been a while since I >>>> did >>>> anything nontrivial on my own in C++. >>>> >>> >>> One example: How do we code multiple return values? >>> >>> In Python: >>> - Return a tuple. >>> >>> In C: >>> - Use pointers (evilness) >>> >>> In C++: >>> - Return a std::tuple, as you would in Python. >>> - Use references, as you would in Fortran or Pascal. >>> - Use pointers, as you would in C. >>> >>> C++ textbooks always pick the last... >>> >>> I would show the first and the second method, and perhaps >>> intentionally forget the last. >>> >>> Sturla >>> > >> On the flip side, cython looked pretty...but I didn't get the >> performance gains I wanted, and had to spend a lot of time figuring >> out if it was cython, needing to add types, buggy support for >> numpy, >> or actually the algorithm. > > At the time, was the numpy support buggy? I personally haven't had > many problems with Cython and numpy. >

It's not that the support WAS buggy, it's that it wasn't clear to me what was going on and where my performance bottleneck was. Even after microbenchmarking with ipython, using timeit and prun, and using

based the

...
...
...
...
...
cython code visualization tool. Ultimately I don't think it was cython, so perhaps my comment was a bit unfair. But it was unfortunately difficult to verify that. Of course, as you say, diagnosing and solving such issues would become easier to resolve with more cython experience.

>> The C files generated by cython were >> enormous and difficult to read. They really weren't meant for human >> consumption. > > Yes, it takes some practice to get used to what Cython will do, and > how to optimize the output. > >> As Sturla has said, regardless of the quality of the >> current product, it isn't stable. > > I've personally found it more or less rock solid. Could you say > what > you mean by "it isn't stable"? >

I just meant what Sturla said, nothing more:

"Cython is still 0.16, it is still unfinished. We cannot base NumPy on an unfinished compiler."

Y'all mean, it has a zero at the beginning of the version number and it is still adding new features? Yes, that is correct, but it seems more reasonable to me to phrase that as 'active development' rather than 'unstable', because they take considerable care to be backwards compatible, have a large automated Cython test suite, and a major stress-tester in the Sage test suite.

Matthew,

No one in their right mind would build a large performance library using Cython, it just isn't the right tool. For what it was designed for - wrapping existing c code or writing small and simple things close to Python - it does very well, but it was never designed for making core C/C++ libraries and in that role it just gets in the way.

I believe the proposal is to refactor the lowest levels in pure C and move the some or most of the library superstructure to Cython.

Go for it.

The proposal of moving to a core C + cython has been discussed by multiple contributors. It is certainly a valid proposal. *I* have worked on this (npymath, separate compilation), although certainly not as much as I would have wanted to. I think much can be done in that vein. Using the "shut up if you don't do it" is a straw man (and uncalled for).

OK, I was annoyed.

...

Moving away from subjective considerations on how to do things, is there a way that one can see the pros/cons of each approach. For the C++ approach, I would really like to see which C++ is being considered. I was. Once the choice is done, going back would be quite hard, so I can't see how we could go for it just because some people prefer it without very clear technical arguments.

Well, we already have code obfuscation (DOUBLE_your_pleasure, FLOAT_your_boat), so we might as well let the compiler handle it. Having classes, lists, and iterators would be a big plus. The current code is really a kludge trying to make C look like C++. Not inherently bad, the original C++ (C with classes), was a preprocessor that generated C code. I really think the best arguments against C++ is portability and I think that needs to be evaluated. But in many ways it supports the sort of things the Numpy C code does in a natural way. I'll let Mark expand on the virtues if he is so inclined, but C++ code offers a higher level of abstraction that is very useful and allows good reuse of properly constructed tools. The emphasis here on 'properly'. There is certainly bad C++ code out there.

...

Saying that C++ is more readable, or scale better are frankly very weak and too subjective to be convincing. There are too many projects way more complex than numpy that have been done in either C or C++.

To some extent that is experience based. And to another extent, it is a question of what language people like to develop in. I myself would prefer C++. The main thing I really don't like about C++ is IO. But Boost offers some relief for that. I expect we will use small bits of Boost that can be excised without problems from the bigger library. I don't think we can count on C++11 at this point, so we would probably be conservative in our choice of features. Jim Hugunin was a keynote speaker at one of the scipy conventions. At dinner he said that if he was to do it again he would use managed code ;) I don't propose we do that, but tools do advance. Chuck

Matthew Brett

9:51 p.m.

On Sat, Feb 18, 2012 at 1:40 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

On Sat, Feb 18, 2012 at 2:17 PM, David Cournapeau <cournape@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi.

On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote: > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett > <matthew.brett@gmail.com> wrote: >> Hi, >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire >> <cjordan1@uw.edu> wrote: >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden >>> <sturla@molden.no> >>> wrote: >>>> >>>> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout >>>> <jason-sage@creativetrax.com>: >>>> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: >>>>>> We would have to write a C++ programming tutorial that is >>>>>> based >>>>>> on >>>>>> Pyton knowledge instead of C knowledge. >>>>> >>>>> I personally would love such a thing. It's been a while since >>>>> I >>>>> did >>>>> anything nontrivial on my own in C++. >>>>> >>>> >>>> One example: How do we code multiple return values? >>>> >>>> In Python: >>>> - Return a tuple. >>>> >>>> In C: >>>> - Use pointers (evilness) >>>> >>>> In C++: >>>> - Return a std::tuple, as you would in Python. >>>> - Use references, as you would in Fortran or Pascal. >>>> - Use pointers, as you would in C. >>>> >>>> C++ textbooks always pick the last... >>>> >>>> I would show the first and the second method, and perhaps >>>> intentionally forget the last. >>>> >>>> Sturla >>>> >> >>> On the flip side, cython looked pretty...but I didn't get the >>> performance gains I wanted, and had to spend a lot of time >>> figuring >>> out if it was cython, needing to add types, buggy support for >>> numpy, >>> or actually the algorithm. >> >> At the time, was the numpy support buggy? I personally haven't >> had >> many problems with Cython and numpy. >> > > It's not that the support WAS buggy, it's that it wasn't clear to > me > what was going on and where my performance bottleneck was. Even > after > microbenchmarking with ipython, using timeit and prun, and using > the > cython code visualization tool. Ultimately I don't think it was > cython, so perhaps my comment was a bit unfair. But it was > unfortunately difficult to verify that. Of course, as you say, > diagnosing and solving such issues would become easier to resolve > with > more cython experience. > >>> The C files generated by cython were >>> enormous and difficult to read. They really weren't meant for >>> human >>> consumption. >> >> Yes, it takes some practice to get used to what Cython will do, >> and >> how to optimize the output. >> >>> As Sturla has said, regardless of the quality of the >>> current product, it isn't stable. >> >> I've personally found it more or less rock solid. Could you say >> what >> you mean by "it isn't stable"? >> > > I just meant what Sturla said, nothing more: > > "Cython is still 0.16, it is still unfinished. We cannot base > NumPy > on > an unfinished compiler."

Y'all mean, it has a zero at the beginning of the version number and it is still adding new features? Yes, that is correct, but it seems more reasonable to me to phrase that as 'active development' rather than 'unstable', because they take considerable care to be backwards compatible, have a large automated Cython test suite, and a major stress-tester in the Sage test suite.

Matthew,

No one in their right mind would build a large performance library using Cython, it just isn't the right tool. For what it was designed for - wrapping existing c code or writing small and simple things close to Python - it does very well, but it was never designed for making core C/C++ libraries and in that role it just gets in the way.

I believe the proposal is to refactor the lowest levels in pure C and move the some or most of the library superstructure to Cython.

Go for it.

The proposal of moving to a core C + cython has been discussed by multiple contributors. It is certainly a valid proposal. *I* have worked on this (npymath, separate compilation), although certainly not as much as I would have wanted to. I think much can be done in that vein. Using the "shut up if you don't do it" is a straw man (and uncalled for).

OK, I was annoyed.

By what? Best, Matthew

Charles R Harris

9:55 p.m.

On Sat, Feb 18, 2012 at 2:51 PM, Matthew Brett <matthew.brett@gmail.com>wrote:

...

On Sat, Feb 18, 2012 at 1:40 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 2:17 PM, David Cournapeau <cournape@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett <

...
...
...
wrote:

...
Hi,

On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett <matthew.brett@gmail.com> wrote: > > Hi. > > On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire > <cjordan1@uw.edu> wrote: > > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett > > <matthew.brett@gmail.com> wrote: > >> Hi, > >> > >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire > >> <cjordan1@uw.edu> wrote: > >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden > >>> <sturla@molden.no> > >>> wrote: > >>>> > >>>> > >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout > >>>> <jason-sage@creativetrax.com>: > >>>> > >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: > >>>>>> We would have to write a C++ programming tutorial that is > >>>>>> based > >>>>>> on > >>>>>> Pyton knowledge instead of C knowledge. > >>>>> > >>>>> I personally would love such a thing. It's been a while

since

...
...
> >>>>> I > >>>>> did > >>>>> anything nontrivial on my own in C++. > >>>>> > >>>> > >>>> One example: How do we code multiple return values? > >>>> > >>>> In Python: > >>>> - Return a tuple. > >>>> > >>>> In C: > >>>> - Use pointers (evilness) > >>>> > >>>> In C++: > >>>> - Return a std::tuple, as you would in Python. > >>>> - Use references, as you would in Fortran or Pascal. > >>>> - Use pointers, as you would in C. > >>>> > >>>> C++ textbooks always pick the last... > >>>> > >>>> I would show the first and the second method, and perhaps > >>>> intentionally forget the last. > >>>> > >>>> Sturla > >>>> > >> > >>> On the flip side, cython looked pretty...but I didn't get the > >>> performance gains I wanted, and had to spend a lot of time > >>> figuring > >>> out if it was cython, needing to add types, buggy support for > >>> numpy, > >>> or actually the algorithm. > >> > >> At the time, was the numpy support buggy? I personally haven't > >> had > >> many problems with Cython and numpy. > >> > > > > It's not that the support WAS buggy, it's that it wasn't clear to > > me > > what was going on and where my performance bottleneck was. Even > > after > > microbenchmarking with ipython, using timeit and prun, and using > > the > > cython code visualization tool. Ultimately I don't think it was > > cython, so perhaps my comment was a bit unfair. But it was > > unfortunately difficult to verify that. Of course, as you say, > > diagnosing and solving such issues would become easier to resolve > > with > > more cython experience. > > > >>> The C files generated by cython were > >>> enormous and difficult to read. They really weren't meant for > >>> human > >>> consumption. > >> > >> Yes, it takes some practice to get used to what Cython will do, > >> and > >> how to optimize the output. > >> > >>> As Sturla has said, regardless of the quality of the > >>> current product, it isn't stable. > >> > >> I've personally found it more or less rock solid. Could you say > >> what > >> you mean by "it isn't stable"? > >> > > > > I just meant what Sturla said, nothing more: > > > > "Cython is still 0.16, it is still unfinished. We cannot base > > NumPy > > on > > an unfinished compiler." > > Y'all mean, it has a zero at the beginning of the version number and > it is still adding new features? Yes, that is correct, but it seems > more reasonable to me to phrase that as 'active development' rather > than 'unstable', because they take considerable care to be backwards > compatible, have a large automated Cython test suite, and a major > stress-tester in the Sage test suite. >

Matthew,

No one in their right mind would build a large performance library using Cython, it just isn't the right tool. For what it was designed for

matthew.brett@gmail.com> -

...
...
...
...
...
wrapping existing c code or writing small and simple things close to Python - it does very well, but it was never designed for making core C/C++ libraries and in that role it just gets in the way.

I believe the proposal is to refactor the lowest levels in pure C and move the some or most of the library superstructure to Cython.

Go for it.

The proposal of moving to a core C + cython has been discussed by multiple contributors. It is certainly a valid proposal. *I* have worked on this (npymath, separate compilation), although certainly not as much as I would have wanted to. I think much can be done in that vein. Using the "shut up if you don't do it" is a straw man (and uncalled for).

OK, I was annoyed.

By what?

Exactly. Chuck

Robert Kern

10:03 p.m.

On Sat, Feb 18, 2012 at 21:51, Matthew Brett <matthew.brett@gmail.com> wrote:

...

On Sat, Feb 18, 2012 at 1:40 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 2:17 PM, David Cournapeau <cournape@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett <matthew.brett@gmail.com> wrote: > > Hi. > > On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire > <cjordan1@uw.edu> wrote: > > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett > > <matthew.brett@gmail.com> wrote: > >> Hi, > >> > >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire > >> <cjordan1@uw.edu> wrote: > >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden > >>> <sturla@molden.no> > >>> wrote: > >>>> > >>>> > >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout > >>>> <jason-sage@creativetrax.com>: > >>>> > >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: > >>>>>> We would have to write a C++ programming tutorial that is > >>>>>> based > >>>>>> on > >>>>>> Pyton knowledge instead of C knowledge. > >>>>> > >>>>> I personally would love such a thing. It's been a while since > >>>>> I > >>>>> did > >>>>> anything nontrivial on my own in C++. > >>>>> > >>>> > >>>> One example: How do we code multiple return values? > >>>> > >>>> In Python: > >>>> - Return a tuple. > >>>> > >>>> In C: > >>>> - Use pointers (evilness) > >>>> > >>>> In C++: > >>>> - Return a std::tuple, as you would in Python. > >>>> - Use references, as you would in Fortran or Pascal. > >>>> - Use pointers, as you would in C. > >>>> > >>>> C++ textbooks always pick the last... > >>>> > >>>> I would show the first and the second method, and perhaps > >>>> intentionally forget the last. > >>>> > >>>> Sturla > >>>> > >> > >>> On the flip side, cython looked pretty...but I didn't get the > >>> performance gains I wanted, and had to spend a lot of time > >>> figuring > >>> out if it was cython, needing to add types, buggy support for > >>> numpy, > >>> or actually the algorithm. > >> > >> At the time, was the numpy support buggy? I personally haven't > >> had > >> many problems with Cython and numpy. > >> > > > > It's not that the support WAS buggy, it's that it wasn't clear to > > me > > what was going on and where my performance bottleneck was. Even > > after > > microbenchmarking with ipython, using timeit and prun, and using > > the > > cython code visualization tool. Ultimately I don't think it was > > cython, so perhaps my comment was a bit unfair. But it was > > unfortunately difficult to verify that. Of course, as you say, > > diagnosing and solving such issues would become easier to resolve > > with > > more cython experience. > > > >>> The C files generated by cython were > >>> enormous and difficult to read. They really weren't meant for > >>> human > >>> consumption. > >> > >> Yes, it takes some practice to get used to what Cython will do, > >> and > >> how to optimize the output. > >> > >>> As Sturla has said, regardless of the quality of the > >>> current product, it isn't stable. > >> > >> I've personally found it more or less rock solid. Could you say > >> what > >> you mean by "it isn't stable"? > >> > > > > I just meant what Sturla said, nothing more: > > > > "Cython is still 0.16, it is still unfinished. We cannot base > > NumPy > > on > > an unfinished compiler." > > Y'all mean, it has a zero at the beginning of the version number and > it is still adding new features? Yes, that is correct, but it seems > more reasonable to me to phrase that as 'active development' rather > than 'unstable', because they take considerable care to be backwards > compatible, have a large automated Cython test suite, and a major > stress-tester in the Sage test suite. >

Matthew,

No one in their right mind would build a large performance library using Cython, it just isn't the right tool. For what it was designed for - wrapping existing c code or writing small and simple things close to Python - it does very well, but it was never designed for making core C/C++ libraries and in that role it just gets in the way.

I believe the proposal is to refactor the lowest levels in pure C and move the some or most of the library superstructure to Cython.

Go for it.

The proposal of moving to a core C + cython has been discussed by multiple contributors. It is certainly a valid proposal. *I* have worked on this (npymath, separate compilation), although certainly not as much as I would have wanted to. I think much can be done in that vein. Using the "shut up if you don't do it" is a straw man (and uncalled for).

OK, I was annoyed.

By what?

Your misunderstanding of what was being discussed. The proposal being discussed is implementing the core of numpy in C++, wrapped in C to be usable as a C library that other extensions can use, and then exposed to Python in an unspecified way. Cython was raised as an alternative for this core, but as Chuck points out, it doesn't really fit. Your assertion that what was being discussed was putting the core in C and using Cython to wrap it was simply a non-sequitur. Discussion of alternatives is fine. You weren't doing that. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Matthew Brett

10:06 p.m.

Hi, On Sat, Feb 18, 2012 at 2:03 PM, Robert Kern <robert.kern@gmail.com> wrote:

...

On Sat, Feb 18, 2012 at 21:51, Matthew Brett <matthew.brett@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 1:40 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 2:17 PM, David Cournapeau <cournape@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris <charlesr.harris@gmail.com> wrote: > > > On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett > <matthew.brett@gmail.com> > wrote: >> >> Hi. >> >> On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire >> <cjordan1@uw.edu> wrote: >> > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett >> > <matthew.brett@gmail.com> wrote: >> >> Hi, >> >> >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire >> >> <cjordan1@uw.edu> wrote: >> >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden >> >>> <sturla@molden.no> >> >>> wrote: >> >>>> >> >>>> >> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout >> >>>> <jason-sage@creativetrax.com>: >> >>>> >> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: >> >>>>>> We would have to write a C++ programming tutorial that is >> >>>>>> based >> >>>>>> on >> >>>>>> Pyton knowledge instead of C knowledge. >> >>>>> >> >>>>> I personally would love such a thing. It's been a while since >> >>>>> I >> >>>>> did >> >>>>> anything nontrivial on my own in C++. >> >>>>> >> >>>> >> >>>> One example: How do we code multiple return values? >> >>>> >> >>>> In Python: >> >>>> - Return a tuple. >> >>>> >> >>>> In C: >> >>>> - Use pointers (evilness) >> >>>> >> >>>> In C++: >> >>>> - Return a std::tuple, as you would in Python. >> >>>> - Use references, as you would in Fortran or Pascal. >> >>>> - Use pointers, as you would in C. >> >>>> >> >>>> C++ textbooks always pick the last... >> >>>> >> >>>> I would show the first and the second method, and perhaps >> >>>> intentionally forget the last. >> >>>> >> >>>> Sturla >> >>>> >> >> >> >>> On the flip side, cython looked pretty...but I didn't get the >> >>> performance gains I wanted, and had to spend a lot of time >> >>> figuring >> >>> out if it was cython, needing to add types, buggy support for >> >>> numpy, >> >>> or actually the algorithm. >> >> >> >> At the time, was the numpy support buggy? I personally haven't >> >> had >> >> many problems with Cython and numpy. >> >> >> > >> > It's not that the support WAS buggy, it's that it wasn't clear to >> > me >> > what was going on and where my performance bottleneck was. Even >> > after >> > microbenchmarking with ipython, using timeit and prun, and using >> > the >> > cython code visualization tool. Ultimately I don't think it was >> > cython, so perhaps my comment was a bit unfair. But it was >> > unfortunately difficult to verify that. Of course, as you say, >> > diagnosing and solving such issues would become easier to resolve >> > with >> > more cython experience. >> > >> >>> The C files generated by cython were >> >>> enormous and difficult to read. They really weren't meant for >> >>> human >> >>> consumption. >> >> >> >> Yes, it takes some practice to get used to what Cython will do, >> >> and >> >> how to optimize the output. >> >> >> >>> As Sturla has said, regardless of the quality of the >> >>> current product, it isn't stable. >> >> >> >> I've personally found it more or less rock solid. Could you say >> >> what >> >> you mean by "it isn't stable"? >> >> >> > >> > I just meant what Sturla said, nothing more: >> > >> > "Cython is still 0.16, it is still unfinished. We cannot base >> > NumPy >> > on >> > an unfinished compiler." >> >> Y'all mean, it has a zero at the beginning of the version number and >> it is still adding new features? Yes, that is correct, but it seems >> more reasonable to me to phrase that as 'active development' rather >> than 'unstable', because they take considerable care to be backwards >> compatible, have a large automated Cython test suite, and a major >> stress-tester in the Sage test suite. >> > > Matthew, > > No one in their right mind would build a large performance library > using > Cython, it just isn't the right tool. For what it was designed for - > wrapping existing c code or writing small and simple things close to > Python > - it does very well, but it was never designed for making core C/C++ > libraries and in that role it just gets in the way.

I believe the proposal is to refactor the lowest levels in pure C and move the some or most of the library superstructure to Cython.

Go for it.

The proposal of moving to a core C + cython has been discussed by multiple contributors. It is certainly a valid proposal. *I* have worked on this (npymath, separate compilation), although certainly not as much as I would have wanted to. I think much can be done in that vein. Using the "shut up if you don't do it" is a straw man (and uncalled for).

OK, I was annoyed.

By what?

Your misunderstanding of what was being discussed. The proposal being discussed is implementing the core of numpy in C++, wrapped in C to be usable as a C library that other extensions can use, and then exposed to Python in an unspecified way. Cython was raised as an alternative for this core, but as Chuck points out, it doesn't really fit. Your assertion that what was being discussed was putting the core in C and using Cython to wrap it was simply a non-sequitur. Discussion of alternatives is fine. You weren't doing that.

You read David's email? Was he also being annoying? Best, Matthew

Robert Kern

10:20 p.m.

On Sat, Feb 18, 2012 at 22:06, Matthew Brett <matthew.brett@gmail.com> wrote:

...

Hi,

On Sat, Feb 18, 2012 at 2:03 PM, Robert Kern <robert.kern@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 21:51, Matthew Brett <matthew.brett@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 1:40 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 2:17 PM, David Cournapeau <cournape@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett <matthew.brett@gmail.com> wrote: > > Hi, > > On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris > <charlesr.harris@gmail.com> wrote: > > > > > > On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett > > <matthew.brett@gmail.com> > > wrote: > >> > >> Hi. > >> > >> On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire > >> <cjordan1@uw.edu> wrote: > >> > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett > >> > <matthew.brett@gmail.com> wrote: > >> >> Hi, > >> >> > >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire > >> >> <cjordan1@uw.edu> wrote: > >> >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden > >> >>> <sturla@molden.no> > >> >>> wrote: > >> >>>> > >> >>>> > >> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout > >> >>>> <jason-sage@creativetrax.com>: > >> >>>> > >> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: > >> >>>>>> We would have to write a C++ programming tutorial that is > >> >>>>>> based > >> >>>>>> on > >> >>>>>> Pyton knowledge instead of C knowledge. > >> >>>>> > >> >>>>> I personally would love such a thing. It's been a while since > >> >>>>> I > >> >>>>> did > >> >>>>> anything nontrivial on my own in C++. > >> >>>>> > >> >>>> > >> >>>> One example: How do we code multiple return values? > >> >>>> > >> >>>> In Python: > >> >>>> - Return a tuple. > >> >>>> > >> >>>> In C: > >> >>>> - Use pointers (evilness) > >> >>>> > >> >>>> In C++: > >> >>>> - Return a std::tuple, as you would in Python. > >> >>>> - Use references, as you would in Fortran or Pascal. > >> >>>> - Use pointers, as you would in C. > >> >>>> > >> >>>> C++ textbooks always pick the last... > >> >>>> > >> >>>> I would show the first and the second method, and perhaps > >> >>>> intentionally forget the last. > >> >>>> > >> >>>> Sturla > >> >>>> > >> >> > >> >>> On the flip side, cython looked pretty...but I didn't get the > >> >>> performance gains I wanted, and had to spend a lot of time > >> >>> figuring > >> >>> out if it was cython, needing to add types, buggy support for > >> >>> numpy, > >> >>> or actually the algorithm. > >> >> > >> >> At the time, was the numpy support buggy? I personally haven't > >> >> had > >> >> many problems with Cython and numpy. > >> >> > >> > > >> > It's not that the support WAS buggy, it's that it wasn't clear to > >> > me > >> > what was going on and where my performance bottleneck was. Even > >> > after > >> > microbenchmarking with ipython, using timeit and prun, and using > >> > the > >> > cython code visualization tool. Ultimately I don't think it was > >> > cython, so perhaps my comment was a bit unfair. But it was > >> > unfortunately difficult to verify that. Of course, as you say, > >> > diagnosing and solving such issues would become easier to resolve > >> > with > >> > more cython experience. > >> > > >> >>> The C files generated by cython were > >> >>> enormous and difficult to read. They really weren't meant for > >> >>> human > >> >>> consumption. > >> >> > >> >> Yes, it takes some practice to get used to what Cython will do, > >> >> and > >> >> how to optimize the output. > >> >> > >> >>> As Sturla has said, regardless of the quality of the > >> >>> current product, it isn't stable. > >> >> > >> >> I've personally found it more or less rock solid. Could you say > >> >> what > >> >> you mean by "it isn't stable"? > >> >> > >> > > >> > I just meant what Sturla said, nothing more: > >> > > >> > "Cython is still 0.16, it is still unfinished. We cannot base > >> > NumPy > >> > on > >> > an unfinished compiler." > >> > >> Y'all mean, it has a zero at the beginning of the version number and > >> it is still adding new features? Yes, that is correct, but it seems > >> more reasonable to me to phrase that as 'active development' rather > >> than 'unstable', because they take considerable care to be backwards > >> compatible, have a large automated Cython test suite, and a major > >> stress-tester in the Sage test suite. > >> > > > > Matthew, > > > > No one in their right mind would build a large performance library > > using > > Cython, it just isn't the right tool. For what it was designed for - > > wrapping existing c code or writing small and simple things close to > > Python > > - it does very well, but it was never designed for making core C/C++ > > libraries and in that role it just gets in the way. > > I believe the proposal is to refactor the lowest levels in pure C and > move the some or most of the library superstructure to Cython.

Go for it.

The proposal of moving to a core C + cython has been discussed by multiple contributors. It is certainly a valid proposal. *I* have worked on this (npymath, separate compilation), although certainly not as much as I would have wanted to. I think much can be done in that vein. Using the "shut up if you don't do it" is a straw man (and uncalled for).

OK, I was annoyed.

By what?

Your misunderstanding of what was being discussed. The proposal being discussed is implementing the core of numpy in C++, wrapped in C to be usable as a C library that other extensions can use, and then exposed to Python in an unspecified way. Cython was raised as an alternative for this core, but as Chuck points out, it doesn't really fit. Your assertion that what was being discussed was putting the core in C and using Cython to wrap it was simply a non-sequitur. Discussion of alternatives is fine. You weren't doing that.

You read David's email? Was he also being annoying?

Not really, because he was responding on-topic to the bizarro-branch of the conversation that you spawned about the merits of moving from hand-written C extensions to a Cython-wrapped C library. Whatever annoyance his email might inspire is your fault, not his. The discussion was about whether to use C++ or Cython for the core. Chuck argued that Cython was not a suitable implementation language for the core. You responded that his objections to Cython didn't apply to what you thought was being discussed, using Cython to wrap a pure-C library. As Pauli (Wolfgang, not our Pauli) once phrased it, you were "not even wrong". It's hard to respond coherently to someone who is breaking the fundamental expectations of discourse. Even I had to stare at the thread for a few minutes to figure out where things went off the rails. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Matthew Brett

10:29 p.m.

Hi, On Sat, Feb 18, 2012 at 2:20 PM, Robert Kern <robert.kern@gmail.com> wrote:

...

On Sat, Feb 18, 2012 at 22:06, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Sat, Feb 18, 2012 at 2:03 PM, Robert Kern <robert.kern@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 21:51, Matthew Brett <matthew.brett@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 1:40 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 2:17 PM, David Cournapeau <cournape@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris <charlesr.harris@gmail.com> wrote: > > > On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett <matthew.brett@gmail.com> > wrote: >> >> Hi, >> >> On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris >> <charlesr.harris@gmail.com> wrote: >> > >> > >> > On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett >> > <matthew.brett@gmail.com> >> > wrote: >> >> >> >> Hi. >> >> >> >> On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire >> >> <cjordan1@uw.edu> wrote: >> >> > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett >> >> > <matthew.brett@gmail.com> wrote: >> >> >> Hi, >> >> >> >> >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire >> >> >> <cjordan1@uw.edu> wrote: >> >> >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden >> >> >>> <sturla@molden.no> >> >> >>> wrote: >> >> >>>> >> >> >>>> >> >> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout >> >> >>>> <jason-sage@creativetrax.com>: >> >> >>>> >> >> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: >> >> >>>>>> We would have to write a C++ programming tutorial that is >> >> >>>>>> based >> >> >>>>>> on >> >> >>>>>> Pyton knowledge instead of C knowledge. >> >> >>>>> >> >> >>>>> I personally would love such a thing. It's been a while since >> >> >>>>> I >> >> >>>>> did >> >> >>>>> anything nontrivial on my own in C++. >> >> >>>>> >> >> >>>> >> >> >>>> One example: How do we code multiple return values? >> >> >>>> >> >> >>>> In Python: >> >> >>>> - Return a tuple. >> >> >>>> >> >> >>>> In C: >> >> >>>> - Use pointers (evilness) >> >> >>>> >> >> >>>> In C++: >> >> >>>> - Return a std::tuple, as you would in Python. >> >> >>>> - Use references, as you would in Fortran or Pascal. >> >> >>>> - Use pointers, as you would in C. >> >> >>>> >> >> >>>> C++ textbooks always pick the last... >> >> >>>> >> >> >>>> I would show the first and the second method, and perhaps >> >> >>>> intentionally forget the last. >> >> >>>> >> >> >>>> Sturla >> >> >>>> >> >> >> >> >> >>> On the flip side, cython looked pretty...but I didn't get the >> >> >>> performance gains I wanted, and had to spend a lot of time >> >> >>> figuring >> >> >>> out if it was cython, needing to add types, buggy support for >> >> >>> numpy, >> >> >>> or actually the algorithm. >> >> >> >> >> >> At the time, was the numpy support buggy? I personally haven't >> >> >> had >> >> >> many problems with Cython and numpy. >> >> >> >> >> > >> >> > It's not that the support WAS buggy, it's that it wasn't clear to >> >> > me >> >> > what was going on and where my performance bottleneck was. Even >> >> > after >> >> > microbenchmarking with ipython, using timeit and prun, and using >> >> > the >> >> > cython code visualization tool. Ultimately I don't think it was >> >> > cython, so perhaps my comment was a bit unfair. But it was >> >> > unfortunately difficult to verify that. Of course, as you say, >> >> > diagnosing and solving such issues would become easier to resolve >> >> > with >> >> > more cython experience. >> >> > >> >> >>> The C files generated by cython were >> >> >>> enormous and difficult to read. They really weren't meant for >> >> >>> human >> >> >>> consumption. >> >> >> >> >> >> Yes, it takes some practice to get used to what Cython will do, >> >> >> and >> >> >> how to optimize the output. >> >> >> >> >> >>> As Sturla has said, regardless of the quality of the >> >> >>> current product, it isn't stable. >> >> >> >> >> >> I've personally found it more or less rock solid. Could you say >> >> >> what >> >> >> you mean by "it isn't stable"? >> >> >> >> >> > >> >> > I just meant what Sturla said, nothing more: >> >> > >> >> > "Cython is still 0.16, it is still unfinished. We cannot base >> >> > NumPy >> >> > on >> >> > an unfinished compiler." >> >> >> >> Y'all mean, it has a zero at the beginning of the version number and >> >> it is still adding new features? Yes, that is correct, but it seems >> >> more reasonable to me to phrase that as 'active development' rather >> >> than 'unstable', because they take considerable care to be backwards >> >> compatible, have a large automated Cython test suite, and a major >> >> stress-tester in the Sage test suite. >> >> >> > >> > Matthew, >> > >> > No one in their right mind would build a large performance library >> > using >> > Cython, it just isn't the right tool. For what it was designed for - >> > wrapping existing c code or writing small and simple things close to >> > Python >> > - it does very well, but it was never designed for making core C/C++ >> > libraries and in that role it just gets in the way. >> >> I believe the proposal is to refactor the lowest levels in pure C and >> move the some or most of the library superstructure to Cython. > > > Go for it.

The proposal of moving to a core C + cython has been discussed by multiple contributors. It is certainly a valid proposal. *I* have worked on this (npymath, separate compilation), although certainly not as much as I would have wanted to. I think much can be done in that vein. Using the "shut up if you don't do it" is a straw man (and uncalled for).

OK, I was annoyed.

By what?

Your misunderstanding of what was being discussed. The proposal being discussed is implementing the core of numpy in C++, wrapped in C to be usable as a C library that other extensions can use, and then exposed to Python in an unspecified way. Cython was raised as an alternative for this core, but as Chuck points out, it doesn't really fit. Your assertion that what was being discussed was putting the core in C and using Cython to wrap it was simply a non-sequitur. Discussion of alternatives is fine. You weren't doing that.

You read David's email? Was he also being annoying?

Not really, because he was responding on-topic to the bizarro-branch of the conversation that you spawned about the merits of moving from hand-written C extensions to a Cython-wrapped C library. Whatever annoyance his email might inspire is your fault, not his. The discussion was about whether to use C++ or Cython for the core. Chuck argued that Cython was not a suitable implementation language for the core. You responded that his objections to Cython didn't apply to what you thought was being discussed, using Cython to wrap a pure-C library. As Pauli (Wolfgang, not our Pauli) once phrased it, you were "not even wrong". It's hard to respond coherently to someone who is breaking the fundamental expectations of discourse. Even I had to stare at the thread for a few minutes to figure out where things went off the rails.

I'm sorry but this seems to me to be aggressive, offensive, and unjust. The discussion was, from the beginning, mainly about the relative benefits of rewriting the core with C / Cython, or C++. I don't think anyone was proposing writing every line of the numpy core in Cython. Ergo (sorry to use the debating term), the proposal to use Cython was always to take some of the higher level code out of C and leave some of it in C. It does indeed make the debate ridiculous to oppose a proposal that no-one has made. Now I am sure it is obvious to you, that the proposal to refactor the current C code to into low-level C libraries, and higher level Cython wrappers, is absurd and off the table. It isn't obvious to me. I don't think I broke a fundamental rule of polite discourse to clarify that is what I meant, Best, Matthew

Robert Kern

10:51 p.m.

On Sat, Feb 18, 2012 at 22:29, Matthew Brett <matthew.brett@gmail.com> wrote:

...

Hi,

On Sat, Feb 18, 2012 at 2:20 PM, Robert Kern <robert.kern@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 22:06, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Sat, Feb 18, 2012 at 2:03 PM, Robert Kern <robert.kern@gmail.com> wrote:

...

...
...
...
Your misunderstanding of what was being discussed. The proposal being discussed is implementing the core of numpy in C++, wrapped in C to be usable as a C library that other extensions can use, and then exposed to Python in an unspecified way. Cython was raised as an alternative for this core, but as Chuck points out, it doesn't really fit. Your assertion that what was being discussed was putting the core in C and using Cython to wrap it was simply a non-sequitur. Discussion of alternatives is fine. You weren't doing that.

You read David's email? Was he also being annoying?

Not really, because he was responding on-topic to the bizarro-branch of the conversation that you spawned about the merits of moving from hand-written C extensions to a Cython-wrapped C library. Whatever annoyance his email might inspire is your fault, not his. The discussion was about whether to use C++ or Cython for the core. Chuck argued that Cython was not a suitable implementation language for the core. You responded that his objections to Cython didn't apply to what you thought was being discussed, using Cython to wrap a pure-C library. As Pauli (Wolfgang, not our Pauli) once phrased it, you were "not even wrong". It's hard to respond coherently to someone who is breaking the fundamental expectations of discourse. Even I had to stare at the thread for a few minutes to figure out where things went off the rails.

I'm sorry but this seems to me to be aggressive, offensive, and unjust.

The discussion was, from the beginning, mainly about the relative benefits of rewriting the core with C / Cython, or C++.

I don't think anyone was proposing writing every line of the numpy core in Cython. Ergo (sorry to use the debating term), the proposal to use Cython was always to take some of the higher level code out of C and leave some of it in C. It does indeed make the debate ridiculous to oppose a proposal that no-one has made.

Now I am sure it is obvious to you, that the proposal to refactor the current C code to into low-level C libraries, and higher level Cython wrappers, is absurd and off the table. It isn't obvious to me. I don't think I broke a fundamental rule of polite discourse to clarify that is what I meant,

It's not off the table, but it's not what this discussion was about. The proposal is to implement the core in C++. Regardless of whether the core is separated out as an independent non-Python library or not. Some people want to use higher level language features in the core. Cython was brought up as an alternative. If they were bringing up Cython in the context of C-core+Cython-wrapper, then they were also misunderstanding what the proposal was about. The discussion is about a C++-core versus a C-core (either the current one or a refactored one). If you want to argue for a C-core over a C++-core, that's great, but talking about Cython features and stability is not relevant to that discussion. It's an entirely orthogonal issue to what is motivating the request to use C++ in the core. C-core+Cython-wrapper is still a viable alternative, but the relevant bit of that is "C-core". I would wager that after any refactoring of the core, regardless of whether it is implemented in C++ or C, we would then wrap it in Cython. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Matthew Brett

11:04 p.m.

Hi, On Sat, Feb 18, 2012 at 2:51 PM, Robert Kern <robert.kern@gmail.com> wrote:

...

On Sat, Feb 18, 2012 at 22:29, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Sat, Feb 18, 2012 at 2:20 PM, Robert Kern <robert.kern@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 22:06, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Sat, Feb 18, 2012 at 2:03 PM, Robert Kern <robert.kern@gmail.com> wrote:

...
...
...
...
Your misunderstanding of what was being discussed. The proposal being discussed is implementing the core of numpy in C++, wrapped in C to be usable as a C library that other extensions can use, and then exposed to Python in an unspecified way. Cython was raised as an alternative for this core, but as Chuck points out, it doesn't really fit. Your assertion that what was being discussed was putting the core in C and using Cython to wrap it was simply a non-sequitur. Discussion of alternatives is fine. You weren't doing that.

You read David's email? Was he also being annoying?

Not really, because he was responding on-topic to the bizarro-branch of the conversation that you spawned about the merits of moving from hand-written C extensions to a Cython-wrapped C library. Whatever annoyance his email might inspire is your fault, not his. The discussion was about whether to use C++ or Cython for the core. Chuck argued that Cython was not a suitable implementation language for the core. You responded that his objections to Cython didn't apply to what you thought was being discussed, using Cython to wrap a pure-C library. As Pauli (Wolfgang, not our Pauli) once phrased it, you were "not even wrong". It's hard to respond coherently to someone who is breaking the fundamental expectations of discourse. Even I had to stare at the thread for a few minutes to figure out where things went off the rails.

I'm sorry but this seems to me to be aggressive, offensive, and unjust.

The discussion was, from the beginning, mainly about the relative benefits of rewriting the core with C / Cython, or C++.

I don't think anyone was proposing writing every line of the numpy core in Cython. Ergo (sorry to use the debating term), the proposal to use Cython was always to take some of the higher level code out of C and leave some of it in C. It does indeed make the debate ridiculous to oppose a proposal that no-one has made.

Now I am sure it is obvious to you, that the proposal to refactor the current C code to into low-level C libraries, and higher level Cython wrappers, is absurd and off the table. It isn't obvious to me. I don't think I broke a fundamental rule of polite discourse to clarify that is what I meant,

It's not off the table, but it's not what this discussion was about.

I beg to differ - which was why I replied the way I did. As I see it the two proposals being discussed were: 1) C++ rewrite of C core 2) Refactor current C core into C / Cython I think you can see from David's reply that that was also his understanding. Of course you could use Cython to interface to the 'core' in C or the 'core' in C++, but the difference would be, that some of the stuff in C++ for option 1) would be in Cython, in option 2). Now you might be saying, that you believe the discussion was only ever about whether the non-Cython bits would be in C or C++. That would indeed make sense of your lack of interest in discussion of Cython. I think you'd be hard pressed to claim it was only me discussing Cython though. Chuck was pointing out that it was completely ridiculous trying to implement the entire core in Cython. Yes it is. As no-one has proposed that, it seems to me only reasonable to point out what I meant, in the interests of productive discourse. Best, Matthew

David Cournapeau

10:24 p.m.

On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

Well, we already have code obfuscation (DOUBLE_your_pleasure, FLOAT_your_boat), so we might as well let the compiler handle it.

Yes, those are not great, but on the other hand, it is not that a fundamental issue IMO. Iterators as we have it in NumPy is something that is clearly limited by C. Writing the neighborhood iterator is the only case where I really felt that C++ *could* be a significant improvement. I use *could* because writing iterator in C++ is hard, and will be much harder to read (I find both boost and STL - e.g. stlport -- iterators to be close to write-only code). But there is the question on how you can make C++-based iterators available in C. I would be interested in a simple example of how this could be done, ignoring all the other issues (portability, exception, etc…). The STL is also potentially compelling, but that's where we go into my "beware of the dragons" area of C++. Portability loss, compilation time increase and warts are significant there. scipy.sparse.sparsetools has been a source of issues that was quite high compared to its proportion of scipy amount code (we *do* have some hard-won experience on C++-related issues).

...

Jim Hugunin was a keynote speaker at one of the scipy conventions. At dinner he said that if he was to do it again he would use managed code ;) I don't propose we do that, but tools do advance.

In an ideal world, we would have a better language than C++ that can be spit out as C for portability. I have looked for a way to do this for as long as I have been contributing to NumPy (I have looked at ooc, D, coccinelle at various stages). I believe the best way is actually in the vein of FFTW: written in a very high level language (OCAML) for the hard part, and spitting out C. This is better than C++ is many ways - this is also clearly not realistic :) David

Sturla Molden

10:50 p.m.

Den 18.02.2012 23:24, skrev David Cournapeau:

...

Iterators as we have it in NumPy is something that is clearly limited by C.

Computers tend to have more than one CPU now. Iterators are inherently bad, whether they are written in C or C++. NumPy core should be written with objects that are scalable on multiple processors. Remember the original numeric was written in a time where dektop computers only had one processor.

...

In an ideal world, we would have a better language than C++ that can be spit out as > C for portability.

What about a statically typed Python? (That is, not Cython.) We just need to make the compiler :-) Sturla

David Cournapeau

11:09 p.m.

On Sat, Feb 18, 2012 at 10:50 PM, Sturla Molden <sturla@molden.no> wrote:

...

> In an ideal world, we would have a better language than C++ that can be spit out as > C for portability.

What about a statically typed Python? (That is, not Cython.) We just need to make the compiler :-)

There are better languages than C++ that has most of the technical benefits stated in this discussion (rust and D being the most "obvious" ones), but whose usage is unrealistic today for various reasons: knowledge, availability on "esoteric" platforms, etc… A new language is completely ridiculous. David

Sturla Molden

11:17 p.m.

Den 19.02.2012 00:09, skrev David Cournapeau:

...

reasons: knowledge, availability on "esoteric" platforms, etc… A new language is completely ridiculous.

Yes, that is why I argued against Cython as well. Personally I prefer C++ to C, but only if it is written in a readable way. And if the purpose is to write C in C++, then it's brain dead. Sturla

Sturla Molden

11:33 p.m.

Den 19.02.2012 00:09, skrev David Cournapeau:

...

There are better languages than C++ that has most of the technical benefits stated in this discussion (rust and D being the most "obvious" ones),

What about Java? (compile with GJC for CPython) Or just write everything in Cython, even the core? Sturla

Sturla Molden

11:36 p.m.

Den 19.02.2012 00:33, skrev Sturla Molden:

...

Or just write everything in Cython, even the core?

That is, use memory view syntax and fused types for generics, and hope it is stable before we are done ;-) Sturla

Nathaniel Smith

12:19 a.m.

On Sat, Feb 18, 2012 at 11:09 PM, David Cournapeau <cournape@gmail.com> wrote:

...

On Sat, Feb 18, 2012 at 10:50 PM, Sturla Molden <sturla@molden.no> wrote:

...
> In an ideal world, we would have a better language than C++ that can be spit out as > C for portability.

What about a statically typed Python? (That is, not Cython.) We just need to make the compiler :-)

There are better languages than C++ that has most of the technical benefits stated in this discussion (rust and D being the most "obvious" ones), but whose usage is unrealistic today for various reasons: knowledge, availability on "esoteric" platforms, etc… A new language is completely ridiculous.

Off-topic: rust is an obvious one? That makes my day, Graydon is an old friend and collaborator :-). But FYI, it wouldn't be relevant anyway; its emphasis on concurrency means that it can easily call C, but you can't really call it from C -- it needs to "own" the overall runtime. And I failed to convince him to add numerical-array-relevant features like operator overloading to make it more convenient for numerical programmers attracted by the concurrency support :-(. There are some very small values of "new language" that might be relevant alternatives, like -- if templates are the big draw for C++, then making the existing code generators suck less might do just as well, while avoiding the build system and portability hassles of C++. *shrug* -- Nathaniel

Sturla Molden

4:42 p.m.

Den 19.02.2012 00:09, skrev David Cournapeau:

...

There are better languages than C++ that has most of the technical benefits stated in this discussion (rust and D being the most "obvious" ones), but whose usage is unrealistic today for various reasons: knowledge, availability on "esoteric" platforms, etc… A new language is completely ridiculous.

There are still other options than C or C++ that are worth considering. One would be to write NumPy in Python. E.g. we could use LLVM as a JIT-compiler and produce the performance critical code we need on the fly. Sturla

Sturla Molden

4:55 p.m.

Den 20.02.2012 17:42, skrev Sturla Molden:

...

There are still other options than C or C++ that are worth considering. One would be to write NumPy in Python. E.g. we could use LLVM as a JIT-compiler and produce the performance critical code we need on the fly.

LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster than GCC and often produces better machine code. They can therefore be used inside an array library. It would give a faster NumPy, and we could keep most of it in Python. Sturla

Charles R Harris

5:14 p.m.

On Mon, Feb 20, 2012 at 9:55 AM, Sturla Molden <sturla@molden.no> wrote:

...

Den 20.02.2012 17:42, skrev Sturla Molden:

...
There are still other options than C or C++ that are worth considering. One would be to write NumPy in Python. E.g. we could use LLVM as a JIT-compiler and produce the performance critical code we need on the fly.

LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster than GCC and often produces better machine code. They can therefore be used inside an array library. It would give a faster NumPy, and we could keep most of it in Python.

Would that work for Ruby also? One of the advantages of C++ is that the code doesn't need to be refactored to start with, just modified step by step going into the future. I think PyPy is close to what you are talking about. Chuck

Sturla Molden

5:26 p.m.

Den 20.02.2012 18:14, skrev Charles R Harris:

...

Would that work for Ruby also? One of the advantages of C++ is that the code doesn't need to be refactored to start with, just modified step by step going into the future. I think PyPy is close to what you are talking about.

If we plant to support more languages than Python, it might be better to use C++ (sorry). But it does not mean that LLVM cannot be used. Either one can generate C or C++, or just use the assembly language (which is very simple and readable too: http://llvm.org/docs/LangRef.html). We have exact knowledge about an ndarray at runtime: - dtype - dimensions - strides - whether the array is contiguous or not This can be JIT-compiled into specialized looping code by LLVM. These kernels can then be stored in a database and resued. If it matters, LLVM is embeddable in C++. Sturla

Dag Sverre Seljebotn

5:18 p.m.

On 02/20/2012 08:55 AM, Sturla Molden wrote:

...

Den 20.02.2012 17:42, skrev Sturla Molden:

...
There are still other options than C or C++ that are worth considering. One would be to write NumPy in Python. E.g. we could use LLVM as a JIT-compiler and produce the performance critical code we need on the fly.

LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster than GCC and often produces better machine code. They can therefore be used inside an array library. It would give a faster NumPy, and we could keep most of it in Python.

I think it is moot to focus on improving NumPy performance as long as in practice all NumPy operations are memory bound due to the need to take a trip through system memory for almost any operation. C/C++ is simply "good enough". JIT is when you're chasing a 2x improvement or so, but today NumPy can be 10-20x slower than a Cython loop. You need at least a slightly different Python API to get anywhere, so numexpr/Theano is the right place to work on an implementation of this idea. Of course it would be nice if numexpr/Theano offered something as convenient as with lazy: arr = A + B + C # with all of these NumPy arrays # compute upon exiting... Dag

Francesc Alted

5:28 p.m.

On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote:

...

You need at least a slightly different Python API to get anywhere, so numexpr/Theano is the right place to work on an implementation of this idea. Of course it would be nice if numexpr/Theano offered something as convenient as

with lazy: arr = A + B + C # with all of these NumPy arrays # compute upon exiting…

Hmm, that would be cute indeed. Do you have an idea on how the code in the with context could be passed to the Python AST compiler (à la numexpr.evaluate("A + B + C"))? -- Francesc Alted

Lluís

7:28 p.m.

Francesc Alted writes:

...

On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote:

...
You need at least a slightly different Python API to get anywhere, so numexpr/Theano is the right place to work on an implementation of this idea. Of course it would be nice if numexpr/Theano offered something as convenient as

with lazy: arr = A + B + C # with all of these NumPy arrays # compute upon exiting…

...

Hmm, that would be cute indeed. Do you have an idea on how the code in the with context could be passed to the Python AST compiler (à la numexpr.evaluate("A + B + C"))?

Well, I started writing some experiments to "almost transparently" translate regular ndarray operations to numexpr strings (or others) using only python code. The concept is very simple: # you only need the first one to start building the AST a = lazy(np.arange(16)) b = np.arange(16) res = a + b + 3 print evaluate(res) # the actual evaluation can be delayed to something like __repr__ or __str__ print repr(res) print res # you could also delay evaluation until someone uses res to create a new array My target was to use this to also generate optimized GPU kernels in-flight using pycuda, but I think some other relatively recent project already performed something similar (w.r.t. generating cuda kernels out of python expressions). The supporting code for numexpr was something like: import numexpr import numpy as np def build_arg_expr (arg, args): if isinstance(arg, Expr): # recursively build the expression arg_expr, arg_args = arg.build_expr() args.update(arg_args) return arg_expr else: # unique argument identifier arg_id = "arg_%d" % id(arg) args[arg_id] = arg return arg_id # generic expression builder class Expr: def evaluate(self): expr, args = self.build_expr() return numexpr.evaluate(expr, local_dict = args, global_dict = {}) def __repr__ (self): return self.evaluate().__repr__() def __str__ (self): return self.evaluate().__str__() def __add__ (self, other): return ExprAdd(self, other) # expression builder for adds class ExprAdd(Expr): def __init__(self, arg1, arg2): self.arg1 = arg1 self.arg2 = arg2 def build_expr(self): args = {} expr1 = build_arg_expr(self.arg1, args) expr2 = build_arg_expr(self.arg2, args) return "("+expr1+") + ("+expr2+")", args # ndarray-like class to generate expression builders class LazyNdArray(np.ndarray): def __add__ (self, other): return ExprAdd(self, other) # build a LazyNdArray def lazy (arg): return arg.view(LazyNdArray) # evaluate with numexpr an arbitrary expression builder def evaluate(arg): return arg.evaluate() The thing here is to always return to the user something that looks like an ndarray. As you can see the whole thing is not very complex, but some less funny code had to be written meanwhile for work and I just dropped this :) Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth

Lluís

7:41 p.m.

Lluís writes:

...

Francesc Alted writes:

...
On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote:

...
You need at least a slightly different Python API to get anywhere, so numexpr/Theano is the right place to work on an implementation of this idea. Of course it would be nice if numexpr/Theano offered something as convenient as

with lazy: arr = A + B + C # with all of these NumPy arrays # compute upon exiting…

...

...
Hmm, that would be cute indeed. Do you have an idea on how the code in the with context could be passed to the Python AST compiler (à la numexpr.evaluate("A + B + C"))?

...

Well, I started writing some experiments to "almost transparently" translate regular ndarray operations to numexpr strings (or others) using only python code. [...] My target was to use this to also generate optimized GPU kernels in-flight using pycuda, but I think some other relatively recent project already performed something similar (w.r.t. generating cuda kernels out of python expressions).

Aaahhh, I just had a quick look at Theano and it seems it's the project I was referring to. Good job! :) Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth

James Bergstra

8:01 p.m.

Looks like Dag forked the discussion of lazy evaluation to a new thread ([Numpy-discussion] ndarray and lazy evaluation). There are actually several projects inspired by this sort of design: off the top of my head I can think of Theano, copperhead, numexpr, arguably sympy, and some non-public code by Nicolas Pinto. So I think the strengths of the approach in principle are established... the big question is how to make this approach easy to use in all the settings where it could be useful. I don't think any of these projects has gotten that totally right. -JB On Mon, Feb 20, 2012 at 2:41 PM, Lluís <xscript@gmx.net> wrote:

...

Lluís writes:

...
Francesc Alted writes:

...
On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote:

...
You need at least a slightly different Python API to get anywhere, so numexpr/Theano is the right place to work on an implementation of this idea. Of course it would be nice if numexpr/Theano offered something as convenient as

with lazy: arr = A + B + C # with all of these NumPy arrays # compute upon exiting…

...
...
Hmm, that would be cute indeed. Do you have an idea on how the code in the with context could be passed to the Python AST compiler (à la numexpr.evaluate("A + B + C"))?

...
Well, I started writing some experiments to "almost transparently" translate regular ndarray operations to numexpr strings (or others) using only python code. [...] My target was to use this to also generate optimized GPU kernels in-flight using pycuda, but I think some other relatively recent project already performed something similar (w.r.t. generating cuda kernels out of python expressions).

Aaahhh, I just had a quick look at Theano and it seems it's the project I was referring to.

Good job! :)

Lluis

-- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- http://www-etud.iro.umontreal.ca/~bergstrj

Christopher Jordan-Squire

5:34 p.m.

On Mon, Feb 20, 2012 at 9:18 AM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:

...

On 02/20/2012 08:55 AM, Sturla Molden wrote:

...
Den 20.02.2012 17:42, skrev Sturla Molden:

...
There are still other options than C or C++ that are worth considering. One would be to write NumPy in Python. E.g. we could use LLVM as a JIT-compiler and produce the performance critical code we need on the fly.

LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster than GCC and often produces better machine code. They can therefore be used inside an array library. It would give a faster NumPy, and we could keep most of it in Python.

I think it is moot to focus on improving NumPy performance as long as in practice all NumPy operations are memory bound due to the need to take a trip through system memory for almost any operation. C/C++ is simply "good enough". JIT is when you're chasing a 2x improvement or so, but today NumPy can be 10-20x slower than a Cython loop.

I don't follow this. Could you expand a bit more? (Specifically, I wasn't aware that numpy could be 10-20x slower than a cython loop, if we're talking about the base numpy library--so core operations. I'm also not totally sure why a JIT is a 2x improvement or so vs. cython. Not that a disagree on either of these points, I'd just like a bit more detail.) Thanks, Chris

...

You need at least a slightly different Python API to get anywhere, so numexpr/Theano is the right place to work on an implementation of this idea. Of course it would be nice if numexpr/Theano offered something as convenient as

with lazy: arr = A + B + C # with all of these NumPy arrays # compute upon exiting...

Dag _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Sturla Molden

6:02 p.m.

Den 20.02.2012 18:34, skrev Christopher Jordan-Squire:

...

I don't follow this. Could you expand a bit more? (Specifically, I wasn't aware that numpy could be 10-20x slower than a cython loop, if we're talking about the base numpy library--so core operations. I'm also not totally sure why a JIT is a 2x improvement or so vs. cython. Not that a disagree on either of these points, I'd just like a bit more detail.)

Dag Sverre is right about this. NumPy is memory bound, Cython loops are (usually) CPU bound. If you write: x[:] = a + b + c # numpy arrays then this happens (excluding reference counting): - allocate temporary array - loop over a and b, add to temporary - allocate 2nd temporary array - loop over 1st temporary array and c, add to 2nd - deallocate 1st temporary array - loop over 2nd temporary array, assign to x - deallocate 2nd temporary array Since memory access is slow, memory allocation and deallocation is slow, and computation is fast, this will be perhaps 10 times slower than what we could do with a loop in Cython: for i in range(n): x[i] = a[i] + b[i] + c[i] I.e. we get rid of the temporary arrays and the multiple loops. All the temporaries here are put in registers. It is streaming data into the CPU that is slow, not computing! It has actually been experimented with streaming data in a compressed form, and decompressing on the fly, as data access still dominates the runtime (even if you do a lot of computing per element). Sturla

Dag Sverre Seljebotn

6:08 p.m.

On 02/20/2012 09:34 AM, Christopher Jordan-Squire wrote:

...

On Mon, Feb 20, 2012 at 9:18 AM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:

...
On 02/20/2012 08:55 AM, Sturla Molden wrote:

...
Den 20.02.2012 17:42, skrev Sturla Molden:

...
There are still other options than C or C++ that are worth considering. One would be to write NumPy in Python. E.g. we could use LLVM as a JIT-compiler and produce the performance critical code we need on the fly.

LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster than GCC and often produces better machine code. They can therefore be used inside an array library. It would give a faster NumPy, and we could keep most of it in Python.

I think it is moot to focus on improving NumPy performance as long as in practice all NumPy operations are memory bound due to the need to take a trip through system memory for almost any operation. C/C++ is simply "good enough". JIT is when you're chasing a 2x improvement or so, but today NumPy can be 10-20x slower than a Cython loop.

I don't follow this. Could you expand a bit more? (Specifically, I wasn't aware that numpy could be 10-20x slower than a cython loop, if we're talking about the base numpy library--so core operations. I'm

The problem with NumPy is the temporaries needed -- if you want to compute A + B + np.sqrt(D) then, if the arrays are larger than cache size (a couple of megabytes), then each of those operations will first transfer the data in and out over the memory bus. I.e. first you compute an element of sqrt(D), then the result of that is put in system memory, then later the same number is read back in order to add it to an element in B, and so on. The compute-to-bandwidth ratio of modern CPUs is between 30:1 and 60:1... so in extreme cases it's cheaper to do 60 additions than to transfer a single number from system memory. It is much faster to only transfer an element (or small block) from each of A, B, and D to CPU cache, then do the entire expression, then transfer the result back. This is easy to code in Cython/Fortran/C and impossible with NumPy/Python. This is why numexpr/Theano exists. You can make the slowdown over Cython/Fortran/C almost arbitrarily large by adding terms to the equation above. So of course, the actual slowdown depends on your usecase.

...

also not totally sure why a JIT is a 2x improvement or so vs. cython. Not that a disagree on either of these points, I'd just like a bit more detail.)

I meant that the JIT may be a 2x improvement over the current NumPy C code. There's some logic when iterating arrays that could perhaps be specialized away depending on the actual array layout at runtime. But I'm thinking that probably a JIT wouldn't help all that much, so it's probably 1x -- the 2x was just to be very conservative w.r.t. the argument I was making, as I don't know the NumPy C sources well enough. Dag

Francesc Alted

6:18 p.m.

On Feb 20, 2012, at 7:08 PM, Dag Sverre Seljebotn wrote:

...

On 02/20/2012 09:34 AM, Christopher Jordan-Squire wrote:

...
On Mon, Feb 20, 2012 at 9:18 AM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:

...
On 02/20/2012 08:55 AM, Sturla Molden wrote:

...
Den 20.02.2012 17:42, skrev Sturla Molden:

...
There are still other options than C or C++ that are worth considering. One would be to write NumPy in Python. E.g. we could use LLVM as a JIT-compiler and produce the performance critical code we need on the fly.

LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster than GCC and often produces better machine code. They can therefore be used inside an array library. It would give a faster NumPy, and we could keep most of it in Python.

I think it is moot to focus on improving NumPy performance as long as in practice all NumPy operations are memory bound due to the need to take a trip through system memory for almost any operation. C/C++ is simply "good enough". JIT is when you're chasing a 2x improvement or so, but today NumPy can be 10-20x slower than a Cython loop.

I don't follow this. Could you expand a bit more? (Specifically, I wasn't aware that numpy could be 10-20x slower than a cython loop, if we're talking about the base numpy library--so core operations. I'm

The problem with NumPy is the temporaries needed -- if you want to compute

A + B + np.sqrt(D)

then, if the arrays are larger than cache size (a couple of megabytes), then each of those operations will first transfer the data in and out over the memory bus. I.e. first you compute an element of sqrt(D), then the result of that is put in system memory, then later the same number is read back in order to add it to an element in B, and so on.

The compute-to-bandwidth ratio of modern CPUs is between 30:1 and 60:1... so in extreme cases it's cheaper to do 60 additions than to transfer a single number from system memory.

It is much faster to only transfer an element (or small block) from each of A, B, and D to CPU cache, then do the entire expression, then transfer the result back. This is easy to code in Cython/Fortran/C and impossible with NumPy/Python.

This is why numexpr/Theano exists.

Well, I can't speak for Theano (it is quite more general than numexpr, and more geared towards using GPUs, right?), but this was certainly the issue that make David Cooke to create numexpr. A more in-deep explanation about this problem can be seen in: http://www.euroscipy.org/talk/1657 which includes some graphical explanations. -- Francesc Alted

Sturla Molden

5:44 p.m.

Den 20.02.2012 18:18, skrev Dag Sverre Seljebotn:

...

I think it is moot to focus on improving NumPy performance as long as in practice all NumPy operations are memory bound due to the need to take a trip through system memory for almost any operation. C/C++ is simply "good enough". JIT is when you're chasing a 2x improvement or so, but today NumPy can be 10-20x slower than a Cython loop.

You need at least a slightly different Python API to get anywhere, so numexpr/Theano is the right place to work on an implementation of this idea. Of course it would be nice if numexpr/Theano offered something as convenient as

with lazy: arr = A + B + C # with all of these NumPy arrays # compute upon exiting...

Lazy evaluation is nice. But I was thinking more about how to avoid C++ in the NumPy core, so more than 2 or 3 programmers could contribute. I.e. my point was not that loops in LLVM would be much faster than C++ (that is besides the point), but the code could be written in Python instead of C++. But if the idea is to support other languages as well (which I somehow forgot), then this approach certainly becomes less useful. (OTOH, lazy evaluation is certainly easier to achieve with JIT compilation. But that will have to wait until NumPy 5.0 perhaps...) Sturla

Travis Oliphant

4:04 a.m.

Interesting you bring this up. I actually have a working prototype of using Python to emit LLVM. I will be showing it at the HPC tutorial that I am giving at PyCon. I will be making this available after PyCon to a wider audience as open source. It uses llvm-py (modified to work with LLVM 3.0) and code I wrote to do the translation from Python byte-code to LLVM. This LLVM can then be "JIT"ed. I have several applications that I would like to use this for. It would be possible to write "more of NumPy" using this approach. Initially, it makes it *very* easy to create a machine-code ufunc from Python code. There are other use-cases of having loops written in Python and plugged in to a calculation, filtering, or indexing framework that this system will be useful for. There is still a need for a core data-type object, a core array object, and a core calculation object. Maybe some-day these cores can be shrunk to a smaller subset and more of something along the lines of LLVM generation from Python can be used. But, there is a lot of work to do before that is possible. But, a lot of the currently pre-compiled loops can be done on the fly instead using this approach. There are several things I'm working on in that direction. This is not PyPy. It certainly uses the same ideas that they are using, but instead it fits into the CPython run-time and doesn't require changing the whole ecosystem. If you are interested in this work let me know. I think I'm going to call the project numpy-llvm, or fast-py, or something like that. It is available on github and will be open source (but it's still under active development). Here is an example of the code to create a ufunc using the system (this is like vectorize, but it creates machine code and by-passes the interpreter and so is 100x faster). from math import sin, pi def sinc(x): if x==0: return 1.0 else: return sin(x*pi)/(pi*x) from translate import Translate t = Translate(sinc) t.translate() print t.mod res = t.make_ufunc('sinc') -Travis On Feb 20, 2012, at 10:55 AM, Sturla Molden wrote:

...

Den 20.02.2012 17:42, skrev Sturla Molden:

...
There are still other options than C or C++ that are worth considering. One would be to write NumPy in Python. E.g. we could use LLVM as a JIT-compiler and produce the performance critical code we need on the fly.

LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster than GCC and often produces better machine code. They can therefore be used inside an array library. It would give a faster NumPy, and we could keep most of it in Python.

Sturla

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Nathaniel Smith

11:44 a.m.

On Tue, Feb 21, 2012 at 4:04 AM, Travis Oliphant <travis@continuum.io> wrote:

...

It uses llvm-py (modified to work with LLVM 3.0) and code I wrote to do the translation from Python byte-code to LLVM. This LLVM can then be "JIT"ed. I have several applications that I would like to use this for. It would be possible to write "more of NumPy" using this approach. Initially, it makes it *very* easy to create a machine-code ufunc from Python code. There are other use-cases of having loops written in Python and plugged in to a calculation, filtering, or indexing framework that this system will be useful for.

Very neat! It's interesting that you decided to use Python bytecode as your source representation. I'm curious what your strategy is for overcoming all the challenges that have plagued previous attempts to efficiently compile "real Python"? (Unladen Swallow, PyPy, etc.) Just support some subset of the language that's easy to handle and do type inference over? Or do you plan to continue using Python as your input language? I guess the conventional wisdom would be that there's a lot of potential for using LLVM to generate efficient specialized loops for numpy on the fly (cf. llvm-pipe for a similar and successful project), but that the key would be to use a more specialized representation than Python bytecode -- one that left out hard/irrelevant parts of the language, that had richer type information, that didn't change around for different Python releases, etc. -- Nathaniel

John Hunter

7:05 p.m.

On Sat, Feb 18, 2012 at 5:09 PM, David Cournapeau <cournape@gmail.com>wrote:

...

There are better languages than C++ that has most of the technical benefits stated in this discussion (rust and D being the most "obvious" ones), but whose usage is unrealistic today for various reasons: knowledge, availability on "esoteric" platforms, etc… A new language is completely ridiculous.

I just saw this for the first time today: Linus Torvalds on C++ ( http://harmful.cat-v.org/software/c++/linus). The post is from 2007 so many of you may have seen it, but I thought it was entertainng enough and on-topic enough with this thread that I'd share it in case you haven't. The point he makes: In other words, the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C was interesting to me because the best C++ library I have ever worked with (agg) imports *nothing* except standard C libs (no standard template library). In fact, the only includes external to external to itself are math.h, stdlib.h, stdio.h, and string.h. To shoehorn Jamie Zawinski's famous regex quote ( http://regex.info/blog/2006-09-15/247). "Some people, when confronted with a problem, think “I know, I'll use boost.” Now they have two problems." Here is the Linus post: From: Linus Torvalds <torvalds <at> linux-foundation.org> Subject: Re: [RFC] Convert builin-mailinfo.c to use The Better String Library. Newsgroups: gmane.comp.version-control.git Date: 2007-09-06 17:50:28 GMT (2 years, 14 weeks, 16 hours and 36 minutes ago) On Wed, 5 Sep 2007, Dmitry Kakurin wrote:

...

When I first looked at Git source code two things struck me as odd: 1. Pure C as opposed to C++. No idea why. Please don't talk about

portability,

...

it's BS.

*YOU* are full of bullshit. C++ is a horrible language. It's made more horrible by the fact that a lot of substandard programmers use it, to the point where it's much much easier to generate total and utter crap with it. Quite frankly, even if the choice of C were to do *nothing* but keep the C++ programmers out, that in itself would be a huge reason to use C. In other words: the choice of C is the only sane choice. I know Miles Bader jokingly said "to piss you off", but it's actually true. I've come to the conclusion that any programmer that would prefer the project to be in C++ over C is likely a programmer that I really *would* prefer to piss off, so that he doesn't come and screw up any project I'm involved with. C++ leads to really really bad design choices. You invariably start using the "nice" library features of the language like STL and Boost and other total and utter crap, that may "help" you program, but causes: - infinite amounts of pain when they don't work (and anybody who tells me that STL and especially Boost are stable and portable is just so full of BS that it's not even funny) - inefficient abstracted programming models where two years down the road you notice that some abstraction wasn't very efficient, but now all your code depends on all the nice object models around it, and you cannot fix it without rewriting your app. In other words, the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C. And limiting your project to C means that people don't screw that up, and also means that you get a lot of programmers that do actually understand low-level issues and don't screw things up with any idiotic "object model" crap. So I'm sorry, but for something like git, where efficiency was a primary objective, the "advantages" of C++ is just a huge mistake. The fact that we also piss off people who cannot see that is just a big additional advantage. If you want a VCS that is written in C++, go play with Monotone. Really. They use a "real database". They use "nice object-oriented libraries". They use "nice C++ abstractions". And quite frankly, as a result of all these design decisions that sound so appealing to some CS people, the end result is a horrible and unmaintainable mess. But I'm sure you'd like it more than git. Linus

Charles R Harris

8:13 p.m.

On Tue, Feb 28, 2012 at 12:05 PM, John Hunter <jdh2358@gmail.com> wrote:

...

On Sat, Feb 18, 2012 at 5:09 PM, David Cournapeau <cournape@gmail.com>wrote:

...
There are better languages than C++ that has most of the technical

benefits stated in this discussion (rust and D being the most "obvious" ones), but whose usage is unrealistic today for various reasons: knowledge, availability on "esoteric" platforms, etc… A new language is completely ridiculous.

I just saw this for the first time today: Linus Torvalds on C++ ( http://harmful.cat-v.org/software/c++/linus). The post is from 2007 so many of you may have seen it, but I thought it was entertainng enough and on-topic enough with this thread that I'd share it in case you haven't.

The point he makes:

In other words, the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C

was interesting to me because the best C++ library I have ever worked with (agg) imports *nothing* except standard C libs (no standard template library). In fact, the only includes external to external to itself are math.h, stdlib.h, stdio.h, and string.h.

To shoehorn Jamie Zawinski's famous regex quote ( http://regex.info/blog/2006-09-15/247). "Some people, when confronted with a problem, think “I know, I'll use boost.” Now they have two problems."

Here is the Linus post:

From: Linus Torvalds <torvalds <at> linux-foundation.org> Subject: Re: [RFC] Convert builin-mailinfo.c to use The Better String Library. Newsgroups: gmane.comp.version-control.git Date: 2007-09-06 17:50:28 GMT (2 years, 14 weeks, 16 hours and 36 minutes ago)

On Wed, 5 Sep 2007, Dmitry Kakurin wrote:

...
When I first looked at Git source code two things struck me as odd: 1. Pure C as opposed to C++. No idea why. Please don't talk about

portability,

...
it's BS.

*YOU* are full of bullshit.

C++ is a horrible language. It's made more horrible by the fact that a lot of substandard programmers use it, to the point where it's much much easier to generate total and utter crap with it. Quite frankly, even if the choice of C were to do *nothing* but keep the C++ programmers out, that in itself would be a huge reason to use C.

In other words: the choice of C is the only sane choice. I know Miles Bader jokingly said "to piss you off", but it's actually true. I've come to the conclusion that any programmer that would prefer the project to be in C++ over C is likely a programmer that I really *would* prefer to piss off, so that he doesn't come and screw up any project I'm involved with.

C++ leads to really really bad design choices. You invariably start using the "nice" library features of the language like STL and Boost and other total and utter crap, that may "help" you program, but causes:

- infinite amounts of pain when they don't work (and anybody who tells me that STL and especially Boost are stable and portable is just so full of BS that it's not even funny)

- inefficient abstracted programming models where two years down the road you notice that some abstraction wasn't very efficient, but now all your code depends on all the nice object models around it, and you cannot fix it without rewriting your app.

In other words, the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C. And limiting your project to C means that people don't screw that up, and also means that you get a lot of programmers that do actually understand low-level issues and don't screw things up with any idiotic "object model" crap.

So I'm sorry, but for something like git, where efficiency was a primary objective, the "advantages" of C++ is just a huge mistake. The fact that we also piss off people who cannot see that is just a big additional advantage.

If you want a VCS that is written in C++, go play with Monotone. Really. They use a "real database". They use "nice object-oriented libraries". They use "nice C++ abstractions". And quite frankly, as a result of all these design decisions that sound so appealing to some CS people, the end result is a horrible and unmaintainable mess.

But I'm sure you'd like it more than git.

Yeah, Linus doesn't like C++. No doubt that is in part because of the attempt to rewrite Linux in C++ back in the early 90's and the resulting compiler and portability problems. Linus also writes C like it was his native tongue, he likes to work close to the metal, and he'd probably prefer it over Python for most problems ;) Things have improved in the compiler department, and I think C++ really wasn't much of an improvement over C until templates and the STL came along. The boost smart pointers are also really nice. OTOH, it is really easy to write awful C++ because of the way inheritance and the other features were over-hyped and the 'everything and the kitchen sink' way it developed. Like any tool, familiarity and skill are essential to good results, but unlike some tools, one also needs to forgo some of the features to keep it under control. It's not a hammer, it is a three inch wide Swiss Army Knife. Chuck

Neal Becker

7:20 p.m.

Charles R Harris wrote:

...

On Tue, Feb 28, 2012 at 12:05 PM, John Hunter <jdh2358@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 5:09 PM, David Cournapeau <cournape@gmail.com>wrote:

...
There are better languages than C++ that has most of the technical

benefits stated in this discussion (rust and D being the most "obvious" ones), but whose usage is unrealistic today for various reasons: knowledge, availability on "esoteric" platforms, etc… A new language is completely ridiculous.

I just saw this for the first time today: Linus Torvalds on C++ ( http://harmful.cat-v.org/software/c++/linus). The post is from 2007 so many of you may have seen it, but I thought it was entertainng enough and on-topic enough with this thread that I'd share it in case you haven't.

The point he makes:

In other words, the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C

was interesting to me because the best C++ library I have ever worked with (agg) imports *nothing* except standard C libs (no standard template library). In fact, the only includes external to external to itself are math.h, stdlib.h, stdio.h, and string.h.

To shoehorn Jamie Zawinski's famous regex quote ( http://regex.info/blog/2006-09-15/247). "Some people, when confronted with a problem, think “I know, I'll use boost.” Now they have two problems."

Here is the Linus post:

From: Linus Torvalds <torvalds <at> linux-foundation.org> Subject: Re: [RFC] Convert builin-mailinfo.c to use The Better String Library. Newsgroups: gmane.comp.version-control.git Date: 2007-09-06 17:50:28 GMT (2 years, 14 weeks, 16 hours and 36 minutes ago)

On Wed, 5 Sep 2007, Dmitry Kakurin wrote:

...
When I first looked at Git source code two things struck me as odd: 1. Pure C as opposed to C++. No idea why. Please don't talk about

portability,

...
it's BS.

*YOU* are full of bullshit.

C++ is a horrible language. It's made more horrible by the fact that a lot of substandard programmers use it, to the point where it's much much easier to generate total and utter crap with it. Quite frankly, even if the choice of C were to do *nothing* but keep the C++ programmers out, that in itself would be a huge reason to use C.

In other words: the choice of C is the only sane choice. I know Miles Bader jokingly said "to piss you off", but it's actually true. I've come to the conclusion that any programmer that would prefer the project to be in C++ over C is likely a programmer that I really *would* prefer to piss off, so that he doesn't come and screw up any project I'm involved with.

C++ leads to really really bad design choices. You invariably start using the "nice" library features of the language like STL and Boost and other total and utter crap, that may "help" you program, but causes:

- infinite amounts of pain when they don't work (and anybody who tells me that STL and especially Boost are stable and portable is just so full of BS that it's not even funny)

- inefficient abstracted programming models where two years down the road you notice that some abstraction wasn't very efficient, but now all your code depends on all the nice object models around it, and you cannot fix it without rewriting your app.

In other words, the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C. And limiting your project to C means that people don't screw that up, and also means that you get a lot of programmers that do actually understand low-level issues and don't screw things up with any idiotic "object model" crap.

So I'm sorry, but for something like git, where efficiency was a primary objective, the "advantages" of C++ is just a huge mistake. The fact that we also piss off people who cannot see that is just a big additional advantage.

If you want a VCS that is written in C++, go play with Monotone. Really. They use a "real database". They use "nice object-oriented libraries". They use "nice C++ abstractions". And quite frankly, as a result of all these design decisions that sound so appealing to some CS people, the end result is a horrible and unmaintainable mess.

But I'm sure you'd like it more than git.

Yeah, Linus doesn't like C++. No doubt that is in part because of the attempt to rewrite Linux in C++ back in the early 90's and the resulting compiler and portability problems. Linus also writes C like it was his native tongue, he likes to work close to the metal, and he'd probably prefer it over Python for most problems ;) Things have improved in the compiler department, and I think C++ really wasn't much of an improvement over C until templates and the STL came along. The boost smart pointers are also really nice. OTOH, it is really easy to write awful C++ because of the way inheritance and the other features were over-hyped and the 'everything and the kitchen sink' way it developed. Like any tool, familiarity and skill are essential to good results, but unlike some tools, one also needs to forgo some of the features to keep it under control. It's not a hammer, it is a three inch wide Swiss Army Knife.

Chuck

Much of Linus's complaints have to do with the use of c++ in the _kernel_. These objections are quite different for an _application_. For example, there are issues with the need for support libraries for exception handling. Not an issue for an application.

John Hunter

7:25 p.m.

On Wed, Feb 29, 2012 at 1:20 PM, Neal Becker <ndbecker2@gmail.com> wrote:

...

Much of Linus's complaints have to do with the use of c++ in the _kernel_. These objections are quite different for an _application_. For example, there are issues with the need for support libraries for exception handling. Not an issue for an application.

Actually, the thread was on the git mailing list, and many of

his complaints were addressing the appropriateness of C++ for git development.

Dag Sverre Seljebotn

9:34 p.m.

On 02/28/2012 11:05 AM, John Hunter wrote:

...

On Sat, Feb 18, 2012 at 5:09 PM, David Cournapeau <cournape@gmail.com <mailto:cournape@gmail.com>> wrote:

There are better languages than C++ that has most of the technical benefits stated in this discussion (rust and D being the most "obvious" ones), but whose usage is unrealistic today for various reasons: knowledge, availability on "esoteric" platforms, etc… A new language is completely ridiculous.

I just saw this for the first time today: Linus Torvalds on C++ (http://harmful.cat-v.org/software/c++/linus). The post is from 2007 so many of you may have seen it, but I thought it was entertainng enough and on-topic enough with this thread that I'd share it in case you haven't.

The point he makes:

In other words, the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C

was interesting to me because the best C++ library I have ever worked with (agg) imports *nothing* except standard C libs (no standard template library). In fact, the only includes external to external to itself are math.h, stdlib.h, stdio.h, and string.h.

To shoehorn Jamie Zawinski's famous regex quote (http://regex.info/blog/2006-09-15/247). "Some people, when confronted with a problem, think “I know, I'll use boost.” Now they have two problems."

In the same vein, this one neatly sums up all the bad sides of C++. (I don't really want to enter the language discussion. But this list is a nice list of the cons, and perhaps that can save discussion time because people don't have to enumerate those reasons again on this list?) http://yosefk.com/c++fqa/defective.html Dag

...

Here is the Linus post:

From: Linus Torvalds <torvalds <at> linux-foundation.org <http://linux-foundation.org>> Subject: Re: [RFC] Convert builin-mailinfo.c to use The Better String Library. Newsgroups: gmane.comp.version-control.git Date: 2007-09-06 17:50:28 GMT (2 years, 14 weeks, 16 hours and 36 minutes ago)

On Wed, 5 Sep 2007, Dmitry Kakurin wrote:

...
When I first looked at Git source code two things struck me as odd: 1. Pure C as opposed to C++. No idea why. Please don't talk about

portability,

...
it's BS.

*YOU* are full of bullshit.

C++ is a horrible language. It's made more horrible by the fact that a lot of substandard programmers use it, to the point where it's much much easier to generate total and utter crap with it. Quite frankly, even if the choice of C were to do *nothing* but keep the C++ programmers out, that in itself would be a huge reason to use C.

In other words: the choice of C is the only sane choice. I know Miles Bader jokingly said "to piss you off", but it's actually true. I've come to the conclusion that any programmer that would prefer the project to be in C++ over C is likely a programmer that I really *would* prefer to piss off, so that he doesn't come and screw up any project I'm involved with.

C++ leads to really really bad design choices. You invariably start using the "nice" library features of the language like STL and Boost and other total and utter crap, that may "help" you program, but causes:

- infinite amounts of pain when they don't work (and anybody who tells me that STL and especially Boost are stable and portable is just so full of BS that it's not even funny)

- inefficient abstracted programming models where two years down the road you notice that some abstraction wasn't very efficient, but now all your code depends on all the nice object models around it, and you cannot fix it without rewriting your app.

In other words, the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C. And limiting your project to C means that people don't screw that up, and also means that you get a lot of programmers that do actually understand low-level issues and don't screw things up with any idiotic "object model" crap.

So I'm sorry, but for something like git, where efficiency was a primary objective, the "advantages" of C++ is just a huge mistake. The fact that we also piss off people who cannot see that is just a big additional advantage.

If you want a VCS that is written in C++, go play with Monotone. Really. They use a "real database". They use "nice object-oriented libraries". They use "nice C++ abstractions". And quite frankly, as a result of all these design decisions that sound so appealing to some CS people, the end result is a horrible and unmaintainable mess.

But I'm sure you'd like it more than git.

Linus

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Charles R Harris

9:52 p.m.

On Tue, Feb 28, 2012 at 2:34 PM, Dag Sverre Seljebotn < d.s.seljebotn@astro.uio.no> wrote:

...

On 02/28/2012 11:05 AM, John Hunter wrote:

...
On Sat, Feb 18, 2012 at 5:09 PM, David Cournapeau <cournape@gmail.com <mailto:cournape@gmail.com>> wrote:

There are better languages than C++ that has most of the technical benefits stated in this discussion (rust and D being the most "obvious" ones), but whose usage is unrealistic today for various reasons: knowledge, availability on "esoteric" platforms, etc… A new language is completely ridiculous.

I just saw this for the first time today: Linus Torvalds on C++ (http://harmful.cat-v.org/software/c++/linus). The post is from 2007 so many of you may have seen it, but I thought it was entertainng enough and on-topic enough with this thread that I'd share it in case you haven't.

The point he makes:

In other words, the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C

was interesting to me because the best C++ library I have ever worked with (agg) imports *nothing* except standard C libs (no standard template library). In fact, the only includes external to external to itself are math.h, stdlib.h, stdio.h, and string.h.

To shoehorn Jamie Zawinski's famous regex quote (http://regex.info/blog/2006-09-15/247). "Some people, when confronted with a problem, think “I know, I'll use boost.” Now they have two problems."

In the same vein, this one neatly sums up all the bad sides of C++.

(I don't really want to enter the language discussion. But this list is a nice list of the cons, and perhaps that can save discussion time because people don't have to enumerate those reasons again on this list?)

http://yosefk.com/c++fqa/defective.html

Heh, I was hoping for something good, but that was kinda unfair. OK, so C++ isn't JAVA or C# or Python, no garbage collection or introspection or whatever, but so what. Destructors are called as the exception unwinds up the call stack, etc. That list is sort of the opposite end of the critical spectrum from Linus (C++ does too much) and is more like a complaint that C++ doesn't walk the dog. Can't satisfy everyone ;) <snip> Chuck.

Russell E. Owen

12:09 a.m.

In article <CAGY4rcXxL8poS5ZCWA4thCG0dhKyEsoEPJSO4Z05SZ_PqjvO2Q@mail.gmail.com>, David Cournapeau <cournape@gmail.com> wrote:

...

On Sat, Feb 18, 2012 at 10:50 PM, Sturla Molden <sturla@molden.no> wrote:

...
> In an ideal world, we would have a better language than C++ that can be spit out as > C for portability.

What about a statically typed Python? (That is, not Cython.) We just need to make the compiler :-)

There are better languages than C++ that has most of the technical benefits stated in this discussion (rust and D being the most "obvious" ones), but whose usage is unrealistic today for various reasons: knowledge, availability on "esoteric" platforms, etc… A new language is completely ridiculous.

I just want to say that C++ has come a long way. I used to hate it, but now that it has matured, and using some basic features of boost (especially shared_ptr) can turn it into a really nice language. The next version will be even better, but one can write nice C++ today. shared_ptr allows objects that easily manage their own memory (basic automatic garbage collection). Generic programming seems like a really good fit to numpy's array types. I am part of a large project that codes in C++ and Python and we find it works very well for us. I can't imagine working in C anymore and doing without exception handling and namespaces. So I'm sorry to hear that C++ is not being considered for a numpy rewrite. -- Russell

Bryan Van de Ven

12:49 a.m.

On 2/28/12 4:09 PM, Russell E. Owen wrote:

...

I can't imagine working in C anymore and doing without exception handling and namespaces. So I'm sorry to hear that C++ is not being considered for a numpy rewrite. -- Russell AFAIK C++ is still being considered for numpy in the future, and I think it is safe to say that a concrete implementation will be put forward for consideration at some point.

Just my own $0.02 regarding this issue: I am in favor of using C++ for numpy, I think it could confer various benefits. However, I am also in favor of explicitly deciding and documenting what subset of C++ features are acceptable for use within the numpy codebase. Bryan Van de Ven

Fernando Perez

5:51 a.m.

On Tue, Feb 28, 2012 at 4:49 PM, Bryan Van de Ven <bryanv@continuum.io> wrote:

...

Just my own $0.02 regarding this issue: I am in favor of using C++ for numpy, I think it could confer various benefits. However, I am also in favor of explicitly deciding and documenting what subset of C++ features are acceptable for use within the numpy codebase.

I would *love* to see us adopt the NEP/PEP process for decisions as complex as this one. The PEP process serves the Python community very well, and I think it's an excellent balance of minimal overhead and maximum benefit for organizing the process of making complex/controversial decisions. PEP/NEPs serve a number of important purposes: - they encourage the proponent of the idea to organize the initial presentation in a concrete, easy to follow way that can be used for decision making. - they serve as a stable reference of the key points in a discussion, in contrast to the meandering that is normal of a mailing list thread. - they can be updated and evolve as the discussion happens, incorporating the distilled ideas that result. - if important new points are brought up in the discussion, the community can ensure that they are added to the NEP. - once a decision is reached, the NEP is updated with the rationale for the decision. Whether it's acceptance or rejection, this ensures that in the future, others can come back to this document to see the reasons, avoiding repetitive discussions. - the NEP can serve as documentation for a specific feature; we see this often in Python, where the standard docs refer to PEPs for details. - over time, these documents build a history of the key decisions in the design of a project, in a way that is much easier to read and reason about than a random splatter of long mailing list threads. I was offline when the long discussions on process happened a few weeks ago, and it's not my intent to dig into every point brought up there. I'm only proposing that we adopt the NEP process for complex decisions, of which the C++ shift is certainly one. In the end, I think the NEP process will actually *help* the discussion process. It helps keep the key points on focus even as the discussion may drift in the mailing list, which means ultimately everyone wastes less energy. I obviously can't force anyone to do this, but for what it's worth, I know that at least for IPython, I've had this in mind for a while. We haven't had any majorly contentious decisions that really need it yet, but for example I have in mind a redesign and extension of the magic system that I intend to write-up pep-style. While I suspect nobody would yell if I just went ahead and implemented it on a pull request, there are enough moving parts and new ideas that I want to gather feedback in an organized manner before proceeding with implementation. And I don't find that idea to be a burden, I actually do think it will make the whole thing go more smoothly even for me. Just a thought... f

Travis Oliphant

6:46 a.m.

We already use the NEP process for such decisions. This discussion came from simply from the *idea* of writing such a NEP. Nothing has been decided. Only opinions have been shared that might influence the NEP. This is all pretty premature, though --- migration to C++ features on a trial branch is some months away were it to happen. Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 28, 2012, at 9:51 PM, Fernando Perez <fperez.net@gmail.com> wrote:

...

On Tue, Feb 28, 2012 at 4:49 PM, Bryan Van de Ven <bryanv@continuum.io> wrote:

...
Just my own $0.02 regarding this issue: I am in favor of using C++ for numpy, I think it could confer various benefits. However, I am also in favor of explicitly deciding and documenting what subset of C++ features are acceptable for use within the numpy codebase.

I would *love* to see us adopt the NEP/PEP process for decisions as complex as this one. The PEP process serves the Python community very well, and I think it's an excellent balance of minimal overhead and maximum benefit for organizing the process of making complex/controversial decisions. PEP/NEPs serve a number of important purposes:

- they encourage the proponent of the idea to organize the initial presentation in a concrete, easy to follow way that can be used for decision making.

- they serve as a stable reference of the key points in a discussion, in contrast to the meandering that is normal of a mailing list thread.

- they can be updated and evolve as the discussion happens, incorporating the distilled ideas that result.

- if important new points are brought up in the discussion, the community can ensure that they are added to the NEP.

- once a decision is reached, the NEP is updated with the rationale for the decision. Whether it's acceptance or rejection, this ensures that in the future, others can come back to this document to see the reasons, avoiding repetitive discussions.

- the NEP can serve as documentation for a specific feature; we see this often in Python, where the standard docs refer to PEPs for details.

- over time, these documents build a history of the key decisions in the design of a project, in a way that is much easier to read and reason about than a random splatter of long mailing list threads.

I was offline when the long discussions on process happened a few weeks ago, and it's not my intent to dig into every point brought up there. I'm only proposing that we adopt the NEP process for complex decisions, of which the C++ shift is certainly one.

In the end, I think the NEP process will actually *help* the discussion process. It helps keep the key points on focus even as the discussion may drift in the mailing list, which means ultimately everyone wastes less energy.

I obviously can't force anyone to do this, but for what it's worth, I know that at least for IPython, I've had this in mind for a while. We haven't had any majorly contentious decisions that really need it yet, but for example I have in mind a redesign and extension of the magic system that I intend to write-up pep-style. While I suspect nobody would yell if I just went ahead and implemented it on a pull request, there are enough moving parts and new ideas that I want to gather feedback in an organized manner before proceeding with implementation. And I don't find that idea to be a burden, I actually do think it will make the whole thing go more smoothly even for me.

Just a thought...

f _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Fernando Perez

7:03 a.m.

On Tue, Feb 28, 2012 at 10:46 PM, Travis Oliphant <travis@continuum.io> wrote:

...

We already use the NEP process for such decisions. This discussion came from simply from the *idea* of writing such a NEP.

Nothing has been decided. Only opinions have been shared that might influence the NEP. This is all pretty premature, though --- migration to C++ features on a trial branch is some months away were it to happen.

Sure, I know we do have neps, they live in the main numpy repo (which btw, I think they should be moved to a standalone repo to make their management independent of the core code, but that's an easy and minor point we can ignore for now). I was just thinking that this discussion is precisely the kind of thing that would be well served by being organized in a nep, before even jumping into implementation. A nep can precisely help organize a discussion where there's enough to think about and make decisions *before* effort has gone into implementing anything. It's important not to forget that once someone goes far enough down the road of implementing something, this adds pressure to turn the implementation into a fait accompli, simply out of not wanting to throw work away. For a decision as binary as 'rewrite the core in C++ or not', it would seem to me that organizing the problem in a NEP *before* starting to implement something in a trial branch would be precisely the way to go, and that it would actually make the decision process and discussion easier and more productive. Cheers, f

Mark Wiebe

7:28 a.m.

On Tue, Feb 28, 2012 at 11:03 PM, Fernando Perez <fperez.net@gmail.com>wrote:

...

On Tue, Feb 28, 2012 at 10:46 PM, Travis Oliphant <travis@continuum.io> wrote:

...
We already use the NEP process for such decisions. This discussion came from simply from the *idea* of writing such a NEP.

Nothing has been decided. Only opinions have been shared that might influence the NEP. This is all pretty premature, though --- migration to C++ features on a trial branch is some months away were it to happen.

Sure, I know we do have neps, they live in the main numpy repo (which btw, I think they should be moved to a standalone repo to make their management independent of the core code, but that's an easy and minor point we can ignore for now). I was just thinking that this discussion is precisely the kind of thing that would be well served by being organized in a nep, before even jumping into implementation.

A nep can precisely help organize a discussion where there's enough to think about and make decisions *before* effort has gone into implementing anything. It's important not to forget that once someone goes far enough down the road of implementing something, this adds pressure to turn the implementation into a fait accompli, simply out of not wanting to throw work away.

For a decision as binary as 'rewrite the core in C++ or not', it would seem to me that organizing the problem in a NEP *before* starting to implement something in a trial branch would be precisely the way to go, and that it would actually make the decision process and discussion easier and more productive.

The development approach I really like is to start with a relatively rough NEP, then cycle through feedback, updating the NEP, and implementation. Organizing ones thoughts to describe them in a design document can often clarify things that are confusing when just looking at code. Feedback from the community, both developers and users, can help expose where your assumptions are and often lead to insights from subjects you didn't even know about. Implementation puts those ideas through the a cold, hard, reality check, and can provide a hands-on experience for later rounds of feedback. This iterative process is most important to emphasize, the design document and the code must both evolve together. Stamping a NEP as "final" before getting into code is just as bad as jumping into code without writing a preliminary design. For the decision about adopting C++, a NEP proposing how we would go about doing it, which evolves as the community gains experience with the idea, will be very helpful. I would emphasize that the adoption of C++ does not require a rewrite. The patch required to make NumPy build with a C++ compiler is very small, and individual features of C++ can be adopted slowly, in a piecemeal fashion. What I'm advocating for is this kind of gradual evolution, and my starting point for writing a NEP would be the email I wrote here: http://mail.scipy.org/pipermail/numpy-discussion/2012-February/060778.html Github actually has a bug that the RST table of contents is stripped, and this makes reading longer NEPS right in the repository uncomfortable. Maybe alternatives to a git repository for NEPs should be considered. I reported the bug to github, but they told me that was just how they did things. Cheers, Mark

...

Cheers,

f _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Fernando Perez

4:11 p.m.

On Tue, Feb 28, 2012 at 11:28 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:

...

The development approach I really like is to start with a relatively rough NEP, then cycle through feedback, updating the NEP, and implementation. Organizing ones thoughts to describe them in a design document can often clarify things that are confusing when just looking at code. Feedback from the community, both developers and users, can help expose where your assumptions are and often lead to insights from subjects you didn't even know about. Implementation puts those ideas through the a cold, hard, reality check, and can provide a hands-on experience for later rounds of feedback.

...

This iterative process is most important to emphasize, the design document and the code must both evolve together. Stamping a NEP as "final" before getting into code is just as bad as jumping into code without writing a preliminary design.

Certainly! We're in complete agreement here. I didn't mean to suggest (though perhaps I phrased it poorly) that the nep discussion and implementation phases should be fully disjoint, since I do believe that implementation and discussion can and should inform each other.

...

Github actually has a bug that the RST table of contents is stripped, and this makes reading longer NEPS right in the repository uncomfortable. Maybe alternatives to a git repository for NEPs should be considered. I reported the bug to github, but they told me that was just how they did things.

That's easy to solve, and can be done with a minimum of work in a way that will make the nep-handling process far eaiser: - split the neps into their own repo, and make that a repo targeted for building a website, like we do with the ipython docs for example. - have a 'nep repo manager' who merges PRs from nep authors quickly. In practice, nep authors could even be given write access to the repo while they work on their own nep, I think we can trust people not to mess around outside their directory. - the nep repo is source-only, and we have a nep-web repo where the *built* neps are displayed using the gh-pages mechanism. With this, we achieve something like what python uses, with a separate and nicely formatted web version of the neps for easy reading, but in addition with the fluidity of the github workflow for source management. We already have all the pieces for this, so it would be a very easy job for someone to make it happen (~2 hours at most, would be my guess). Cheers, f

Matthew Brett

6:54 p.m.

Hi, On Wed, Feb 29, 2012 at 1:46 AM, Travis Oliphant <travis@continuum.io> wrote:

...

We already use the NEP process for such decisions. This discussion came from simply from the *idea* of writing such a NEP.

Nothing has been decided. Only opinions have been shared that might influence the NEP. This is all pretty premature, though --- migration to C++ features on a trial branch is some months away were it to happen.

Fernando can correct me if I'm wrong, but I think he was asking a governance question. That is: would you (as BDF$N) consider the following guideline: "As a condition for accepting significant changes to Numpy, for each significant change, there will be a NEP. The NEP shall follow the same model as the Python PEPs - that is - there will be a summary of the changes, the issues arising, the for / against opinions and alternatives offered. There will usually be a draft implementation. The NEP will contain the resolution of the discussion as it relates to the code" For example, the masked array NEP, although very substantial, contains little discussion of the controversy arising, or the intended resolution of the controversy: https://github.com/numpy/numpy/blob/3f685a1a990f7b6e5149c80b52436fb4207e49f5... I mean, although it is useful, it is not in the form of a PEP, as Fernando has described it. Would you accept extending the guidelines to the NEP format? Best, Matthew

Travis Oliphant

March 2012

3:02 a.m.

I Would like to hear the opinions of others on that point, but yes, I think that is an appropriate procedure. Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 29, 2012, at 10:54 AM, Matthew Brett <matthew.brett@gmail.com> wrote:

...

Hi,

On Wed, Feb 29, 2012 at 1:46 AM, Travis Oliphant <travis@continuum.io> wrote:

...
We already use the NEP process for such decisions. This discussion came from simply from the *idea* of writing such a NEP.

Nothing has been decided. Only opinions have been shared that might influence the NEP. This is all pretty premature, though --- migration to C++ features on a trial branch is some months away were it to happen.

Fernando can correct me if I'm wrong, but I think he was asking a governance question. That is: would you (as BDF$N) consider the following guideline:

"As a condition for accepting significant changes to Numpy, for each significant change, there will be a NEP. The NEP shall follow the same model as the Python PEPs - that is - there will be a summary of the changes, the issues arising, the for / against opinions and alternatives offered. There will usually be a draft implementation. The NEP will contain the resolution of the discussion as it relates to the code"

For example, the masked array NEP, although very substantial, contains little discussion of the controversy arising, or the intended resolution of the controversy:

https://github.com/numpy/numpy/blob/3f685a1a990f7b6e5149c80b52436fb4207e49f5...

I mean, although it is useful, it is not in the form of a PEP, as Fernando has described it.

Would you accept extending the guidelines to the NEP format?

Best,

Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Charles R Harris

February 2012

11:59 p.m.

On Sat, Feb 18, 2012 at 3:24 PM, David Cournapeau <cournape@gmail.com>wrote:

...

On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
Well, we already have code obfuscation (DOUBLE_your_pleasure, FLOAT_your_boat), so we might as well let the compiler handle it.

Yes, those are not great, but on the other hand, it is not that a fundamental issue IMO.

"Name mangling" is what I meant. But C++ does exactly the same thing, just more systematically. It's not whether it's great, it's whether the compiler or the programmer does the boring stuff. <snip> Chuck

Mark Wiebe

8:08 a.m.

On Sat, Feb 18, 2012 at 4:24 PM, David Cournapeau <cournape@gmail.com>wrote:

...

On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
Well, we already have code obfuscation (DOUBLE_your_pleasure, FLOAT_your_boat), so we might as well let the compiler handle it.

Yes, those are not great, but on the other hand, it is not that a fundamental issue IMO.

Iterators as we have it in NumPy is something that is clearly limited by C. Writing the neighborhood iterator is the only case where I really felt that C++ *could* be a significant improvement. I use *could* because writing iterator in C++ is hard, and will be much harder to read (I find both boost and STL - e.g. stlport -- iterators to be close to write-only code). But there is the question on how you can make C++-based iterators available in C. I would be interested in a simple example of how this could be done, ignoring all the other issues (portability, exception, etc…).

The STL is also potentially compelling, but that's where we go into my "beware of the dragons" area of C++. Portability loss, compilation time increase and warts are significant there. scipy.sparse.sparsetools has been a source of issues that was quite high compared to its proportion of scipy amount code (we *do* have some hard-won experience on C++-related issues).

These standard library issues were definitely valid 10 years ago, but all the major C++ compilers have great C++98 support now. Is there a specific target platform/compiler combination you're thinking of where we can do tests on this? I don't believe the compile times are as bad as many people suspect, can you give some simple examples of things we might do in NumPy you expect to compile slower in C++ vs C? -Mark

...

...
Jim Hugunin was a keynote speaker at one of the scipy conventions. At

dinner

...
he said that if he was to do it again he would use managed code ;) I don't propose we do that, but tools do advance.

In an ideal world, we would have a better language than C++ that can be spit out as C for portability. I have looked for a way to do this for as long as I have been contributing to NumPy (I have looked at ooc, D, coccinelle at various stages). I believe the best way is actually in the vein of FFTW: written in a very high level language (OCAML) for the hard part, and spitting out C. This is better than C++ is many ways - this is also clearly not realistic :)

David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Stéfan van der Walt

8:55 a.m.

On Feb 19, 2012 12:09 AM, "Mark Wiebe" <mwwiebe@gmail.com> wrote:

...

These standard library issues were definitely valid 10 years ago, but all

the major C++ compilers have great C++98 support now. Is there a specific target platform/compiler combination you're thinking of where we can do tests on this? I don't believe the compile times are as bad as many people suspect, can you give some simple examples of things we might do in NumPy you expect to compile slower in C++ vs C? The concern may be more that this will be an issue once we start templating (scipy.sparse as an example). Compiling templates requires a lot of memory (more than with the current Heath Robbinson solution). Stéfan

David Cournapeau

9:16 a.m.

On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe <mwwiebe@gmail.com> wrote:

...

On Sat, Feb 18, 2012 at 4:24 PM, David Cournapeau <cournape@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
Well, we already have code obfuscation (DOUBLE_your_pleasure, FLOAT_your_boat), so we might as well let the compiler handle it.

Yes, those are not great, but on the other hand, it is not that a fundamental issue IMO.

Iterators as we have it in NumPy is something that is clearly limited by C. Writing the neighborhood iterator is the only case where I really felt that C++ *could* be a significant improvement. I use *could* because writing iterator in C++ is hard, and will be much harder to read (I find both boost and STL - e.g. stlport -- iterators to be close to write-only code). But there is the question on how you can make C++-based iterators available in C. I would be interested in a simple example of how this could be done, ignoring all the other issues (portability, exception, etc…).

The STL is also potentially compelling, but that's where we go into my "beware of the dragons" area of C++. Portability loss, compilation time increase and warts are significant there. scipy.sparse.sparsetools has been a source of issues that was quite high compared to its proportion of scipy amount code (we *do* have some hard-won experience on C++-related issues).

These standard library issues were definitely valid 10 years ago, but all the major C++ compilers have great C++98 support now.

STL varies significantly between platforms, I believe it is still the case today. Do you know the status of the STL on bluegen, on small devices ? We unfortunately cannot restrict ourselves to one well known implementation (e.g. STLPort).

...

Is there a specific target platform/compiler combination you're thinking of where we can do tests on this? I don't believe the compile times are as bad as many people suspect, can you give some simple examples of things we might do in NumPy you expect to compile slower in C++ vs C?

Switching from gcc to g++ on the same codebase should not change much compilation times. We should test, but that's not what worries me. What worries me is when we start using C++ specific code, STL and co. Today, scipy.sparse.sparsetools takes half of the build time of the whole scipy, and it does not even use fancy features. It also takes Gb of ram when building in parallel. David

Mark Wiebe

9:28 a.m.

On Sun, Feb 19, 2012 at 3:16 AM, David Cournapeau <cournape@gmail.com>wrote:

...

On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe <mwwiebe@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 4:24 PM, David Cournapeau <cournape@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
Well, we already have code obfuscation (DOUBLE_your_pleasure, FLOAT_your_boat), so we might as well let the compiler handle it.

Yes, those are not great, but on the other hand, it is not that a fundamental issue IMO.

Iterators as we have it in NumPy is something that is clearly limited by C. Writing the neighborhood iterator is the only case where I really felt that C++ *could* be a significant improvement. I use *could* because writing iterator in C++ is hard, and will be much harder to read (I find both boost and STL - e.g. stlport -- iterators to be close to write-only code). But there is the question on how you can make C++-based iterators available in C. I would be interested in a simple example of how this could be done, ignoring all the other issues (portability, exception, etc…).

The STL is also potentially compelling, but that's where we go into my "beware of the dragons" area of C++. Portability loss, compilation time increase and warts are significant there. scipy.sparse.sparsetools has been a source of issues that was quite high compared to its proportion of scipy amount code (we *do* have some hard-won experience on C++-related issues).

These standard library issues were definitely valid 10 years ago, but all the major C++ compilers have great C++98 support now.

STL varies significantly between platforms, I believe it is still the case today. Do you know the status of the STL on bluegen, on small devices ? We unfortunately cannot restrict ourselves to one well known implementation (e.g. STLPort).

Is there anyone who uses a blue gene or small device which needs up-to-date numpy support, that I could talk to directly? We really need a list of supported platforms on the numpy wiki we can refer to when discussing this stuff, it all seems very nebulous to me.

...

...
target platform/compiler combination you're thinking of where we can do tests on this? I don't believe the compile times are as bad as many

Is there a specific people

...
suspect, can you give some simple examples of things we might do in NumPy you expect to compile slower in C++ vs C?

Switching from gcc to g++ on the same codebase should not change much compilation times. We should test, but that's not what worries me. What worries me is when we start using C++ specific code, STL and co. Today, scipy.sparse.sparsetools takes half of the build time of the whole scipy, and it does not even use fancy features. It also takes Gb of ram when building in parallel.

Particular styles of using templates can cause this, yes. To properly do this kind of advanced C++ library work, it's important to think about the big-O notation behavior of your template instantiations, not just the big-O notation of run-time. C++ templates have a turing-complete language (which is said to be quite similar to haskell, but spelled vastly different) running at compile time in them. This is what gives template meta-programming in C++ great power, but since templates weren't designed for this style of programming originally, template meta-programming is not very easy. Cheers, Mark

...

David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

David Cournapeau

10:03 a.m.

On Sun, Feb 19, 2012 at 9:28 AM, Mark Wiebe <mwwiebe@gmail.com> wrote:

...

Is there anyone who uses a blue gene or small device which needs up-to-date numpy support, that I could talk to directly? We really need a list of supported platforms on the numpy wiki we can refer to when discussing this stuff, it all seems very nebulous to me.

They may not need an up to date numpy version now, but if stopping support for them is a requirement for C++, it must be kept in mind. I actually suspect Travis to have more details on the big iron side of things. On the small side of things: http://projects.scipy.org/numpy/ticket/1969 This may seem like not very useful - but that's part of what a open source project is all about in my mind.

...

Particular styles of using templates can cause this, yes. To properly do this kind of advanced C++ library work, it's important to think about the big-O notation behavior of your template instantiations, not just the big-O notation of run-time. C++ templates have a turing-complete language (which is said to be quite similar to haskell, but spelled vastly different) running at compile time in them. This is what gives template meta-programming in C++ great power, but since templates weren't designed for this style of programming originally, template meta-programming is not very easy.

scipy.sparse.sparsetools is quite straightforward in its usage of templates (would be great if you could suggest improvement BTW, e.g. scipy/sparse/sparsetools/csr.h), and does not by itself use any meta-template programming. I like that numpy can be built in a few seconds (at least without optimization), and consider this to be a useful feature. cheers, David

Mark Wiebe

8:36 p.m.

On Sun, Feb 19, 2012 at 4:03 AM, David Cournapeau <cournape@gmail.com>wrote:

...

On Sun, Feb 19, 2012 at 9:28 AM, Mark Wiebe <mwwiebe@gmail.com> wrote:

...
Is there anyone who uses a blue gene or small device which needs up-to-date numpy support, that I could talk to directly? We really need a list of supported platforms on the numpy wiki we can refer to when discussing this stuff, it all seems very nebulous to me.

They may not need an up to date numpy version now, but if stopping support for them is a requirement for C++, it must be kept in mind. I actually suspect Travis to have more details on the big iron side of things. On the small side of things: http://projects.scipy.org/numpy/ticket/1969

This may seem like not very useful - but that's part of what a open source project is all about in my mind.

...
Particular styles of using templates can cause this, yes. To properly do this kind of advanced C++ library work, it's important to think about the big-O notation behavior of your template instantiations, not just the

big-O

...
notation of run-time. C++ templates have a turing-complete language (which is said to be quite similar to haskell, but spelled vastly different) running at compile time in them. This is what gives template meta-programming in C++ great power, but since templates weren't designed for this style of programming originally, template meta-programming is not very easy.

scipy.sparse.sparsetools is quite straightforward in its usage of templates (would be great if you could suggest improvement BTW, e.g. scipy/sparse/sparsetools/csr.h), and does not by itself use any meta-template programming.

I took a look, and I think the reason this is so slow to compile and uses so much memory is visible as follows: [sparsetools]$ wc *.cxx | sort -n 4039 13276 116263 csgraph_wrap.cxx 6464 21385 189537 dia_wrap.cxx 14002 45406 412262 coo_wrap.cxx 32385 102534 963688 csc_wrap.cxx 42997 140896 1313797 bsr_wrap.cxx 50041 161127 1501400 csr_wrap.cxx 149928 484624 4496947 total That's almost 4.5MB of code, in 6 files. C/C++ compilers are not optimized to compile this sort of thing fast, they are focused on more "human-style" coding with smaller individual files. Looking at some of these SWIG-generated files, the way they dispatch based on the input Python types is bloated as well. Probably the main question I would ask is, does scipy really need sparse matrix variants for all of int8, uint8, int16, uint16, etc? Trimming away some of these might be reasonable, and would be a start to improve compile times. The reason for the slowness is not C++ templates in this example. Cheers, Mark

...

I like that numpy can be built in a few seconds (at least without optimization), and consider this to be a useful feature.

cheers,

David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Ralf Gommers

10:03 a.m.

On Sun, Feb 19, 2012 at 10:28 AM, Mark Wiebe <mwwiebe@gmail.com> wrote:

...

On Sun, Feb 19, 2012 at 3:16 AM, David Cournapeau <cournape@gmail.com>wrote:

...
On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe <mwwiebe@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 4:24 PM, David Cournapeau <cournape@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
Well, we already have code obfuscation (DOUBLE_your_pleasure, FLOAT_your_boat), so we might as well let the compiler handle it.

Yes, those are not great, but on the other hand, it is not that a fundamental issue IMO.

Iterators as we have it in NumPy is something that is clearly limited by C. Writing the neighborhood iterator is the only case where I really felt that C++ *could* be a significant improvement. I use *could* because writing iterator in C++ is hard, and will be much harder to read (I find both boost and STL - e.g. stlport -- iterators to be close to write-only code). But there is the question on how you can make C++-based iterators available in C. I would be interested in a simple example of how this could be done, ignoring all the other issues (portability, exception, etc…).

The STL is also potentially compelling, but that's where we go into my "beware of the dragons" area of C++. Portability loss, compilation time increase and warts are significant there. scipy.sparse.sparsetools has been a source of issues that was quite high compared to its proportion of scipy amount code (we *do* have some hard-won experience on C++-related issues).

These standard library issues were definitely valid 10 years ago, but all the major C++ compilers have great C++98 support now.

STL varies significantly between platforms, I believe it is still the case today. Do you know the status of the STL on bluegen, on small devices ? We unfortunately cannot restrict ourselves to one well known implementation (e.g. STLPort).

Is there anyone who uses a blue gene or small device which needs up-to-date numpy support, that I could talk to directly? We really need a list of supported platforms on the numpy wiki we can refer to when discussing this stuff, it all seems very nebulous to me.

The list of officially supported platforms, where supported means we test and release binaries if appropriate, is short: Windows, Linux, OS X. There are many platforms which are "supported" in the form of feedback on the mailing list or Trac. This explanation is written down somewhere, not sure where right now. The best way to get an overview of those is to look at the distutils code for various compilers, and at npy_cpu.h and similar. We're not talking about expanding the number of officially supported platforms here, but not breaking those unofficially supported ones (too badly). It's possible we break those once in a while, which becomes apparent only when we get a patch of a few lines long that fixes it. What should be avoided is that those few-line patches have to turn into very large patches. The most practical way to deal with this is probably to take two or three non-standard platforms/compilers, set up a buildbot on them, and when things break ensure that fixing it is not too hard.

...

From recent history, I'd suggest AIX, an ARM device and a PathScale compiler. But the limitation is probably finding someone willing to run a buildbot.

Ralf

Sturla Molden

3:48 p.m.

Den 19.02.2012 10:28, skrev Mark Wiebe:

...

Particular styles of using templates can cause this, yes. To properly do this kind of advanced C++ library work, it's important to think about the big-O notation behavior of your template instantiations, not just the big-O notation of run-time. C++ templates have a turing-complete language (which is said to be quite similar to haskell, but spelled vastly different) running at compile time in them. This is what gives template meta-programming in C++ great power, but since templates weren't designed for this style of programming originally, template meta-programming is not very easy.

The problem with metaprogramming is that we are doing manually the work that belongs to the compiler. Blitz++ was supposed to be a library that "thought like a compiler". But then compilers just got better. Today, it is no longer possible for a numerical library programmer to outsmart an optimizing C++ compiler. All metaprogramming can do today is produce error messages noone can understand. And the resulting code will often be slower because the compiler has less opportunities to do its work. Sturla

xavier.gnata＠gmail.com

4:13 p.m.

...

Den 19.02.2012 10:28, skrev Mark Wiebe:

...
Particular styles of using templates can cause this, yes. To properly do this kind of advanced C++ library work, it's important to think about the big-O notation behavior of your template instantiations, not just the big-O notation of run-time. C++ templates have a turing-complete language (which is said to be quite similar to haskell, but spelled vastly different) running at compile time in them. This is what gives template meta-programming in C++ great power, but since templates weren't designed for this style of programming originally, template meta-programming is not very easy.

The problem with metaprogramming is that we are doing manually the work that belongs to the compiler. Blitz++ was supposed to be a library that "thought like a compiler". But then compilers just got better. Today, it is no longer possible for a numerical library programmer to outsmart an optimizing C++ compiler. All metaprogramming can do today is produce error messages noone can understand. And the resulting code will often be slower because the compiler has less opportunities to do its work.

Sturla "Today, it is no longer possible for a numerical library programmer to outsmart an optimizing C++ compiler." I'm no sure. If you want to be able to write A=B+C+D; with decent

On 02/19/2012 04:48 PM, Sturla Molden wrote: performances, I think you have to use a lib based on expression templates. It would be great if C++ compilers could automatically optimize out spurious copies into temporaries. However, I don't think the compilers are smart enough to do so...not yet. Xavier

Nathaniel Smith

6:32 p.m.

On Sun, Feb 19, 2012 at 4:13 PM, xavier.gnata@gmail.com <xavier.gnata@gmail.com> wrote:

...

I'm no sure. If you want to be able to write A=B+C+D; with decent performances, I think you have to use a lib based on expression templates. It would be great if C++ compilers could automatically optimize out spurious copies into temporaries. However, I don't think the compilers are smart enough to do so...not yet.

But isn't this all irrelevant to numpy? Numpy is basically a large collection of bare inner loops, plus a bunch of dynamic dispatch machinery to make sure that the right one gets called at the right time. Since these are exposed directly to Python, there's really no way for the compiler to optimize out spurious copies or anything like that -- even a very smart fortran-esque static compiler can't optimize complex expressions like A=B+C+D if they simply aren't present at compile time. And I guess even less-fancy C compilers will still be able to optimize simple ufunc loops pretty well. IIUC the important thing for numpy speed is the code that works out at runtime whether this particular array would benefit from a column-based or row-based strategy, chooses the right buffer sizes, etc., which isn't really something a compiler can help with. -- Nathaniel

Matthieu Brucher

5:30 p.m.

2012/2/19 Sturla Molden <sturla@molden.no>

...

Den 19.02.2012 10:28, skrev Mark Wiebe:

...
Particular styles of using templates can cause this, yes. To properly do this kind of advanced C++ library work, it's important to think about the big-O notation behavior of your template instantiations, not just the big-O notation of run-time. C++ templates have a turing-complete language (which is said to be quite similar to haskell, but spelled vastly different) running at compile time in them. This is what gives template meta-programming in C++ great power, but since templates weren't designed for this style of programming originally, template meta-programming is not very easy.

The problem with metaprogramming is that we are doing manually the work that belongs to the compiler. Blitz++ was supposed to be a library that "thought like a compiler". But then compilers just got better. Today, it is no longer possible for a numerical library programmer to outsmart an optimizing C++ compiler. All metaprogramming can do today is produce error messages noone can understand. And the resulting code will often be slower because the compiler has less opportunities to do its work.

As I've said, the compiler is pretty much stupid. It cannot do what Blitzz++ did, or what Eigen is currently doing, mainly because of the basis different languages (C or C++). -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher

Nathaniel Smith

11:25 a.m.

On Sun, Feb 19, 2012 at 9:16 AM, David Cournapeau <cournape@gmail.com> wrote:

...

On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe <mwwiebe@gmail.com> wrote:

...
Is there a specific target platform/compiler combination you're thinking of where we can do tests on this? I don't believe the compile times are as bad as many people suspect, can you give some simple examples of things we might do in NumPy you expect to compile slower in C++ vs C?

Switching from gcc to g++ on the same codebase should not change much compilation times. We should test, but that's not what worries me. What worries me is when we start using C++ specific code, STL and co. Today, scipy.sparse.sparsetools takes half of the build time of the whole scipy, and it does not even use fancy features. It also takes Gb of ram when building in parallel.

I like C++ but it definitely does have issues with compilation times. IIRC the main problem is very simple: STL and friends (e.g. Boost) are huge libraries, and because they use templates, the entire source code is in the header files. That means that as soon as you #include a few standard C++ headers, your innocent little source file has suddenly become hundreds of thousands of lines long, and it just takes the compiler a while to churn through megabytes of source code, no matter what it is. (Effectively you recompile some significant fraction of STL from scratch on every file, and then throw it away.) Precompiled headers can help some, but require complex and highly non-portable build-system support. (E.g., gcc's precompiled header constraints are here: http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html -- only one per source file, etc.) To demonstrate: a trivial hello-world in C using <stdio.h>, versus a trivial version in C++ using <iostream>. On my laptop (gcc 4.5.2), compiling each program 100 times in a loop requires: C: 2.28 CPU seconds C compiled with C++ compiler: 4.61 CPU seconds C++: 17.66 CPU seconds Slowdown for using g++ instead of gcc: 2.0x Slowdown for using C++ standard library: 3.8x Total C++ penalty: 7.8x Lines of code compiled in each case: $ gcc -E hello.c | wc 855 2039 16934 $ g++ -E hello.cc | wc 18569 40994 437954 (I.e., the C++ hello world is almost half a megabyte.) Of course we won't be using <iostream>, but <vector>, <unordered_map> etc. all have the same basic character. -- Nathaniel (Test files attached, times were from: time sh -c 'for i in $(seq 100); do gcc hello.c -o hello-c; done' cp hello.c c-hello.cc time sh -c 'for i in $(seq 100); do g++ c-hello.cc -o c-hello-cc; done' time sh -c 'for i in $(seq 100); do g++ hello.cc -o hello-cc; done' and then summing the resulting user and system times.)

Neal Becker

1:42 p.m.

Nathaniel Smith wrote:

...

On Sun, Feb 19, 2012 at 9:16 AM, David Cournapeau <cournape@gmail.com> wrote:

...
On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe <mwwiebe@gmail.com> wrote:

...
Is there a specific target platform/compiler combination you're thinking of where we can do tests on this? I don't believe the compile times are as bad as many people suspect, can you give some simple examples of things we might do in NumPy you expect to compile slower in C++ vs C?

Switching from gcc to g++ on the same codebase should not change much compilation times. We should test, but that's not what worries me. What worries me is when we start using C++ specific code, STL and co. Today, scipy.sparse.sparsetools takes half of the build time of the whole scipy, and it does not even use fancy features. It also takes Gb of ram when building in parallel.

I like C++ but it definitely does have issues with compilation times.

IIRC the main problem is very simple: STL and friends (e.g. Boost) are huge libraries, and because they use templates, the entire source code is in the header files. That means that as soon as you #include a few standard C++ headers, your innocent little source file has suddenly become hundreds of thousands of lines long, and it just takes the compiler a while to churn through megabytes of source code, no matter what it is. (Effectively you recompile some significant fraction of STL from scratch on every file, and then throw it away.)

Precompiled headers can help some, but require complex and highly non-portable build-system support. (E.g., gcc's precompiled header constraints are here: http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html -- only one per source file, etc.)

To demonstrate: a trivial hello-world in C using <stdio.h>, versus a trivial version in C++ using <iostream>.

On my laptop (gcc 4.5.2), compiling each program 100 times in a loop requires: C: 2.28 CPU seconds C compiled with C++ compiler: 4.61 CPU seconds C++: 17.66 CPU seconds Slowdown for using g++ instead of gcc: 2.0x Slowdown for using C++ standard library: 3.8x Total C++ penalty: 7.8x

Lines of code compiled in each case: $ gcc -E hello.c | wc 855 2039 16934 $ g++ -E hello.cc | wc 18569 40994 437954 (I.e., the C++ hello world is almost half a megabyte.)

Of course we won't be using <iostream>, but <vector>, <unordered_map> etc. all have the same basic character.

-- Nathaniel

(Test files attached, times were from: time sh -c 'for i in $(seq 100); do gcc hello.c -o hello-c; done' cp hello.c c-hello.cc time sh -c 'for i in $(seq 100); do g++ c-hello.cc -o c-hello-cc; done' time sh -c 'for i in $(seq 100); do g++ hello.cc -o hello-cc; done' and then summing the resulting user and system times.)

On Fedora linux I use ccache, which is completely transparant and makes a huge difference in build times.

Nathaniel Smith

11:42 p.m.

On Sun, Feb 19, 2012 at 1:42 PM, Neal Becker <ndbecker2@gmail.com> wrote:

...

On Fedora linux I use ccache, which is completely transparant and makes a huge difference in build times.

ccache is fabulous (and it's fabulous for C too), but it only helps when 'make' has screwed up and decided to rebuild some file that didn't really need rebuilding, or when doing a clean build (which is more or less the same thing, if you think about it). -- Nathaniel

Charles R Harris

12:12 a.m.

On Sun, Feb 19, 2012 at 4:42 PM, Nathaniel Smith <njs@pobox.com> wrote:

...

On Sun, Feb 19, 2012 at 1:42 PM, Neal Becker <ndbecker2@gmail.com> wrote:

...
On Fedora linux I use ccache, which is completely transparant and makes a huge difference in build times.

ccache is fabulous (and it's fabulous for C too), but it only helps when 'make' has screwed up and decided to rebuild some file that didn't really need rebuilding, or when doing a clean build (which is more or less the same thing, if you think about it).

For Numpy, there are also other things going on. My clean builds finish in about 30 seconds using one cpu, not so clean builds take longer. Chuck

Mark Wiebe

7:13 p.m.

On Sun, Feb 19, 2012 at 5:25 AM, Nathaniel Smith <njs@pobox.com> wrote:

...

...
On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe <mwwiebe@gmail.com> wrote:

...
Is there a specific target platform/compiler combination you're thinking of where we can do tests on this? I don't believe the compile times are as bad as many

On Sun, Feb 19, 2012 at 9:16 AM, David Cournapeau <cournape@gmail.com> wrote: people

...
...
suspect, can you give some simple examples of things we might do in NumPy you expect to compile slower in C++ vs C?

Switching from gcc to g++ on the same codebase should not change much compilation times. We should test, but that's not what worries me. What worries me is when we start using C++ specific code, STL and co. Today, scipy.sparse.sparsetools takes half of the build time of the whole scipy, and it does not even use fancy features. It also takes Gb of ram when building in parallel.

I like C++ but it definitely does have issues with compilation times.

IIRC the main problem is very simple: STL and friends (e.g. Boost) are huge libraries, and because they use templates, the entire source code is in the header files. That means that as soon as you #include a few standard C++ headers, your innocent little source file has suddenly become hundreds of thousands of lines long, and it just takes the compiler a while to churn through megabytes of source code, no matter what it is. (Effectively you recompile some significant fraction of STL from scratch on every file, and then throw it away.)

Precompiled headers can help some, but require complex and highly non-portable build-system support. (E.g., gcc's precompiled header constraints are here: http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html -- only one per source file, etc.)

This doesn't look too bad, I think it would be worth setting these up in NumPy. The complexity you see is because its pretty close to the only way that precompiled headers could be set up.

...

To demonstrate: a trivial hello-world in C using <stdio.h>, versus a trivial version in C++ using <iostream>.

On my laptop (gcc 4.5.2), compiling each program 100 times in a loop requires: C: 2.28 CPU seconds C compiled with C++ compiler: 4.61 CPU seconds C++: 17.66 CPU seconds Slowdown for using g++ instead of gcc: 2.0x Slowdown for using C++ standard library: 3.8x Total C++ penalty: 7.8x

Lines of code compiled in each case: $ gcc -E hello.c | wc 855 2039 16934 $ g++ -E hello.cc | wc 18569 40994 437954 (I.e., the C++ hello world is almost half a megabyte.)

Of course we won't be using <iostream>, but <vector>, <unordered_map> etc. all have the same basic character.

Thanks for doing the benchmark. It is a bit artificial, however, and when I tried these trivial examples with -O0 and -O2, the difference (in gcc 4.7) of the C++ compile time was about 4%. In NumPy presently as it is in C, the difference between -O0 and -O2 is very significant, and any comparisons need to take this kind of thing into account. When I said I thought the compile-time differences would be smaller than many people expect, I was thinking about how this optimization phase, which is shared between C and C++, often dominating the compile times. Cheers, Mark

...

-- Nathaniel

(Test files attached, times were from: time sh -c 'for i in $(seq 100); do gcc hello.c -o hello-c; done' cp hello.c c-hello.cc time sh -c 'for i in $(seq 100); do g++ c-hello.cc -o c-hello-cc; done' time sh -c 'for i in $(seq 100); do g++ hello.cc -o hello-cc; done' and then summing the resulting user and system times.)

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Nathaniel Smith

11:39 p.m.

On Sun, Feb 19, 2012 at 7:13 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:

...

On Sun, Feb 19, 2012 at 5:25 AM, Nathaniel Smith <njs@pobox.com> wrote:

...
Precompiled headers can help some, but require complex and highly non-portable build-system support. (E.g., gcc's precompiled header constraints are here: http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html -- only one per source file, etc.)

This doesn't look too bad, I think it would be worth setting these up in NumPy. The complexity you see is because its pretty close to the only way that precompiled headers could be set up.

Sure, so long as you know what headers every file needs. (Or more likely, figure out a more-or-less complete set of all the headers might ever need, and then -include that into every file.)

...

...
To demonstrate: a trivial hello-world in C using <stdio.h>, versus a trivial version in C++ using <iostream>.

On my laptop (gcc 4.5.2), compiling each program 100 times in a loop requires: C: 2.28 CPU seconds C compiled with C++ compiler: 4.61 CPU seconds C++: 17.66 CPU seconds Slowdown for using g++ instead of gcc: 2.0x Slowdown for using C++ standard library: 3.8x Total C++ penalty: 7.8x

Lines of code compiled in each case: $ gcc -E hello.c | wc 855 2039 16934 $ g++ -E hello.cc | wc 18569 40994 437954 (I.e., the C++ hello world is almost half a megabyte.)

Of course we won't be using <iostream>, but <vector>, <unordered_map> etc. all have the same basic character.

Thanks for doing the benchmark. It is a bit artificial, however, and when I tried these trivial examples with -O0 and -O2, the difference (in gcc 4.7) of the C++ compile time was about 4%. In NumPy presently as it is in C, the difference between -O0 and -O2 is very significant, and any comparisons need to take this kind of thing into account. When I said I thought the compile-time differences would be smaller than many people expect, I was thinking about how this optimization phase, which is shared between C and C++, often dominating the compile times.

Sure -- but the effective increased code-size for STL-using C++ affects the optimizer too; it's effectively re-optimizing all the used parts of STL again for each source file. (Presumably in this benchmark that half megabyte of extra code is mostly unused, and therefore getting thrown out before the optimizer does any work on it -- but that doesn't happen if you're actually using the library!) Maybe things have gotten better in the last year or two, I dunno; if you run a better benchmark I'll listen. But there's an order-of-magnitude difference in compile times between most real-world C projects and most real-world C++ projects. It might not be a deal-breaker and it might not apply for subset of C++ you're planning to use, but AFAICT that's the facts. -- Nathaniel

Sturla Molden

12:14 a.m.

Den 20.02.2012 00:39, skrev Nathaniel Smith:

...

But there's an order-of-magnitude difference in compile times between most real-world C projects and most real-world C++ projects. It might not be a deal-breaker and it might not apply for subset of C++ you're planning to use, but AFAICT that's the facts.

This is mainly a complaint about the build-process. Maybe make or distutis are broken, I don't know. But with a sane build tool (e.g. MS Visual Studio or Eclipse) this is not a problem. You just recompile the file you are working with, not the rest (unless you do a clean build). Sturla

Stéfan van der Walt

4:20 a.m.

On Feb 19, 2012 4:14 PM, "Sturla Molden" <sturla@molden.no> wrote:

...

Den 20.02.2012 00:39, skrev Nathaniel Smith:

...
But there's an order-of-magnitude difference in compile times between most real-world C projects and most real-world C++ projects. It might not be a deal-breaker and it might not apply for subset of C++ you're planning to use, but AFAICT that's the facts.

This is mainly a complaint about the build-process.

This has nothing to do with the build process. More complex languages take longer to compile. The benchmark shown is also entirely independent of build system. Stéfan

Paul Anton Letnes

7:35 a.m.

In the language wars, I have one question. Why is Fortran not being considered? Fortran already implements many of the features that we want in NumPy: - slicing and similar operations, at least some of the fancy indexing kind - element-wise array operations and function calls - array bounds-checking and other debugging aid (with debugging flags) - arrays that mentally map very well onto numpy arrays. To me, this spells +1 to ease of contribution, over some abstract C/C++ template - in newer standards it has some nontrivial mathematical functions: gamma, bessel, etc. that numpy lacks right now - compilers that are good at optimizing for floating-point performance, because that's what Fortran is all about - not Fortran as such, but BLAS and LAPACK are easily accessed by Fortran - possibly other numerical libraries that can be helpful - Fortran has, in its newer standards, thought of C interoperability. We could still keep bits of the code in C (or even C++?) if we'd like to, or perhaps f2py/Cython could do the wrapping. - some programmers know Fortran better than C++. Fortran is at least used by many science guys, like me. Until someone comes along with actual numbers or at least anecdotal evidence, I don't think the "more programmers know X than Y" argument is too interesting. Personally I've learned both, and Fortran is much more accessible than C++ (to me) if you're used to the "work with (numpy) arrays" mentality. As far as I can understand, implementing element-wise operations, slicing, and a host of other NumPy features is in some sense pointless - the Fortran compiler authors have already done it for us. Of course some nice wrapping will be needed in C, Cython, f2py, or similar. Since my understanding is limited, I'd be interested in being proved wrong, though :) Paul

Pauli Virtanen

9:54 a.m.

20.02.2012 08:35, Paul Anton Letnes kirjoitti:

...

In the language wars, I have one question. Why is Fortran not being considered?

Fortran is OK for simple numerical algorithms, but starts to suck heavily if you need to do any string handling, I/O, complicated logic, or data structures. Most of the work in Numpy implementation is not actually in numerics, but in figuring out the correct operation to dispatch the computations to. So, this is one reason why Fortran is not considered. -- Pauli Virtanen

Stéfan van der Walt

10:24 a.m.

On Mon, Feb 20, 2012 at 1:54 AM, Pauli Virtanen <pav@iki.fi> wrote:

...

20.02.2012 08:35, Paul Anton Letnes kirjoitti:

...
In the language wars, I have one question. Why is Fortran not being considered?

Fortran is OK for simple numerical algorithms, but starts to suck heavily if you need to do any string handling, I/O, complicated logic, or data structures.

Out of curiosity, is this still true for the latest Fortran versions? I guess there the problem may be compiler support over various platforms. Stéfan

Charles R Harris

11:43 a.m.

On Mon, Feb 20, 2012 at 2:54 AM, Pauli Virtanen <pav@iki.fi> wrote:

...

20.02.2012 08:35, Paul Anton Letnes kirjoitti:

...
In the language wars, I have one question. Why is Fortran not being considered?

Fortran is OK for simple numerical algorithms, but starts to suck heavily if you need to do any string handling, I/O, complicated logic, or data structures.

Most of the work in Numpy implementation is not actually in numerics, but in figuring out the correct operation to dispatch the computations to. So, this is one reason why Fortran is not considered.

There also used to be a problem with unsigned types not being available. I don't know if that is still the case. Chuck

Sturla Molden

3:09 p.m.

Den 20.02.2012 12:43, skrev Charles R Harris:

...

There also used to be a problem with unsigned types not being available. I don't know if that is still the case.

Fortran -- like Python and Java -- does not have built-in unsigned integer types. It is never really a problem though. One can e.g. use a longer integer or keep them in an array of bytes. (Fortran 2003 is OOP so it is possible to define one if needed. Not saying it is a good idea.) Sturla

Sturla Molden

3:15 p.m.

Den 20.02.2012 10:54, skrev Pauli Virtanen:

...

Fortran is OK for simple numerical algorithms, but starts to suck heavily if you need to do any string handling, I/O, complicated logic, or data structures

For string handling, C is actually worse than Fortran. In Fortran a string can be sliced like in Python. It is not as nice as Python, but far better than C. Fortran's built-in I/O syntax is archaic, but the ISO C bindings in Fortran 2003 means one can use other means of I/O (posix, win api, C stdio) in a portable way. Sturla

Sturla Molden

3:29 p.m.

Den 20.02.2012 08:35, skrev Paul Anton Letnes:

...

In the language wars, I have one question. Why is Fortran not being considered? Fortran already implements many of the features that we want in NumPy:

Yes ... but it does not make Fortran a systems programming language. Making NumPy is different from using it.

...

- slicing and similar operations, at least some of the fancy indexing kind - element-wise array operations and function calls - array bounds-checking and other debugging aid (with debugging flags)

That is nice for numerical computing, but not really needed to make NumPy.

...

- arrays that mentally map very well onto numpy arrays. To me, this spells +1 to ease of contribution, over some abstract C/C++ template

Mentally perhaps, but not binary. NumPy needs uniformly strided memory on the binary level. Fortran just gives this at the mental level. E.g. there is nothing that dictates a Fortran pointer has to be a view, the compiler is free to employ copy-in copy-out. In Fortran, a function call can invalidate a pointer. One would therefore have to store the array in an array of integer*1, and use the intrinsic function transfer() to parse the contents into NumPy dtypes.

...

- in newer standards it has some nontrivial mathematical functions: gamma, bessel, etc. that numpy lacks right now

That belongs to SciPy.

...

- compilers that are good at optimizing for floating-point performance, because that's what Fortran is all about

Insanely good, but not when we start to do the (binary, not mentally) strided access that NumPy needs. (Not that C compilers would be any better.)

...

- not Fortran as such, but BLAS and LAPACK are easily accessed by Fortran - possibly other numerical libraries that can be helpful - Fortran has, in its newer standards, thought of C interoperability. We could still keep bits of the code in C (or even C++?) if we'd like to, or perhaps f2py/Cython could do the wrapping.

Not f2py, as it depends on NumPy. - some programmers know Fortran better than C++. Fortran is at least used by many science guys, like me. That is a valid arguments. Fortran is also much easier to read and debug. Sturla

Paul Anton Letnes

7:55 p.m.

On 20. feb. 2012, at 16:29, Sturla Molden wrote:

...

Den 20.02.2012 08:35, skrev Paul Anton Letnes:

...
In the language wars, I have one question. Why is Fortran not being considered? Fortran already implements many of the features that we want in NumPy:

Yes ... but it does not make Fortran a systems programming language. Making NumPy is different from using it.

...
- slicing and similar operations, at least some of the fancy indexing kind - element-wise array operations and function calls - array bounds-checking and other debugging aid (with debugging flags)

That is nice for numerical computing, but not really needed to make NumPy.

...
- arrays that mentally map very well onto numpy arrays. To me, this spells +1 to ease of contribution, over some abstract C/C++ template

Mentally perhaps, but not binary. NumPy needs uniformly strided memory on the binary level. Fortran just gives this at the mental level. E.g. there is nothing that dictates a Fortran pointer has to be a view, the compiler is free to employ copy-in copy-out. In Fortran, a function call can invalidate a pointer. One would therefore have to store the array in an array of integer*1, and use the intrinsic function transfer() to parse the contents into NumPy dtypes.

...
- in newer standards it has some nontrivial mathematical functions: gamma, bessel, etc. that numpy lacks right now

That belongs to SciPy.

I don't see exactly why. Why should numpy have exponential but not gamma functions? The division seems kinda arbitrary. Not that I am arguing violently for bessel functions in numpy.

...

...
- compilers that are good at optimizing for floating-point performance, because that's what Fortran is all about

Insanely good, but not when we start to do the (binary, not mentally) strided access that NumPy needs. (Not that C compilers would be any better.)

...
- not Fortran as such, but BLAS and LAPACK are easily accessed by Fortran - possibly other numerical libraries that can be helpful - Fortran has, in its newer standards, thought of C interoperability. We could still keep bits of the code in C (or even C++?) if we'd like to, or perhaps f2py/Cython could do the wrapping.

Not f2py, as it depends on NumPy.

- some programmers know Fortran better than C++. Fortran is at least used by many science guys, like me.

That is a valid arguments. Fortran is also much easier to read and debug.

Sturla

Thanks for an excellent answer, Sturla - very informative indeed. Paul.

Robert Kern

8:40 p.m.

On Mon, Feb 20, 2012 at 19:55, Paul Anton Letnes <paul.anton.letnes@gmail.com> wrote:

...

On 20. feb. 2012, at 16:29, Sturla Molden wrote:

...

...
...
- in newer standards it has some nontrivial mathematical functions: gamma, bessel, etc. that numpy lacks right now

That belongs to SciPy.

I don't see exactly why. Why should numpy have exponential but not gamma functions? The division seems kinda arbitrary. Not that I am arguing violently for bessel functions in numpy.

The semi-arbitrary dividing line that we have settled on is C99. If a special function is in the C99 standard, we'll accept an implementation for it in numpy. Part (well, most) of the rationale is just to have a clear dividing line even if it's fairly arbitrary. The other part is that if a decidedly non-mathematically-focused standard like C99 includes a special function in its standard library, then odds are good that it's something that is widely used enough as a building block for other things. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Sturla Molden

4:35 p.m.

Den 20.02.2012 08:35, skrev Paul Anton Letnes:

...

As far as I can understand, implementing element-wise operations, slicing, and a host of other NumPy features is in some sense pointless - the Fortran compiler authors have already done it for us.

Only if you know the array dimensions in advance. Sturla

Matthieu Brucher

5:28 p.m.

2012/2/19 Nathaniel Smith <njs@pobox.com>

...

...
On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe <mwwiebe@gmail.com> wrote:

...
Is there a specific target platform/compiler combination you're thinking of where we can do tests on this? I don't believe the compile times are as bad as many

On Sun, Feb 19, 2012 at 9:16 AM, David Cournapeau <cournape@gmail.com> wrote: people

...
...
suspect, can you give some simple examples of things we might do in NumPy you expect to compile slower in C++ vs C?

Switching from gcc to g++ on the same codebase should not change much compilation times. We should test, but that's not what worries me. What worries me is when we start using C++ specific code, STL and co. Today, scipy.sparse.sparsetools takes half of the build time of the whole scipy, and it does not even use fancy features. It also takes Gb of ram when building in parallel.

I like C++ but it definitely does have issues with compilation times.

IIRC the main problem is very simple: STL and friends (e.g. Boost) are huge libraries, and because they use templates, the entire source code is in the header files. That means that as soon as you #include a few standard C++ headers, your innocent little source file has suddenly become hundreds of thousands of lines long, and it just takes the compiler a while to churn through megabytes of source code, no matter what it is. (Effectively you recompile some significant fraction of STL from scratch on every file, and then throw it away.)

In fact Boost tries to be clean about this. Up to a few minor releases of GCC, their headers were a mess. When you included something, a lot of additional code was brought, and the compile-time exploded. But this is no longer the case. If we restrict the core to a few includes, even with templates, it should not be long to compile. -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher

Benjamin Root

9:25 p.m.

On Sat, Feb 18, 2012 at 2:45 PM, Charles R Harris <charlesr.harris@gmail.com

...

wrote:

...

On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett <matthew.brett@gmail.com>wrote:

...
Hi,

On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett <

matthew.brett@gmail.com>

...
wrote:

...
Hi.

On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote:

...
On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote: > On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden <sturla@molden.no> > wrote: >> >> >> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout >> <jason-sage@creativetrax.com>: >> >>> On 2/17/12 9:54 PM, Sturla Molden wrote: >>>> We would have to write a C++ programming tutorial that is based

on

...
...
...
>>>> Pyton knowledge instead of C knowledge. >>> >>> I personally would love such a thing. It's been a while since I did >>> anything nontrivial on my own in C++. >>> >> >> One example: How do we code multiple return values? >> >> In Python: >> - Return a tuple. >> >> In C: >> - Use pointers (evilness) >> >> In C++: >> - Return a std::tuple, as you would in Python. >> - Use references, as you would in Fortran or Pascal. >> - Use pointers, as you would in C. >> >> C++ textbooks always pick the last... >> >> I would show the first and the second method, and perhaps >> intentionally forget the last. >> >> Sturla >>

> On the flip side, cython looked pretty...but I didn't get the > performance gains I wanted, and had to spend a lot of time figuring > out if it was cython, needing to add types, buggy support for numpy, > or actually the algorithm.

At the time, was the numpy support buggy? I personally haven't had many problems with Cython and numpy.

It's not that the support WAS buggy, it's that it wasn't clear to me what was going on and where my performance bottleneck was. Even after microbenchmarking with ipython, using timeit and prun, and using the cython code visualization tool. Ultimately I don't think it was cython, so perhaps my comment was a bit unfair. But it was unfortunately difficult to verify that. Of course, as you say, diagnosing and solving such issues would become easier to resolve with more cython experience.

...
> The C files generated by cython were > enormous and difficult to read. They really weren't meant for human > consumption.

Yes, it takes some practice to get used to what Cython will do, and how to optimize the output.

> As Sturla has said, regardless of the quality of the > current product, it isn't stable.

I've personally found it more or less rock solid. Could you say what you mean by "it isn't stable"?

I just meant what Sturla said, nothing more:

"Cython is still 0.16, it is still unfinished. We cannot base NumPy on an unfinished compiler."

Y'all mean, it has a zero at the beginning of the version number and it is still adding new features? Yes, that is correct, but it seems more reasonable to me to phrase that as 'active development' rather than 'unstable', because they take considerable care to be backwards compatible, have a large automated Cython test suite, and a major stress-tester in the Sage test suite.

Matthew,

No one in their right mind would build a large performance library using Cython, it just isn't the right tool. For what it was designed for - wrapping existing c code or writing small and simple things close to Python - it does very well, but it was never designed for making core C/C++ libraries and in that role it just gets in the way.

I believe the proposal is to refactor the lowest levels in pure C and move the some or most of the library superstructure to Cython.

Go for it.

Chuck

Just a couple of quick questions: 1.) What is the status of the refactoring that was done for IronPython a couple of years ago? The last I heard, the branches diverged too much for merging the work back into numpy. Are there lessons that can be learned from that experience that can be applied to whatever happens next? 2.) My personal preference is an incremental refactor over to C++ using STL, however, I have to be realistic. First, the exception issue is problematic (unsolvable? I don't know). Second, one of Numpy/Scipy's greatest strengths is the relative ease it has in interfacing with BLAS, ATLAS, mkl and other optimizations. Will this still be possible from a C++ (or anything else) core? Third, I am only familiar with STL on gcc. Are there any subtle differences in implementations of STL in MSVC or any other compilers. Pointers are hard to mess up, in cross-platform ways. 3.) Will memory-mapped arrays still be possible after the refactor? I am not familiar with the implementation, but I am a big netcdf/hdf user and mem-mapped arrays are important to me. 4.) Wouldn't depending on Cython create a circular dependency? Can you build Cython without numpy-devel? (I never tried. I have only used packaged Cython). Also, because Cython generates code to compile, is there a possibility of producing different ABIs depending upon the combinations of numpy and cython versions (even if unintentional)? How difficult will it be for distro maintainers to package numpy and its extensions? How difficult will it be for users of Macs and Windows who may try combining different versions? Honest questions because I have never had more than a cursory exposure to Cython. Ben Root

Sturla Molden

10:17 p.m.

Den 18.02.2012 22:25, skrev Benjamin Root:

...

2.) My personal preference is an incremental refactor over to C++ using STL, however, I have to be realistic. First, the exception issue is problematic (unsolvable? I don't know). Second, one of Numpy/Scipy's greatest strengths is the relative ease it has in interfacing with BLAS, ATLAS, mkl and other optimizations. Will this still be possible from a C++ (or anything else) core?

Yes.

...

Third, I am only familiar with STL on gcc. Are there any subtle differences in implementations of STL in MSVC or any other compilers. Pointers are hard to mess up, in cross-platform ways.

NumPy should stay with the standard, whether C or C++, ans not be written for one particular compiler. Writing code that depends on a set of known bugs in one implementation is why IE6 almost broke the internet.

...

3.) Will memory-mapped arrays still be possible after the refactor? I am not familiar with the implementation, but I am a big netcdf/hdf user and mem-mapped arrays are important to me.

Yes, that depends on the operating system, not the programming language. Sturla

Dag Sverre Seljebotn

10:07 p.m.

On 02/18/2012 12:35 PM, Charles R Harris wrote:

...

On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett <matthew.brett@gmail.com <mailto:matthew.brett@gmail.com>> wrote:

Hi.

On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire <cjordan1@uw.edu <mailto:cjordan1@uw.edu>> wrote: > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett <matthew.brett@gmail.com <mailto:matthew.brett@gmail.com>> wrote: >> Hi, >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire >> <cjordan1@uw.edu <mailto:cjordan1@uw.edu>> wrote: >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden <sturla@molden.no <mailto:sturla@molden.no>> wrote: >>>> >>>> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout <jason-sage@creativetrax.com <mailto:jason-sage@creativetrax.com>>: >>>> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: >>>>>> We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge. >>>>> >>>>> I personally would love such a thing. It's been a while since I did >>>>> anything nontrivial on my own in C++. >>>>> >>>> >>>> One example: How do we code multiple return values? >>>> >>>> In Python: >>>> - Return a tuple. >>>> >>>> In C: >>>> - Use pointers (evilness) >>>> >>>> In C++: >>>> - Return a std::tuple, as you would in Python. >>>> - Use references, as you would in Fortran or Pascal. >>>> - Use pointers, as you would in C. >>>> >>>> C++ textbooks always pick the last... >>>> >>>> I would show the first and the second method, and perhaps intentionally forget the last. >>>> >>>> Sturla >>>> >> >>> On the flip side, cython looked pretty...but I didn't get the >>> performance gains I wanted, and had to spend a lot of time figuring >>> out if it was cython, needing to add types, buggy support for numpy, >>> or actually the algorithm. >> >> At the time, was the numpy support buggy? I personally haven't had >> many problems with Cython and numpy. >> > > It's not that the support WAS buggy, it's that it wasn't clear to me > what was going on and where my performance bottleneck was. Even after > microbenchmarking with ipython, using timeit and prun, and using the > cython code visualization tool. Ultimately I don't think it was > cython, so perhaps my comment was a bit unfair. But it was > unfortunately difficult to verify that. Of course, as you say, > diagnosing and solving such issues would become easier to resolve with > more cython experience. > >>> The C files generated by cython were >>> enormous and difficult to read. They really weren't meant for human >>> consumption. >> >> Yes, it takes some practice to get used to what Cython will do, and >> how to optimize the output. >> >>> As Sturla has said, regardless of the quality of the >>> current product, it isn't stable. >> >> I've personally found it more or less rock solid. Could you say what >> you mean by "it isn't stable"? >> > > I just meant what Sturla said, nothing more: > > "Cython is still 0.16, it is still unfinished. We cannot base NumPy on > an unfinished compiler."

Y'all mean, it has a zero at the beginning of the version number and it is still adding new features? Yes, that is correct, but it seems more reasonable to me to phrase that as 'active development' rather than 'unstable', because they take considerable care to be backwards compatible, have a large automated Cython test suite, and a major stress-tester in the Sage test suite.

Matthew,

No one in their right mind would build a large performance library using Cython, it just isn't the right tool. For what it was designed for - wrapping existing c code or writing small and simple things close to Python - it does very well, but it was never designed for making core C/C++ libraries and in that role it just gets in the way.

+1. Even I who have contributed to Cython realize this; last autumn I implemented a library by writing it in C and wrapping it in Cython. Dag

Matthew Brett

7:47 a.m.

Hi, again (sorry), On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote:

...

On the broader topic of recruitment...sure, cython has a lower barrier to entry than C++. But there are many, many more C++ developers and resources out there than cython resources. And it likely will stay that way for quite some time.

On the other hand, in the current development community around numpy, and among the subscribers to this mailing list, I suspect there is more Cython experience than C++ experience. Of course it might be that so-far undiscovered C++ developers are drawn to a C++ rewrite of Numpy. But it that really likely? I can see a C++ developer being drawn to C++ performance library they would use in their C++ applications, but it's harder for me to imagine a C++ programmer being drawn to a Python library because the internals are C++. Best, Matthew

David Warde-Farley

4:06 a.m.

On 2012-02-18, at 2:47 AM, Matthew Brett wrote:

...

Of course it might be that so-far undiscovered C++ developers are drawn to a C++ rewrite of Numpy. But it that really likely?

If we can trick them into thinking the GIL doesn't exist, then maybe... David

David Cournapeau

7:55 a.m.

Le 18 févr. 2012 06:18, "Christopher Jordan-Squire" <cjordan1@uw.edu> a écrit :

...

On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden <sturla@molden.no> wrote:

...
Den 18. feb. 2012 kl. 05:01 skrev Jason Grout <

jason-sage@creativetrax.com>:

...

...
...
On 2/17/12 9:54 PM, Sturla Molden wrote:

...
We would have to write a C++ programming tutorial that is based on

Pyton knowledge instead of C knowledge.

...
...
I personally would love such a thing. It's been a while since I did anything nontrivial on my own in C++.

One example: How do we code multiple return values?

In Python: - Return a tuple.

In C: - Use pointers (evilness)

In C++: - Return a std::tuple, as you would in Python. - Use references, as you would in Fortran or Pascal. - Use pointers, as you would in C.

C++ textbooks always pick the last...

I would show the first and the second method, and perhaps intentionally forget the last.

Sturla

I can add my own 2 cents about cython vs. C vs. C++, based on summer coding experiences.

I was an intern at Enthought, sharing an office with Mark W. (Which was a treat. I recommend you all quit your day jobs and haunt whatever office Mark is inhabiting.) I was trying to optimize some code and that lead to experimenting with both cython and C.

Dealing with the C internals of numpy was frustrating. Since C doesn't have templating but numpy kinda needs it, instead python scripts go over and manually perform templating. Not the most obvious thing. There were other issues in the background--including C doesn't allow for abstraction (i.e. easy to read), lots of pointer-fu is required, and the C API is lightly documented and already plenty difficult.

Please understand that the argument is not to maintain a status quo. Lack of API documentation, internals that need significant work are certainly issues. I fail to see how writing in C++ will solve the documentation issues. On the abstraction side of things, let's agree to disagree. Plenty of complex projects are written in both languages to make this a matter of mostly subjective matter.

...

On the flip side, cython looked pretty...but I didn't get the performance gains I wanted, and had to spend a lot of time figuring out if it was cython, needing to add types, buggy support for numpy, or actually the algorithm. The C files generated by cython were enormous and difficult to read. They really weren't meant for human consumption. As Sturla has said, regardless of the quality of the current product, it isn't stable.

Sturla represents only himself on this issue. Cython is widely held as a successful and very useful tool. Many more projects in the scipy community uses cython compared to C++. And even if it looks friendly

...

there's magic going on under the hood. Magic means it's hard to diagnose and fix problems. At least one very smart person has told me they find cython most useful for wrapping C/C++ libraries and exposing them to python, which is a far cry from library writing. (Of course Wes McKinney, a cython evangelist, uses it all over his pandas library.)

I am not very smart, but this is certainly close to what I had in mind as well :) As you know, the lack of clear abstraction between c and c python wrapping is one of the major issue in numpy. Cython is certainly one of the most capable tool out there to avoid tedious reference bug chasing.

...

In comparison, there are a number of high quality, performant, open-source C++ based array libraries out there with very friendly API's. Things like eigen (http://eigen.tuxfamily.org/index.php?title=Main_Page) and Armadillo (http://arma.sourceforge.net/). They seem to have plenty of users and more devs than

eigen is a typical example of code i hope numpy will never be close to. This is again quite subjective, but it also shows that we have quite different ideas on what maintainable/readable code means. Which is of course quite alright. But it means a choice needs to be made. If a majority of people find eigen more readable than a well written C library, then I don't think anyone can reasonably argue against going to c++.

...

On the broader topic of recruitment...sure, cython has a lower barrier to entry than C++. But there are many, many more C++ developers and resources out there than cython resources. And it likely will stay that way for quite some

I may not have explained it very well: my whole point is that we don't recruite people, where I understand recruit as hiring full time, profesional programmers.We need more people who can casually spend a few hours - typically grad students, scientists with an itch. There is no doubt that more professional programmers know c++ compared to C. But a community project like numpy has different requirements than a "professional" project. David

...

-Chris

...
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Eric Firing

8:16 a.m.

On 02/17/2012 09:55 PM, David Cournapeau wrote:

...

I may not have explained it very well: my whole point is that we don't recruite people, where I understand recruit as hiring full time, profesional programmers.We need more people who can casually spend a few hours - typically grad students, scientists with an itch. There is no doubt that more professional programmers know c++ compared to C. But a community project like numpy has different requirements than a "professional" project.

My sense from the thread so far is that the C++ push is part of the new vision, in which numpy will make the transition to a more "professional" level, with paid developers, and there will no longer be the expectation that "grad students, scientists with an itch" will dive into the innermost guts of the code. The guts will be more like Qt or AGG or 0MQ--solid, documented libraries that just work (I think--I don't really know that much about these examples), so we can take them for granted and worry about other things instead. If that can be accomplished, it is certainly more than fine with me; and if the best way to accomplish that is with C++, so be it. Eric

Christopher Jordan-Squire

8:17 a.m.

On Fri, Feb 17, 2012 at 11:55 PM, David Cournapeau <cournape@gmail.com> wrote:

...

Le 18 févr. 2012 06:18, "Christopher Jordan-Squire" <cjordan1@uw.edu> a écrit :

...
On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden <sturla@molden.no> wrote:

...
Den 18. feb. 2012 kl. 05:01 skrev Jason Grout <jason-sage@creativetrax.com>:

...
On 2/17/12 9:54 PM, Sturla Molden wrote:

...
We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge.

I personally would love such a thing. It's been a while since I did anything nontrivial on my own in C++.

One example: How do we code multiple return values?

In Python: - Return a tuple.

In C: - Use pointers (evilness)

In C++: - Return a std::tuple, as you would in Python. - Use references, as you would in Fortran or Pascal. - Use pointers, as you would in C.

C++ textbooks always pick the last...

I would show the first and the second method, and perhaps intentionally forget the last.

Sturla

I can add my own 2 cents about cython vs. C vs. C++, based on summer coding experiences.

I was an intern at Enthought, sharing an office with Mark W. (Which was a treat. I recommend you all quit your day jobs and haunt whatever office Mark is inhabiting.) I was trying to optimize some code and that lead to experimenting with both cython and C.

Dealing with the C internals of numpy was frustrating. Since C doesn't have templating but numpy kinda needs it, instead python scripts go over and manually perform templating. Not the most obvious thing. There were other issues in the background--including C doesn't allow for abstraction (i.e. easy to read), lots of pointer-fu is required, and the C API is lightly documented and already plenty difficult.

Please understand that the argument is not to maintain a status quo.

Lack of API documentation, internals that need significant work are certainly issues. I fail to see how writing in C++ will solve the documentation issues.

On the abstraction side of things, let's agree to disagree. Plenty of complex projects are written in both languages to make this a matter of mostly subjective matter.

...
On the flip side, cython looked pretty...but I didn't get the performance gains I wanted, and had to spend a lot of time figuring out if it was cython, needing to add types, buggy support for numpy, or actually the algorithm. The C files generated by cython were enormous and difficult to read. They really weren't meant for human consumption. As Sturla has said, regardless of the quality of the current product, it isn't stable.

Sturla represents only himself on this issue. Cython is widely held as a successful and very useful tool. Many more projects in the scipy community uses cython compared to C++.

And even if it looks friendly

...
there's magic going on under the hood. Magic means it's hard to diagnose and fix problems. At least one very smart person has told me they find cython most useful for wrapping C/C++ libraries and exposing them to python, which is a far cry from library writing. (Of course Wes McKinney, a cython evangelist, uses it all over his pandas library.)

I am not very smart, but this is certainly close to what I had in mind as well :) As you know, the lack of clear abstraction between c and c python wrapping is one of the major issue in numpy. Cython is certainly one of the most capable tool out there to avoid tedious reference bug chasing.

...
In comparison, there are a number of high quality, performant, open-source C++ based array libraries out there with very friendly API's. Things like eigen (http://eigen.tuxfamily.org/index.php?title=Main_Page) and Armadillo (http://arma.sourceforge.net/). They seem to have plenty of users and more devs than

eigen is a typical example of code i hope numpy will never be close to. This is again quite subjective, but it also shows that we have quite different ideas on what maintainable/readable code means. Which is of course quite alright. But it means a choice needs to be made. If a majority of people find eigen more readable than a well written C library, then I don't think anyone can reasonably argue against going to c++.

Fair point, obviously. I have't dug into eigen's internals much. I just like their performance benchmarks and API. <joke> Also their cute owl mascot, but I suppose that's not a meaningful standard for future coding practices. </joke>

...

...
On the broader topic of recruitment...sure, cython has a lower barrier to entry than C++. But there are many, many more C++ developers and resources out there than cython resources. And it likely will stay that way for quite some

I may not have explained it very well: my whole point is that we don't recruite people, where I understand recruit as hiring full time, profesional programmers.We need more people who can casually spend a few hours - typically grad students, scientists with an itch. There is no doubt that more professional programmers know c++ compared to C. But a community project like numpy has different requirements than a "professional" project.

I'm not sure you really mean casually spend a few *hours*, but I get your point. It's important for people to be able to add onto it incrementally as an off-hours hobby. But for itches to scratch, is numpy the realistic place for scientists and grad students to go? As opposed to one of the extension packages, like scipy, sklearn, etc.? If anywhere is going to be more akin to a "professional" project, code-style wise, it seems like the numpy core is the place to do it. -Chris

...

David

...
-Chris

...
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Daniele Nicolodi

7:14 p.m.

On 18/02/12 04:54, Sturla Molden wrote:

...

This is not true. C++ can be much easier, particularly for those who already know Python. The problem: C++ textbooks teach C++ as a subset of C. Writing C in C++ just adds the complexity of C++ on top of C, for no good reason. I can write FORTRAN in any language, it does not mean it is a good idea. We would have to start by teaching people to write good C++. E.g., always use the STL like Python built-in types if possible. Dynamic memory should be std::vector, not new or malloc. Pointers should be replaced with references. We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge.

Hello Sturla, unrelated to the numpy tewrite debate, can you please suggest some resources you think can be used to learn how to program C++ "the proper way"? Thank you. Cheers, -- Daniele

Matthieu Brucher

7:17 p.m.

2012/2/20 Daniele Nicolodi <daniele@grinta.net>

...

On 18/02/12 04:54, Sturla Molden wrote:

...
This is not true. C++ can be much easier, particularly for those who already know Python. The problem: C++ textbooks teach C++ as a subset of C. Writing C in C++ just adds the complexity of C++ on top of C, for no good reason. I can write FORTRAN in any language, it does not mean it is a good idea. We would have to start by teaching people to write good C++. E.g., always use the STL like Python built-in types if possible. Dynamic memory should be std::vector, not new or malloc. Pointers should be replaced with references. We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge.

Hello Sturla,

unrelated to the numpy tewrite debate, can you please suggest some resources you think can be used to learn how to program C++ "the proper way"?

One of the best books may be "Accelerated C++" or the new Stroutrup's book (not the C++ language) Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher

Sturla Molden

8:12 p.m.

Den 20.02.2012 20:14, skrev Daniele Nicolodi:

...

Hello Sturla, unrelated to the numpy tewrite debate, can you please suggest some resources you think can be used to learn how to program C++ "the proper way"? Thank you. Cheers,

This is totally OT on this list, however ... Scott Meyer's books have been mentioned. Also look at some litterature on the STL (e.g. Josuittis). Getting the Boost library is essential as well. The Qt library have many examples of beautiful C++. But the most important part, in my opinion, is to put the "C with classes" mentality away. Look at it as compiled Python or Java. The STL (the standard C++ library) has classes that do the same as the types we use in Python --- there are parallels to tuple, dict, set, list, deque, etc. The STL is actually richer than Python. Just use them the way we use Python. With C++11 (the latest standard), even for loops can be like Python. There are lamdas and closures, to be used as in Python, and there is an 'auto' keyword for type inference; you don't have to declare the type of a variable, the compiler will figure it out. Don't use new[] just because you can, when there is std::vector that behaves lika Python list. If you need to allocate a resource, wrap it in a class. Allocate from the contructor and deallocate from the destructor. That way an exception cannot cause a resource leak, and the clean-up code will be called automatically when the object fall of the stack. If you need to control the lifetime of an object, make an inner block with curly brackets, and declare it on top of the block. Don't call new and delete to control where you want it to be allocated and deallocated. Nothing goes on the heap unless STL puts it there. Always put objects on the stack, never allocate to a pointer with new. Always use references, and forget about pointers. This has to do with putting the "C with classes" mentality away. Always implement a copy constructor so the classes work with the STL. std:: vector<double> x(n); // ok void foobar(std:: vector<double>& x); // ok double* x = new double [n]; // bad std:: vector<double> *x = new std:: vector<double> (n); // bad void foobar(std:: vector<double>* x); // bad If you get any textbook on Windows programming from Microsoft Press, you have an excellent resource on what not to do. Verbose functions and field names, Hungarian notation, factories instead of constructors, etc. If you find yourself using macros or template magic to avoid the overhead of a virtual function (MFC, ATL, wxWidgets, FOX), for the expense of readability, you are probably doing something you shouldn't. COM is probably the worst example I know of, just compare the beautiful OpenGL to Direct3D. VTK is another example of what I consider ugly C++. But that's just my opinion. Sturla

Sturla Molden

8:33 p.m.

Den 20.02.2012 21:12, skrev Sturla Molden:

...

If you need to control the lifetime of an object, make an inner block with curly brackets, and declare it on top of the block. Don't call new and delete to control where you want it to be allocated and deallocated. Nothing goes on the heap unless STL puts it there.

Here is an example: // bad Foo *bar = new Foo(); <suite> delete Foo; // ok { Foo bar(); <suite> } Remember C++ does not allow a "finally" clause to exception handling. You cannot do this: try { Foo *bar = new Foo(); } finally { // syntax error delete Foo; } So... try { Foo *bar = new Foo(); } catch(...) { } // might not get here, possible // resource leak delete Foo; Which is why we should always do this: { Foo bar(); <suite> } This is perhaps the most common source of errors in C++ code. If we use C++ in the NumPy core, we need a Nazi regime against these type of obscure errors. Sturla

Neal Becker

12:25 p.m.

It's great advice to say avoid using new instead rely on scope and classes such as std::vector. I just want to point out, that sometimes objects must outlive scope. For those cases, std::shared_ptr can be helpful.

Ralf Gommers

9:46 a.m.

On Thu, Feb 16, 2012 at 11:39 PM, Travis Oliphant <travis@continuum.io>wrote:

...

Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires:

* addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy

I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process.

As a result, I am proposing a rough outline for releases over the next year:

* NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those.

* NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding "filters" to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism

For some of these things it's not entirely (or at all, what's a meta-object?) clear to me what they mean or how they would work. How do you plan to go about working on these features? One NEP per feature? Ralf

Travis Oliphant

4:32 p.m.

Yes. Basically, one NEP per feature. Some of them might be merged. The NEP will be an outline and overview and then fleshed out as the code is developed in a branch. Some of the NEPs will be more detailed than others a first of course. I just wanted to provide a preview about the kind of things I see needed in the code. The details will emerge in the coming weeks and months. Thanks, Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 18, 2012, at 3:46 AM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:

...

On Thu, Feb 16, 2012 at 11:39 PM, Travis Oliphant <travis@continuum.io> wrote: Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires:

* addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy

I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process.

As a result, I am proposing a rough outline for releases over the next year:

* NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those.

* NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding "filters" to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism

For some of these things it's not entirely (or at all, what's a meta-object?) clear to me what they mean or how they would work. How do you plan to go about working on these features? One NEP per feature?

Ralf _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Travis Oliphant

9:57 p.m.

...

* NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding "filters" to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism

For some of these things it's not entirely (or at all, what's a meta-object?) clear to me what they mean or how they would work. How do you plan to go about working on these features? One NEP per feature?

I thought I responded to this already, but it might have been from a different mail server.... Yes, these will each be discussed in course as they are developed. I just wanted to get an outline started. More detail will come out on each feature as development proceeds. There is a larger list of features that we will be suggesting and discussing in the months ahead as NumPy 2.0 development is proposed and discussed. But, this list includes things that are fairly straightforward to implement in the current data-model and calculation infrastructure. There is a lot of criticism of the C-code which is welcome. I wrote *a lot* of that code --- inspired by and following patterns laid out by other people. I am always interested in specific improvement ideas and/or proposals, as are most people. I especially appreciate targeted, constructive comments and not just general FUD. There has been some criticism of the C-API documentation. After I gave away the content of my book, Guide to NumPy, 3 years ago: Joe Harrington and others adapted it to the web. The C-API portion which was documented in my book (see starting with page 211 at http://www.tramy.us/numpybook.pdf). This material is now available online as well (where it has received updates and improvements): http://docs.scipy.org/doc/numpy/reference/c-api.array.html There are under-documented sections of the code --- usually these are in areas where adoption has driven demand for an understanding of those features (adding new dtypes and array scalars, for example). In addition, there are always improvements to be made to the way something is said and described (and there are different ways people like to be taught). The C/C++ discussion is just getting started. Everyone should keep in mind that this is not something that is going to happening quickly. This will be a point of discussion throughout the year. I'm not a huge supporter of C++, but C++11 does look like it's made some nice progress, and as I think about making a core-set of NumPy into a library that can be called by multiple languages (and even multiple implementations of Python), tempered C++ seems like it might be an appropriate way to go. Cython could be useful for Python interfaces to that Core and for extension modules on top, but Cython is *not* a solution for the core of NumPy. It was entertained as we did the IronPython work, but realized it would have taken too long. I'm actually quite glad that we didn't go that direction, now. Cython is a nice project, and I think will play a role in the stack that emerges, but I am more interested in an eventual NumPy core that does not rely on the Python C-API. Another thing that I would like to see happen for NumPy 1.8 is the use of bento by default for the build --- and encouraging down-stream projects to use it as well. We should deprecate as much of numpy.distutils as possilbe, in my mind. What happens during build is pretty hard to understand partly because distutils never really supported building complex extension modules --- that community is still pretty hostile to the needs of extension writers with a real build problem on their hands. We have gotten by with numpy.distutils, but it has not been the easiest thing to adapt. -Travis

Matthew Brett

10:03 p.m.

Hi, On Sat, Feb 18, 2012 at 1:57 PM, Travis Oliphant <travis@continuum.io> wrote:

...

The C/C++ discussion is just getting started. Everyone should keep in mind that this is not something that is going to happening quickly. This will be a point of discussion throughout the year. I'm not a huge supporter of C++, but C++11 does look like it's made some nice progress, and as I think about making a core-set of NumPy into a library that can be called by multiple languages (and even multiple implementations of Python), tempered C++ seems like it might be an appropriate way to go.

Could you say more about this? Do you have any idea when the decision about C++ is likely to be made? At what point does it make most sense to make the argument for or against? Can you suggest a good way for us to be able to make more substantial arguments either way? Can you say a little more about your impression of the previous Cython refactor and why it was not successful? Thanks a lot, Matthew

Travis Oliphant

10:54 p.m.

On Feb 18, 2012, at 4:03 PM, Matthew Brett wrote:

...

Hi,

On Sat, Feb 18, 2012 at 1:57 PM, Travis Oliphant <travis@continuum.io> wrote:

...
The C/C++ discussion is just getting started. Everyone should keep in mind that this is not something that is going to happening quickly. This will be a point of discussion throughout the year. I'm not a huge supporter of C++, but C++11 does look like it's made some nice progress, and as I think about making a core-set of NumPy into a library that can be called by multiple languages (and even multiple implementations of Python), tempered C++ seems like it might be an appropriate way to go.

Could you say more about this? Do you have any idea when the decision about C++ is likely to be made? At what point does it make most sense to make the argument for or against? Can you suggest a good way for us to be able to make more substantial arguments either way?

I think early arguments against are always appropriate --- if you believe they have a chance of swaying Mark or Chuck who are the strongest supporters of C++ at this point. I will be quite nervous about going crazy with C++. It was suggested that I use C++ 7 years ago when I wrote NumPy. I didn't go that route then largely because of compiler issues, ABI-concerns, and I knew C better than C++ so I felt like it would have taken me longer to do something in C++. I made the right decision for me. If you think my C-code is horrible, you would have been completely offended by whatever C++ I might have done at the time. But I basically agree with Chuck that there is a lot of C-code in NumPy and template-based-code that is really trying to be C++ spelled differently. The decision will not be made until NumPy 2.0 work is farther along. The most likely outcome is that Mark will develop something quite nice in C++ which he is already toying with, and we will either choose to use it in NumPy to build 2.0 on --- or not. I'm interested in sponsoring Mark and working as closely as I can with he and Chuck to see what emerges. I'm reading very carefully any arguments against using C++ because I've actually pushed back on Mark pretty hard as we've discussed these things over the past months. I am nervous about corner use-cases that will be unpleasant for some groups and some platforms. But, that vague nervousness is not enough to discount the clear benefits. I'm curious about the state of C++ compilers for Blue-Gene and other big-iron machines as well. My impression is that most of them use g++. which has pretty good support for C++. David and others raised some important concerns (merging multiple compilers seems like the biggest issue --- it already is...). If someone out there seriously opposes judicious and careful use of C++ and can show a clear reason why it would be harmful --- feel free to speak up at any time. We are leaning that way with Mark out in front of us leading the charge.

...

Can you say a little more about your impression of the previous Cython refactor and why it was not successful?

Sure. This list actually deserves a long writeup about that. First, there wasn't a "Cython-refactor" of NumPy. There was a Cython-refactor of SciPy. I'm not sure of it's current status. I'm still very supportive of that sort of thing. I don't know if Cython ever solved the "raising an exception in a Fortran-called call-back" issue. I used setjmp and longjmp in several places in SciPy originally in order to enable exceptions raised in a Python-callback that is wrapped in a C-function pointer and being handed to a Fortran-routine that asks for a function-pointer. What happend in NumPy, was that the code was re-factored to become a library. I don't think much NumPy code actually ended up in Cython (the random-number generators have been in Cython from the beginning). The biggest problem with merging the code was that Mark Wiebe got active at about that same time :-) He ended up changing several things in the code-base that made it difficult to merge-in the changes. Some of the bug-fixes and memory-leak patches, and tests did get into the code-base, but the essential creation of the NumPy library did not make it. There was some very good work done that I hope we can still take advantage of. Another factor. the decision to make an extra layer of indirection makes small arrays that much slower. I agree with Mark that in a core library we need to go the other way with small arrays being completely allocated in the data-structure itself (reducing the number of pointer de-references). So, Cython did not play a major role on the NumPy side of things. It played a very nice role on the SciPy side of things. -Travis

...

Thanks a lot,

Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Sturla Molden

12:07 a.m.

Den 18.02.2012 23:54, skrev Travis Oliphant:

...

Another factor. the decision to make an extra layer of indirection makes small arrays that much slower. I agree with Mark that in a core library we need to go the other way with small arrays being completely allocated in the data-structure itself (reducing the number of pointer de-references).

I am not sure there is much overhead to double *const data = (double*)PyArray_DATA(array); If C code calls PyArray_DATA(array) more than needed, the fix is not to store the data inside the struct, but rather fix the real problem. For example, the Cython syntax for NumPy arrays will under the hood unbox the ndarray struct into local variables. That gives the fastest data access. The NumPy core could e.g. have macros that takes care of the unboxing. But for the purpose of cache use, it could be smart to make sure the data buffer is allocated directly after the PyObject struct (or at least in vicinity of it), so it will be loaded into cache along with the PyObject. That is, prefetched before dereferencing PyArray_DATA(array). But with respect to placement we must keep in mind the the PyObject can be subclassed. Putting e.g. 4 kb of static buffer space inside the PyArrayObject struct will bloat every ndarray. Sturla

Nathaniel Smith

12:12 a.m.

On Sat, Feb 18, 2012 at 10:54 PM, Travis Oliphant <travis@continuum.io> wrote:

...

I'm reading very carefully any arguments against using C++ because I've actually pushed back on Mark pretty hard as we've discussed these things over the past months. I am nervous about corner use-cases that will be unpleasant for some groups and some platforms. But, that vague nervousness is not enough to discount the clear benefits. I'm curious about the state of C++ compilers for Blue-Gene and other big-iron machines as well. My impression is that most of them use g++. which has pretty good support for C++. David and others raised some important concerns (merging multiple compilers seems like the biggest issue --- it already is...). If someone out there seriously opposes judicious and careful use of C++ and can show a clear reason why it would be harmful --- feel free to speak up at any time. We are leaning that way with Mark out in front of us leading the charge.

I don't oppose it, but I admit I'm not really clear on what the supposed advantages would be. Everyone seems to agree that -- Only a carefully-chosen subset of C++ features should be used -- But this subset would be pretty useful I wonder if anyone is actually thinking of the same subset :-). Chuck mentioned iterators as one advantage. I don't understand, since iterators aren't even a C++ feature, they're just objects with "next" and "dereference" operators. The only difference between these is spelling: for (my_iter i = foo.begin(); i != foo.end(); ++i) { ... } for (my_iter i = my_iter_begin(foo); !my_iter_ended(&i); my_iter_next(&i)) { ... } So I assume he's thinking about something more, but the discussion has been too high-level for me to figure out what. Using C++ templates to generate ufunc loops is an obvious application, but again, in the simple examples I'm thinking of (e.g., the stuff in numpy/core/src/umath/loops.c.src), this pretty much comes down to whether we want to spell the function names like "SHORT_add" or "add<short>", and write the code like "*(T *))x[0] + ((T *)y)[0]" or "((@TYPE@ *)x)[0] + ((@TYPE@ *)y)[0]". Maybe there are other places where we'd get some advantage from the compiler knowing what was going on, like if we're doing type-based dispatch to overloaded functions, but I don't know if that'd be useful for the templates we actually use. RAII is pretty awesome, and RAII smart-pointers might help a lot with getting reference-counting right. OTOH, you really only need RAII if you're using exceptions; otherwise, the goto-failure pattern usually works pretty well, esp. if used systematically. Do we know that the Python memory allocator plays well with the C++ allocation interfaces on all relevant systems? (Potentially you have to know for every pointer whether it was allocated by new, new[], malloc, or PyMem_Malloc, because they all have different deallocation functions. This is already an issue for malloc versus PyMem_Malloc, but C++ makes it worse.) Again, it really doesn't matter to me personally which approach is chosen. But getting more concrete might be useful... -- Nathaniel

Charles R Harris

12:24 a.m.

On Sat, Feb 18, 2012 at 5:12 PM, Nathaniel Smith <njs@pobox.com> wrote:

...

On Sat, Feb 18, 2012 at 10:54 PM, Travis Oliphant <travis@continuum.io> wrote:

...
I'm reading very carefully any arguments against using C++ because I've actually pushed back on Mark pretty hard as we've discussed these things over the past months. I am nervous about corner use-cases that will be unpleasant for some groups and some platforms. But, that vague nervousness is not enough to discount the clear benefits. I'm curious about the state of C++ compilers for Blue-Gene and other big-iron machines as well. My impression is that most of them use g++. which has pretty good support for C++. David and others raised some important concerns (merging multiple compilers seems like the biggest issue --- it already is...). If someone out there seriously opposes judicious and careful use of C++ and can show a clear reason why it would be harmful --- feel free to speak up at any time. We are leaning that way with Mark out in front of us leading the charge.

I don't oppose it, but I admit I'm not really clear on what the supposed advantages would be. Everyone seems to agree that -- Only a carefully-chosen subset of C++ features should be used -- But this subset would be pretty useful I wonder if anyone is actually thinking of the same subset :-).

Chuck mentioned iterators as one advantage. I don't understand, since iterators aren't even a C++ feature, they're just objects with "next" and "dereference" operators. The only difference between these is spelling: for (my_iter i = foo.begin(); i != foo.end(); ++i) { ... } for (my_iter i = my_iter_begin(foo); !my_iter_ended(&i); my_iter_next(&i)) { ... } So I assume he's thinking about something more, but the discussion has been too high-level for me to figure out what.

They are classes, data with methods in one cute little bundle.

...

Using C++ templates to generate ufunc loops is an obvious application, but again, in the simple examples I'm thinking of (e.g., the stuff in numpy/core/src/umath/loops.c.src), this pretty much comes down to whether we want to spell the function names like "SHORT_add" or "add<short>", and write the code like "*(T *))x[0] + ((T *)y)[0]" or "((@TYPE@ *)x)[0] + ((@TYPE@ *)y)[0]". Maybe there are other places where we'd get some advantage from the compiler knowing what was going on, like if we're doing type-based dispatch to overloaded functions, but I don't know if that'd be useful for the templates we actually use.

RAII is pretty awesome, and RAII smart-pointers might help a lot with getting reference-counting right. OTOH, you really only need RAII if you're using exceptions; otherwise, the goto-failure pattern usually works pretty well, esp. if used systematically.

That's more like having destructors. Let the compiler do it, part of useful code abstraction is to hide those sort of sordid details.

...

Do we know that the Python memory allocator plays well with the C++ allocation interfaces on all relevant systems? (Potentially you have to know for every pointer whether it was allocated by new, new[], malloc, or PyMem_Malloc, because they all have different deallocation functions. This is already an issue for malloc versus PyMem_Malloc, but C++ makes it worse.)

I think the low level library will ignore the Python memory allocator, but there is a template for allocators that makes them selectable.

...

Again, it really doesn't matter to me personally which approach is chosen. But getting more concrete might be useful...

Agreed. I think much will be clarified once there is some actual code to look at. Chuck

Sturla Molden

12:35 a.m.

Den 19.02.2012 01:12, skrev Nathaniel Smith:

...

I don't oppose it, but I admit I'm not really clear on what the supposed advantages would be. Everyone seems to agree that -- Only a carefully-chosen subset of C++ features should be used -- But this subset would be pretty useful I wonder if anyone is actually thinking of the same subset :-).

Probably not, everybody have their own favourite subset.

...

Chuck mentioned iterators as one advantage. I don't understand, since iterators aren't even a C++ feature, they're just objects with "next" and "dereference" operators. The only difference between these is spelling: for (my_iter i = foo.begin(); i != foo.end(); ++i) { ... } for (my_iter i = my_iter_begin(foo); !my_iter_ended(&i); my_iter_next(&i)) { ... } So I assume he's thinking about something more, but the discussion has been too high-level for me to figure out what.

C++11 has this option: for (auto& item : container) { // iterate over the container object, // get a reference to each item // // "container" can be an STL class or // A C-style array with known size. } Which does this: for item in container: pass

...

Using C++ templates to generate ufunc loops is an obvious application, but again, in the simple examples

Template metaprogramming? Don't even think about it. It is brain dead to try to outsmart the compiler. Sturla

Neal Becker

1:27 p.m.

Sturla Molden wrote:

...

Den 19.02.2012 01:12, skrev Nathaniel Smith:

...
I don't oppose it, but I admit I'm not really clear on what the supposed advantages would be. Everyone seems to agree that -- Only a carefully-chosen subset of C++ features should be used -- But this subset would be pretty useful I wonder if anyone is actually thinking of the same subset :-).

Probably not, everybody have their own favourite subset.

...
Chuck mentioned iterators as one advantage. I don't understand, since iterators aren't even a C++ feature, they're just objects with "next" and "dereference" operators. The only difference between these is spelling: for (my_iter i = foo.begin(); i != foo.end(); ++i) { ... } for (my_iter i = my_iter_begin(foo); !my_iter_ended(&i); my_iter_next(&i)) { ... } So I assume he's thinking about something more, but the discussion has been too high-level for me to figure out what.

I find range interface (i.e., boost::range) is far more useful than raw iterator interface. I always write all my algorithms using this abstraction.

Matthieu Brucher

5:19 p.m.

...

C++11 has this option:

for (auto& item : container) { // iterate over the container object, // get a reference to each item // // "container" can be an STL class or // A C-style array with known size. }

Which does this:

for item in container: pass

It is even better than using the macro way because the compiler knows everything is constant (start and end), so it can do better things.

...

...
Using C++ templates to generate ufunc loops is an obvious application, but again, in the simple examples

Template metaprogramming?

Don't even think about it. It is brain dead to try to outsmart the compiler.

It is really easy to outsmart the compiler. Really. I use metaprogramming for loop creation to optimize cache behavior, communication in parallel environments, and there is no way the compiler would have done things as efficiently (and there is a lot of leeway to enhance my code). -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher

Matthew Brett

1:18 a.m.

Hi, On Sat, Feb 18, 2012 at 2:54 PM, Travis Oliphant <travis@continuum.io> wrote:

...

On Feb 18, 2012, at 4:03 PM, Matthew Brett wrote:

...
Hi,

On Sat, Feb 18, 2012 at 1:57 PM, Travis Oliphant <travis@continuum.io> wrote:

...
The C/C++ discussion is just getting started. Everyone should keep in mind that this is not something that is going to happening quickly. This will be a point of discussion throughout the year. I'm not a huge supporter of C++, but C++11 does look like it's made some nice progress, and as I think about making a core-set of NumPy into a library that can be called by multiple languages (and even multiple implementations of Python), tempered C++ seems like it might be an appropriate way to go.

Could you say more about this? Do you have any idea when the decision about C++ is likely to be made? At what point does it make most sense to make the argument for or against? Can you suggest a good way for us to be able to make more substantial arguments either way?

I think early arguments against are always appropriate --- if you believe they have a chance of swaying Mark or Chuck who are the strongest supporters of C++ at this point. I will be quite nervous about going crazy with C++. It was suggested that I use C++ 7 years ago when I wrote NumPy. I didn't go that route then largely because of compiler issues, ABI-concerns, and I knew C better than C++ so I felt like it would have taken me longer to do something in C++. I made the right decision for me. If you think my C-code is horrible, you would have been completely offended by whatever C++ I might have done at the time.

But I basically agree with Chuck that there is a lot of C-code in NumPy and template-based-code that is really trying to be C++ spelled differently.

The decision will not be made until NumPy 2.0 work is farther along. The most likely outcome is that Mark will develop something quite nice in C++ which he is already toying with, and we will either choose to use it in NumPy to build 2.0 on --- or not. I'm interested in sponsoring Mark and working as closely as I can with he and Chuck to see what emerges.

Would it be fair to say then, that you are expecting the discussion about C++ will mainly arise after the Mark has written the code? I can see that it will be easier to specific at that point, but there must be a serious risk that it will be too late to seriously consider an alternative approach.

...

...
Can you say a little more about your impression of the previous Cython refactor and why it was not successful?

Sure. This list actually deserves a long writeup about that. First, there wasn't a "Cython-refactor" of NumPy. There was a Cython-refactor of SciPy. I'm not sure of it's current status. I'm still very supportive of that sort of thing.

I think I missed that - is it on git somewhere?

...

I don't know if Cython ever solved the "raising an exception in a Fortran-called call-back" issue. I used setjmp and longjmp in several places in SciPy originally in order to enable exceptions raised in a Python-callback that is wrapped in a C-function pointer and being handed to a Fortran-routine that asks for a function-pointer.

What happend in NumPy, was that the code was re-factored to become a library. I don't think much NumPy code actually ended up in Cython (the random-number generators have been in Cython from the beginning).

The biggest problem with merging the code was that Mark Wiebe got active at about that same time :-) He ended up changing several things in the code-base that made it difficult to merge-in the changes. Some of the bug-fixes and memory-leak patches, and tests did get into the code-base, but the essential creation of the NumPy library did not make it. There was some very good work done that I hope we can still take advantage of.

...

Another factor. the decision to make an extra layer of indirection makes small arrays that much slower. I agree with Mark that in a core library we need to go the other way with small arrays being completely allocated in the data-structure itself (reducing the number of pointer de-references

Does that imply there was a review of the refactor at some point to do things like benchmarking? Are there any sources to get started trying to understand the nature of the Numpy refactor and where it ran into trouble? Was it just the small arrays?

...

So, Cython did not play a major role on the NumPy side of things. It played a very nice role on the SciPy side of things.

I guess Cython was attractive because the desire was to make a stand-alone library? If that is still the goal, presumably that excludes Cython from serious consideration? What are the primary advantages of making the standalone library? Are there any serious disbenefits? Thanks a lot for the reply, Matthew

Matthew Brett

1:32 a.m.

On Sat, Feb 18, 2012 at 5:18 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...

Hi,

On Sat, Feb 18, 2012 at 2:54 PM, Travis Oliphant <travis@continuum.io> wrote:

...
On Feb 18, 2012, at 4:03 PM, Matthew Brett wrote:

...
Hi,

On Sat, Feb 18, 2012 at 1:57 PM, Travis Oliphant <travis@continuum.io> wrote:

...
The C/C++ discussion is just getting started. Everyone should keep in mind that this is not something that is going to happening quickly. This will be a point of discussion throughout the year. I'm not a huge supporter of C++, but C++11 does look like it's made some nice progress, and as I think about making a core-set of NumPy into a library that can be called by multiple languages (and even multiple implementations of Python), tempered C++ seems like it might be an appropriate way to go.

Could you say more about this? Do you have any idea when the decision about C++ is likely to be made? At what point does it make most sense to make the argument for or against? Can you suggest a good way for us to be able to make more substantial arguments either way?

I think early arguments against are always appropriate --- if you believe they have a chance of swaying Mark or Chuck who are the strongest supporters of C++ at this point. I will be quite nervous about going crazy with C++. It was suggested that I use C++ 7 years ago when I wrote NumPy. I didn't go that route then largely because of compiler issues, ABI-concerns, and I knew C better than C++ so I felt like it would have taken me longer to do something in C++. I made the right decision for me. If you think my C-code is horrible, you would have been completely offended by whatever C++ I might have done at the time.

But I basically agree with Chuck that there is a lot of C-code in NumPy and template-based-code that is really trying to be C++ spelled differently.

The decision will not be made until NumPy 2.0 work is farther along. The most likely outcome is that Mark will develop something quite nice in C++ which he is already toying with, and we will either choose to use it in NumPy to build 2.0 on --- or not. I'm interested in sponsoring Mark and working as closely as I can with he and Chuck to see what emerges.

Would it be fair to say then, that you are expecting the discussion about C++ will mainly arise after the Mark has written the code? I can see that it will be easier to specific at that point, but there must be a serious risk that it will be too late to seriously consider an alternative approach.

...
...
Can you say a little more about your impression of the previous Cython refactor and why it was not successful?

Sure. This list actually deserves a long writeup about that. First, there wasn't a "Cython-refactor" of NumPy. There was a Cython-refactor of SciPy. I'm not sure of it's current status. I'm still very supportive of that sort of thing.

I think I missed that - is it on git somewhere?

...
I don't know if Cython ever solved the "raising an exception in a Fortran-called call-back" issue. I used setjmp and longjmp in several places in SciPy originally in order to enable exceptions raised in a Python-callback that is wrapped in a C-function pointer and being handed to a Fortran-routine that asks for a function-pointer.

What happend in NumPy, was that the code was re-factored to become a library. I don't think much NumPy code actually ended up in Cython (the random-number generators have been in Cython from the beginning).

The biggest problem with merging the code was that Mark Wiebe got active at about that same time :-) He ended up changing several things in the code-base that made it difficult to merge-in the changes. Some of the bug-fixes and memory-leak patches, and tests did get into the code-base, but the essential creation of the NumPy library did not make it. There was some very good work done that I hope we can still take advantage of.

...
Another factor. the decision to make an extra layer of indirection makes small arrays that much slower. I agree with Mark that in a core library we need to go the other way with small arrays being completely allocated in the data-structure itself (reducing the number of pointer de-references

Does that imply there was a review of the refactor at some point to do things like benchmarking? Are there any sources to get started trying to understand the nature of the Numpy refactor and where it ran into trouble? Was it just the small arrays?

...
So, Cython did not play a major role on the NumPy side of things. It played a very nice role on the SciPy side of things.

I guess Cython was attractive because the desire was to make a

Sorry - that should read "I guess Cython was _not_ attractive ... "

...

stand-alone library? If that is still the goal, presumably that excludes Cython from serious consideration? What are the primary advantages of making the standalone library? Are there any serious disbenefits?

Best, Matthew

Travis Oliphant

4:38 a.m.

...

...
The decision will not be made until NumPy 2.0 work is farther along. The most likely outcome is that Mark will develop something quite nice in C++ which he is already toying with, and we will either choose to use it in NumPy to build 2.0 on --- or not. I'm interested in sponsoring Mark and working as closely as I can with he and Chuck to see what emerges.

Would it be fair to say then, that you are expecting the discussion about C++ will mainly arise after the Mark has written the code? I can see that it will be easier to specific at that point, but there must be a serious risk that it will be too late to seriously consider an alternative approach.

We will need to see examples of what Mark is talking about and clarify some of the compiler issues. Certainly there is some risk that once code is written that it will be tempting to just use it. Other approaches are certainly worth exploring in the mean-time, but C++ has some strong arguments for it.

...

...
...
Can you say a little more about your impression of the previous Cython refactor and why it was not successful?

Sure. This list actually deserves a long writeup about that. First, there wasn't a "Cython-refactor" of NumPy. There was a Cython-refactor of SciPy. I'm not sure of it's current status. I'm still very supportive of that sort of thing.

I think I missed that - is it on git somewhere?

I thought so, but I can't find it either. We should ask Jason McCampbell of Enthought where the code is located. Here are the distributed eggs: http://www.enthought.com/repo/.iron/ -Travis

...

...
Another factor. the decision to make an extra layer of indirection makes small arrays that much slower. I agree with Mark that in a core library we need to go the other way with small arrays being completely allocated in the data-structure itself (reducing the number of pointer de-references

Does that imply there was a review of the refactor at some point to do things like benchmarking? Are there any sources to get started trying to understand the nature of the Numpy refactor and where it ran into trouble? Was it just the small arrays?

The main trouble was just the pace of development of NumPy and the divergence of the trees so that the re-factor branch did not keep up. It's changes were quite extensive, and so were some of Mark's. So, that created the difficulty in merging them together. Mark's review of the re-factor was that small-array support was going to get worse. I'm not sure if we ever did any bench-marking in that direction.

...

...
So, Cython did not play a major role on the NumPy side of things. It played a very nice role on the SciPy side of things.

I guess Cython was attractive because the desire was to make a stand-alone library? If that is still the goal, presumably that excludes Cython from serious consideration? What are the primary advantages of making the standalone library? Are there any serious disbenefits?

From my perspective having a standalone core NumPy is still a goal. The primary advantages of having a NumPy library (call it NumLib for the sake of argument) are 1) Ability for projects like PyPy, IronPython, and Jython to use it more easily 2) Ability for Ruby, Perl, Node.JS, and other new languages to use the code for their technical computing projects. 3) increasing the number of users who can help make it more solid 4) being able to build the user-base (and corresponding performance with eye-balls from Intel, NVidia, AMD, Microsoft, Google, etc. looking at the code). The disadvantages I can think of: 1) More users also means we might risk "lowest-commond-denominator" problems --- i.e. trying to be too much to too many may make it not useful for anyone. Also, more users means more people with opinions that might be difficult to re-concile. 2) The work of doing the re-write is not small: probably at least 6 person-months 3) Not being able to rely on Python objects (dictionaries, lists, and tuples are currently used in the code-base quite a bit --- though the re-factor did show some examples of how to remove this usage). 4) Handling of "Object" arrays requires some re-design. I'm sure there are other factors that could be added to both lists. -Travis

...

Thanks a lot for the reply,

Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Matthew Brett

5:19 a.m.

Hi, On Sat, Feb 18, 2012 at 8:38 PM, Travis Oliphant <travis@continuum.io> wrote:

...

We will need to see examples of what Mark is talking about and clarify some of the compiler issues. Certainly there is some risk that once code is written that it will be tempting to just use it. Other approaches are certainly worth exploring in the mean-time, but C++ has some strong arguments for it.

The worry as I understand it is that a C++ rewrite might make the numpy core effectively a read-only project for anyone but Mark. Do you have any feeling for whether that is likely?

...

I thought so, but I can't find it either. We should ask Jason McCampbell of Enthought where the code is located. Here are the distributed eggs: http://www.enthought.com/repo/.iron/

Should I email him? Happy to do that.

...

From my perspective having a standalone core NumPy is still a goal. The primary advantages of having a NumPy library (call it NumLib for the sake of argument) are

1) Ability for projects like PyPy, IronPython, and Jython to use it more easily 2) Ability for Ruby, Perl, Node.JS, and other new languages to use the code for their technical computing projects. 3) increasing the number of users who can help make it more solid 4) being able to build the user-base (and corresponding performance with eye-balls from Intel, NVidia, AMD, Microsoft, Google, etc. looking at the code).

The disadvantages I can think of: 1) More users also means we might risk "lowest-commond-denominator" problems --- i.e. trying to be too much to too many may make it not useful for anyone. Also, more users means more people with opinions that might be difficult to re-concile. 2) The work of doing the re-write is not small: probably at least 6 person-months 3) Not being able to rely on Python objects (dictionaries, lists, and tuples are currently used in the code-base quite a bit --- though the re-factor did show some examples of how to remove this usage). 4) Handling of "Object" arrays requires some re-design.

How would numpylib compare to libraries like eigen? How likely do you think it would be that unrelated projects would use numpylib rather than eigen or other numerical libraries? Do you think the choice of C++ rather than C will influence whether other projects will take it up? See you, Matthew

Benjamin Root

5:47 a.m.

On Saturday, February 18, 2012, Matthew Brett wrote:

...

Hi,

On Sat, Feb 18, 2012 at 8:38 PM, Travis Oliphant <travis@continuum.io<javascript:;>> wrote:

...
We will need to see examples of what Mark is talking about and clarify some of the compiler issues. Certainly there is some risk that once code is written that it will be tempting to just use it. Other approaches are certainly worth exploring in the mean-time, but C++ has some strong arguments for it.

The worry as I understand it is that a C++ rewrite might make the numpy core effectively a read-only project for anyone but Mark. Do you have any feeling for whether that is likely?

Dude, have you seen the .c files in numpy/core? They are already read-only for pretty much everybody but Mark. All kidding aside, is your concern that when Mark starts this that no one will be able to contribute until he is done? I can tell you right now that won't be the case as I will be trying to flesh out issues with datetime64 with him. Ben Root

Matthew Brett

6:11 a.m.

Hi, On Sat, Feb 18, 2012 at 9:47 PM, Benjamin Root <ben.root@ou.edu> wrote:

...

On Saturday, February 18, 2012, Matthew Brett wrote:

...
Hi,

On Sat, Feb 18, 2012 at 8:38 PM, Travis Oliphant <travis@continuum.io> wrote:

...
We will need to see examples of what Mark is talking about and clarify some of the compiler issues. Certainly there is some risk that once code is written that it will be tempting to just use it. Other approaches are certainly worth exploring in the mean-time, but C++ has some strong arguments for it.

The worry as I understand it is that a C++ rewrite might make the numpy core effectively a read-only project for anyone but Mark. Do you have any feeling for whether that is likely?

Dude, have you seen the .c files in numpy/core? They are already read-only for pretty much everybody but Mark.

I think the question is whether refactoring in C would be preferable to refactoring in C++.

...

All kidding aside, is your concern that when Mark starts this that no one will be able to contribute until he is done? I can tell you right now that won't be the case as I will be trying to flesh out issues with datetime64 with him.

No - can I refer you back to the emails from David in particular about the difficulties of sharing development in C++? I can find the links - but do you remember the ones I'm referring to? See you, Matthew

Ralf Gommers

8:44 a.m.

On Sun, Feb 19, 2012 at 6:47 AM, Benjamin Root <ben.root@ou.edu> wrote:

...

All kidding aside, is your concern that when Mark starts this that no one will be able to contribute until he is done? I can tell you right now that won't be the case as I will be trying to flesh out issues with datetime64 with him.

If you're interested in that, you may be interested in https://github.com/numpy/numpy/pull/156. It's about datetime behavior and compile issues, which are the main reason we can't have a 1.7 release right now. Ralf

David Warde-Farley

10:44 a.m.

On 2012-02-19, at 12:47 AM, Benjamin Root wrote:

...

Dude, have you seen the .c files in numpy/core? They are already read-only for pretty much everybody but Mark.

I've managed to patch several of them without incident, and I do not do a lot of programming in C. It could be simpler, but it's not really a big deal to navigate once you've spent some time reading it. I think the comments about the developer audience NumPy will attract are important. There may be lots of C++ developers out there, but the intersection of (truly competent in C++) and (likely to involve oneself in NumPy development) may well be quite small. David

Gael Varoquaux

10:18 p.m.

On Sun, Feb 19, 2012 at 05:44:27AM -0500, David Warde-Farley wrote:

...

I think the comments about the developer audience NumPy will attract are important. There may be lots of C++ developers out there, but the intersection of (truly competent in C++) and (likely to involve oneself in NumPy development) may well be quite small.

That's a very valid concern. It is reminiscent of a possible cause to our lack of contributors to Mayavi: contributing to Mayavi requires knowing VTK. One of the major benefits of Mayavi is that it makes it is to use the power of VTK without understanding it well. The intersection of the people interested in using Mayavi and able to contribute to it is almost empty. This is stricking to me, because I know a lot of who know VTK well. Most of them couldn't care less for Mayavi: they are happy coding directly in VTK in C++. This is also a reason why I don't code UIs any more: I simply cannot find the resource to maintain them in proportion with the number of users that they garner. A sad statement. Gael

Perry Greenfield

1:44 p.m.

I, like Travis, have my worries about C++. But if those actually doing the work (and particularly the subsequent support) feel it is the best language for implementation, I can live with that. I particularly like the incremental and conservative approach to introducing C++ that was proposed by Mark. What I would like to stress in doing this that all along that process, extensive testing is performed (preferably with some build-bot process) to ensure that whatever C++ features are being introduced are fully portable and don't present intractable distribution issues. Whatever we do, we don't want to go far down that road only to find out that there is no good solution in that regard with certain platforms. We are particularly sensitive to this issue since we distribute our software, and anything that makes installation of numpy problematic is a very serious issue for us. It has to be an easy install on all common platforms. That is one thing C allowed, despite all its flaws, which is near universal installation advantages over any other language available. If the appropriate subset of C++ can achieve that, great. But it has to be proved continuously as it is incrementally adopted. (I'm not much persuaded by comments like "my experience has shown it not to be a problem") Is there any disagreement with this? It's less clear to me what to do about more unusual platforms. It seems to me that some sort of testing against those that may prove important in the future (e.g., gpus?) will be needed, but how to do this is not clear to me. Perry

Charles R Harris

6:27 p.m.

Hi Perry, On Wed, Feb 22, 2012 at 6:44 AM, Perry Greenfield <perry@stsci.edu> wrote:

...

I, like Travis, have my worries about C++. But if those actually doing the work (and particularly the subsequent support) feel it is the best language for implementation, I can live with that.

I particularly like the incremental and conservative approach to introducing C++ that was proposed by Mark. What I would like to stress in doing this that all along that process, extensive testing is performed (preferably with some build-bot process) to ensure that whatever C++ features are being introduced are fully portable and don't present intractable distribution issues. Whatever we do, we don't want to go far down that road only to find out that there is no good solution in that regard with certain platforms.

We are particularly sensitive to this issue since we distribute our software, and anything that makes installation of numpy problematic is a very serious issue for us. It has to be an easy install on all common platforms. That is one thing C allowed, despite all its flaws, which is near universal installation advantages over any other language available. If the appropriate subset of C++ can achieve that, great. But it has to be proved continuously as it is incrementally adopted. (I'm not much persuaded by comments like "my experience has shown it not to be a problem")

Is there any disagreement with this?

It's less clear to me what to do about more unusual platforms. It seems to me that some sort of testing against those that may prove important in the future (e.g., gpus?) will be needed, but how to do this is not clear to me.

Your group has been one of the best for testing numpy. What systems do you support at this time? Chuck

Matthieu Brucher

5:24 p.m.

2012/2/19 Matthew Brett <matthew.brett@gmail.com>

...

Hi,

On Sat, Feb 18, 2012 at 8:38 PM, Travis Oliphant <travis@continuum.io> wrote:

...
We will need to see examples of what Mark is talking about and clarify some of the compiler issues. Certainly there is some risk that once code is written that it will be tempting to just use it. Other approaches are certainly worth exploring in the mean-time, but C++ has some strong arguments for it.

The worry as I understand it is that a C++ rewrite might make the numpy core effectively a read-only project for anyone but Mark. Do you have any feeling for whether that is likely?

Some of us are C developers, other are C++. It will depend on the background of each of us.

...

How would numpylib compare to libraries like eigen? How likely do you think it would be that unrelated projects would use numpylib rather than eigen or other numerical libraries? Do you think the choice of C++ rather than C will influence whether other projects will take it up?

I guess that the C++ port may open a door to change the back-end, and perhaps use Eigen, or ArBB. As those guys (ArBB) wanted to provided a Python interface compatible with Numpy to their VM, it may be interesting to be able to change back-ends (although it is limited to one platform and 2 OS). -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher

Charles R Harris

6:09 a.m.

On Sat, Feb 18, 2012 at 9:38 PM, Travis Oliphant <travis@continuum.io>wrote:

...

The decision will not be made until NumPy 2.0 work is farther along. The most likely outcome is that Mark will develop something quite nice in C++ which he is already toying with, and we will either choose to use it in NumPy to build 2.0 on --- or not. I'm interested in sponsoring Mark and working as closely as I can with he and Chuck to see what emerges.

Would it be fair to say then, that you are expecting the discussion about C++ will mainly arise after the Mark has written the code? I can see that it will be easier to specific at that point, but there must be a serious risk that it will be too late to seriously consider an alternative approach.

We will need to see examples of what Mark is talking about and clarify some of the compiler issues. Certainly there is some risk that once code is written that it will be tempting to just use it. Other approaches are certainly worth exploring in the mean-time, but C++ has some strong arguments for it.

Can you say a little more about your impression of the previous Cython

refactor and why it was not successful?

Sure. This list actually deserves a long writeup about that. First, there wasn't a "Cython-refactor" of NumPy. There was a Cython-refactor of SciPy. I'm not sure of it's current status. I'm still very supportive of that sort of thing.

I think I missed that - is it on git somewhere?

I thought so, but I can't find it either. We should ask Jason McCampbell of Enthought where the code is located. Here are the distributed eggs: http://www.enthought.com/repo/.iron/

Refactor is with the other numpy repos here<https://github.com/numpy/numpy-refactor>. Chuck

Matthew Brett

6:15 a.m.

Hi, On Sat, Feb 18, 2012 at 10:09 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

On Sat, Feb 18, 2012 at 9:38 PM, Travis Oliphant <travis@continuum.io> wrote:

...

...

...
Sure. This list actually deserves a long writeup about that. First, there wasn't a "Cython-refactor" of NumPy. There was a Cython-refactor of SciPy. I'm not sure of it's current status. I'm still very supportive of that sort of thing.

I think I missed that - is it on git somewhere?

I thought so, but I can't find it either. We should ask Jason McCampbell of Enthought where the code is located. Here are the distributed eggs: http://www.enthought.com/repo/.iron/

Refactor is with the other numpy repos here.

I think Travis is referring to the _scipy_ refactor here. I can't see that with the numpy repos, or with the scipy repos, but I may have missed it, See you, Matthew

Pauli Virtanen

3:35 p.m.

New subject: Scipy Cython refactor

19.02.2012 05:38, Travis Oliphant kirjoitti: [clip]

...

...
...
Sure. This list actually deserves a long writeup about that. First, there wasn't a "Cython-refactor" of NumPy. There was a Cython-refactor of SciPy. I'm not sure of it's current status. I'm still very supportive of that sort of thing.

I think I missed that - is it on git somewhere?

I thought so, but I can't find it either. We should ask Jason McCampbell of Enthought where the code is located. Here are the distributed eggs: http://www.enthought.com/repo/.iron/

They're here: https://github.com/dagss/private-scipy-refactor https://github.com/jasonmccampbell/scipy-refactor The main problem with merging this was the experimental status of FWrap, and the fact that the wrappers it generates are big compared to f2py and required manual editing of the generated code. So, there were maintainability concerns with the Fortran pieces. These could probably be solved, however, and I wouldn't be opposed to e.g. cleaning up the generated code and using manually crafted Cython. Cherry picking the Cython replacements for all the modules wrapped in C probably should be done in any case. The parts of Scipy affected by the refactoring have not changed significantly, so there are no significant problems in re-raising the issue of merging the work back. Pauli

Matthew Brett

8:32 p.m.

New subject: Scipy Cython refactor

Hi, On Sun, Feb 19, 2012 at 7:35 AM, Pauli Virtanen <pav@iki.fi> wrote:

...

19.02.2012 05:38, Travis Oliphant kirjoitti: [clip]

...
...
...
Sure. This list actually deserves a long writeup about that. First, there wasn't a "Cython-refactor" of NumPy. There was a Cython-refactor of SciPy. I'm not sure of it's current status. I'm still very supportive of that sort of thing.

I think I missed that - is it on git somewhere?

I thought so, but I can't find it either. We should ask Jason McCampbell of Enthought where the code is located. Here are the distributed eggs: http://www.enthought.com/repo/.iron/

They're here:

https://github.com/dagss/private-scipy-refactor https://github.com/jasonmccampbell/scipy-refactor

The main problem with merging this was the experimental status of FWrap, and the fact that the wrappers it generates are big compared to f2py and required manual editing of the generated code. So, there were maintainability concerns with the Fortran pieces.

These could probably be solved, however, and I wouldn't be opposed to e.g. cleaning up the generated code and using manually crafted Cython. Cherry picking the Cython replacements for all the modules wrapped in C probably should be done in any case.

The parts of Scipy affected by the refactoring have not changed significantly, so there are no significant problems in re-raising the issue of merging the work back.

Thanks for making a new thread. Who knows this work best? Who do you think should join the discussion to plan the work? I might have some time for this - maybe a sprint would be in order, Best, Matthew

Mark Wiebe

9:16 p.m.

New subject: Scipy Cython refactor

On Sun, Feb 19, 2012 at 7:35 AM, Pauli Virtanen <pav@iki.fi> wrote:

...

19.02.2012 05:38, Travis Oliphant kirjoitti: [clip]

...
...
...
Sure. This list actually deserves a long writeup about that. First, there wasn't a "Cython-refactor" of NumPy. There was a Cython-refactor of SciPy. I'm not sure of it's current status. I'm still very supportive of that sort of thing.

I think I missed that - is it on git somewhere?

I thought so, but I can't find it either. We should ask Jason McCampbell of Enthought where the code is located. Here are the distributed eggs: http://www.enthought.com/repo/.iron/

They're here:

https://github.com/dagss/private-scipy-refactor https://github.com/jasonmccampbell/scipy-refactor

The main problem with merging this was the experimental status of FWrap, and the fact that the wrappers it generates are big compared to f2py and required manual editing of the generated code. So, there were maintainability concerns with the Fortran pieces.

These could probably be solved, however, and I wouldn't be opposed to e.g. cleaning up the generated code and using manually crafted Cython. Cherry picking the Cython replacements for all the modules wrapped in C probably should be done in any case.

The parts of Scipy affected by the refactoring have not changed significantly, so there are no significant problems in re-raising the issue of merging the work back.

...

From the numpy roadmap discussion, the sparsetools code might be a good candidate for Cythonization. The 4.5MB of code SWIG is generating is mostly

parameter checking boilerplate, and if Cython lives up to its reputation, it will be able to easily make this smaller and compile a lot faster. It looks like neither of those two branches switched this code to Cython, unfortunately. -Mark

...

Pauli

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Matthieu Brucher

5:22 p.m.

...

Would it be fair to say then, that you are expecting the discussion about C++ will mainly arise after the Mark has written the code? I can see that it will be easier to specific at that point, but there must be a serious risk that it will be too late to seriously consider an alternative approach.

We will need to see examples of what Mark is talking about and clarify some of the compiler issues. Certainly there is some risk that once code is written that it will be tempting to just use it. Other approaches are certainly worth exploring in the mean-time, but C++ has some strong arguments for it.

Compilers for C++98 are now stable enough (except on Bluegene, see the Boost distribution with xlc++) C++ helps a lot to enhance robustness.ts?

...

From my perspective having a standalone core NumPy is still a goal. The primary advantages of having a NumPy library (call it NumLib for the sake of argument) are

1) Ability for projects like PyPy, IronPython, and Jython to use it more easily 2) Ability for Ruby, Perl, Node.JS, and other new languages to use the code for their technical computing projects. 3) increasing the number of users who can help make it more solid 4) being able to build the user-base (and corresponding performance with eye-balls from Intel, NVidia, AMD, Microsoft, Google, etc. looking at the code).

The disadvantages I can think of: 1) More users also means we might risk "lowest-commond-denominator" problems --- i.e. trying to be too much to too many may make it not useful for anyone. Also, more users means more people with opinions that might be difficult to re-concile. 2) The work of doing the re-write is not small: probably at least 6 person-months 3) Not being able to rely on Python objects (dictionaries, lists, and tuples are currently used in the code-base quite a bit --- though the re-factor did show some examples of how to remove this usage). 4) Handling of "Object" arrays requires some re-design.

I'm sure there are other factors that could be added to both lists.

-Travis

Thanks a lot for the reply,

Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher

Jason McCampbell

5:03 p.m.

...

Sure. This list actually deserves a long writeup about that. First, there wasn't a "Cython-refactor" of NumPy. There was a Cython-refactor of SciPy. I'm not sure of it's current status. I'm still very supportive of that sort of thing.

I think I missed that - is it on git somewhere?

I thought so, but I can't find it either. We should ask Jason McCampbell of Enthought where the code is located. Here are the distributed eggs: http://www.enthought.com/repo/.iron/

-Travis

Hi Travis and everyone, just cleaning up email and saw this question. The trees had been in my personal GitHub account prior to Enthought switching over. I forked them now and the paths are: https://github.com/enthought/numpy-refactor https://github.com/enthought/scipy-refactor The numpy code is on the 'refactor' branch. The master branch is dated but consistent (correct commit IDs) with the master NumPy repository on GitHub so the refactor branch should be able to be pushed to the main numpy account if desired. The scipy code was cloned from the subversion repository and so would either need to be moved back to svn or sync'd with any git migration. Jason

4727

Age (days ago)

4741

Last active (days ago)

List overview

Download

196 comments

37 participants

participants (37)

Benjamin Root
Bryan Van de Ven
Charles R Harris
Christopher Hanley
Christopher Jordan-Squire
Dag Sverre Seljebotn
Daniele Nicolodi
David Cournapeau
David Gowers (kampu)
David Warde-Farley
Eric Firing
Fernando Perez
Francesc Alted
Gael Varoquaux
James Bergstra
Jason Grout
Jason McCampbell
John Hunter
josef.pktd＠gmail.com
Lluís
Mark Wiebe
Matthew Brett
Matthieu Brucher
Nathaniel Smith
Neal Becker
Paul Anton Letnes
Pauli Virtanen
Perry Greenfield
Ralf Gommers
Robert Kern
Russell E. Owen
Samuel John
Sturla Molden
Stéfan van der Walt
Travis Oliphant
Warren Weckesser
xavier.gnata＠gmail.com