Mailman 3 Created NumPy 1.7.x branch - NumPy-Discussion

Created NumPy 1.7.x branch

older
Non-deterministic test failure in...

Travis Oliphant

June 21, 2012

10:11 a.m.

Hey all, I made a branch called with_maskna and then merged Nathaniel's PR which removes the mask_na support from master. I then applied a patch to fix the boolean indexing problem reported by Ralf. I then created a NumPy 1.7.x maintenance branch from which the release of NumPy 1.7 will be made. Ondrej Certik and I will be managing the release of NumPy 1.7. Ondrej is the author of SymPy and has agreed to help get NumPy 1.7 out the door. Thanks, Ondrej for being willing to help in this way. In principal only bug-fixes should be pushed to the NumPy 1.7 branch at this point. The target is to make a release of NumPy 1.7.x by July 9th. The schedule we will work for is: RC1 -- June 25 RC2 -- July 5 Release -- July 13 NumPy 1.7 is a significant release and has several changes many of which are documented in the release notes. Several new code paths were added which can have a subtle impact on code. As we make the release candidates, it will be very helpful to receive as much feedback as possible on how any changes affect your code. We will work on the release notes over the coming weeks so that they have as much information as possible. After NumPy 1.7, there is a NumPy 1.8 planned for later this year. Best regards, -Travis

Show replies by date

Pierre Haessig

June 2012

12:22 p.m.

Hi, Glad to see that 1.7 is coming soon ! Le 21/06/2012 12:11, Travis Oliphant a écrit :

...

NumPy 1.7 is a significant release and has several changes many of which are documented in the release notes. I browsed the sources on github and ended up here : https://github.com/numpy/numpy/tree/maintenance/1.7.x/doc/release

I didn't find release notes for 1.7 but there is a file for 2.0 which content suggest it applies to 1.7. https://github.com/numpy/numpy/blob/maintenance/1.7.x/doc/release/2.0.0-note... Is it indeed the file you mentioned ? Best, Pierre

Charles R Harris

1:07 p.m.

On Thu, Jun 21, 2012 at 4:11 AM, Travis Oliphant <travis@continuum.io>wrote:

...

Hey all,

I made a branch called with_maskna and then merged Nathaniel's PR which removes the mask_na support from master. I then applied a patch to fix the boolean indexing problem reported by Ralf.

I then created a NumPy 1.7.x maintenance branch from which the release of NumPy 1.7 will be made. Ondrej Certik and I will be managing the release of NumPy 1.7. Ondrej is the author of SymPy and has agreed to help get NumPy 1.7 out the door. Thanks, Ondrej for being willing to help in this way.

In principal only bug-fixes should be pushed to the NumPy 1.7 branch at this point. The target is to make a release of NumPy 1.7.x by July 9th. The schedule we will work for is:

RC1 -- June 25 RC2 -- July 5 Release -- July 13

NumPy 1.7 is a significant release and has several changes many of which are documented in the release notes. Several new code paths were added which can have a subtle impact on code. As we make the release candidates, it will be very helpful to receive as much feedback as possible on how any changes affect your code. We will work on the release notes over the coming weeks so that they have as much information as possible.

After NumPy 1.7, there is a NumPy 1.8 planned for later this year.

Hmm, I was going to add the type specific sorts for object and structured types. Chuck

Charles R Harris

1:10 p.m.

On Thu, Jun 21, 2012 at 7:07 AM, Charles R Harris <charlesr.harris@gmail.com

...

wrote:

...

On Thu, Jun 21, 2012 at 4:11 AM, Travis Oliphant <travis@continuum.io>wrote:

...
Hey all,

I made a branch called with_maskna and then merged Nathaniel's PR which removes the mask_na support from master. I then applied a patch to fix the boolean indexing problem reported by Ralf.

I then created a NumPy 1.7.x maintenance branch from which the release of NumPy 1.7 will be made. Ondrej Certik and I will be managing the release of NumPy 1.7. Ondrej is the author of SymPy and has agreed to help get NumPy 1.7 out the door. Thanks, Ondrej for being willing to help in this way.

In principal only bug-fixes should be pushed to the NumPy 1.7 branch at this point. The target is to make a release of NumPy 1.7.x by July 9th. The schedule we will work for is:

RC1 -- June 25 RC2 -- July 5 Release -- July 13

NumPy 1.7 is a significant release and has several changes many of which are documented in the release notes. Several new code paths were added which can have a subtle impact on code. As we make the release candidates, it will be very helpful to receive as much feedback as possible on how any changes affect your code. We will work on the release notes over the coming weeks so that they have as much information as possible.

After NumPy 1.7, there is a NumPy 1.8 planned for later this year.

Hmm, I was going to add the type specific sorts for object and structured types.

Also, there is some additional cleanup that needs to be done for macros. Probably it would have been helpful to schedule the branch for a week or two in the future so we could all get the little odds and ends fixed up first. Chuck

Travis Oliphant

3:25 p.m.

I thought it was clear we were doing a 1.7 release before SciPy. It seems pretty urgent that we get something out sooner than later. I know there is never enough time to do all the things we want to do. There is time before the first Release candidate to make changes on the 1.7.x branch. If you want to make the changes on master, and just indicate the Pull requests, Ondrej can make sure they are added to the 1.7.x. branch by Monday. We can also delay the first Release Candidate by a few days to next Wednesday and then bump everything 3 days if that will help. There will be a follow-on 1.8 release before the end of the year --- so there is time to make changes for that release as well. The next release will not take a year to get out, so we shouldn't feel pressured to get *everything* in this release. Speaking of code changes... What are the cleanups for macros that need to be done? I was looking at the code and notice that where before I could do PyArray_NDIM(obj), Mark's code now does PyArray_NDIM((PyArrayObject *)obj). Is that intentional? That's not as nice to type. Is that assuming that PyArray_NDIM will become a function and need a specific object type for its argument (and everything else cast....). That's one clear disadvantage of inline functions versus macros in my mind: no automatic polymorphism. I don't think type safety is a big win for macros like these. We need to be more judicious about which macros are scheduled for function inlining. Some just don't benefit from the type-safety implications as much as others do, and you end up requiring everyone to change their code downstream for no real reason. These sorts of changes really feel to me like unnecessary spelling changes that require work from extension writers who now have to modify their code with no real gain. There seems to be a lot of that going on in the code base and I'm not really convinced that it's useful for end-users. I'm going to be a lot more resistant to that sort of change in the code base when I see it. One particularly glaring example to my lens on the world: I think it would have been better to define new macros which require semicolons than changing the macros that don't require semicolons to now require semicolons: NPY_BEGIN_THREADS_DEF NPY_BEGIN_THREADS NPY_ALLOW_C_API NPY_ALLOW_C_API_DEF NPY_DISABLE_C_API That feels like a gratuitous style change that will force users of those macros to re-write their code. Sure, it's a simple change, but it's a simple change that doesn't do anything for you as an end user. I think I'm going to back this change out, in fact. I can't see requiring people to change their C-code like this will require without a clear benefit to them. I'm quite sure there is code out there that uses these documented APIs (without the semicolon). If we want to define new macros that require colons, then we do that, but we can't get rid of the old ones --- especially in a 1.x release. Our policy should not be to allow gratuitous style changes just because we think something is prettier another way. The NumPy code base has come from multiple sources and reflects several styles. It also follows an older style of C-programming (that is quite common in the Python code base). It can be changed, but those changes shouldn't be painful for a library user without some specific gain for them that the change allows. There are significant users of NumPy out there still on 1.4. Even the policy of deprecation that has been discussed will not help people trying to upgrade from 1.4 to 1.8. They will be forced to upgrade multiple times. The easier we can make this process for users the better. I remain convinced that it's better and am much more comfortable with making a release that requires a re-compile (that will succeed without further code changes --- because of backward compatibility efforts) than to have supposed ABI compatibility with subtle semantic changes and required C-code changes when you do happen to re-compile. Thanks, -Travis On Jun 21, 2012, at 8:10 AM, Charles R Harris wrote:

...

On Thu, Jun 21, 2012 at 7:07 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:

On Thu, Jun 21, 2012 at 4:11 AM, Travis Oliphant <travis@continuum.io> wrote: Hey all,

I made a branch called with_maskna and then merged Nathaniel's PR which removes the mask_na support from master. I then applied a patch to fix the boolean indexing problem reported by Ralf.

I then created a NumPy 1.7.x maintenance branch from which the release of NumPy 1.7 will be made. Ondrej Certik and I will be managing the release of NumPy 1.7. Ondrej is the author of SymPy and has agreed to help get NumPy 1.7 out the door. Thanks, Ondrej for being willing to help in this way.

In principal only bug-fixes should be pushed to the NumPy 1.7 branch at this point. The target is to make a release of NumPy 1.7.x by July 9th. The schedule we will work for is:

RC1 -- June 25 RC2 -- July 5 Release -- July 13

NumPy 1.7 is a significant release and has several changes many of which are documented in the release notes. Several new code paths were added which can have a subtle impact on code. As we make the release candidates, it will be very helpful to receive as much feedback as possible on how any changes affect your code. We will work on the release notes over the coming weeks so that they have as much information as possible.

After NumPy 1.7, there is a NumPy 1.8 planned for later this year.

Hmm, I was going to add the type specific sorts for object and structured types.

Also, there is some additional cleanup that needs to be done for macros. Probably it would have been helpful to schedule the branch for a week or two in the future so we could all get the little odds and ends fixed up first.

Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Charles R Harris

5:20 p.m.

On Thu, Jun 21, 2012 at 9:25 AM, Travis Oliphant <travis@continuum.io>wrote:

...

I thought it was clear we were doing a 1.7 release before SciPy. It seems pretty urgent that we get something out sooner than later. I know there is never enough time to do all the things we want to do.

The usual practice is to announce a schedule first.

...

There is time before the first Release candidate to make changes on the 1.7.x branch. If you want to make the changes on master, and just indicate the Pull requests, Ondrej can make sure they are added to the 1.7.x. branch by Monday. We can also delay the first Release Candidate by a few days to next Wednesday and then bump everything 3 days if that will help. There will be a follow-on 1.8 release before the end of the year --- so there is time to make changes for that release as well. The next release will not take a year to get out, so we shouldn't feel pressured to get *everything* in this release.

What are we going to do for 1.8?

...

Speaking of code changes...

What are the cleanups for macros that need to be done? I was looking at the code and notice that where before I could do PyArray_NDIM(obj), Mark's code now does PyArray_NDIM((PyArrayObject *)obj). Is that intentional?

Yes, the functions will give warnings otherwise.

...

That's not as nice to type.

So? The point is to have correctness, not ease of typing.

...

Is that assuming that PyArray_NDIM will become a function and need a specific object type for its argument (and everything else cast....). That's one clear disadvantage of inline functions versus macros in my mind: no automatic polymorphism.

That's a disadvantage of Python. The virtue of inline functions is precisely type checking.

...

I don't think type safety is a big win for macros like these. We need to be more judicious about which macros are scheduled for function inlining. Some just don't benefit from the type-safety implications as much as others do, and you end up requiring everyone to change their code downstream for no real reason.

These sorts of changes really feel to me like unnecessary spelling changes that require work from extension writers who now have to modify their code with no real gain. There seems to be a lot of that going on in the code base and I'm not really convinced that it's useful for end-users.

Good style and type checking are useful. Numpy needs more of both.

...

I'm going to be a lot more resistant to that sort of change in the code base when I see it.

Numpy is a team effort. There are people out there who write better code than you do, you should learn from them.

...

One particularly glaring example to my lens on the world: I think it would have been better to define new macros which require semicolons than changing the macros that don't require semicolons to now require semicolons:

NPY_BEGIN_THREADS_DEF NPY_BEGIN_THREADS NPY_ALLOW_C_API NPY_ALLOW_C_API_DEF NPY_DISABLE_C_API

That feels like a gratuitous style change that will force users of those macros to re-write their code.

It doesn't seem to be much of a problem.

...

Sure, it's a simple change, but it's a simple change that doesn't do anything for you as an end user. I think I'm going to back this change out, in fact. I can't see requiring people to change their C-code like this will require without a clear benefit to them. I'm quite sure there is code out there that uses these documented APIs (without the semicolon). If we want to define new macros that require colons, then we do that, but we can't get rid of the old ones --- especially in a 1.x release.

Our policy should not be to allow gratuitous style changes just because we think something is prettier another way. The NumPy code base has come from multiple sources and reflects several styles. It also follows an older style of C-programming (that is quite common in the Python code base). It can be changed, but those changes shouldn't be painful for a library user without some specific gain for them that the change allows.

You use that word 'gratuitous' a lot, I don't think it means what you think it means. For instance, the new polynomial coefficient order wasn't gratuitous, it was doing things in a way many found more intuitive and generalized better to different polynomial basis. People have different ideas, that doesn't make them gratuitous.

...

There are significant users of NumPy out there still on 1.4. Even the policy of deprecation that has been discussed will not help people trying to upgrade from 1.4 to 1.8. They will be forced to upgrade multiple times. The easier we can make this process for users the better. I remain convinced that it's better and am much more comfortable with making a release that requires a re-compile (that will succeed without further code changes --- because of backward compatibility efforts) than to have supposed ABI compatibility with subtle semantic changes and required C-code changes when you do happen to re-compile.

Cleanups need to be made bit by bit. I don't think we have done anything that will cause undo trouble. Chuck

Ralf Gommers

6:39 p.m.

On Thu, Jun 21, 2012 at 7:20 PM, Charles R Harris <charlesr.harris@gmail.com

...

wrote:

...

On Thu, Jun 21, 2012 at 9:25 AM, Travis Oliphant <travis@continuum.io>wrote:

...
One particularly glaring example to my lens on the world: I think it would have been better to define new macros which require semicolons than changing the macros that don't require semicolons to now require semicolons:

NPY_BEGIN_THREADS_DEF NPY_BEGIN_THREADS NPY_ALLOW_C_API NPY_ALLOW_C_API_DEF NPY_DISABLE_C_API

That feels like a gratuitous style change that will force users of those macros to re-write their code.

It doesn't seem to be much of a problem.

Well, we did do a SciPy maintenance release for this.... Overall I agree with you Chuck that cleanups are needed, but if there's too much impact for users of this particular change -- which would be nice to see confirmed by pointing to actual code instead of just asserted -- then I don't see the harm in undoing it.

...

...
Sure, it's a simple change, but it's a simple change that doesn't do anything for you as an end user. I think I'm going to back this change out, in fact. I can't see requiring people to change their C-code like this will require without a clear benefit to them. I'm quite sure there is code out there that uses these documented APIs (without the semicolon). If we want to define new macros that require colons, then we do that, but we can't get rid of the old ones --- especially in a 1.x release.

Our policy should not be to allow gratuitous style changes just because we think something is prettier another way. The NumPy code base has come from multiple sources and reflects several styles. It also follows an older style of C-programming (that is quite common in the Python code base). It can be changed, but those changes shouldn't be painful for a library user without some specific gain for them that the change allows.

You use that word 'gratuitous' a lot, I don't think it means what you think it means. For instance, the new polynomial coefficient order wasn't gratuitous, it was doing things in a way many found more intuitive and generalized better to different polynomial basis. People have different ideas, that doesn't make them gratuitous.

...
There are significant users of NumPy out there still on 1.4. Even the policy of deprecation that has been discussed will not help people trying to upgrade from 1.4 to 1.8. They will be forced to upgrade multiple times. The easier we can make this process for users the better. I remain convinced that it's better and am much more comfortable with making a release that requires a re-compile (that will succeed without further code changes --- because of backward compatibility efforts) than to have supposed ABI compatibility with subtle semantic changes and required C-code changes when you do happen to re-compile.

Best to have neither a re-compile nor ABI incompatibility. That said, I'd prefer the former over the latter any day of the week if I'd have to choose. Ralf

...

Cleanups need to be made bit by bit. I don't think we have done anything that will cause undo trouble.

Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Travis Oliphant

8:42 p.m.

...

The usual practice is to announce a schedule first.

I just did announce the schedule.

...

There is time before the first Release candidate to make changes on the 1.7.x branch. If you want to make the changes on master, and just indicate the Pull requests, Ondrej can make sure they are added to the 1.7.x. branch by Monday. We can also delay the first Release Candidate by a few days to next Wednesday and then bump everything 3 days if that will help. There will be a follow-on 1.8 release before the end of the year --- so there is time to make changes for that release as well. The next release will not take a year to get out, so we shouldn't feel pressured to get *everything* in this release.

What are we going to do for 1.8?

Let's get 1.7 out the door first.

...

Yes, the functions will give warnings otherwise.

I think this needs to be revisited. I don't think these changes are necessary for *every* use of macros. It can cause a lot of effort for people downstream without concrete benefit.

...

That's not as nice to type.

So? The point is to have correctness, not ease of typing.

I'm not sure if a pun was intended there or not. C is not a safe and fully-typed system. That is one of its weaknesses according to many. But, I would submit that not being forced to give everything a "type" (and recognizing the tradeoffs that implies) is also one reason it gets used.

...

Is that assuming that PyArray_NDIM will become a function and need a specific object type for its argument (and everything else cast....). That's one clear disadvantage of inline functions versus macros in my mind: no automatic polymorphism.

That's a disadvantage of Python. The virtue of inline functions is precisely type checking.

Right, but we need to be more conscientious about this. Not every use of Macros should be replaced by inline function calls and the requisite *forced* type-checking. type-chekcing is not *universally* a virtue --- if it were, nobody would use Python.

...

I don't think type safety is a big win for macros like these. We need to be more judicious about which macros are scheduled for function inlining. Some just don't benefit from the type-safety implications as much as others do, and you end up requiring everyone to change their code downstream for no real reason.

These sorts of changes really feel to me like unnecessary spelling changes that require work from extension writers who now have to modify their code with no real gain. There seems to be a lot of that going on in the code base and I'm not really convinced that it's useful for end-users.

Good style and type checking are useful. Numpy needs more of both.

You can assert it, but it doesn't make it so. "Good style" depends on what you are trying to accomplish and on your point of view. NumPy's style is not the product of one person, it's been adapted from multiple styles and inherits quite a bit from Python's style. I don't make any claims for it other than it allowed me to write it with the time and experience I had 7 years ago. We obviously disagree about this point. I'm sorry about that. I'm pretty flexible usually --- that's probably one of your big criticisms of my "style". But, one of the things I feel quite strongly about is how hard we make it for NumPy users to upgrade. There are two specific things I disagree with pretty strongly: 1) Changing defined macros that should work the same on PyArrayObjects or PyObjects to now *require* types --- if we want to introduce new macros that require types than we can --- as long as it just provides warnings but still compiles then I suppose I could find this acceptable. 2) Changing MACROS to require semicolons when they were previously not needed. I'm going to be very hard-nosed about this one.

...

I'm going to be a lot more resistant to that sort of change in the code base when I see it.

Numpy is a team effort. There are people out there who write better code than you do, you should learn from them.

Exactly! It's a team effort. I'm part of that team as well, and while I don't always have strong opinions about things. When I do, I'm going to voice it. I've learned long ago there are people that write better code than me. There are people that write better code than you. That is not the question here at all. The question here is not requiring a *re-write* of code in order to get their extensions to compile using NumPy headers. We should not be making people change their code to get their extensions to compile in NumPy 1.X

...

One particularly glaring example to my lens on the world: I think it would have been better to define new macros which require semicolons than changing the macros that don't require semicolons to now require semicolons:

NPY_BEGIN_THREADS_DEF NPY_BEGIN_THREADS NPY_ALLOW_C_API NPY_ALLOW_C_API_DEF NPY_DISABLE_C_API

That feels like a gratuitous style change that will force users of those macros to re-write their code.

It doesn't seem to be much of a problem.

Unfortunately, I don't trust your judgment on that. My experience and understanding tells a much different story. I'm sorry if you disagree with me.

...

Sure, it's a simple change, but it's a simple change that doesn't do anything for you as an end user. I think I'm going to back this change out, in fact. I can't see requiring people to change their C-code like this will require without a clear benefit to them. I'm quite sure there is code out there that uses these documented APIs (without the semicolon). If we want to define new macros that require colons, then we do that, but we can't get rid of the old ones --- especially in a 1.x release.

Our policy should not be to allow gratuitous style changes just because we think something is prettier another way. The NumPy code base has come from multiple sources and reflects several styles. It also follows an older style of C-programming (that is quite common in the Python code base). It can be changed, but those changes shouldn't be painful for a library user without some specific gain for them that the change allows.

You use that word 'gratuitous' a lot, I don't think it means what you think it means. For instance, the new polynomial coefficient order wasn't gratuitous, it was doing things in a way many found more intuitive and generalized better to different polynomial basis. People have different ideas, that doesn't make them gratuitous.

That's a slightly different issue. At least you created a new object and api which is a *little* better. My complaint about the choice there is now there *must* be two interfaces and added confusion as people will have to figure out which assumption is being used. I don't really care about the coefficient order --- really I don't. Either one is fine in my mind. I recognize the reasons. The problem is *changing* it without a *really* good reason. Now, we have to have two different APIs. I would much preferred to have poly1d disappear and just use your much nicer polynomial classes. Now, it can't and we are faced with a user-story that is either difficult for someone transitioning from MATLAB or a "why did you do that?" puzzled look from a new user as to why we support both coefficient orders. Of course, that could be our story --- hey we support all kinds of orders, it doesn't really matter, you just have to tell us what you mean when passing in an unadorned array of coefficients. But, this is a different issue. I'm using the word 'gratuitous' to mean that it is "uncalled for and lacks a good reason". There needs to be much better reasons given for code changes that require someone to re-write working code than "it's better style" or even "it will help new programmers avoid errors". Let's write another interface that new programmers can use that fits the world the way you see it, don't change what's already working just because you don't like it or wish a different choice had been made.

...

There are significant users of NumPy out there still on 1.4. Even the policy of deprecation that has been discussed will not help people trying to upgrade from 1.4 to 1.8. They will be forced to upgrade multiple times. The easier we can make this process for users the better. I remain convinced that it's better and am much more comfortable with making a release that requires a re-compile (that will succeed without further code changes --- because of backward compatibility efforts) than to have supposed ABI compatibility with subtle semantic changes and required C-code changes when you do happen to re-compile.

Cleanups need to be made bit by bit. I don't think we have done anything that will cause undo trouble.

I disagree substantially on the impact of these changes. You can disagree about my awareness of NumPy users, but I think I understand a large number of them and why NumPy has been successful in getting users. I agree that we have been unsuccessful at getting serious developers and I'm convinced by you and Mark as to why that is. But, we can't sacrifice users for the sake of getting developers who will spend their free time trying to get around the organic pile that NumPy is at this point. Because of this viewpoint, I think there is some adaptation and cleanup right now, needed, so that significant users of NumPy can upgrade based on the changes that have occurred without causing them annoying errors (even simple changes can be a pain in the neck to fix). I do agree changes can be made. I realize you've worked hard to keep the code-base in a state that you find more adequate. I think you go overboard on that front, but I acknowledge that there are people that appreciate this. I do feel very strongly that we should not require users to have to re-write working C-code in order to use a new minor version number in NumPy, regardless of how the code "looks" or how much "better" it is according to some idealized standard. The macro changes are border-line (at least I believe code will still compile --- just raise warnings, but I need to be sure about this). The changes that require semi-colons are not acceptable at all. Look Charles, I believe we can continue to work productively together and our differences can be a strength to the community. I hope you feel the same way. I will continue to respect and listen to your perspective --- especially when I disagree with it. -Travis

Charles R Harris

3:14 a.m.

On Fri, Jun 22, 2012 at 2:42 PM, Travis Oliphant <travis@continuum.io>wrote:

...

The usual practice is to announce a schedule first.

I just did announce the schedule.

What has been done in the past is that an intent to fork is announced some two weeks in advance so that people can weigh in on what needs to be done before the fork. The immediate fork was a bit hasty. Likewise, when I suggested going to the github issue tracking, I opened a discussion on needed tags, but voila, there it was with an incomplete set and no discussion. That to seemed hasty.

...

...
There is time before the first Release candidate to make changes on the 1.7.x branch. If you want to make the changes on master, and just indicate the Pull requests, Ondrej can make sure they are added to the 1.7.x. branch by Monday. We can also delay the first Release Candidate by a few days to next Wednesday and then bump everything 3 days if that will help. There will be a follow-on 1.8 release before the end of the year --- so there is time to make changes for that release as well. The next release will not take a year to get out, so we shouldn't feel pressured to get *everything* in this release.

What are we going to do for 1.8?

Let's get 1.7 out the door first.

Mark proposed a schedule for the next several releases, I'd like to know if we are going to follow it.

...

Yes, the functions will give warnings otherwise.

I think this needs to be revisited. I don't think these changes are necessary for *every* use of macros. It can cause a lot of effort for people downstream without concrete benefit.

The idea is to slowly move towards hiding the innards of the array type. This has been under discussion since 1.3 came out. It is certainly the case that not all macros need to go away.

...

...
That's not as nice to type.

So? The point is to have correctness, not ease of typing.

I'm not sure if a pun was intended there or not. C is not a safe and fully-typed system. That is one of its weaknesses according to many. But, I would submit that not being forced to give everything a "type" (and recognizing the tradeoffs that implies) is also one reason it gets used.

C was famous for bugs due to the lack of function prototypes. This was fixed with C99 and the stricter typing was a great help.

...

...
Is that assuming that PyArray_NDIM will become a function and need a specific object type for its argument (and everything else cast....). That's one clear disadvantage of inline functions versus macros in my mind: no automatic polymorphism.

That's a disadvantage of Python. The virtue of inline functions is precisely type checking.

Right, but we need to be more conscientious about this. Not every use of Macros should be replaced by inline function calls and the requisite *forced* type-checking. type-chekcing is not *universally* a virtue --- if it were, nobody would use Python.

...
I don't think type safety is a big win for macros like these. We need to be more judicious about which macros are scheduled for function inlining. Some just don't benefit from the type-safety implications as much as others do, and you end up requiring everyone to change their code downstream for no real reason.

These sorts of changes really feel to me like unnecessary spelling changes that require work from extension writers who now have to modify their code with no real gain. There seems to be a lot of that going on in the code base and I'm not really convinced that it's useful for end-users.

Good style and type checking are useful. Numpy needs more of both.

You can assert it, but it doesn't make it so. "Good style" depends on what you are trying to accomplish and on your point of view. NumPy's style is not the product of one person, it's been adapted from multiple styles and inherits quite a bit from Python's style. I don't make any claims for it other than it allowed me to write it with the time and experience I had 7 years ago. We obviously disagree about this point. I'm sorry about that. I'm pretty flexible usually --- that's probably one of your big criticisms of my "style".

Curiously, my criticism would be more that you are inflexible, slow to change old habits.

...

But, one of the things I feel quite strongly about is how hard we make it for NumPy users to upgrade. There are two specific things I disagree with pretty strongly:

1) Changing defined macros that should work the same on PyArrayObjects or PyObjects to now *require* types --- if we want to introduce new macros that require types than we can --- as long as it just provides warnings but still compiles then I suppose I could find this acceptable.

2) Changing MACROS to require semicolons when they were previously not needed. I'm going to be very hard-nosed about this one.

...
I'm going to be a lot more resistant to that sort of change in the code base when I see it.

Numpy is a team effort. There are people out there who write better code than you do, you should learn from them.

Exactly! It's a team effort. I'm part of that team as well, and while I don't always have strong opinions about things. When I do, I'm going to voice it.

I've learned long ago there are people that write better code than me. There are people that write better code than you.

Of course. Writing code is not my profession, and even if it were, there are people out there who would be immeasurable better. I have tried to improve my style over the years by reading books and browsing code by people who are better than me. I also recognize common bad habits naive coders tend to pick up when they start out, not least because I have at one time or another had many of the same bad habits. That is not the question here at all. The question here is not

...

requiring a *re-write* of code in order to get their extensions to compile using NumPy headers. We should not be making people change their code to get their extensions to compile in NumPy 1.X

I think a bit of rewrite here and there along the way is more palatable than a big change coming in as one big lump, especially if the changes are done with a long term goal in mind. We are working towards a Numpy 2, but we can't just go off for a year or two and write it, we have to get there step by step. And that requires a plan.

...

...
One particularly glaring example to my lens on the world: I think it would have been better to define new macros which require semicolons than changing the macros that don't require semicolons to now require semicolons:

NPY_BEGIN_THREADS_DEF NPY_BEGIN_THREADS NPY_ALLOW_C_API NPY_ALLOW_C_API_DEF NPY_DISABLE_C_API

That feels like a gratuitous style change that will force users of those macros to re-write their code.

It doesn't seem to be much of a problem.

Unfortunately, I don't trust your judgment on that. My experience and understanding tells a much different story. I'm sorry if you disagree with me.

I'm sorry I made you sorry ;) The problem here is that you don't come forth with specifics. People tell you things, but you don't say who or what their specific problem was. Part of working with a team is keeping folks informed, it isn't that useful to appeal to authority. I watch the list, which is admittedly a small window into the community, and I haven't seen show stoppers. Bugs, sure, but that isn't the same thing.

...

...
Sure, it's a simple change, but it's a simple change that doesn't do anything for you as an end user. I think I'm going to back this change out, in fact. I can't see requiring people to change their C-code like this will require without a clear benefit to them. I'm quite sure there is code out there that uses these documented APIs (without the semicolon). If we want to define new macros that require colons, then we do that, but we can't get rid of the old ones --- especially in a 1.x release.

Our policy should not be to allow gratuitous style changes just because we think something is prettier another way. The NumPy code base has come from multiple sources and reflects several styles. It also follows an older style of C-programming (that is quite common in the Python code base). It can be changed, but those changes shouldn't be painful for a library user without some specific gain for them that the change allows.

You use that word 'gratuitous' a lot, I don't think it means what you think it means. For instance, the new polynomial coefficient order wasn't gratuitous, it was doing things in a way many found more intuitive and generalized better to different polynomial basis. People

have different ideas, that doesn't make them gratuitous.

That's a slightly different issue. At least you created a new object and api which is a *little* better. My complaint about the choice there is now there *must* be two interfaces and added confusion as people will have to figure out which assumption is being used. I don't really care about the coefficient order --- really I don't. Either one is fine in my mind. I recognize the reasons. The problem is *changing* it without a *really* good reason. Now, we have to have two different APIs. I would much preferred to have poly1d disappear and just use your much nicer polynomial classes. Now, it can't and we are faced with a user-story that is either difficult for someone transitioning from MATLAB

Most folks aren't going to transition from MATLAB or IDL. Engineers tend to stick with the tools they learned in school, they aren't interested in the tool itself as long as they can get their job done. And getting the job done is what they are paid for. That said, I doubt they would have much problem making the adjustment if they were inclined to switch tools. or a "why did you do that?" puzzled look from a new user as to why we

...

support both coefficient orders. Of course, that could be our story --- hey we support all kinds of orders, it doesn't really matter, you just have to tell us what you mean when passing in an unadorned array of coefficients. But, this is a different issue.

I'm using the word 'gratuitous' to mean that it is "uncalled for and lacks a good reason". There needs to be much better reasons given for code changes that require someone to re-write working code than "it's better style" or even "it will help new programmers avoid errors". Let's write another interface that new programmers can use that fits the world the way you see it, don't change what's already working just because you don't like it or wish a different choice had been made.

Well, and that was exactly what you meant when you called to coefficient order 'gratuitous' in your first post to me about it. The problem was that you didn't understand why I made the change until I explained it, but rather made the charge sans explanation. It might be that some of the other things you call gratuitous are less so than you think. These are hasty judgements I think.

...

...
There are significant users of NumPy out there still on 1.4. Even the policy of deprecation that has been discussed will not help people trying to upgrade from 1.4 to 1.8. They will be forced to upgrade multiple times. The easier we can make this process for users the better. I remain convinced that it's better and am much more comfortable with making a release that requires a re-compile (that will succeed without further code changes --- because of backward compatibility efforts) than to have supposed ABI compatibility with subtle semantic changes and required C-code changes when you do happen to re-compile.

Cleanups need to be made bit by bit. I don't think we have done anything that will cause undo trouble.

I disagree substantially on the impact of these changes. You can disagree about my awareness of NumPy users, but I think I understand a large number of them and why NumPy has been successful in getting users. I agree that we have been unsuccessful at getting serious developers and I'm convinced by you and Mark as to why that is. But, we can't sacrifice users for the sake of getting developers who will spend their free time trying to get around the organic pile that NumPy is at this point.

Because of this viewpoint, I think there is some adaptation and cleanup right now, needed, so that significant users of NumPy can upgrade based on the changes that have occurred without causing them annoying errors (even simple changes can be a pain in the neck to fix).

I do agree changes can be made. I realize you've worked hard to keep the code-base in a state that you find more adequate. I think you go overboard on that front, but I acknowledge that there are people that appreciate this. I do feel very strongly that we should not require users to have to re-write working C-code in order to use a new minor version number in NumPy, regardless of how the code "looks" or how much "better" it is according to some idealized standard.

The macro changes are border-line (at least I believe code will still compile --- just raise warnings, but I need to be sure about this). The changes that require semi-colons are not acceptable at all.

I was tempted to back them out myself, but I don't think the upshot will be earth shaking.

...

Look Charles, I believe we can continue to work productively together and our differences can be a strength to the community. I hope you feel the same way. I will continue to respect and listen to your perspective --- especially when I disagree with it.

Sounds like a threat to me. Who are you to judge? If you are going to be the dictator, let's put that out there and make it official. Chuck.

Dag Sverre Seljebotn

7:32 a.m.

On 06/23/2012 05:14 AM, Charles R Harris wrote:

...

On Fri, Jun 22, 2012 at 2:42 PM, Travis Oliphant <travis@continuum.io <mailto:travis@continuum.io>> wrote:

...
The usual practice is to announce a schedule first.

I just did announce the schedule.

What has been done in the past is that an intent to fork is announced some two weeks in advance so that people can weigh in on what needs to be done before the fork. The immediate fork was a bit hasty. Likewise, when I suggested going to the github issue tracking, I opened a discussion on needed tags, but voila, there it was with an incomplete set and no discussion. That to seemed hasty.

...
There is time before the first Release candidate to make changes on the 1.7.x branch. If you want to make the changes on master, and just indicate the Pull requests, Ondrej can make sure they are added to the 1.7.x. branch by Monday. We can also delay the first Release Candidate by a few days to next Wednesday and then bump everything 3 days if that will help. There will be a follow-on 1.8 release before the end of the year --- so there is time to make changes for that release as well. The next release will not take a year to get out, so we shouldn't feel pressured to get *everything* in this release.

What are we going to do for 1.8?

Let's get 1.7 out the door first.

Mark proposed a schedule for the next several releases, I'd like to know if we are going to follow it.

...
Yes, the functions will give warnings otherwise.

I think this needs to be revisited. I don't think these changes are necessary for *every* use of macros. It can cause a lot of effort for people downstream without concrete benefit.

The idea is to slowly move towards hiding the innards of the array type. This has been under discussion since 1.3 came out. It is certainly the case that not all macros need to go away.

...
That's not as nice to type.

So? The point is to have correctness, not ease of typing.

I'm not sure if a pun was intended there or not. C is not a safe and fully-typed system. That is one of its weaknesses according to many. But, I would submit that not being forced to give everything a "type" (and recognizing the tradeoffs that implies) is also one reason it gets used.

C was famous for bugs due to the lack of function prototypes. This was fixed with C99 and the stricter typing was a great help.

...
Is that assuming that PyArray_NDIM will become a function and need a specific object type for its argument (and everything else cast....). That's one clear disadvantage of inline functions versus macros in my mind: no automatic polymorphism.

That's a disadvantage of Python. The virtue of inline functions is precisely type checking.

Right, but we need to be more conscientious about this. Not every use of Macros should be replaced by inline function calls and the requisite *forced* type-checking. type-chekcing is not *universally* a virtue --- if it were, nobody would use Python.

...
I don't think type safety is a big win for macros like these. We need to be more judicious about which macros are scheduled for function inlining. Some just don't benefit from the type-safety implications as much as others do, and you end up requiring everyone to change their code downstream for no real reason.

These sorts of changes really feel to me like unnecessary spelling changes that require work from extension writers who now have to modify their code with no real gain. There seems to be a lot of that going on in the code base and I'm not really convinced that it's useful for end-users.

Good style and type checking are useful. Numpy needs more of both.

You can assert it, but it doesn't make it so. "Good style" depends on what you are trying to accomplish and on your point of view. NumPy's style is not the product of one person, it's been adapted from multiple styles and inherits quite a bit from Python's style. I don't make any claims for it other than it allowed me to write it with the time and experience I had 7 years ago. We obviously disagree about this point. I'm sorry about that. I'm pretty flexible usually --- that's probably one of your big criticisms of my "style".

Curiously, my criticism would be more that you are inflexible, slow to change old habits.

But, one of the things I feel quite strongly about is how hard we make it for NumPy users to upgrade. There are two specific things I disagree with pretty strongly:

1) Changing defined macros that should work the same on PyArrayObjects or PyObjects to now *require* types --- if we want to introduce new macros that require types than we can --- as long as it just provides warnings but still compiles then I suppose I could find this acceptable.

2) Changing MACROS to require semicolons when they were previously not needed. I'm going to be very hard-nosed about this one.

...
I'm going to be a lot more resistant to that sort of change in the code base when I see it.

Numpy is a team effort. There are people out there who write better code than you do, you should learn from them.

Exactly! It's a team effort. I'm part of that team as well, and while I don't always have strong opinions about things. When I do, I'm going to voice it.

I've learned long ago there are people that write better code than me. There are people that write better code than you.

Of course. Writing code is not my profession, and even if it were, there are people out there who would be immeasurable better. I have tried to improve my style over the years by reading books and browsing code by people who are better than me. I also recognize common bad habits naive coders tend to pick up when they start out, not least because I have at one time or another had many of the same bad habits.

That is not the question here at all. The question here is not requiring a *re-write* of code in order to get their extensions to compile using NumPy headers. We should not be making people change their code to get their extensions to compile in NumPy 1.X

I think a bit of rewrite here and there along the way is more palatable than a big change coming in as one big lump, especially if the changes are done with a long term goal in mind. We are working towards a Numpy 2, but we can't just go off for a year or two and write it, we have to get there step by step. And that requires a plan.

To me you sound like you expect that people just need to change, say, PyArray_SHAPE(obj) to PyArray_SHAPE((PyArrayObject*)obj) But that's not the reality. The reality is that most users of the NumPy C API are required to do: #if WHATEVERNUMPYVERSIONDEFINE > 0x... PyArray_SHAPE(obj) #else PyArray_SHAPE((PyArrayObject*)obj) #endif or, perhaps, PyArray_SHAPE(CAST_IF_NEW_NUMPY obj). Or perhaps write a shim wrapper to insulate themselves from the NumPy API. At least if you want to cleanly compile against all the last ~3 versions of NumPy cleanly without warnings -- which any good developer wishes (unless there are *features* in newer versions that make a hard dependency on the newest version logical). Thus, cleaning up the NumPy API makes users' code much more ugly and difficult to read. "Gradual changes along the way" means there will be lots of different #if tests like that, which is at least harder to remember and work with than a single #if test for 1.x vs 2.x. Dag

Dag Sverre Seljebotn

7:34 a.m.

On 06/23/2012 09:32 AM, Dag Sverre Seljebotn wrote:

...

On 06/23/2012 05:14 AM, Charles R Harris wrote:

...
On Fri, Jun 22, 2012 at 2:42 PM, Travis Oliphant<travis@continuum.io <mailto:travis@continuum.io>> wrote:

...
The usual practice is to announce a schedule first.

I just did announce the schedule.

What has been done in the past is that an intent to fork is announced some two weeks in advance so that people can weigh in on what needs to be done before the fork. The immediate fork was a bit hasty. Likewise, when I suggested going to the github issue tracking, I opened a discussion on needed tags, but voila, there it was with an incomplete set and no discussion. That to seemed hasty.

...
There is time before the first Release candidate to make changes on the 1.7.x branch. If you want to make the changes on master, and just indicate the Pull requests, Ondrej can make sure they are added to the 1.7.x. branch by Monday. We can also delay the first Release Candidate by a few days to next Wednesday and then bump everything 3 days if that will help. There will be a follow-on 1.8 release before the end of the year --- so there is time to make changes for that release as well. The next release will not take a year to get out, so we shouldn't feel pressured to get *everything* in this release.

What are we going to do for 1.8?

Let's get 1.7 out the door first.

Mark proposed a schedule for the next several releases, I'd like to know if we are going to follow it.

...
Yes, the functions will give warnings otherwise.

I think this needs to be revisited. I don't think these changes are necessary for *every* use of macros. It can cause a lot of effort for people downstream without concrete benefit.

The idea is to slowly move towards hiding the innards of the array type. This has been under discussion since 1.3 came out. It is certainly the case that not all macros need to go away.

...
That's not as nice to type.

So? The point is to have correctness, not ease of typing.

I'm not sure if a pun was intended there or not. C is not a safe and fully-typed system. That is one of its weaknesses according to many. But, I would submit that not being forced to give everything a "type" (and recognizing the tradeoffs that implies) is also one reason it gets used.

C was famous for bugs due to the lack of function prototypes. This was fixed with C99 and the stricter typing was a great help.

...
Is that assuming that PyArray_NDIM will become a function and need a specific object type for its argument (and everything else cast....). That's one clear disadvantage of inline functions versus macros in my mind: no automatic polymorphism.

That's a disadvantage of Python. The virtue of inline functions is precisely type checking.

Right, but we need to be more conscientious about this. Not every use of Macros should be replaced by inline function calls and the requisite *forced* type-checking. type-chekcing is not *universally* a virtue --- if it were, nobody would use Python.

...
I don't think type safety is a big win for macros like these. We need to be more judicious about which macros are scheduled for function inlining. Some just don't benefit from the type-safety implications as much as others do, and you end up requiring everyone to change their code downstream for no real reason.

These sorts of changes really feel to me like unnecessary spelling changes that require work from extension writers who now have to modify their code with no real gain. There seems to be a lot of that going on in the code base and I'm not really convinced that it's useful for end-users.

Good style and type checking are useful. Numpy needs more of both.

You can assert it, but it doesn't make it so. "Good style" depends on what you are trying to accomplish and on your point of view. NumPy's style is not the product of one person, it's been adapted from multiple styles and inherits quite a bit from Python's style. I don't make any claims for it other than it allowed me to write it with the time and experience I had 7 years ago. We obviously disagree about this point. I'm sorry about that. I'm pretty flexible usually --- that's probably one of your big criticisms of my "style".

Curiously, my criticism would be more that you are inflexible, slow to change old habits.

But, one of the things I feel quite strongly about is how hard we make it for NumPy users to upgrade. There are two specific things I disagree with pretty strongly:

1) Changing defined macros that should work the same on PyArrayObjects or PyObjects to now *require* types --- if we want to introduce new macros that require types than we can --- as long as it just provides warnings but still compiles then I suppose I could find this acceptable.

2) Changing MACROS to require semicolons when they were previously not needed. I'm going to be very hard-nosed about this one.

...
I'm going to be a lot more resistant to that sort of change in the code base when I see it.

Numpy is a team effort. There are people out there who write better code than you do, you should learn from them.

Exactly! It's a team effort. I'm part of that team as well, and while I don't always have strong opinions about things. When I do, I'm going to voice it.

I've learned long ago there are people that write better code than me. There are people that write better code than you.

Of course. Writing code is not my profession, and even if it were, there are people out there who would be immeasurable better. I have tried to improve my style over the years by reading books and browsing code by people who are better than me. I also recognize common bad habits naive coders tend to pick up when they start out, not least because I have at one time or another had many of the same bad habits.

That is not the question here at all. The question here is not requiring a *re-write* of code in order to get their extensions to compile using NumPy headers. We should not be making people change their code to get their extensions to compile in NumPy 1.X

I think a bit of rewrite here and there along the way is more palatable than a big change coming in as one big lump, especially if the changes are done with a long term goal in mind. We are working towards a Numpy 2, but we can't just go off for a year or two and write it, we have to get there step by step. And that requires a plan.

To me you sound like you expect that people just need to change, say,

PyArray_SHAPE(obj)

to

PyArray_SHAPE((PyArrayObject*)obj)

But that's not the reality. The reality is that most users of the NumPy C API are required to do:

#if WHATEVERNUMPYVERSIONDEFINE> 0x... PyArray_SHAPE(obj) #else PyArray_SHAPE((PyArrayObject*)obj) #endif

or, perhaps, PyArray_SHAPE(CAST_IF_NEW_NUMPY obj).

Whoops. Terribly sorry, bad example -- I guess fixes to the users code would make it work with any NumPy version. And I guess an extra semicolon never hurts for the macros either? So by now I wish I could retract that post. Realized it five seconds too late :-) Dag

...

Or perhaps write a shim wrapper to insulate themselves from the NumPy API.

At least if you want to cleanly compile against all the last ~3 versions of NumPy cleanly without warnings -- which any good developer wishes (unless there are *features* in newer versions that make a hard dependency on the newest version logical). Thus, cleaning up the NumPy API makes users' code much more ugly and difficult to read.

"Gradual changes along the way" means there will be lots of different #if tests like that, which is at least harder to remember and work with than a single #if test for 1.x vs 2.x.

Dag _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Thouis (Ray) Jones

9:23 a.m.

On Sat, Jun 23, 2012 at 5:14 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

What has been done in the past is that an intent to fork is announced some two weeks in advance so that people can weigh in on what needs to be done before the fork. The immediate fork was a bit hasty. Likewise, when I suggested going to the github issue tracking, I opened a discussion on needed tags, but voila, there it was with an incomplete set and no discussion. That to seemed hasty.

I don't have a particular dog in this fight, but it seems like neither creating the fork nor turning on issues are worth disagreeing to much about. There's going to be a 1.7 fork sometime soon, and whether it gets created now or after discussion seems mostly academic. Even if there were changes that needed to go into both branches, git makes that straightforward. Likewise github issues. Turning them on has minimal cost, especially given that pull requests already go through github, and gives another route for bug reporting and a way to experiment with issues to inform the discussion.

...

[...] Most folks aren't going to transition from MATLAB or IDL. Engineers tend to stick with the tools they learned in school, they aren't interested in the tool itself as long as they can get their job done. And getting the job done is what they are paid for. That said, I doubt they would have much problem making the adjustment if they were inclined to switch tools. [...]

My own experience is the opposite. Most programmers/engineers I've worked with are happy to transition away from Matlab, but part of why they're willing to is that it's not that difficult to retarget Matlab knowledge onto numpy/scipy/matplotlib knowledge. Making that transition as easy as possible (as I think matplotlib does particularly well) is a good goal. I agree that getting the job done is what they're paid for, but python/numpy/scipy/matplotlib allow them to get that job done much faster and more easily. Ray Jones

Charles R Harris

12:01 p.m.

On Sat, Jun 23, 2012 at 3:23 AM, Thouis (Ray) Jones <thouis@gmail.com>wrote:

...

On Sat, Jun 23, 2012 at 5:14 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
What has been done in the past is that an intent to fork is announced some two weeks in advance so that people can weigh in on what needs to be done before the fork. The immediate fork was a bit hasty. Likewise, when I suggested going to the github issue tracking, I opened a discussion on needed tags, but voila, there it was with an incomplete set and no discussion. That to seemed hasty.

I don't have a particular dog in this fight, but it seems like neither creating the fork nor turning on issues are worth disagreeing to much about. There's going to be a 1.7 fork sometime soon, and whether it gets created now or after discussion seems mostly academic. Even if there were changes that needed to go into both branches, git makes that straightforward. Likewise github issues. Turning them on has minimal cost, especially given that pull requests already go through github, and gives another route for bug reporting and a way to experiment with issues to inform the discussion.

...
[...] Most folks aren't going to transition from MATLAB or IDL. Engineers tend to stick with the tools they learned in school, they aren't interested in the tool itself as long as they can get their job done. And getting the job done is what they are paid for. That said, I doubt they would have much problem making the adjustment if they were inclined to switch tools. [...]

My own experience is the opposite. Most programmers/engineers I've worked with are happy to transition away from Matlab, but part of why they're willing to is that it's not that difficult to retarget Matlab knowledge onto numpy/scipy/matplotlib knowledge. Making that transition as easy as possible (as I think matplotlib does particularly well) is a good goal. I agree that getting the job done is what they're paid for, but python/numpy/scipy/matplotlib allow them to get that job done much faster and more easily.

Haven't seen that myself. When engineers' time is paid for out of contracts and the work is on a schedule, they generally don't have the time to chase after new things unless the payoff is substantial. Matlab is also widely used for rapid prototyping of control systems with the models then translated to run on the actual hardware. Not to mention that device makers often provide Simulink models of their hardware.That sort of thing is not available in Python. I agree that there are many places that Python could work better, but the old rule of thumb is that the effective savings need to be on the order of a factor of ten to drive a new technology takeover of a widespread existing technology. Of course, that doesn't hold for new markets. Chuck

Charles R Harris

12:12 p.m.

On Sat, Jun 23, 2012 at 3:23 AM, Thouis (Ray) Jones <thouis@gmail.com>wrote:

...

On Sat, Jun 23, 2012 at 5:14 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
What has been done in the past is that an intent to fork is announced some two weeks in advance so that people can weigh in on what needs to be done before the fork. The immediate fork was a bit hasty. Likewise, when I suggested going to the github issue tracking, I opened a discussion on needed tags, but voila, there it was with an incomplete set and no discussion. That to seemed hasty.

I don't have a particular dog in this fight, but it seems like neither creating the fork nor turning on issues are worth disagreeing to much about. There's going to be a 1.7 fork sometime soon, and whether it gets created now or after discussion seems mostly academic. Even if there were changes that needed to go into both branches, git makes that straightforward. Likewise github issues. Turning them on has minimal cost, especially given that pull requests already go through github, and gives another route for bug reporting and a way to experiment with issues to inform the discussion.

...

From my point of view, the haste seems to be driven by SciPy2012. And why

the rush after we have wasted three months running in circles for lack of a decision, with Mark and Nathaniel sent off to write a report that had no impact on the final outcome. The github thing also ended the thread and now someone has to clean up the result. It also appears that that work is being done by request rather than by a volunteer, that has subtle implications in the long run. Things have been happening by fits and starts, with issues picked up and than dropped half done. That isn't a good way to move forward. Chuck

Travis Oliphant

4:23 a.m.

On Jun 23, 2012, at 7:12 AM, Charles R Harris wrote:

...

On Sat, Jun 23, 2012 at 3:23 AM, Thouis (Ray) Jones <thouis@gmail.com> wrote: On Sat, Jun 23, 2012 at 5:14 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
What has been done in the past is that an intent to fork is announced some two weeks in advance so that people can weigh in on what needs to be done before the fork. The immediate fork was a bit hasty. Likewise, when I suggested going to the github issue tracking, I opened a discussion on needed tags, but voila, there it was with an incomplete set and no discussion. That to seemed hasty.

I don't have a particular dog in this fight, but it seems like neither creating the fork nor turning on issues are worth disagreeing to much about. There's going to be a 1.7 fork sometime soon, and whether it gets created now or after discussion seems mostly academic. Even if there were changes that needed to go into both branches, git makes that straightforward. Likewise github issues. Turning them on has minimal cost, especially given that pull requests already go through github, and gives another route for bug reporting and a way to experiment with issues to inform the discussion.

From my point of view, the haste seems to be driven by SciPy2012. And why the rush after we have wasted three months running in circles for lack of a decision, with Mark and Nathaniel sent off to write a report that had no impact on the final outcome. The github thing also ended the thread and now someone has to clean up the result. It also appears that that work is being done by request rather than by a volunteer, that has subtle implications in the long run.

The report has tremendous impact on the final outcome --- especially because the outcome is not *final*. I think the report helped clarify exactly what the differences were between Mark and Nathaniel's viewpoints and absolutely impacted the outcome for 1.7. I don't agree with your interpretation of events. I'm not sure what is meant by "request rather than volunteer", but I think it has something to do with your perspective on how NumPy should be developed.

...

Things have been happening by fits and starts, with issues picked up and than dropped half done. That isn't a good way to move forward.

That's the problem with volunteer labor. It's at the whim of the time people have available. The only time it's different is when people have resources to make it different. Issues are picked up when people have the time to pick them up. It turns out that good people are hard to find and it takes time to get them engaged. NumFOCUS is actively raising money to fund technology fellowships in order to provide full-time support to both mentors and students. The hope is that good people who want to continue to help the NumPy project will be found and supported. Best, -Travis

...

Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Travis Oliphant

4:11 a.m.

On Jun 23, 2012, at 4:23 AM, Thouis (Ray) Jones wrote:

...

On Sat, Jun 23, 2012 at 5:14 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
What has been done in the past is that an intent to fork is announced some two weeks in advance so that people can weigh in on what needs to be done before the fork. The immediate fork was a bit hasty. Likewise, when I suggested going to the github issue tracking, I opened a discussion on needed tags, but voila, there it was with an incomplete set and no discussion. That to seemed hasty.

I don't have a particular dog in this fight, but it seems like neither creating the fork nor turning on issues are worth disagreeing to much about. There's going to be a 1.7 fork sometime soon, and whether it gets created now or after discussion seems mostly academic. Even if there were changes that needed to go into both branches, git makes that straightforward. Likewise github issues. Turning them on has minimal cost, especially given that pull requests already go through github, and gives another route for bug reporting and a way to experiment with issues to inform the discussion.

Yes, this is exactly my perspective. Let's use the power of github and avoid discussions that don't need to happen and have more of them that do. -Travis

Travis Oliphant

4:09 a.m.

...

What has been done in the past is that an intent to fork is announced some two weeks in advance so that people can weigh in on what needs to be done before the fork. The immediate fork was a bit hasty. Likewise, when I suggested going to the github issue tracking, I opened a discussion on needed tags, but voila, there it was with an incomplete set and no discussion. That to seemed hasty.

My style is just different. I like to do things and then if discussion requires an alteration, we alter. It is just a different style. The labels can be altered, they are not set in stone. I prefer to have something to talk about and a starting point to alter from --- especially on potential bike-shedding discussions. There are several people that can make changes to the labels. If we have difficulty agreeing then we can go from that point.

...

...
There is time before the first Release candidate to make changes on the 1.7.x branch. If you want to make the changes on master, and just indicate the Pull requests, Ondrej can make sure they are added to the 1.7.x. branch by Monday. We can also delay the first Release Candidate by a few days to next Wednesday and then bump everything 3 days if that will help. There will be a follow-on 1.8 release before the end of the year --- so there is time to make changes for that release as well. The next release will not take a year to get out, so we shouldn't feel pressured to get *everything* in this release.

What are we going to do for 1.8?

Let's get 1.7 out the door first.

Mark proposed a schedule for the next several releases, I'd like to know if we are going to follow it.

We should discuss it again. I don't recall the specifics and I believe it was just a proposal. I do not recall much feedback on it.

...

...
Yes, the functions will give warnings otherwise.

I think this needs to be revisited. I don't think these changes are necessary for *every* use of macros. It can cause a lot of effort for people downstream without concrete benefit.

The idea is to slowly move towards hiding the innards of the array type. This has been under discussion since 1.3 came out. It is certainly the case that not all macros need to go away.

I know it's been under discussion, but it looks like a lot of changes were made just last year (and I am just starting to understand the implications of all those changes). I think there are many NumPy users that will be in the same position over the coming years. This is a bit more than just hiding the array innards. The array innards have been "hidden" by using the macros since NumPy 1.0. There was a specific intent to create macros for all array access and encourage use of those macros --- precisely so that the array object could change. The requirement of ABI compatibility was not pre-envisioned in NumPy 1.0 Neither was NumPy 1.0 trying to provide type-safety in all cases. I don't recall a discussion on the value of having macros that can be interpreted at least as both PyObject * and PyArrayObject *. Perhaps this is possible, and I just need to be educated. But, my opinion is that it's not always useful to require type-casting especially between those two.

...

...
That's not as nice to type.

So? The point is to have correctness, not ease of typing.

I'm not sure if a pun was intended there or not. C is not a safe and fully-typed system. That is one of its weaknesses according to many. But, I would submit that not being forced to give everything a "type" (and recognizing the tradeoffs that implies) is also one reason it gets used.

C was famous for bugs due to the lack of function prototypes. This was fixed with C99 and the stricter typing was a great help.

Bugs are not "due to lack of function prototypes". Bugs are due to mistakes that programmers make (and I know all about mistakes programmers make). Function prototypes can help detect some kinds of mistakes which is helpful. But, this doesn't help the question of how to transition a weakly-typed program or whether or not that is even a useful exercise.

...

...
Is that assuming that PyArray_NDIM will become a function and need a specific object type for its argument (and everything else cast....). That's one clear disadvantage of inline functions versus macros in my mind: no automatic polymorphism.

That's a disadvantage of Python. The virtue of inline functions is precisely type checking.

Right, but we need to be more conscientious about this. Not every use of Macros should be replaced by inline function calls and the requisite *forced* type-checking. type-chekcing is not *universally* a virtue --- if it were, nobody would use Python.

...
I don't think type safety is a big win for macros like these. We need to be more judicious about which macros are scheduled for function inlining. Some just don't benefit from the type-safety implications as much as others do, and you end up requiring everyone to change their code downstream for no real reason.

These sorts of changes really feel to me like unnecessary spelling changes that require work from extension writers who now have to modify their code with no real gain. There seems to be a lot of that going on in the code base and I'm not really convinced that it's useful for end-users.

Good style and type checking are useful. Numpy needs more of both.

You can assert it, but it doesn't make it so. "Good style" depends on what you are trying to accomplish and on your point of view. NumPy's style is not the product of one person, it's been adapted from multiple styles and inherits quite a bit from Python's style. I don't make any claims for it other than it allowed me to write it with the time and experience I had 7 years ago. We obviously disagree about this point. I'm sorry about that. I'm pretty flexible usually --- that's probably one of your big criticisms of my "style".

Curiously, my criticism would be more that you are inflexible, slow to change old habits.

I don't mind changing old habits at all. In fact, I don't think you know me very well if that's your take. You have a very narrow window into my activity and behavior. Of course habits are always hard to change (that's why they call them habits). Mostly, I need to be convinced of the value of changing old patterns --- just like everyone else (including existing NumPy users). On the type-question, I'm just not convinced that the most pressing matter in NumPy and SciPy is to re-write existing code to be more strictly typed. I'm quite open to other view points on that --- as long as backward compatibility is preserved, or a clear upgrade story is provided to existing users.

...

But, one of the things I feel quite strongly about is how hard we make it for NumPy users to upgrade. There are two specific things I disagree with pretty strongly:

1) Changing defined macros that should work the same on PyArrayObjects or PyObjects to now *require* types --- if we want to introduce new macros that require types than we can --- as long as it just provides warnings but still compiles then I suppose I could find this acceptable.

2) Changing MACROS to require semicolons when they were previously not needed. I'm going to be very hard-nosed about this one.

...
I'm going to be a lot more resistant to that sort of change in the code base when I see it.

Numpy is a team effort. There are people out there who write better code than you do, you should learn from them.

Exactly! It's a team effort. I'm part of that team as well, and while I don't always have strong opinions about things. When I do, I'm going to voice it.

I've learned long ago there are people that write better code than me. There are people that write better code than you.

Of course. Writing code is not my profession, and even if it were, there are people out there who would be immeasurable better. I have tried to improve my style over the years by reading books and browsing code by people who are better than me. I also recognize common bad habits naive coders tend to pick up when they start out, not least because I have at one time or another had many of the same bad habits.

We are really quite a like here. I have done and continue to do exactly the same thing. My priorities are just different. I don't believe it is universally useful to alter patterns in existing code. I have typically adapted my style to the code I'm working with. Numeric had a style which I adapted to. Python has a style which I adapted to. I think that people reading code and seeing multiple styles will find the code harder to read. Such changes of style take work, and quite often the transition is not worth the effort. I have not been nor will I continue to be in opposition to changes that improve things (under any developers notion of "improvement"). The big exception to that is when it seems to me that the changes will make it more difficult for existing users to use their code. I know you are trying to make it easier for NumPy developers as you understand it. I really admire you for doing what you feel strongly about. I think we are both in our way trying to encourage more NumPy developers (you by making the code easier to get in to) and me by trying to figure out acceptable ways to fund them. I just think that we must recognize the users out there who have written to the existing NumPy interface. Any change that requires effort from users should be met with skepticism. We can write new interfaces and encourage new users to use those new interfaces. We can even re-write NumPy internals to use those interfaces. But, we can't just change documented interfaces (and even be careful about undocumented but implied interfaces -- I agree that this gets difficult to really adhere to, but we can and should try at least for heavily used code-paths). One thing I'm deeply aware of is the limited audience of this list compared to the user base of NumPy and the intertia of old NumPy releases. Discussions on this list are just not visible to the wider user base. My recent activity and interest is in protecting that user-base from the difficulties that recent changes are going to be on people upgrading from 1.5. My failing last year was to encourage and pay for (through Enthought) Mark's full-time activity on this list but not have the time to provide enough guidance to him about my understanding of the implications of his changes and think hard enough about those to understand them in the time.

...

That is not the question here at all. The question here is not requiring a *re-write* of code in order to get their extensions to compile using NumPy headers. We should not be making people change their code to get their extensions to compile in NumPy 1.X

I think a bit of rewrite here and there along the way is more palatable than a big change coming in as one big lump, especially if the changes are done with a long term goal in mind. We are working towards a Numpy 2, but we can't just go off for a year or two and write it, we have to get there step by step. And that requires a plan.

We see things a little differently on that front, I think. A bit of re-write here and there for down-stream users is exactly the wrong approach in my view. I think it depends on the user. For one who is tracking every NumPy release and has time to make any and all changes needed, I think you are right --- that approach will work for them. However, there are people out there who are using NumPy in ways (either significantly or only indirectly) where having to change *any* code from one release to another will make them seriously annoyed and we will start losing users.

...

...
One particularly glaring example to my lens on the world: I think it would have been better to define new macros which require semicolons than changing the macros that don't require semicolons to now require semicolons:

NPY_BEGIN_THREADS_DEF NPY_BEGIN_THREADS NPY_ALLOW_C_API NPY_ALLOW_C_API_DEF NPY_DISABLE_C_API

That feels like a gratuitous style change that will force users of those macros to re-write their code.

It doesn't seem to be much of a problem.

Unfortunately, I don't trust your judgment on that. My experience and understanding tells a much different story. I'm sorry if you disagree with me.

I'm sorry I made you sorry ;) The problem here is that you don't come forth with specifics. People tell you things, but you don't say who or what their specific problem was. Part of working with a team is keeping folks informed, it isn't that useful to appeal to authority. I watch the list, which is admittedly a small window into the community, and I haven't seen show stoppers. Bugs, sure, but that isn't the same thing.

I came up with a very specific thing. I'm not sure what you are talking about. If you are talking about discussions with people off list, then I can't speak for them unless they have allowed me to. I encourage them to speak up here as often as they can. Yes, you will have to trust that a little bit of concern might just be an iceberg waiting to sink the ship.

...

...
Sure, it's a simple change, but it's a simple change that doesn't do anything for you as an end user. I think I'm going to back this change out, in fact. I can't see requiring people to change their C-code like this will require without a clear benefit to them. I'm quite sure there is code out there that uses these documented APIs (without the semicolon). If we want to define new macros that require colons, then we do that, but we can't get rid of the old ones --- especially in a 1.x release.

Our policy should not be to allow gratuitous style changes just because we think something is prettier another way. The NumPy code base has come from multiple sources and reflects several styles. It also follows an older style of C-programming (that is quite common in the Python code base). It can be changed, but those changes shouldn't be painful for a library user without some specific gain for them that the change allows.

You use that word 'gratuitous' a lot, I don't think it means what you think it means. For instance, the new polynomial coefficient order wasn't gratuitous, it was doing things in a way many found more intuitive and generalized better to different polynomial basis. People have different ideas, that doesn't make them gratuitous.

That's a slightly different issue. At least you created a new object and api which is a *little* better. My complaint about the choice there is now there *must* be two interfaces and added confusion as people will have to figure out which assumption is being used. I don't really care about the coefficient order --- really I don't. Either one is fine in my mind. I recognize the reasons. The problem is *changing* it without a *really* good reason. Now, we have to have two different APIs. I would much preferred to have poly1d disappear and just use your much nicer polynomial classes. Now, it can't and we are faced with a user-story that is either difficult for someone transitioning from MATLAB

Most folks aren't going to transition from MATLAB or IDL. Engineers tend to stick with the tools they learned in school, they aren't interested in the tool itself as long as they can get their job done. And getting the job done is what they are paid for. That said, I doubt they would have much problem making the adjustment if they were inclined to switch tools.

I don't share your pessimism. You really think that "most folks aren't going to transition". It's happening now. It's been happening for several years.

...

or a "why did you do that?" puzzled look from a new user as to why we support both coefficient orders. Of course, that could be our story --- hey we support all kinds of orders, it doesn't really matter, you just have to tell us what you mean when passing in an unadorned array of coefficients. But, this is a different issue.

I'm using the word 'gratuitous' to mean that it is "uncalled for and lacks a good reason". There needs to be much better reasons given for code changes that require someone to re-write working code than "it's better style" or even "it will help new programmers avoid errors". Let's write another interface that new programmers can use that fits the world the way you see it, don't change what's already working just because you don't like it or wish a different choice had been made.

Well, and that was exactly what you meant when you called to coefficient order 'gratuitous' in your first post to me about it. The problem was that you didn't understand why I made the change until I explained it, but rather made the charge sans explanation. It might be that some of the other things you call gratuitous are less so than you think. These are hasty judgements I think.

I'm sure we all have our share of hasty judgments to go around. Even after your explanation, I still disagree with it. But, I appreciate the reminder to give you the benefit of the doubt when I encounter something that makes me raise my eyebrows. I hope you will do the same.

...

...
There are significant users of NumPy out there still on 1.4. Even the policy of deprecation that has been discussed will not help people trying to upgrade from 1.4 to 1.8. They will be forced to upgrade multiple times. The easier we can make this process for users the better. I remain convinced that it's better and am much more comfortable with making a release that requires a re-compile (that will succeed without further code changes --- because of backward compatibility efforts) than to have supposed ABI compatibility with subtle semantic changes and required C-code changes when you do happen to re-compile.

Cleanups need to be made bit by bit. I don't think we have done anything that will cause undo trouble.

I disagree substantially on the impact of these changes. You can disagree about my awareness of NumPy users, but I think I understand a large number of them and why NumPy has been successful in getting users. I agree that we have been unsuccessful at getting serious developers and I'm convinced by you and Mark as to why that is. But, we can't sacrifice users for the sake of getting developers who will spend their free time trying to get around the organic pile that NumPy is at this point.

Because of this viewpoint, I think there is some adaptation and cleanup right now, needed, so that significant users of NumPy can upgrade based on the changes that have occurred without causing them annoying errors (even simple changes can be a pain in the neck to fix).

I do agree changes can be made. I realize you've worked hard to keep the code-base in a state that you find more adequate. I think you go overboard on that front, but I acknowledge that there are people that appreciate this. I do feel very strongly that we should not require users to have to re-write working C-code in order to use a new minor version number in NumPy, regardless of how the code "looks" or how much "better" it is according to some idealized standard.

The macro changes are border-line (at least I believe code will still compile --- just raise warnings, but I need to be sure about this). The changes that require semi-colons are not acceptable at all.

I was tempted to back them out myself, but I don't think the upshot will be earth shaking.

I think it's important that code using NumPy headers that compiled with 1.5 will compile with 1.7.

...

Look Charles, I believe we can continue to work productively together and our differences can be a strength to the community. I hope you feel the same way. I will continue to respect and listen to your perspective --- especially when I disagree with it.

Sounds like a threat to me. Who are you to judge? If you are going to be the dictator, let's put that out there and make it official.

Wow, charles! I think you should re-read what I wrote. It was not a threat at all. It was an appeal to work more closely together, and a commitment on my end to listen to your point of view and try to sift from any of my own opposition the chaff from the wheat. I am just not thinking in those terms at all. I do not think it is appropriate to talk about a dictator in this context. I have no control over what you do, and you have no control over what I do. We can only work cooperatively or independently for the benefit of NumPy. Perhaps there are things I've said and done that really bother you, or have offended you. I'm sorry for anything I've said that might have grated on you personally. I do appreciate your voice, ability, perspective, and skill. I suspect there are others in the NumPy community that feel the same way. Best regards, -Travis

Charles R Harris

4:20 p.m.

On Sun, Jun 24, 2012 at 10:09 PM, Travis Oliphant <travis@continuum.io>wrote:

...

What has been done in the past is that an intent to fork is announced some two weeks in advance so that people can weigh in on what needs to be done before the fork. The immediate fork was a bit hasty. Likewise, when I suggested going to the github issue tracking, I opened a discussion on needed tags, but voila, there it was with an incomplete set and no discussion. That to seemed hasty.

My style is just different. I like to do things and then if discussion requires an alteration, we alter. It is just a different style. The labels can be altered, they are not set in stone. I prefer to have something to talk about and a starting point to alter from --- especially on potential bike-shedding discussions. There are several people that can make changes to the labels. If we have difficulty agreeing then we can go from that point.

...
...
There is time before the first Release candidate to make changes on the 1.7.x branch. If you want to make the changes on master, and just indicate the Pull requests, Ondrej can make sure they are added to the 1.7.x. branch by Monday. We can also delay the first Release Candidate by a few days to next Wednesday and then bump everything 3 days if that will help. There will be a follow-on 1.8 release before the end of the year --- so there is time to make changes for that release as well. The next release will not take a year to get out, so we shouldn't feel pressured to get *everything* in this release.

What are we going to do for 1.8?

Let's get 1.7 out the door first.

Mark proposed a schedule for the next several releases, I'd like to know if we are going to follow it.

We should discuss it again. I don't recall the specifics and I believe it was just a proposal. I do not recall much feedback on it.

...
Yes, the functions will give warnings otherwise.

I think this needs to be revisited. I don't think these changes are necessary for *every* use of macros. It can cause a lot of effort for people downstream without concrete benefit.

The idea is to slowly move towards hiding the innards of the array type. This has been under discussion since 1.3 came out. It is certainly the case that not all macros need to go away.

I know it's been under discussion, but it looks like a lot of changes were made just last year (and I am just starting to understand the implications of all those changes). I think there are many NumPy users that will be in the same position over the coming years. This is a bit more than just hiding the array innards. The array innards have been "hidden" by using the macros since NumPy 1.0. There was a specific intent to create macros for all array access and encourage use of those macros --- precisely so that the array object could change. The requirement of ABI compatibility was not pre-envisioned in NumPy 1.0

Neither was NumPy 1.0 trying to provide type-safety in all cases. I don't recall a discussion on the value of having macros that can be interpreted at least as both PyObject * and PyArrayObject *. Perhaps this is possible, and I just need to be educated. But, my opinion is that it's not always useful to require type-casting especially between those two.

...
...
That's not as nice to type.

So? The point is to have correctness, not ease of typing.

I'm not sure if a pun was intended there or not. C is not a safe and fully-typed system. That is one of its weaknesses according to many. But, I would submit that not being forced to give everything a "type" (and recognizing the tradeoffs that implies) is also one reason it gets used.

C was famous for bugs due to the lack of function prototypes. This was fixed with C99 and the stricter typing was a great help.

Bugs are not "due to lack of function prototypes". Bugs are due to mistakes that programmers make (and I know all about mistakes programmers make). Function prototypes can help detect some kinds of mistakes which is helpful. But, this doesn't help the question of how to transition a weakly-typed program or whether or not that is even a useful exercise.

Oh, come on. Writing correct C code used to be a guru exercise. A friend of mine, a Putnam fellow, was the Weitek guru for drivers. To say bugs are programmer mistakes is information free, the question is how to minimize programmer mistakes.

...

...
...
Is that assuming that PyArray_NDIM will become a function and need a specific object type for its argument (and everything else cast....). That's one clear disadvantage of inline functions versus macros in my mind: no automatic polymorphism.

That's a disadvantage of Python. The virtue of inline functions is precisely type checking.

Right, but we need to be more conscientious about this. Not every use of Macros should be replaced by inline function calls and the requisite *forced* type-checking. type-chekcing is not *universally* a virtue --- if it were, nobody would use Python.

...
I don't think type safety is a big win for macros like these. We need to be more judicious about which macros are scheduled for function inlining. Some just don't benefit from the type-safety implications as much as others do, and you end up requiring everyone to change their code downstream for no real reason.

These sorts of changes really feel to me like unnecessary spelling changes that require work from extension writers who now have to modify their code with no real gain. There seems to be a lot of that going on in the code base and I'm not really convinced that it's useful for end-users.

Good style and type checking are useful. Numpy needs more of both.

You can assert it, but it doesn't make it so. "Good style" depends on what you are trying to accomplish and on your point of view. NumPy's style is not the product of one person, it's been adapted from multiple styles and inherits quite a bit from Python's style. I don't make any claims for it other than it allowed me to write it with the time and experience I had 7 years ago. We obviously disagree about this point. I'm sorry about that. I'm pretty flexible usually --- that's probably one of your big criticisms of my "style".

Curiously, my criticism would be more that you are inflexible, slow to change old habits.

I don't mind changing old habits at all. In fact, I don't think you know me very well if that's your take. You have a very narrow window into my activity and behavior. Of course habits are always hard to change (that's why they call them habits). Mostly, I need to be convinced of the value of changing old patterns --- just like everyone else (including existing NumPy users). On the type-question, I'm just not convinced that the most pressing matter in NumPy and SciPy is to re-write existing code to be more strictly typed. I'm quite open to other view points on that --- as long as backward compatibility is preserved, or a clear upgrade story is provided to existing users.

...
But, one of the things I feel quite strongly about is how hard we make it for NumPy users to upgrade. There are two specific things I disagree with pretty strongly:

1) Changing defined macros that should work the same on PyArrayObjects or PyObjects to now *require* types --- if we want to introduce new macros that require types than we can --- as long as it just provides warnings but still compiles then I suppose I could find this acceptable.

2) Changing MACROS to require semicolons when they were previously not needed. I'm going to be very hard-nosed about this one.

...
I'm going to be a lot more resistant to that sort of change in the code base when I see it.

Numpy is a team effort. There are people out there who write better code than you do, you should learn from them.

Exactly! It's a team effort. I'm part of that team as well, and while I don't always have strong opinions about things. When I do, I'm going to voice it.

I've learned long ago there are people that write better code than me. There are people that write better code than you.

Of course. Writing code is not my profession, and even if it were, there are people out there who would be immeasurable better. I have tried to improve my style over the years by reading books and browsing code by people who are better than me. I also recognize common bad habits naive coders tend to pick up when they start out, not least because I have at one time or another had many of the same bad habits.

We are really quite a like here. I have done and continue to do exactly the same thing. My priorities are just different. I don't believe it is universally useful to alter patterns in existing code. I have typically adapted my style to the code I'm working with. Numeric had a style which I adapted to. Python has a style which I adapted to. I think that people reading code and seeing multiple styles will find the code harder to read. Such changes of style take work, and quite often the transition is not worth the effort. I have not been nor will I continue to be in opposition to changes that improve things (under any developers notion of "improvement"). The big exception to that is when it seems to me that the changes will make it more difficult for existing users to use their code.

I know you are trying to make it easier for NumPy developers as you understand it. I really admire you for doing what you feel strongly about. I think we are both in our way trying to encourage more NumPy developers (you by making the code easier to get in to) and me by trying to figure out acceptable ways to fund them.

I just think that we must recognize the users out there who have written to the existing NumPy interface. Any change that requires effort from users should be met with skepticism. We can write new interfaces and encourage new users to use those new interfaces. We can even re-write NumPy internals to use those interfaces. But, we can't just change documented interfaces (and even be careful about undocumented but implied interfaces -- I agree that this gets difficult to really adhere to, but we can and should try at least for heavily used code-paths). One thing I'm deeply aware of is the limited audience of this list compared to the user base of NumPy and the intertia of old NumPy releases. Discussions on this list are just not visible to the wider user base. My recent activity and interest is in protecting that user-base from the difficulties that recent changes are going to be on people upgrading from 1.5.

My failing last year was to encourage and pay for (through Enthought) Mark's full-time activity on this list but not have the time to provide enough guidance to him about my understanding of the implications of his changes and think hard enough about those to understand them in the time.

I thought Mark's activities actually declined once he entered the Enthought black hole. To be more specific, Mark did things that interested Enthought. I'd like to know what Mark himself would have liked to do. When an original thinker with impressive skills comes along it is worth letting them have a fair amount of freedom to move things, it's the only way to avoid stagnation.

...

That is not the question here at all. The question here is not

...
requiring a *re-write* of code in order to get their extensions to compile using NumPy headers. We should not be making people change their code to get their extensions to compile in NumPy 1.X

I think a bit of rewrite here and there along the way is more palatable than a big change coming in as one big lump, especially if the changes are done with a long term goal in mind. We are working towards a Numpy 2, but we can't just go off for a year or two and write it, we have to get there step by step. And that requires a plan.

We see things a little differently on that front, I think. A bit of re-write here and there for down-stream users is exactly the wrong approach in my view. I think it depends on the user. For one who is tracking every NumPy release and has time to make any and all changes needed, I think you are right --- that approach will work for them. However, there are people out there who are using NumPy in ways (either significantly or only indirectly) where having to change *any* code from one release to another will make them seriously annoyed and we will start losing users.

Remember the lessons of 2.0, and of Python 3.0 for that matter.

...

...
...
One particularly glaring example to my lens on the world: I think it would have been better to define new macros which require semicolons than changing the macros that don't require semicolons to now require semicolons:

NPY_BEGIN_THREADS_DEF NPY_BEGIN_THREADS NPY_ALLOW_C_API NPY_ALLOW_C_API_DEF NPY_DISABLE_C_API

That feels like a gratuitous style change that will force users of those macros to re-write their code.

It doesn't seem to be much of a problem.

Unfortunately, I don't trust your judgment on that. My experience and understanding tells a much different story. I'm sorry if you disagree with me.

I'm sorry I made you sorry ;) The problem here is that you don't come forth with specifics. People tell you things, but you don't say who or what their specific problem was. Part of working with a team is keeping folks informed, it isn't that useful to appeal to authority. I watch the list, which is admittedly a small window into the community, and I haven't seen show stoppers. Bugs, sure, but that isn't the same thing.

I came up with a very specific thing. I'm not sure what you are talking about. If you are talking about discussions with people off list, then I can't speak for them unless they have allowed me to. I encourage them to speak up here as often as they can. Yes, you will have to trust that a little bit of concern might just be an iceberg waiting to sink the ship.

...
...
Sure, it's a simple change, but it's a simple change that doesn't do anything for you as an end user. I think I'm going to back this change out, in fact. I can't see requiring people to change their C-code like this will require without a clear benefit to them. I'm quite sure there is code out there that uses these documented APIs (without the semicolon). If we want to define new macros that require colons, then we do that, but we can't get rid of the old ones --- especially in a 1.x release.

Our policy should not be to allow gratuitous style changes just because we think something is prettier another way. The NumPy code base has come from multiple sources and reflects several styles. It also follows an older style of C-programming (that is quite common in the Python code base). It can be changed, but those changes shouldn't be painful for a library user without some specific gain for them that the change allows.

You use that word 'gratuitous' a lot, I don't think it means what you think it means. For instance, the new polynomial coefficient order wasn't gratuitous, it was doing things in a way many found more intuitive and generalized better to different polynomial basis. People

have different ideas, that doesn't make them gratuitous.

That's a slightly different issue. At least you created a new object and api which is a *little* better. My complaint about the choice there is now there *must* be two interfaces and added confusion as people will have to figure out which assumption is being used. I don't really care about the coefficient order --- really I don't. Either one is fine in my mind. I recognize the reasons. The problem is *changing* it without a *really* good reason. Now, we have to have two different APIs. I would much preferred to have poly1d disappear and just use your much nicer polynomial classes. Now, it can't and we are faced with a user-story that is either difficult for someone transitioning from MATLAB

Most folks aren't going to transition from MATLAB or IDL. Engineers tend to stick with the tools they learned in school, they aren't interested in the tool itself as long as they can get their job done. And getting the job done is what they are paid for. That said, I doubt they would have much problem making the adjustment if they were inclined to switch tools.

I don't share your pessimism. You really think that "most folks aren't going to transition". It's happening now. It's been happening for several years.

I still haven't seen it. Once upon a time code for optical design was a new thing and many folks wrote their own, myself for one. These days they reach for Code V or Zemax. When they make the schematics they use something like Solidworks. When it comes time for thermal anaysis they run the Solidworks design into another commercial program. When it comes time to manufacture the parts another package takes the Solidworks data and produces nc instructions to drive the tools. The thing is, there is a whole ecosystem built around a few standard design tools. Similar considerations hold in civil engineering, architecture, and many other areas. Another example would be Linux on the desktop. That never really took off, Microsoft is still the dominant presence there. Where Linux succeeded was in embedded devices and smart phones, markets that hadn't yet developed a large ecosystem and where pennies count. Now to Matlab, suppose you want to analyse thermal effects on an orbiting satellite. Do you sit down and start writing new code in Python or do you buy a package for Matlab that deals with orbital calculations and knows all about shading and illumination? Suppose further that you have a few weeks to pull it off and have used the Matlab tools in the past. Matlab wins in this situation, Python isn't even a consideration. There are certainly places for Python out there. HPC is one, because last I looked Matlab licenses were still based around the number of cpu cores, so there are significant cost savings. Research that needs innovative software is another area where Python has an advantage. First, because in research it is expected that time will be spent exploring new things, and second because it is easier to write Python than Matlab scripts and there are more tools available at no cost. On the other hand, if you need sophisticated mathematics, Mathematica is the easy way to go. Engineering is a big area, and only a small part of it offers opportunity for Python to make inroads.

...

or a "why did you do that?" puzzled look from a new user as to why we

...
support both coefficient orders. Of course, that could be our story --- hey we support all kinds of orders, it doesn't really matter, you just have to tell us what you mean when passing in an unadorned array of coefficients. But, this is a different issue.

I'm using the word 'gratuitous' to mean that it is "uncalled for and lacks a good reason". There needs to be much better reasons given for code changes that require someone to re-write working code than "it's better style" or even "it will help new programmers avoid errors". Let's write another interface that new programmers can use that fits the world the way you see it, don't change what's already working just because you don't like it or wish a different choice had been made.

Well, and that was exactly what you meant when you called to coefficient order 'gratuitous' in your first post to me about it. The problem was that you didn't understand why I made the change until I explained it, but rather made the charge sans explanation. It might be that some of the other things you call gratuitous are less so than you think. These are hasty judgements I think.

I'm sure we all have our share of hasty judgments to go around. Even after your explanation, I still disagree with it. But, I appreciate the reminder to give you the benefit of the doubt when I encounter something that makes me raise my eyebrows. I hope you will do the same.

...
...
There are significant users of NumPy out there still on 1.4. Even the policy of deprecation that has been discussed will not help people trying to upgrade from 1.4 to 1.8. They will be forced to upgrade multiple times. The easier we can make this process for users the better. I remain convinced that it's better and am much more comfortable with making a release that requires a re-compile (that will succeed without further code changes --- because of backward compatibility efforts) than to have supposed ABI compatibility with subtle semantic changes and required C-code changes when you do happen to re-compile.

Cleanups need to be made bit by bit. I don't think we have done anything that will cause undo trouble.

I disagree substantially on the impact of these changes. You can disagree about my awareness of NumPy users, but I think I understand a large number of them and why NumPy has been successful in getting users. I agree that we have been unsuccessful at getting serious developers and I'm convinced by you and Mark as to why that is. But, we can't sacrifice users for the sake of getting developers who will spend their free time trying to get around the organic pile that NumPy is at this point.

Because of this viewpoint, I think there is some adaptation and cleanup right now, needed, so that significant users of NumPy can upgrade based on the changes that have occurred without causing them annoying errors (even simple changes can be a pain in the neck to fix).

I do agree changes can be made. I realize you've worked hard to keep the code-base in a state that you find more adequate. I think you go overboard on that front, but I acknowledge that there are people that appreciate this. I do feel very strongly that we should not require users to have to re-write working C-code in order to use a new minor version number in NumPy, regardless of how the code "looks" or how much "better" it is according to some idealized standard.

The macro changes are border-line (at least I believe code will still compile --- just raise warnings, but I need to be sure about this). The changes that require semi-colons are not acceptable at all.

I was tempted to back them out myself, but I don't think the upshot will be earth shaking.

I think it's important that code using NumPy headers that compiled with 1.5 will compile with 1.7.

...
Look Charles, I believe we can continue to work productively together and our differences can be a strength to the community. I hope you feel the same way. I will continue to respect and listen to your perspective --- especially when I disagree with it.

Sounds like a threat to me. Who are you to judge? If you are going to be the dictator, let's put that out there and make it official.

Wow, charles! I think you should re-read what I wrote. It was not a threat at all. It was an appeal to work more closely together, and a commitment on my end to listen to your point of view and try to sift from any of my own opposition the chaff from the wheat.

I am just not thinking in those terms at all. I do not think it is appropriate to talk about a dictator in this context. I have no control over what you do, and you have no control over what I do. We can only work cooperatively or independently for the benefit of NumPy.

Perhaps there are things I've said and done that really bother you, or have offended you. I'm sorry for anything I've said that might have grated on you personally. I do appreciate your voice, ability, perspective, and skill. I suspect there are others in the NumPy community that feel the same way.

Chuck

Travis Oliphant

5:41 p.m.

...

...
C was famous for bugs due to the lack of function prototypes. This was fixed with C99 and the stricter typing was a great help.

Bugs are not "due to lack of function prototypes". Bugs are due to mistakes that programmers make (and I know all about mistakes programmers make). Function prototypes can help detect some kinds of mistakes which is helpful. But, this doesn't help the question of how to transition a weakly-typed program or whether or not that is even a useful exercise.

Oh, come on. Writing correct C code used to be a guru exercise. A friend of mine, a Putnam fellow, was the Weitek guru for drivers. To say bugs are programmer mistakes is information free, the question is how to minimize programmer mistakes.

Bugs *are* programmer mistakes. Let's put responsibility where it lies. Of course, writing languages that help programmers make fewer mistakes (or catch them earlier when they do) are a good thing. I'm certainly not arguing against that. But, I reiterate that just because a better way to write new code under some metric is discovered or understood does not mean that all current code should be re-written to use that style. That's the only comment I'm making. Also, you mention the lessons from Python 2 and Python 3, but I'm not sure we would agree on what those lessons actually were, so I wouldn't rely on that as a way of getting your point across if it matters. Best, -Travis

Benjamin Root

5:55 p.m.

On Mon, Jun 25, 2012 at 1:41 PM, Travis Oliphant <travis@continuum.io>wrote:

...

...
C was famous for bugs due to the lack of function prototypes. This was fixed with C99 and the stricter typing was a great help.

Bugs are not "due to lack of function prototypes". Bugs are due to mistakes that programmers make (and I know all about mistakes programmers make). Function prototypes can help detect some kinds of mistakes which is helpful. But, this doesn't help the question of how to transition a weakly-typed program or whether or not that is even a useful exercise.

Oh, come on. Writing correct C code used to be a guru exercise. A friend of mine, a Putnam fellow, was the Weitek guru for drivers. To say bugs are programmer mistakes is information free, the question is how to minimize programmer mistakes.

Bugs *are* programmer mistakes. Let's put responsibility where it lies. Of course, writing languages that help programmers make fewer mistakes (or catch them earlier when they do) are a good thing. I'm certainly not arguing against that.

But, I reiterate that just because a better way to write new code under some metric is discovered or understood does not mean that all current code should be re-written to use that style. That's the only comment I'm making.

Also, you mention the lessons from Python 2 and Python 3, but I'm not sure we would agree on what those lessons actually were, so I wouldn't rely on that as a way of getting your point across if it matters.

Best,

-Travis

At the risk of starting a language flame war, my take of Charles' comment about the lessons of python 3.0 is its success in getting packages transitioned smoothly (still an on-going process), versus what happened with Perl 5. Perl 5 was a major change that happened all at once and no-one adopted it for the longest time. Meanwhile, python incremented itself from the 2.x series to the 3.x series in a very nice manner with a well-thought-out plan that was visible to all. At least, that is my understanding and perception. Take it with as much salt as you (or your doctor) desires. Ben Root

Perry Greenfield

5:56 p.m.

On Jun 25, 2012, at 12:20 PM, Charles R Harris wrote:

...

...
...
Most folks aren't going to transition from MATLAB or IDL. Engineers tend to stick with the tools they learned in school, they aren't interested in the tool itself as long as they can get their job done. And getting the job done is what they are paid for. That said, I doubt they would have much problem making the adjustment if they were inclined to switch tools.

I don't share your pessimism. You really think that "most folks aren't going to transition". It's happening now. It's been happening for several years.

I still haven't seen it. Once upon a time code for optical design was a new thing and many folks wrote their own, myself for one. These days they reach for Code V or Zemax. When they make the schematics they use something like Solidworks. When it comes time for thermal anaysis they run the Solidworks design into another commercial program. When it comes time to manufacture the parts another package takes the Solidworks data and produces nc instructions to drive the tools. The thing is, there is a whole ecosystem built around a few standard design tools. Similar considerations hold in civil engineering, architecture, and many other areas.

Another example would be Linux on the desktop. That never really took off, Microsoft is still the dominant presence there. Where Linux succeeded was in embedded devices and smart phones, markets that hadn't yet developed a large ecosystem and where pennies count.

Now to Matlab, suppose you want to analyse thermal effects on an orbiting satellite. Do you sit down and start writing new code in Python or do you buy a package for Matlab that deals with orbital calculations and knows all about shading and illumination? Suppose further that you have a few weeks to pull it off and have used the Matlab tools in the past. Matlab wins in this situation, Python isn't even a consideration.

There are certainly places for Python out there. HPC is one, because last I looked Matlab licenses were still based around the number of cpu cores, so there are significant cost savings. Research that needs innovative software is another area where Python has an advantage. First, because in research it is expected that time will be spent exploring new things, and second because it is easier to write Python than Matlab scripts and there are more tools available at no cost. On the other hand, if you need sophisticated mathematics, Mathematica is the easy way to go.

Engineering is a big area, and only a small part of it offers opportunity for Python to make inroads.

It's hard to generalize that much here. There are some areas in what you say is true, particularly if whole industries rely on libraries that have much time involved in developing them, and for which it is particularly difficult to break away. But there are plenty of other areas where it isn't that hard. I'd characterize the process a bit differently. I would agree that it is pretty hard to get someone who has been using matlab or IDL for many years to transition. That doesn't happen very often (if it does, it's because all the other people they work with are using a different tool and they are forced to). I think we are targeting the younger people; those that do not have a lot of experience tied up in matlab or IDL. For example, IDL is very well established in astronomy, and we've seen few make that switch if they already have been using IDL for a while. But we are seeing many more younger astronomers choose Python over IDL these days. Perry

Charles R Harris

7:25 p.m.

On Mon, Jun 25, 2012 at 11:56 AM, Perry Greenfield <perry@stsci.edu> wrote:

...

On Jun 25, 2012, at 12:20 PM, Charles R Harris wrote:

...
...
...
Most folks aren't going to transition from MATLAB or IDL. Engineers tend to stick with the tools they learned in school, they aren't interested in the tool itself as long as they can get their job done. And getting the job done is what they are paid for. That said, I doubt they would have much problem making the adjustment if they were inclined to switch tools.

I don't share your pessimism. You really think that "most folks aren't going to transition". It's happening now. It's been happening for several years.

I still haven't seen it. Once upon a time code for optical design was a new thing and many folks wrote their own, myself for one. These days they reach for Code V or Zemax. When they make the schematics they use something like Solidworks. When it comes time for thermal anaysis they run the Solidworks design into another commercial program. When it comes time to manufacture the parts another package takes the Solidworks data and produces nc instructions to drive the tools. The thing is, there is a whole ecosystem built around a few standard design tools. Similar considerations hold in civil engineering, architecture, and many other areas.

Another example would be Linux on the desktop. That never really took off, Microsoft is still the dominant presence there. Where Linux succeeded was in embedded devices and smart phones, markets that hadn't yet developed a large ecosystem and where pennies count.

Now to Matlab, suppose you want to analyse thermal effects on an orbiting satellite. Do you sit down and start writing new code in Python or do you buy a package for Matlab that deals with orbital calculations and knows all about shading and illumination? Suppose further that you have a few weeks to pull it off and have used the Matlab tools in the past. Matlab wins in this situation, Python isn't even a consideration.

There are certainly places for Python out there. HPC is one, because last I looked Matlab licenses were still based around the number of cpu cores, so there are significant cost savings. Research that needs innovative software is another area where Python has an advantage. First, because in research it is expected that time will be spent exploring new things, and second because it is easier to write Python than Matlab scripts and there are more tools available at no cost. On the other hand, if you need sophisticated mathematics, Mathematica is the easy way to go.

Engineering is a big area, and only a small part of it offers opportunity for Python to make inroads.

It's hard to generalize that much here. There are some areas in what you say is true, particularly if whole industries rely on libraries that have much time involved in developing them, and for which it is particularly difficult to break away. But there are plenty of other areas where it isn't that hard.

I'd characterize the process a bit differently. I would agree that it is pretty hard to get someone who has been using matlab or IDL for many years to transition. That doesn't happen very often (if it does, it's because all the other people they work with are using a different tool and they are forced to). I think we are targeting the younger people; those that do not have a lot of experience tied up in matlab or IDL. For example, IDL is very well established in astronomy, and we've seen few make that switch if they already have been using IDL for a while. But we are seeing many more younger astronomers choose Python over IDL these days.

I didn't bring up the Astronomy experience, but I think that is a special case because it is a fairly small area and to some extent you had the advantage of a supported center, STSci, maintaining some software. There are also a lot of amateurs who can appreciate the low costs and simplicity of Python. The software engineers use tends to be set early, in college or in their first jobs. I suspect that these days professional astronomers spend a number of years in graduate school where they have time to experiment a bit. That is a nice luxury to have. Chuck

Perry Greenfield

10:21 p.m.

On Jun 25, 2012, at 3:25 PM, Charles R Harris wrote:

...

On Mon, Jun 25, 2012 at 11:56 AM, Perry Greenfield <perry@stsci.edu> wrote:

It's hard to generalize that much here. There are some areas in what you say is true, particularly if whole industries rely on libraries that have much time involved in developing them, and for which it is particularly difficult to break away. But there are plenty of other areas where it isn't that hard.

I'd characterize the process a bit differently. I would agree that it is pretty hard to get someone who has been using matlab or IDL for many years to transition. That doesn't happen very often (if it does, it's because all the other people they work with are using a different tool and they are forced to). I think we are targeting the younger people; those that do not have a lot of experience tied up in matlab or IDL. For example, IDL is very well established in astronomy, and we've seen few make that switch if they already have been using IDL for a while. But we are seeing many more younger astronomers choose Python over IDL these days.

I didn't bring up the Astronomy experience, but I think that is a special case because it is a fairly small area and to some extent you had the advantage of a supported center, STSci, maintaining some software. There are also a lot of amateurs who can appreciate the low costs and simplicity of Python.

The software engineers use tends to be set early, in college or in their first jobs. I suspect that these days professional astronomers spend a number of years in graduate school where they have time to experiment a bit. That is a nice luxury to have.

Sure. But it's not unusual for an invasive technology (that's us) to take root in certain niches before spreading more widely. Another way of looking at such things is: is what we are seeking to replace that much worse? If the gains are marginal, then it is very hard to displace. But if there are significant advantages, eventually they will win through. I tend to think Python and the scientific stack does offer the potential for great advantages over IDL or matlab. But that doesn't make it easy. Perry

Charles R Harris

12:01 a.m.

On Mon, Jun 25, 2012 at 4:21 PM, Perry Greenfield <perry@stsci.edu> wrote:

...

On Jun 25, 2012, at 3:25 PM, Charles R Harris wrote:

...
On Mon, Jun 25, 2012 at 11:56 AM, Perry Greenfield <perry@stsci.edu> wrote:

It's hard to generalize that much here. There are some areas in what you say is true, particularly if whole industries rely on libraries that have much time involved in developing them, and for which it is particularly difficult to break away. But there are plenty of other areas where it isn't that hard.

I'd characterize the process a bit differently. I would agree that it is pretty hard to get someone who has been using matlab or IDL for many years to transition. That doesn't happen very often (if it does, it's because all the other people they work with are using a different tool and they are forced to). I think we are targeting the younger people; those that do not have a lot of experience tied up in matlab or IDL. For example, IDL is very well established in astronomy, and we've seen few make that switch if they already have been using IDL for a while. But we are seeing many more younger astronomers choose Python over IDL these days.

I didn't bring up the Astronomy experience, but I think that is a special case because it is a fairly small area and to some extent you had the advantage of a supported center, STSci, maintaining some software. There are also a lot of amateurs who can appreciate the low costs and simplicity of Python.

The software engineers use tends to be set early, in college or in their first jobs. I suspect that these days professional astronomers spend a number of years in graduate school where they have time to experiment a bit. That is a nice luxury to have.

Sure. But it's not unusual for an invasive technology (that's us) to take root in certain niches before spreading more widely.

Another way of looking at such things is: is what we are seeking to replace that much worse? If the gains are marginal, then it is very hard to displace. But if there are significant advantages, eventually they will win through. I tend to think Python and the scientific stack does offer the potential for great advantages over IDL or matlab. But that doesn't make it easy.

I didn't say we couldn't make inroads. The original proposition was that we needed a polynomial class compatible with Matlab. I didn't think compatibility with Matlab mattered so much in that case because not many people switch, as you have agreed is the case, and those who start fresh, or are the adventurous sort, can adapt without a problem. In other words, IMHO, it wasn't a pressing issue and could be decided on the merits of the interface, which I thought of in terms of series approximation. In particular, it wasn't a 'gratuitous' choice as I had good reasons to do things the way I did. Chuck

Travis Oliphant

12:10 a.m.

You are still missing the point that there was already a choice that was made in the previous class --- made in Numeric actually. You made a change to that. It is the change that is 'gratuitous'. The pain and unnecessary overhead of having two competing standards is the problem --- not whether one is 'right' or not. That is a different discussion entirely. -- Travis Oliphant (on a mobile) 512-826-7480 On Jun 25, 2012, at 7:01 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

On Mon, Jun 25, 2012 at 4:21 PM, Perry Greenfield <perry@stsci.edu> wrote:

On Jun 25, 2012, at 3:25 PM, Charles R Harris wrote:

...
On Mon, Jun 25, 2012 at 11:56 AM, Perry Greenfield <perry@stsci.edu> wrote:

It's hard to generalize that much here. There are some areas in what you say is true, particularly if whole industries rely on libraries that have much time involved in developing them, and for which it is particularly difficult to break away. But there are plenty of other areas where it isn't that hard.

I'd characterize the process a bit differently. I would agree that it is pretty hard to get someone who has been using matlab or IDL for many years to transition. That doesn't happen very often (if it does, it's because all the other people they work with are using a different tool and they are forced to). I think we are targeting the younger people; those that do not have a lot of experience tied up in matlab or IDL. For example, IDL is very well established in astronomy, and we've seen few make that switch if they already have been using IDL for a while. But we are seeing many more younger astronomers choose Python over IDL these days.

I didn't bring up the Astronomy experience, but I think that is a special case because it is a fairly small area and to some extent you had the advantage of a supported center, STSci, maintaining some software. There are also a lot of amateurs who can appreciate the low costs and simplicity of Python.

The software engineers use tends to be set early, in college or in their first jobs. I suspect that these days professional astronomers spend a number of years in graduate school where they have time to experiment a bit. That is a nice luxury to have.

Sure. But it's not unusual for an invasive technology (that's us) to take root in certain niches before spreading more widely.

Another way of looking at such things is: is what we are seeking to replace that much worse? If the gains are marginal, then it is very hard to displace. But if there are significant advantages, eventually they will win through. I tend to think Python and the scientific stack does offer the potential for great advantages over IDL or matlab. But that doesn't make it easy.

I didn't say we couldn't make inroads. The original proposition was that we needed a polynomial class compatible with Matlab. I didn't think compatibility with Matlab mattered so much in that case because not many people switch, as you have agreed is the case, and those who start fresh, or are the adventurous sort, can adapt without a problem. In other words, IMHO, it wasn't a pressing issue and could be decided on the merits of the interface, which I thought of in terms of series approximation. In particular, it wasn't a 'gratuitous' choice as I had good reasons to do things the way I did.

Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Fernando Perez

12:21 a.m.

On Mon, Jun 25, 2012 at 5:10 PM, Travis Oliphant <travis@continuum.io> wrote:

...

You are still missing the point that there was already a choice that was made in the previous class --- made in Numeric actually.

You made a change to that. It is the change that is 'gratuitous'.

As someone who played a role in that change (by talking with Chuck about it, I didn't do the actual hard work), I'd like to pitch in. I think it's unfair to use the word gratuitous here, which is defined as: Adjective: Uncalled for; lacking good reason; unwarranted. It is true that the change happened to not consider enough the reasons that existed for the previous state of affairs, but it's *not* true that there were no good reasons for it. Calling something gratuitous is fairly derogatory, as it implies that it was done without any thinking whatsoever, and that was most certainly not the case here. It was a deliberate and considered change for what were *thought* to be good reasons. It's possible that, had there been feedback from you at the time, those reasons would have been appreciated as not being sufficient to make the change, or that a different solution would have been arrived at. But to say that there were no good reason is unfair to those who did spend the time thinking about the problem, and who thought the reasons they had found were indeed good ones. That particular issue was simply one of the best examples of what happens in a project when there are not enough eyes to provide feedback on its evolution: even with the best intentions, the few doing the work may make changes that might not have gone through with more input from others. But the alternative was to paralyze numpy completely, which I think would have been a worse outcome. I know that this particular issue grates you quite a bit, but I urge you to be fair in your appreciation of how it came to be: through the work of well-intentioned and thoughtful (but not omniscient) people when you weren't participating actively in numpy development. Cheers, f

Travis Oliphant

1:39 a.m.

On Jun 25, 2012, at 7:21 PM, Fernando Perez wrote:

...

On Mon, Jun 25, 2012 at 5:10 PM, Travis Oliphant <travis@continuum.io> wrote:

...
You are still missing the point that there was already a choice that was made in the previous class --- made in Numeric actually.

You made a change to that. It is the change that is 'gratuitous'.

As someone who played a role in that change (by talking with Chuck about it, I didn't do the actual hard work), I'd like to pitch in.

I think it's unfair to use the word gratuitous here, which is defined as:

Adjective:

Uncalled for; lacking good reason; unwarranted.

I appreciate your perspective, but I still think it's fair to use that word. I think it's been interpreted more broadly then I intended and in a different color than I intended. My use of the word is closer to "uncalled for" and "unwarranted" than an isolated "lacking good reason". I know very well that anything done in NumPy has a "good reason" because the people who participate in NumPy development are very bright and capable. For context, consider that for many years, the word "gratuitous" has been used in a non-derogatory way in the Python ecosystem to describe changes to semantics and syntax that don't have benefits significant enough to offset the pain it will cause to existing users. That's why I used the word. I am not trying to be derogatory. I am trying to be clear that we need to respect existing users of NumPy more than we have done from 1.5 to 1.7 in the enthusiasm to make changes. I will repeat my very simple argument: I think this particular change was uncalled for because now we will have 2 different conventions in NumPy for polynomial order coefficients. I understand it's a different API so code won't break which is a good thing --- but it will be a wart for NumPy as we explain for years to come about the different conventions in the same code-base. Working on the NumPy code base implies respecting the conventions that are already in place --- not just disregarding them and doing whatever we want. I'm not really sure why I have to argue the existing users point of view so much recently. I would hope that all of us would have the perspective that the people who have adopted NumPy deserve to be treated with respect. The changes that grate on me are the ones that seem to take lightly existing users of NumPy.

...

It is true that the change happened to not consider enough the reasons that existed for the previous state of affairs, but it's *not* true that there were no good reasons for it.

Of course not. I've tried to make the point very clearly that I understand the good reasons for it. I do understand them. 12-13 years ago, when the decisions were being made about current conventions, I would likely have been persuaded by them.

...

But to say that there were no good reason is unfair to those who did spend the time thinking about the problem, and who thought the reasons they had found were indeed good ones.

I did not ever say there were no good reasons. Please do not add energy to the idea that I'm dis-regarding the reasoning of those who thought about this. I'm not. I said the changes were uncalled for and unwarranted. I stand by that assessment. I do not mean any dis-respect to the people who made the changes. In that context, it's also useful to recognize how unfair it is to existing users to change conventions and ignore the work they have put in to understanding and using what is there. It's also useful to consider the unfairness of ignoring the thinking and work that went in to the existing conventions and APIs.

...

I know that this particular issue grates you quite a bit, but I urge you to be fair in your appreciation of how it came to be: through the work of well-intentioned and thoughtful (but not omniscient) people when you weren't participating actively in numpy development.

I'm trying very hard to be fair --- especially to changes like this. What grates me are changes that affect our user base in a negative way --- specifically by causing code that used to work to no longer work or create alterations to real conventions. This kind of change is just not acceptable if we can avoid it. I'm really trying to understand why others do not feel so strongly about this, but I'm not persuaded by what I've heard so far. Please note that I'm not trying to assign blame. I recognize the part that my failings and inadequacies have played in this (I continue to be willing to listen to others assessments of those inadequacies and failings and do my best to learn from them). I'm just trying to create a different context for future discussions about these sorts of things. I love the changes that add features and capability for our users. I have not called for ripping these particular changes out even though I would be much, much happier if there were a single convention for polynomial-coefficient order in NumPy. In fact, most of my messages have included references to how to incorporate such changes with as little impact as possible -- finding some way to reconcile things, perhaps by focusing attention away from such things (adding a keyword to the poly1d class, perhaps, to allow it to be called in reverse order). Best, -Travis

Fernando Perez

2:38 a.m.

On Mon, Jun 25, 2012 at 6:39 PM, Travis Oliphant <travis@continuum.io> wrote:

...

On Jun 25, 2012, at 7:21 PM, Fernando Perez wrote:

...

For context, consider that for many years, the word "gratuitous" has been used in a non-derogatory way in the Python ecosystem to describe changes to semantics and syntax that don't have benefits significant enough to offset the pain it will cause to existing users. That's why I used the word. I am not trying to be derogatory. I am trying to be clear that we need to respect existing users of NumPy more than we have done from 1.5 to 1.7 in the enthusiasm to make changes.

For reference, here's the (long) thread where this came to be: http://mail.scipy.org/pipermail/scipy-dev/2009-October/012958.html It's worth noting that at the time, the discussion was for an addition to *scipy*, not to numpy. I don't know when things were moved over to numpy.

...

Working on the NumPy code base implies respecting the conventions that are already in place --- not just disregarding them and doing whatever we want. I'm not really sure why I have to argue the existing users point of view so much recently. I would hope that all of us would have the perspective that the people who have adopted NumPy deserve to be treated with respect. The changes that grate on me are the ones that seem to take lightly existing users of NumPy.

I certainly appreciate the need to not break user habits/code, as we struggle with the very same issue in IPython all the time. And obviously at this point numpy is 'core infrastructure' enough that breaking backwards compatibility in any way should be very strongly discouraged (things were probably a bit different back in 2009).

...

...
I know that this particular issue grates you quite a bit, but I urge you to be fair in your appreciation of how it came to be: through the work of well-intentioned and thoughtful (but not omniscient) people when you weren't participating actively in numpy development.

I'm trying very hard to be fair --- especially to changes like this. What grates me are changes that affect our user base in a negative way --- specifically by causing code that used to work to no longer work or create alterations to real conventions. This kind of change is just not acceptable if we can avoid it. I'm really trying to understand why others do not feel so strongly about this, but I'm not persuaded by what I've heard so far.

I just want to note that I'm not advocating for *any* backwards-compatibility breakage in numpy at this point... I was just providing context for a discussion that happened back in 2009, and in the scipy list. I certainly feel pretty strongly at this point about the importance of preserving working code *today*, given the role of numpy at the 'root node' of the scipy ecosystem tree and the size of said tree. Best, f

Travis Oliphant

3:04 a.m.

On Jun 25, 2012, at 9:38 PM, Fernando Perez wrote:

...

On Mon, Jun 25, 2012 at 6:39 PM, Travis Oliphant <travis@continuum.io> wrote:

...
On Jun 25, 2012, at 7:21 PM, Fernando Perez wrote:

...
For context, consider that for many years, the word "gratuitous" has been used in a non-derogatory way in the Python ecosystem to describe changes to semantics and syntax that don't have benefits significant enough to offset the pain it will cause to existing users. That's why I used the word. I am not trying to be derogatory. I am trying to be clear that we need to respect existing users of NumPy more than we have done from 1.5 to 1.7 in the enthusiasm to make changes.

For reference, here's the (long) thread where this came to be:

http://mail.scipy.org/pipermail/scipy-dev/2009-October/012958.html

It's worth noting that at the time, the discussion was for an addition to *scipy*, not to numpy. I don't know when things were moved over to numpy.

Yes, it's also worth noting the discussion took place on the SciPy list. The fact that NumPy decisions were made on the SciPy mailing list is not a pattern we should repeat. While the two communities have overlap, they are not the same. It is important to remind ourselves of this (especially those of us who feel at home in both). From that thread, I wish that ideas of Anne and David had been listened to instead of just dismissed out of hand, like was done. Anne suggested putting the polynomial class in SciPy (where there would have been less consternation about the coefficient order change --- although many seem to really want to ingore the entire Controls and LTI-system communities where the other convention is common). David suggested allowing both orders to be specified. That is still a good idea in my view. Thanks for doing the research to bring the thread up again.

...

I just want to note that I'm not advocating for *any* backwards-compatibility breakage in numpy at this point... I was just providing context for a discussion that happened back in 2009, and in the scipy list. I certainly feel pretty strongly at this point about the importance of preserving working code *today*, given the role of numpy at the 'root node' of the scipy ecosystem tree and the size of said tree.

Thank you for re-iterating that position. The polynomial order question is moot at this point. It's not going to change. We just need to also keep maintaining poly1d's interface. -Travis

...

Best,

f _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Ondřej Čertík

3:10 a.m.

On Mon, Jun 25, 2012 at 7:38 PM, Fernando Perez <fperez.net@gmail.com> wrote:

...

On Mon, Jun 25, 2012 at 6:39 PM, Travis Oliphant <travis@continuum.io> wrote:

...
On Jun 25, 2012, at 7:21 PM, Fernando Perez wrote:

...
For context, consider that for many years, the word "gratuitous" has been used in a non-derogatory way in the Python ecosystem to describe changes to semantics and syntax that don't have benefits significant enough to offset the pain it will cause to existing users. That's why I used the word. I am not trying to be derogatory. I am trying to be clear that we need to respect existing users of NumPy more than we have done from 1.5 to 1.7 in the enthusiasm to make changes.

For reference, here's the (long) thread where this came to be:

http://mail.scipy.org/pipermail/scipy-dev/2009-October/012958.html

It's worth noting that at the time, the discussion was for an addition to *scipy*, not to numpy. I don't know when things were moved over to numpy.

...
Working on the NumPy code base implies respecting the conventions that are already in place --- not just disregarding them and doing whatever we want. I'm not really sure why I have to argue the existing users point of view so much recently. I would hope that all of us would have the perspective that the people who have adopted NumPy deserve to be treated with respect. The changes that grate on me are the ones that seem to take lightly existing users of NumPy.

I certainly appreciate the need to not break user habits/code, as we struggle with the very same issue in IPython all the time. And obviously at this point numpy is 'core infrastructure' enough that breaking backwards compatibility in any way should be very strongly discouraged (things were probably a bit different back in 2009).

...
...
I know that this particular issue grates you quite a bit, but I urge you to be fair in your appreciation of how it came to be: through the work of well-intentioned and thoughtful (but not omniscient) people when you weren't participating actively in numpy development.

I'm trying very hard to be fair --- especially to changes like this. What grates me are changes that affect our user base in a negative way --- specifically by causing code that used to work to no longer work or create alterations to real conventions. This kind of change is just not acceptable if we can avoid it. I'm really trying to understand why others do not feel so strongly about this, but I'm not persuaded by what I've heard so far.

I just want to note that I'm not advocating for *any* backwards-compatibility breakage in numpy at this point... I was just providing context for a discussion that happened back in 2009, and in the scipy list. I certainly feel pretty strongly at this point about the importance of preserving working code *today*, given the role of numpy at the 'root node' of the scipy ecosystem tree and the size of said tree.

I think that everybody strongly agrees that backward incompatible changes should not be made. Sometimes it can be more subtle, see for example this numpy bug report in Debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=589835 and read the dozens of emails that it generated, e.g. http://lists.debian.org/debian-python/2010/07/msg00048.html, and so on. I've been hit by this problem too, that's why I remember it -- suddenly many packages that depend on NumPy stopped working in a subtle way and I had to spent hours figuring out what went wrong and that the problem is not in h5py, but actually that NumPy has changed its ABI, or more precisely the problem is described here (some new members were added to a C datastructure): http://lists.debian.org/debian-python/2010/07/msg00045.html I am sure that this ABI change had to be done and there were good reasons for it and this particular change probably even couldn't have been avoided. But nevertheless it has caused headaches to a lot of people downstream. I just looked into the release notes for NumPy 1.4.0 and didn't find this change nor how to fix it in there. I am just posting this as a particular, concrete, real life example of consequences for the end users. My understanding is that Travis is simply trying to stress "We have to think about the implications of our changes on existing users." and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well. But sometimes I guess mistakes are made anyway. What can be done to avoid similar issues like with the polynomial order in the future? Ondrej

josef.pktd＠gmail.com

3:22 a.m.

On Mon, Jun 25, 2012 at 11:10 PM, Ondřej Čertík <ondrej.certik@gmail.com> wrote:

...

On Mon, Jun 25, 2012 at 7:38 PM, Fernando Perez <fperez.net@gmail.com> wrote:

...
On Mon, Jun 25, 2012 at 6:39 PM, Travis Oliphant <travis@continuum.io> wrote:

...
On Jun 25, 2012, at 7:21 PM, Fernando Perez wrote:

...
For context, consider that for many years, the word "gratuitous" has been used in a non-derogatory way in the Python ecosystem to describe changes to semantics and syntax that don't have benefits significant enough to offset the pain it will cause to existing users. That's why I used the word. I am not trying to be derogatory. I am trying to be clear that we need to respect existing users of NumPy more than we have done from 1.5 to 1.7 in the enthusiasm to make changes.

For reference, here's the (long) thread where this came to be:

http://mail.scipy.org/pipermail/scipy-dev/2009-October/012958.html

It's worth noting that at the time, the discussion was for an addition to *scipy*, not to numpy. I don't know when things were moved over to numpy.

...
Working on the NumPy code base implies respecting the conventions that are already in place --- not just disregarding them and doing whatever we want. I'm not really sure why I have to argue the existing users point of view so much recently. I would hope that all of us would have the perspective that the people who have adopted NumPy deserve to be treated with respect. The changes that grate on me are the ones that seem to take lightly existing users of NumPy.

I certainly appreciate the need to not break user habits/code, as we struggle with the very same issue in IPython all the time. And obviously at this point numpy is 'core infrastructure' enough that breaking backwards compatibility in any way should be very strongly discouraged (things were probably a bit different back in 2009).

...
...
I know that this particular issue grates you quite a bit, but I urge you to be fair in your appreciation of how it came to be: through the work of well-intentioned and thoughtful (but not omniscient) people when you weren't participating actively in numpy development.

I'm trying very hard to be fair --- especially to changes like this. What grates me are changes that affect our user base in a negative way --- specifically by causing code that used to work to no longer work or create alterations to real conventions. This kind of change is just not acceptable if we can avoid it. I'm really trying to understand why others do not feel so strongly about this, but I'm not persuaded by what I've heard so far.

I just want to note that I'm not advocating for *any* backwards-compatibility breakage in numpy at this point... I was just providing context for a discussion that happened back in 2009, and in the scipy list. I certainly feel pretty strongly at this point about the importance of preserving working code *today*, given the role of numpy at the 'root node' of the scipy ecosystem tree and the size of said tree.

I think that everybody strongly agrees that backward incompatible changes should not be made.

Sometimes it can be more subtle, see for example this numpy bug report in Debian:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=589835

and read the dozens of emails that it generated, e.g. http://lists.debian.org/debian-python/2010/07/msg00048.html, and so on. I've been hit by this problem too, that's why I remember it -- suddenly many packages that depend on NumPy stopped working in a subtle way and I had to spent hours figuring out what went wrong and that the problem is not in h5py, but actually that NumPy has changed its ABI, or more precisely the problem is described here (some new members were added to a C datastructure): http://lists.debian.org/debian-python/2010/07/msg00045.html I am sure that this ABI change had to be done and there were good reasons for it and this particular change probably even couldn't have been avoided. But nevertheless it has caused headaches to a lot of people downstream. I just looked into the release notes for NumPy 1.4.0 and didn't find this change nor how to fix it in there. I am just posting this as a particular, concrete, real life example of consequences for the end users.

My understanding is that Travis is simply trying to stress "We have to think about the implications of our changes on existing users." and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well.

That's not the case that Travi's has in mind. This was an ABI break, not API break. It took quite some time and an 1.4.1 to recover from it. Although there were some indication of the ABI break before the 1.4.0 release, it was only found out after the release (as byproduct of datetime). Many packages on windows were never available for 1.4.0 because not many package developers wanted to recompile for 1.4.0, (like h5py) Josef

...

But sometimes I guess mistakes are made anyway. What can be done to avoid similar issues like with the polynomial order in the future?

Ondrej _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Travis Oliphant

3:33 a.m.

...

...
I just want to note that I'm not advocating for *any* backwards-compatibility breakage in numpy at this point... I was just providing context for a discussion that happened back in 2009, and in the scipy list. I certainly feel pretty strongly at this point about the importance of preserving working code *today*, given the role of numpy at the 'root node' of the scipy ecosystem tree and the size of said tree.

I think that everybody strongly agrees that backward incompatible changes should not be made.

...

Sometimes it can be more subtle, see for example this numpy bug report in Debian:

There are a lot of subtleties and different users and different expectations. It does make it difficult to know the best course of action. I appreciate the perspective of as many people as possible --- especially those who have managed code bases with a large number of users. What should have happened in this case, in my mind, is that NumPy 1.4.0 should have been 1.5.0 and advertised that there was a break in the ABI and that all extensions would have to be re-built against the new version. This would have been some pain for one class of users (primarily package maintainers) and no pain for another class. There was no API breakage. We just needed to communicate clearly. Because we guessed wrongly that the changes made did not change the ABI, we did not communicate clearly during the release. This was a mistake. I was a large part of that mistake. I also understand the impact that the unsolved packaging problem in the Python community has created (at least for non-academic users and HPC users). Some take this example as you can't change the ABI. That's not quite my perspective for what it's worth. I don't think you should have a habit of changing the ABI (because it does create some hassle for downstream users), but especially today when there are many pre-packaged distributions of Python, occassional changes that require a re-compile of downstream dependencies does not constitute the kind of breakage I'm talking about. The kind of breakage I'm talking about is the kind that causes code that used to work to stop working (either because it won't compile against the new headers) or because the behavior of operations changes in subtle ways. Both kinds of changes have happened between 1.5.x and 1.7.x. Some believe these changes are inconsequential. I hope they are right. I don't believe we have enough data to make that call, and there is some evidence I am aware of from people in other organizations that their are changes that will make upgrading difficult for people --- much more difficult than an ABI breakage would have been. You can change things. You just have to be cautious and more careful. It's definitely more painful. Changes that will require *any* work by a user of NumPy other than a re-compile of their code should only be on major version numbers, preferably have a backward-compatible header to use, and a document that describes all the changes that must be made to move the code forward. I'm not trying to throw stones. My glass house and my own sins would not justify such behavior. I apologize if it has come off that way at any time. Best, -Travis

Ralf Gommers

5:25 a.m.

Travis, apologies in advance if the tone of this message is too strong - please take it as a sign of how frustrating I find the discussion on this point. On Tue, Jun 26, 2012 at 5:33 AM, Travis Oliphant <travis@continuum.io>wrote: ...

...

What should have happened in this case, in my mind, is that NumPy 1.4.0 should have been 1.5.0 and advertised that there was a break in the ABI and that all extensions would have to be re-built against the new version. This would have been some pain for one class of users (primarily package maintainers) and no pain for another class.

Please please stop asserting this. It's plain wrong. It has been explained to you multiple times by multiple people how bad the consequences of breaking the ABI are. It leads to random segfaults when existing installers are not updated or when users pick the wrong installer by accident (which undoubtedly some will). It also leads to a large increase in the number of installers that maintainers for every single package that depends on numpy will have to build. Including for releases they've already made in the past. The assertion that users nowadays mainly use bundles like EPD or package managers is also extremely pointless. Last week NumPy had over 7000 downloads on SF alone; the cumulative total stands at almost 1.7 million. If even 0.1% of those downloads are of the wrong binary, that's 7 users *every week* with a very serious problem. API breakage is also bad, and I'm not going to argue here about which kind of breakage is worse. What I will point out though is that we now have datetime merged back in while keeping ABI compatibility, thanks to Mark's efforts. That shows it's hardly ever really necessary to break the ABI. Finally, it has been agreed several times on this list to not break the ABI for minor releases, period. Let's please stick to that decision. Best regards, Ralf

Fernando Perez

5:59 a.m.

On Tue, Jun 26, 2012 at 10:25 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:

...

On Tue, Jun 26, 2012 at 5:33 AM, Travis Oliphant <travis@continuum.io> wrote: ...

...
What should have happened in this case, in my mind, is that NumPy 1.4.0 should have been 1.5.0 and advertised that there was a break in the ABI and that all extensions would have to be re-built against the new version. This would have been some pain for one class of users (primarily package maintainers) and no pain for another class.

Please please stop asserting this. It's plain wrong. It has been explained to you multiple times by multiple people how bad the consequences of breaking the ABI are. It leads to random segfaults when existing installers are not updated or when users pick the wrong installer by accident (which undoubtedly some will). It also leads to a large increase in the number of installers that maintainers for every single package that depends on numpy will have to build. Including for releases they've already made in the past.

An additional perspective on the issue of ABI breakage: even for those of us who live in a distro-managed universe (ubuntu in my case), the moment numpy breaks ABI means that it becomes *much* harder to use the new numpy because I'd have to start recompiling all binary dependencies, some of which are not pleasant to start rebuilding (such as VTK for mayavi). So that means I'm much less likely to use an ABI-incompatible numpy for everyday work, and therefore less likely to find bugs, report them, etc. I typically run dev versions of numpy, scipy and matplotlib all the time, except when numpy breaks ABI, which means I have to 'pin' numpy to the system one and only update the others. Now, obviously that doesn't mean that ABI can never be broken, but it's just another data point for you as you evaluate the cost of ABI breakage. It is significant even for those who operate under the benefit of managed packages, because numpy is effectively the root node of the dependency tree for virtually all scientific python packages. I hope this is useful as additional data on the issue. Cheers, f

Travis Oliphant

6:02 a.m.

I do understand the issues around ABI breakage. I just want to speak up for the people who are affected by API breakage who are not as vocal on this list. I believe we should have similar frustration and concern at talk of API breakage as there is about talk of ABI breakage. -Travis On Jun 27, 2012, at 12:59 AM, Fernando Perez wrote:

...

On Tue, Jun 26, 2012 at 10:25 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:

...
On Tue, Jun 26, 2012 at 5:33 AM, Travis Oliphant <travis@continuum.io> wrote: ...

...
What should have happened in this case, in my mind, is that NumPy 1.4.0 should have been 1.5.0 and advertised that there was a break in the ABI and that all extensions would have to be re-built against the new version. This would have been some pain for one class of users (primarily package maintainers) and no pain for another class.

Please please stop asserting this. It's plain wrong. It has been explained to you multiple times by multiple people how bad the consequences of breaking the ABI are. It leads to random segfaults when existing installers are not updated or when users pick the wrong installer by accident (which undoubtedly some will). It also leads to a large increase in the number of installers that maintainers for every single package that depends on numpy will have to build. Including for releases they've already made in the past.

An additional perspective on the issue of ABI breakage: even for those of us who live in a distro-managed universe (ubuntu in my case), the moment numpy breaks ABI means that it becomes *much* harder to use the new numpy because I'd have to start recompiling all binary dependencies, some of which are not pleasant to start rebuilding (such as VTK for mayavi). So that means I'm much less likely to use an ABI-incompatible numpy for everyday work, and therefore less likely to find bugs, report them, etc. I typically run dev versions of numpy, scipy and matplotlib all the time, except when numpy breaks ABI, which means I have to 'pin' numpy to the system one and only update the others.

Now, obviously that doesn't mean that ABI can never be broken, but it's just another data point for you as you evaluate the cost of ABI breakage. It is significant even for those who operate under the benefit of managed packages, because numpy is effectively the root node of the dependency tree for virtually all scientific python packages.

I hope this is useful as additional data on the issue.

Cheers,

f _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Fernando Perez

6:18 a.m.

On Tue, Jun 26, 2012 at 11:02 PM, Travis Oliphant <travis@continuum.io> wrote:

...

I just want to speak up for the people who are affected by API breakage who are not as vocal on this list.

Certainly! And indeed I bet you that's a community underrepresented here: those of us who are on this list are likely to be up to speed on what's happening with the API and can therefore adjust to changes quickly, simply because we know they have occurred. Random J. User who gets an upstream update and all of a sudden finds previously working code to break is unlikely to be active here and will be very, very unhappy. If anything, the lesson is: for a project that's so deep in the dependency tree as numpy is, A{P,B}I stability is a paramount concern, with a cost that gets higher the more successful the project is. This means AXIs should evolve only in backwards-compatible ways when at all possible, with backwards-compatibility being broken only in: - clearly designated points that are agreed upon by as many as possible - with clear explanations of how old codes need to be adapted to the new interface to continue working - if at all possible with advance warnings, and even better, a system for 'future' loading. Python in fact has the __future__ imports that help quite a bit for people to start adapting their codes. How about creating a numpy.future module where new, non-backward-compatible APIs could go? That would give the adventurous a way to play with new features (hence getting them better tested) as well as an easier path for gradual migration to the new features by everyone. This may have already been discussed before, forgive me if I'm repeating well-known material. Cheers, f

Travis Oliphant

12:50 p.m.

On Jun 27, 2012, at 1:18 AM, Fernando Perez wrote:

...

On Tue, Jun 26, 2012 at 11:02 PM, Travis Oliphant <travis@continuum.io> wrote:

...
I just want to speak up for the people who are affected by API breakage who are not as vocal on this list.

Certainly! And indeed I bet you that's a community underrepresented here: those of us who are on this list are likely to be up to speed on what's happening with the API and can therefore adjust to changes quickly, simply because we know they have occurred. Random J. User who gets an upstream update and all of a sudden finds previously working code to break is unlikely to be active here and will be very, very unhappy

...

If anything, the lesson is: for a project that's so deep in the dependency tree as numpy is, A{P,B}I stability is a paramount concern, with a cost that gets higher the more successful the project is. This means AXIs should evolve only in backwards-compatible ways when at all possible, with backwards-compatibility being broken only in:

...

- clearly designated points that are agreed upon by as many as possible - with clear explanations of how old codes need to be adapted to the new interface to continue working - if at all possible with advance warnings, and even better, a system for 'future' loading.

This is a good reminder. I agree with your views here. I've not been able to communicate very well my attitudes on this and I've been saddened at how eager some seem to pick apart my words to find problems with them. My discussion about the ABI and API breakage should not be taken as an assertion that I don't recognize that ABI breakage is bad and has consequences. I'm a little surprised that people assume I haven't been listening or paying attention or something. But, I recognize that I don't always communicate clearly enough. I do understand the consequences of ABI breakage. I also understand the pain involved. I have no plans to break the ABI. There is a certain group who is affected by ABI breakage and another group *more* affected by API breakage. It feels like this list is particularly populated with people who feel pain by ABI breakage whereas the people who feel pain with API breakage are not as vocal, don' t track this list, etc. But, their stories are just as compelling to me. I understand the pain they feel as well when the NumPy API breaks. It's just as important that we take them into consideration. That's my only point. Right now, though, arguing over the relative importance of ABI or API breakage is moot. I was simply pointing out my perspective that I think a single ABI breakage in 1.5.0 would have been better than the API and use-case breakages that have been reported (I know these are only very weakly correlated so it's just an analogy). If you disagree with me, that's fine. Just understand that any frustration you feel about the thought of ABI breakage is the same as the frustration I feel about changes that cause working code to break for people. I also understand that it's not quite the same thing because the phrase "changes that cause working code to break" is too strong. Some code that "works" has "work-arounds and hacks" and assumptions about APIs. In other word, it is possible that some NumPy-dependent code out there works "accidentally". Of course, what is a "hack" or an "accidental" usage is not at all clear. I can't define it. It takes judgment to make a decision. This judgment requires an awareness of the "intention of the original" code, how big the user-base is of the group that is making the "hack". How difficult it is to remedy the situation, etc. These are hard problems. I don't claim to understand how to solve all of them. I don't claim that I won't make serious mistakes. All I can do is offer my experience, my awareness of the code history (including the code history of Numeric and Numarray), and my interactions with many downstream users. We need good judgment from as many NumPy developers as possible. That judgment must be colored with empathy for as many users of NumPy as possible. Best, -Travis

...

Python in fact has the __future__ imports that help quite a bit for people to start adapting their codes. How about creating a numpy.future module where new, non-backward-compatible APIs could go? That would give the adventurous a way to play with new features (hence getting them better tested) as well as an easier path for gradual migration to the new features by everyone.

This may have already been discussed before, forgive me if I'm repeating well-known material.

This is a

...

Cheers,

f _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Fernando Perez

8:30 p.m.

On Thu, Jun 28, 2012 at 5:50 AM, Travis Oliphant <travis@continuum.io> wrote:

...

...
Python in fact has the __future__ imports that help quite a bit for people to start adapting their codes. How about creating a numpy.future module where new, non-backward-compatible APIs could go? That would give the adventurous a way to play with new features (hence getting them better tested) as well as an easier path for gradual migration to the new features by everyone.

This may have already been discussed before, forgive me if I'm repeating well-known material.

This is a

Did you mean to finish a sentence here and hit 'send' earlier than planned? :) Cheers, f

David Cournapeau

3:35 a.m.

On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertík <ondrej.certik@gmail.com> wrote:

...

My understanding is that Travis is simply trying to stress "We have to think about the implications of our changes on existing users." and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well.

I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC). The relative importance of API vs ABI is a tough one: I think ABI breakage is as bad as API breakage (but matter in different circumstances), but it is hard to improve the situation around our ABI without changing the API (especially everything around macros and publicly accessible structures). Changing this is politically difficult because nobody will upgrade to a new numpy with a different API just because it is cleaner, but without a cleaner API, it will be difficult to implement quite a few improvements. The situation is not that different form python 3, which has seen a poor adoption, and only starts having interesting feature on its own now. As for more concrete actions: I believe Wes McKinney has a comprehensive suite with multiple versions of numpy/pandas, I can't seem to find where that was mentioned, though. This would be a good starting point to check ABI matters (say pandas, mpl, scipy on top of multiple numpy). David

Ondřej Čertík

3:42 a.m.

On Mon, Jun 25, 2012 at 8:35 PM, David Cournapeau <cournape@gmail.com> wrote:

...

On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertík <ondrej.certik@gmail.com> wrote:

...
My understanding is that Travis is simply trying to stress "We have to think about the implications of our changes on existing users." and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well.

I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC).

The relative importance of API vs ABI is a tough one: I think ABI breakage is as bad as API breakage (but matter in different circumstances), but it is hard to improve the situation around our ABI without changing the API (especially everything around macros and publicly accessible structures). Changing this is politically difficult because nobody will upgrade to a new numpy with a different API just because it is cleaner, but without a cleaner API, it will be difficult to implement quite a few improvements. The situation is not that different form python 3, which has seen a poor adoption, and only starts having interesting feature on its own now.

As for more concrete actions: I believe Wes McKinney has a comprehensive suite with multiple versions of numpy/pandas, I can't seem to find where that was mentioned, though. This would be a good starting point to check ABI matters (say pandas, mpl, scipy on top of multiple numpy).

I will try to check as many packages as I can to see what actual problems arise. I have created an issue for it: https://github.com/numpy/numpy/issues/319 Feel free to add more packages that you feel are important. I will try to check at least the ones that are in the issue, and more if I have time. I will close the issue once the upgrade path is clearly documented in the release for every thing that breaks. Ondrej

David Cournapeau

4:12 a.m.

On Tue, Jun 26, 2012 at 4:42 AM, Ondřej Čertík <ondrej.certik@gmail.com> wrote:

...

On Mon, Jun 25, 2012 at 8:35 PM, David Cournapeau <cournape@gmail.com> wrote:

...
On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertík <ondrej.certik@gmail.com> wrote:

...
My understanding is that Travis is simply trying to stress "We have to think about the implications of our changes on existing users." and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well.

I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC).

The relative importance of API vs ABI is a tough one: I think ABI breakage is as bad as API breakage (but matter in different circumstances), but it is hard to improve the situation around our ABI without changing the API (especially everything around macros and publicly accessible structures). Changing this is politically difficult because nobody will upgrade to a new numpy with a different API just because it is cleaner, but without a cleaner API, it will be difficult to implement quite a few improvements. The situation is not that different form python 3, which has seen a poor adoption, and only starts having interesting feature on its own now.

As for more concrete actions: I believe Wes McKinney has a comprehensive suite with multiple versions of numpy/pandas, I can't seem to find where that was mentioned, though. This would be a good starting point to check ABI matters (say pandas, mpl, scipy on top of multiple numpy).

I will try to check as many packages as I can to see what actual problems arise. I have created an issue for it:

https://github.com/numpy/numpy/issues/319

Feel free to add more packages that you feel are important. I will try to check at least the ones that are in the issue, and more if I have time. I will close the issue once the upgrade path is clearly documented in the release for every thing that breaks.

I believe the basis can be 1.4.1 against which we build different packages, and then test each new version. There are also tools to check ABI compatibility (e.g. http://ispras.linuxbase.org/index.php/ABI_compliance_checker), but I have never used them. Being able to tell when a version of numpy breaks ABI would already be a good improvement. David

Travis Oliphant

4:17 a.m.

On Jun 25, 2012, at 10:35 PM, David Cournapeau wrote:

...

On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertík <ondrej.certik@gmail.com> wrote:

...
My understanding is that Travis is simply trying to stress "We have to think about the implications of our changes on existing users." and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well.

I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC).

In the present climate, I'm going to have to provide additional context to a comment like this. This is not an accurate enough characterization of events. I was trying to get date-time changes in, for sure. I generally like feature additions to NumPy. (Robert Kern was also involved with that effort and it was funded by an active user of NumPy. I was concerned that the changes would break the ABI. In fact, I expected them to --- I was not against such changes, even though it was a change in previously discussed policy. We just needed to advertise them widely. Other voices, prevailed, however, and someone else believed the changes would not break ABI compatibility. Unfortunately, I did not have much time to look into the matter as I was working full time on other things. If I had had my way we would have released NumPy 1.5 at the time and widely advertised the ABI breakage (and moved at the same time to a design that would have made it easier to upgrade without breaking the ABI). I do not believe it would have been that big of a deal as long as we communicated correctly about the release. I still don't think it's correct to be overly concerned about ABI breakage in a world where packages can just be re-compiled against the new version in a matter of minutes with one hand and with the other make changes to the code base that change existing code behavior. I think the fact that the latter has occurred is evidence that we have to sacrifice one of them. And ABI compatibility is the preferred one to sacrifice by a long stretch in my view. -Travis

David Cournapeau

4:43 a.m.

On Tue, Jun 26, 2012 at 5:17 AM, Travis Oliphant <travis@continuum.io> wrote:

...

On Jun 25, 2012, at 10:35 PM, David Cournapeau wrote:

...
On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertík <ondrej.certik@gmail.com> wrote:

...
My understanding is that Travis is simply trying to stress "We have to think about the implications of our changes on existing users." and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well.

I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC).

In the present climate, I'm going to have to provide additional context to a comment like this. This is not an accurate enough characterization of events. I was trying to get date-time changes in, for sure. I generally like feature additions to NumPy. (Robert Kern was also involved with that effort and it was funded by an active user of NumPy. I was concerned that the changes would break the ABI.

I did not mean to go back at old history, sorry. My main point was to highlight ABI vs API issues. Numpy needs to decide whether it attempts to keep ABI or not. We already had this discussion 2 years ago (for the issue mentioned by Ondrej), and the decision was not made. The arguments and their value did not really change. The issue is thus that a decision needs to be made over that disagreement in one way or the other. David

Travis Oliphant

4:48 a.m.

...

...
In the present climate, I'm going to have to provide additional context to a comment like this. This is not an accurate enough characterization of events. I was trying to get date-time changes in, for sure. I generally like feature additions to NumPy. (Robert Kern was also involved with that effort and it was funded by an active user of NumPy. I was concerned that the changes would break the ABI.

I did not mean to go back at old history, sorry. My main point was to highlight ABI vs API issues. Numpy needs to decide whether it attempts to keep ABI or not. We already had this discussion 2 years ago (for the issue mentioned by Ondrej), and the decision was not made. The arguments and their value did not really change. The issue is thus that a decision needs to be made over that disagreement in one way or the other.

Thank you for clarifying and for being willing to look to the future. I agree a decision needs to be made. I think we will need to break the ABI. At this point, I don't know of any pressing features that would require it short of NumPy 2.0. -Travis

...

David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Fernando Perez

5:09 a.m.

On Mon, Jun 25, 2012 at 9:48 PM, Travis Oliphant <travis@continuum.io> wrote:

...

I agree a decision needs to be made. I think we will need to break the ABI. At this point, I don't know of any pressing features that would require it short of NumPy 2.0.

Sorry, I don't quite know how to parse the above, do you mean: 1. We will need to break ABI in the upcoming 1.7 release or 2. We will need to be more willing to accept ABI breakages in .Y releases (in X.Y convention) Just curious... Cheers, f

Travis Oliphant

5:20 a.m.

On Jun 26, 2012, at 12:09 AM, Fernando Perez wrote:

...

On Mon, Jun 25, 2012 at 9:48 PM, Travis Oliphant <travis@continuum.io> wrote:

...
I agree a decision needs to be made. I think we will need to break the ABI. At this point, I don't know of any pressing features that would require it short of NumPy 2.0.

Sorry, I don't quite know how to parse the above, do you mean:

1. We will need to break ABI in the upcoming 1.7 release

or

2. We will need to be more willing to accept ABI breakages in .Y releases (in X.Y convention)

Eventually we will need to break the ABI. We might as well wait until 2.0 at this point. -Travis

Fernando Perez

5:40 a.m.

On Mon, Jun 25, 2012 at 10:20 PM, Travis Oliphant <travis@continuum.io> wrote:

...

Eventually we will need to break the ABI. We might as well wait until 2.0 at this point.

Ah, got it; thanks for the clarification, I just didn't understand the original. Cheers, f

Dag Sverre Seljebotn

9:27 a.m.

On 06/26/2012 05:35 AM, David Cournapeau wrote:

...

On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertík<ondrej.certik@gmail.com> wrote:

...
My understanding is that Travis is simply trying to stress "We have to think about the implications of our changes on existing users." and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well.

I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC).

The relative importance of API vs ABI is a tough one: I think ABI breakage is as bad as API breakage (but matter in different circumstances), but it is hard to improve the situation around our ABI without changing the API (especially everything around macros and publicly accessible structures). Changing this is politically

But I think it is *possible* to get to a situation where ABI isn't broken without changing API. I have posted such a proposal. If one uses the kind of C-level duck typing I describe in the link below, one would do typedef PyObject PyArrayObject; typedef struct { ... } NumPyArray; /* used to be PyArrayObject */ Thus, a ABI-hiding PyArray_SHAPE function could take either a PyArrayObject* or a PyObject*, since they would be the same. http://thread.gmane.org/gmane.comp.python.numeric.general/49997 (The technical parts are a bit out of date; me and Robert Bradshaw are in the 4th iteration of that concept for use within Cython, we are now hovering around perfect-hashing lookup tables that have 1ns branch-miss-free lookups and uses ~20us for construction/initialization). Dag

David Cournapeau

9:58 a.m.

On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:

...

On 06/26/2012 05:35 AM, David Cournapeau wrote:

...
On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertík<ondrej.certik@gmail.com> wrote:

...
My understanding is that Travis is simply trying to stress "We have to think about the implications of our changes on existing users." and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well.

I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC).

The relative importance of API vs ABI is a tough one: I think ABI breakage is as bad as API breakage (but matter in different circumstances), but it is hard to improve the situation around our ABI without changing the API (especially everything around macros and publicly accessible structures). Changing this is politically

But I think it is *possible* to get to a situation where ABI isn't broken without changing API. I have posted such a proposal. If one uses the kind of C-level duck typing I describe in the link below, one would do

typedef PyObject PyArrayObject;

typedef struct { ... } NumPyArray; /* used to be PyArrayObject */

Maybe we're just in violent agreement, but whatever ends up being used would require to change the *current* C API, right ? If one wants to allow for changes in our structures more freely, we have to hide them from the headers, which means breaking the code that depends on the structure binary layout. Any code that access those directly will need to be changed. There is the particular issue of iterator, which seem quite difficult to make "ABI-safe" without losing significant performance. cheers, David

Dag Sverre Seljebotn

10:41 a.m.

On 06/26/2012 11:58 AM, David Cournapeau wrote:

...

On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:

...
On 06/26/2012 05:35 AM, David Cournapeau wrote:

...
On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertík<ondrej.certik@gmail.com> wrote:

...
My understanding is that Travis is simply trying to stress "We have to think about the implications of our changes on existing users." and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well.

I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC).

The relative importance of API vs ABI is a tough one: I think ABI breakage is as bad as API breakage (but matter in different circumstances), but it is hard to improve the situation around our ABI without changing the API (especially everything around macros and publicly accessible structures). Changing this is politically

But I think it is *possible* to get to a situation where ABI isn't broken without changing API. I have posted such a proposal. If one uses the kind of C-level duck typing I describe in the link below, one would do

typedef PyObject PyArrayObject;

typedef struct { ... } NumPyArray; /* used to be PyArrayObject */

Maybe we're just in violent agreement, but whatever ends up being used would require to change the *current* C API, right ? If one wants to

Accessing arr->dims[i] directly would need to change. But that's been discouraged for a long time. By "API" I meant access through the macros. One of the changes under discussion here is to change PyArray_SHAPE from a macro that accepts both PyObject* and PyArrayObject* to a function that only accepts PyArrayObject* (hence breakage). I'm saying that under my proposal, assuming I or somebody else can find the time to implement it under, you can both make it a function and have it accept both PyObject* and PyArrayObject* (since they are the same), undoing the breakage but allowing to hide the ABI. (It doesn't give you full flexibility in ABI, it does require that you somewhere have an "npy_intp dims[nd]" with the same lifetime as your object, etc., but I don't consider that a big disadvantage).

...

allow for changes in our structures more freely, we have to hide them from the headers, which means breaking the code that depends on the structure binary layout. Any code that access those directly will need to be changed.

There is the particular issue of iterator, which seem quite difficult to make "ABI-safe" without losing significant performance.

I don't agree (for some meanings of "ABI-safe"). You can export the data (dataptr/shape/strides) through the ABI, then the iterator uses these in whatever way it wishes consumer-side. Sort of like PEP 3118 without the performance degradation. The only sane way IMO of doing iteration is building it into the consumer anyway. I didn't think about whether API breakage would be needed for iterators though, that may be the case, I just didn't look at it yet. Dag

Charles R Harris

2 p.m.

On Mon, Jun 25, 2012 at 9:10 PM, Ondřej Čertík <ondrej.certik@gmail.com>wrote:

...

...
On Mon, Jun 25, 2012 at 6:39 PM, Travis Oliphant <travis@continuum.io> wrote:

...
On Jun 25, 2012, at 7:21 PM, Fernando Perez wrote:

...
For context, consider that for many years, the word "gratuitous" has

been used in a non-derogatory way in the Python ecosystem to describe changes to semantics and syntax that don't have benefits significant enough to offset the pain it will cause to existing users. That's why I used

...
...
For reference, here's the (long) thread where this came to be:

http://mail.scipy.org/pipermail/scipy-dev/2009-October/012958.html

It's worth noting that at the time, the discussion was for an addition to *scipy*, not to numpy. I don't know when things were moved over to numpy.

...
Working on the NumPy code base implies respecting the conventions that are already in place --- not just disregarding them and doing whatever we want. I'm not really sure why I have to argue the existing users point of view so much recently. I would hope that all of us would have the

On Mon, Jun 25, 2012 at 7:38 PM, Fernando Perez <fperez.net@gmail.com> wrote: the word. I am not trying to be derogatory. I am trying to be clear that we need to respect existing users of NumPy more than we have done from 1.5 to 1.7 in the enthusiasm to make changes. perspective that the people who have adopted NumPy deserve to be treated with respect. The changes that grate on me are the ones that seem to take lightly existing users of NumPy.

...
...
I certainly appreciate the need to not break user habits/code, as we struggle with the very same issue in IPython all the time. And obviously at this point numpy is 'core infrastructure' enough that breaking backwards compatibility in any way should be very strongly discouraged (things were probably a bit different back in 2009).

...
...
I know that this particular issue grates you quite a bit, but I urge you to be fair in your appreciation of how it came to be: through the work of well-intentioned and thoughtful (but not omniscient) people when you weren't participating actively in numpy development.

I'm trying very hard to be fair --- especially to changes like this. What grates me are changes that affect our user base in a negative way --- specifically by causing code that used to work to no longer work or create alterations to real conventions. This kind of change is just not acceptable if we can avoid it. I'm really trying to understand why others do not feel so strongly about this, but I'm not persuaded by what I've heard so far.

I just want to note that I'm not advocating for *any* backwards-compatibility breakage in numpy at this point... I was just providing context for a discussion that happened back in 2009, and in the scipy list. I certainly feel pretty strongly at this point about the importance of preserving working code *today*, given the role of numpy at the 'root node' of the scipy ecosystem tree and the size of said tree.

I think that everybody strongly agrees that backward incompatible changes should not be made.

Sometimes it can be more subtle, see for example this numpy bug report in Debian:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=589835

and read the dozens of emails that it generated, e.g. http://lists.debian.org/debian-python/2010/07/msg00048.html, and so on. I've been hit by this problem too, that's why I remember it -- suddenly many packages that depend on NumPy stopped working in a subtle way and I had to spent hours figuring out what went wrong and that the problem is not in h5py, but actually that NumPy has changed its ABI, or more precisely the problem is described here (some new members were added to a C datastructure): http://lists.debian.org/debian-python/2010/07/msg00045.html I am sure that this ABI change had to be done and there were good reasons for it and this particular change probably even couldn't have been avoided. But nevertheless it has caused headaches to a lot of people downstream. I just looked into the release notes for NumPy 1.4.0 and didn't find this change nor how to fix it in there. I am just posting this as a particular, concrete, real life example of consequences for the end users.

Let us note that that problem was due to Travis convincing David to include the Datetime work in the release against David's own best judgement. The result was a delay of several months until Ralf could get up to speed and get 1.4.1 out. Let us also note that poly1d is actually not the same as Matlab poly1d.

...

My understanding is that Travis is simply trying to stress "We have to think about the implications of our changes on existing users." and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well.

But sometimes I guess mistakes are made anyway. What can be done to avoid similar issues like with the polynomial order in the future?

Chuck

Travis Oliphant

2:52 p.m.

On Jun 26, 2012, at 9:00 AM, Charles R Harris wrote:

...

On Mon, Jun 25, 2012 at 9:10 PM, Ondřej Čertík <ondrej.certik@gmail.com> wrote: On Mon, Jun 25, 2012 at 7:38 PM, Fernando Perez <fperez.net@gmail.com> wrote:

...
On Mon, Jun 25, 2012 at 6:39 PM, Travis Oliphant <travis@continuum.io> wrote:

...
On Jun 25, 2012, at 7:21 PM, Fernando Perez wrote:

...
For context, consider that for many years, the word "gratuitous" has been used in a non-derogatory way in the Python ecosystem to describe changes to semantics and syntax that don't have benefits significant enough to offset the pain it will cause to existing users. That's why I used the word. I am not trying to be derogatory. I am trying to be clear that we need to respect existing users of NumPy more than we have done from 1.5 to 1.7 in the enthusiasm to make changes.

For reference, here's the (long) thread where this came to be:

http://mail.scipy.org/pipermail/scipy-dev/2009-October/012958.html

It's worth noting that at the time, the discussion was for an addition to *scipy*, not to numpy. I don't know when things were moved over to numpy.

...
Working on the NumPy code base implies respecting the conventions that are already in place --- not just disregarding them and doing whatever we want. I'm not really sure why I have to argue the existing users point of view so much recently. I would hope that all of us would have the perspective that the people who have adopted NumPy deserve to be treated with respect. The changes that grate on me are the ones that seem to take lightly existing users of NumPy.

I certainly appreciate the need to not break user habits/code, as we struggle with the very same issue in IPython all the time. And obviously at this point numpy is 'core infrastructure' enough that breaking backwards compatibility in any way should be very strongly discouraged (things were probably a bit different back in 2009).

...
...
I know that this particular issue grates you quite a bit, but I urge you to be fair in your appreciation of how it came to be: through the work of well-intentioned and thoughtful (but not omniscient) people when you weren't participating actively in numpy development.

I'm trying very hard to be fair --- especially to changes like this. What grates me are changes that affect our user base in a negative way --- specifically by causing code that used to work to no longer work or create alterations to real conventions. This kind of change is just not acceptable if we can avoid it. I'm really trying to understand why others do not feel so strongly about this, but I'm not persuaded by what I've heard so far.

I just want to note that I'm not advocating for *any* backwards-compatibility breakage in numpy at this point... I was just providing context for a discussion that happened back in 2009, and in the scipy list. I certainly feel pretty strongly at this point about the importance of preserving working code *today*, given the role of numpy at the 'root node' of the scipy ecosystem tree and the size of said tree.

I think that everybody strongly agrees that backward incompatible changes should not be made.

Sometimes it can be more subtle, see for example this numpy bug report in Debian:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=589835

and read the dozens of emails that it generated, e.g. http://lists.debian.org/debian-python/2010/07/msg00048.html, and so on. I've been hit by this problem too, that's why I remember it -- suddenly many packages that depend on NumPy stopped working in a subtle way and I had to spent hours figuring out what went wrong and that the problem is not in h5py, but actually that NumPy has changed its ABI, or more precisely the problem is described here (some new members were added to a C datastructure): http://lists.debian.org/debian-python/2010/07/msg00045.html I am sure that this ABI change had to be done and there were good reasons for it and this particular change probably even couldn't have been avoided. But nevertheless it has caused headaches to a lot of people downstream. I just looked into the release notes for NumPy 1.4.0 and didn't find this change nor how to fix it in there. I am just posting this as a particular, concrete, real life example of consequences for the end users.

Let us note that that problem was due to Travis convincing David to include the Datetime work in the release against David's own best judgement. The result was a delay of several months until Ralf could get up to speed and get 1.4.1 out. Let us also note that poly1d is actually not the same as Matlab poly1d.

This is not accurate, Charles. Please stop trying to dredge up old history you don't know the full story about and are trying to create an alternate reality about. It doesn't help anything and is quite poisonous to this mailing list. You have a narrative about the past that seems very different from mine --- and you apparently blame me personally for all that is wrong with NumPy. This is not a helpful perspective and it just alienates us further and is a very polarizing perspective. This is not good for the community nor for our ability to work productively together. I hope that it is not a permanent reality and you will find a way to see things in a different light. -Travis

...

My understanding is that Travis is simply trying to stress "We have to think about the implications of our changes on existing users." and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well.

But sometimes I guess mistakes are made anyway. What can be done to avoid similar issues like with the polynomial order in the future?

Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Charles R Harris

3:33 p.m.

On Tue, Jun 26, 2012 at 8:52 AM, Travis Oliphant <travis@continuum.io>wrote:

...

On Jun 26, 2012, at 9:00 AM, Charles R Harris wrote:

On Mon, Jun 25, 2012 at 9:10 PM, Ondřej Čertík <ondrej.certik@gmail.com>wrote:

...
...
On Mon, Jun 25, 2012 at 6:39 PM, Travis Oliphant <travis@continuum.io> wrote:

...
On Jun 25, 2012, at 7:21 PM, Fernando Perez wrote:

...
For context, consider that for many years, the word "gratuitous" has

been used in a non-derogatory way in the Python ecosystem to describe changes to semantics and syntax that don't have benefits significant enough to offset the pain it will cause to existing users. That's why I used

...
...
For reference, here's the (long) thread where this came to be:

http://mail.scipy.org/pipermail/scipy-dev/2009-October/012958.html

It's worth noting that at the time, the discussion was for an addition to *scipy*, not to numpy. I don't know when things were moved over to numpy.

...
Working on the NumPy code base implies respecting the conventions that are already in place --- not just disregarding them and doing whatever we want. I'm not really sure why I have to argue the existing users point of view so much recently. I would hope that all of us would have the

On Mon, Jun 25, 2012 at 7:38 PM, Fernando Perez <fperez.net@gmail.com> wrote: the word. I am not trying to be derogatory. I am trying to be clear that we need to respect existing users of NumPy more than we have done from 1.5 to 1.7 in the enthusiasm to make changes. perspective that the people who have adopted NumPy deserve to be treated with respect. The changes that grate on me are the ones that seem to take lightly existing users of NumPy.

...
...
I certainly appreciate the need to not break user habits/code, as we struggle with the very same issue in IPython all the time. And obviously at this point numpy is 'core infrastructure' enough that breaking backwards compatibility in any way should be very strongly discouraged (things were probably a bit different back in 2009).

...
...
I know that this particular issue grates you quite a bit, but I urge you to be fair in your appreciation of how it came to be: through the work of well-intentioned and thoughtful (but not omniscient) people when you weren't participating actively in numpy development.

I'm trying very hard to be fair --- especially to changes like this. What grates me are changes that affect our user base in a negative way --- specifically by causing code that used to work to no longer work or create alterations to real conventions. This kind of change is just not acceptable if we can avoid it. I'm really trying to understand why others do not feel so strongly about this, but I'm not persuaded by what I've heard so far.

I just want to note that I'm not advocating for *any* backwards-compatibility breakage in numpy at this point... I was just providing context for a discussion that happened back in 2009, and in the scipy list. I certainly feel pretty strongly at this point about the importance of preserving working code *today*, given the role of numpy at the 'root node' of the scipy ecosystem tree and the size of said tree.

I think that everybody strongly agrees that backward incompatible changes should not be made.

Sometimes it can be more subtle, see for example this numpy bug report in Debian:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=589835

and read the dozens of emails that it generated, e.g. http://lists.debian.org/debian-python/2010/07/msg00048.html, and so on. I've been hit by this problem too, that's why I remember it -- suddenly many packages that depend on NumPy stopped working in a subtle way and I had to spent hours figuring out what went wrong and that the problem is not in h5py, but actually that NumPy has changed its ABI, or more precisely the problem is described here (some new members were added to a C datastructure): http://lists.debian.org/debian-python/2010/07/msg00045.html I am sure that this ABI change had to be done and there were good reasons for it and this particular change probably even couldn't have been avoided. But nevertheless it has caused headaches to a lot of people downstream. I just looked into the release notes for NumPy 1.4.0 and didn't find this change nor how to fix it in there. I am just posting this as a particular, concrete, real life example of consequences for the end users.

Let us note that that problem was due to Travis convincing David to include the Datetime work in the release against David's own best judgement. The result was a delay of several months until Ralf could get up to speed and get 1.4.1 out. Let us also note that poly1d is actually not the same as Matlab poly1d.

This is not accurate, Charles. Please stop trying to dredge up old history you don't know the full story about and are trying to create an alternate reality about. It doesn't help anything and is quite poisonous to this mailing list.

I didn't start the discussion of 1.4, nor did I raise the issue at the time as I didn't think it would be productive. We moved forward. But in any case, I asked David at the time why the datetime stuff got included. I'd welcome your version if you care to offer it. That would be more useful than accusing me of creating an alternative reality and would clear the air.

...

You have a narrative about the past that seems very different from mine --- and you apparently blame me personally for all that is wrong with NumPy.

You started this blame game. You could have simply said, "here is how we will move forward." This is not a helpful perspective and it just alienates us further and is a

...

very polarizing perspective. This is not good for the community nor for our ability to work productively together.

Calling this and that 'gratuitous' is already damaging to the community. Them's fightin' words. If you didn't want a fight you could have simply pointed out a path forward. I hope that it is not a permanent reality and you will find a way to see

...

things in a different light.

I see things as I see them. Chuck

Travis Oliphant

4:24 p.m.

...

...
Let us note that that problem was due to Travis convincing David to include the Datetime work in the release against David's own best judgement. The result was a delay of several months until Ralf could get up to speed and get 1.4.1 out. Let us also note that poly1d is actually not the same as Matlab poly1d.

This is not accurate, Charles. Please stop trying to dredge up old history you don't know the full story about and are trying to create an alternate reality about. It doesn't help anything and is quite poisonous to this mailing list.

I didn't start the discussion of 1.4, nor did I raise the issue at the time as I didn't think it would be productive. We moved forward. But in any case, I asked David at the time why the datetime stuff got included. I'd welcome your version if you care to offer it. That would be more useful than accusing me of creating an alternative reality and would clear the air.

The datetime stuff got included because it is a very useful and important feature for multiple users. It still needed work, but it was in a state where it could be tried. It did require breaking ABI compatibility in the state it was in. My approach was to break ABI compatibility and move forward (there were other things we could do at the time that are still needed in the code base that will break ABI compatibility in the future). David didn't want to break ABI compatibility and so tried to satisfy two competing desires in a way that did not ultimately work. These things happen. We all get to share responsibility for the outcome.

...

You have a narrative about the past that seems very different from mine --- and you apparently blame me personally for all that is wrong with NumPy.

You started this blame game. You could have simply said, "here is how we will move forward."

I'm sorry you feel that way. My intent was not to assign blame --- but of course mailing lists can be notoriously hard to actually communicate intent. My intent was to provide context for why I think we should move forward in a particular way.

...

This is not a helpful perspective and it just alienates us further and is a very polarizing perspective. This is not good for the community nor for our ability to work productively together.

Calling this and that 'gratuitous' is already damaging to the community. Them's fightin' words. If you didn't want a fight you could have simply pointed out a path forward.

They were not intended as "fighting words". I used the term in a very specific way as used by the Python developers themselves in describing their hope in moving from Python 2 to Python 3. Clearly your semantic environment interpreted them differently. As I have emphasized, I did not mean to disrespect you or anyone else by using that term. From where I sit, however, it seems you are anxious for a fight and so interpret everything I say in the worst possible light. If that is really the case, then this is a very bad state of affairs. We can't really communicate at that point. It will be impossible to agree on anything, and the whole idea of finding consensus just won't work. That's what I'm concerned about, fundamentally. You don't seem to be willing to give me the benefit of the doubt at all. Just like anyone who has created something, I feel a sense of "ownership" of NumPy. It might be helpful to recognize that I also feel that way about SciPy. In the case of SciPy, however, I have handed that project off to Ralf, Pauli, Warren, Josef, and others who are able to spend the time on it that it deserves. That internal mental decision to formally "hand off" SciPy did not come, though, until the end of last year and the first of this year. Perhaps it should have come sooner, but SciPy took a lot of time from me during a lot of formative years and I've always had very high hopes for it. It's hard to let that go. I am not ready to formally "hand off" my involvement with NumPy at all --- especially not now that I understand so much better what NumPy should and can be and how it's being used. Of course, I recognize that it's a team effort. I can't help but feel that you wish I would just "hand off" things to someone else and get out of Dodge. I understand that NumPy would not be what it is today without your contributions, those of David, Mark, Robert, Pauli and so many other people, but I'm not going anywhere at least for the foreseeable future. I've respected that "team effort" perspective from the beginning and remain respectful of it. I recognize that you must feel some sense of "ownership" of NumPy as well. I suspect there are several others that feel the same way. Right now, though, we need to work as hard as we can to reconcile our different perspectives so that we can do our very best to serve and respect the time of the users who have adopted NumPy. -Travis

David Cournapeau

4:48 p.m.

On Tue, Jun 26, 2012 at 5:24 PM, Travis Oliphant <travis@continuum.io> wrote:

...

...
Let us note that that problem was due to Travis convincing David to include the Datetime work in the release against David's own best judgement. The result was a delay of several months until Ralf could get up to speed and get 1.4.1 out. Let us also note that poly1d is actually not the same as Matlab poly1d.

This is not accurate, Charles. Please stop trying to dredge up old history you don't know the full story about and are trying to create an alternate reality about. It doesn't help anything and is quite poisonous to this mailing list.

I didn't start the discussion of 1.4, nor did I raise the issue at the time as I didn't think it would be productive. We moved forward. But in any case, I asked David at the time why the datetime stuff got included. I'd welcome your version if you care to offer it. That would be more useful than accusing me of creating an alternative reality and would clear the air.

The datetime stuff got included because it is a very useful and important feature for multiple users. It still needed work, but it was in a state where it could be tried. It did require breaking ABI compatibility in the state it was in. My approach was to break ABI compatibility and move forward (there were other things we could do at the time that are still needed in the code base that will break ABI compatibility in the future). David didn't want to break ABI compatibility and so tried to satisfy two competing desires in a way that did not ultimately work. These things happen. We all get to share responsibility for the outcome.

I think Chuck alludes to the fact that I was rather reserved about merging datetime before *anyone* knew about breaking the ABI. I don't feel responsible for this issue (except I maybe should have pushed more strongly about datetime being included), but I am also not interested in making a big deal out of it, certainly not two years after the fact. I am merely point this out so that you realize that you may both have a different view that could be seen as valid depending on what you are willing to highlight. I suggest that Chuck and you take this off-list, David

Benjamin Root

4:52 p.m.

On Tue, Jun 26, 2012 at 12:48 PM, David Cournapeau <cournape@gmail.com>wrote:

...

On Tue, Jun 26, 2012 at 5:24 PM, Travis Oliphant <travis@continuum.io> wrote:

...
...
Let us note that that problem was due to Travis convincing David to include the Datetime work in the release against David's own best

...
...
The result was a delay of several months until Ralf could get up to speed and get 1.4.1 out. Let us also note that poly1d is actually not the same as Matlab poly1d.

This is not accurate, Charles. Please stop trying to dredge up old history you don't know the full story about and are trying to create an alternate reality about. It doesn't help anything and is quite

...
...
to this mailing list.

I didn't start the discussion of 1.4, nor did I raise the issue at the time as I didn't think it would be productive. We moved forward. But in any case, I asked David at the time why the datetime stuff got included. I'd welcome your version if you care to offer it. That would be more useful than accusing me of creating an alternative reality and would clear the air.

The datetime stuff got included because it is a very useful and important feature for multiple users. It still needed work, but it was in a state where it could be tried. It did require breaking ABI compatibility in

judgement. poisonous the

...
state it was in. My approach was to break ABI compatibility and move forward (there were other things we could do at the time that are still needed in the code base that will break ABI compatibility in the future). David didn't want to break ABI compatibility and so tried to satisfy two competing desires in a way that did not ultimately work. These things happen. We all get to share responsibility for the outcome.

I think Chuck alludes to the fact that I was rather reserved about merging datetime before *anyone* knew about breaking the ABI. I don't feel responsible for this issue (except I maybe should have pushed more strongly about datetime being included), but I am also not interested in making a big deal out of it, certainly not two years after the fact. I am merely point this out so that you realize that you may both have a different view that could be seen as valid depending on what you are willing to highlight.

I suggest that Chuck and you take this off-list,

David

Or, we could raise funds for NumFOCUS by selling tickets for a brawl between the two at SciPy2012... I kid, I kid! Ben Root

Travis Oliphant

4:59 p.m.

...

Or, we could raise funds for NumFOCUS by selling tickets for a brawl between the two at SciPy2012...

I kid, I kid!

Thanks for the humor. Unfortunately, I would be no match physically with someone used to the cold of Logan.... :-) -Travis

Travis Oliphant

4:52 p.m.

...

I think Chuck alludes to the fact that I was rather reserved about merging datetime before *anyone* knew about breaking the ABI. I don't feel responsible for this issue (except I maybe should have pushed more strongly about datetime being included), but I am also not interested in making a big deal out of it, certainly not two years after the fact. I am merely point this out so that you realize that you may both have a different view that could be seen as valid depending on what you are willing to highlight.

I suggest that Chuck and you take this off-list,

Agreed! -Travis

...

David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Charles R Harris

5:22 p.m.

On Tue, Jun 26, 2012 at 10:48 AM, David Cournapeau <cournape@gmail.com>wrote:

...

On Tue, Jun 26, 2012 at 5:24 PM, Travis Oliphant <travis@continuum.io> wrote:

...
...
Let us note that that problem was due to Travis convincing David to include the Datetime work in the release against David's own best

...
...
The result was a delay of several months until Ralf could get up to speed and get 1.4.1 out. Let us also note that poly1d is actually not the same as Matlab poly1d.

This is not accurate, Charles. Please stop trying to dredge up old history you don't know the full story about and are trying to create an alternate reality about. It doesn't help anything and is quite

...
...
to this mailing list.

I didn't start the discussion of 1.4, nor did I raise the issue at the time as I didn't think it would be productive. We moved forward. But in any case, I asked David at the time why the datetime stuff got included. I'd welcome your version if you care to offer it. That would be more useful than accusing me of creating an alternative reality and would clear the air.

The datetime stuff got included because it is a very useful and important feature for multiple users. It still needed work, but it was in a state where it could be tried. It did require breaking ABI compatibility in

judgement. poisonous the

...
state it was in. My approach was to break ABI compatibility and move forward (there were other things we could do at the time that are still needed in the code base that will break ABI compatibility in the future). David didn't want to break ABI compatibility and so tried to satisfy two competing desires in a way that did not ultimately work. These things happen. We all get to share responsibility for the outcome.

I think Chuck alludes to the fact that I was rather reserved about merging datetime before *anyone* knew about breaking the ABI.

Exactly.

...

I don't feel responsible for this issue (except I maybe should have pushed more strongly about datetime being included),

I think you left out a 'not'. I don't mean to imply that you were in anyway the blame. And you have been pretty adamant about not allowing late merges of large bits of code since then. It falls in the lessons learned category. but I am also not

...

interested in making a big deal out of it, certainly not two years after the fact. I am merely point this out so that you realize that you may both have a different view that could be seen as valid depending on what you are willing to highlight.

I suggest that Chuck and you take this off-list,

I don't think there is much more to say, although I would suggest Travis be more careful about criticising previous work, ala 'gratuitous', 'not listening', etc. We got 1.3, 1.4, 1.5, and 1.6 out without any help from him, and I think we did a pretty damn good job of working with the community and improving the code in the process. Chuck

Travis Oliphant

7:51 p.m.

...

Exactly.

I don't feel responsible for this issue (except I maybe should have pushed more strongly about datetime being included),

I think you left out a 'not'. I don't mean to imply that you were in anyway the blame. And you have been pretty adamant about not allowing late merges of large bits of code since then. It falls in the lessons learned category.

but I am also not interested in making a big deal out of it, certainly not two years after the fact. I am merely point this out so that you realize that you may both have a different view that could be seen as valid depending on what you are willing to highlight.

I suggest that Chuck and you take this off-list,

I don't think there is much more to say, although I would suggest Travis be more careful about criticising previous work, ala 'gratuitous', 'not listening', etc. We got 1.3, 1.4, 1.5, and 1.6 out without any help from him, and I think we did a pretty damn good job of working with the community and improving the code in the process.

Wow! Again, your attitude surprises me and I can't just let a public comment like that go unaddressed. Not *any* help from me. Is that really the way you view it. Amazing! No wonder people new to the project lose sight of where it came from if that's the kind of dialogue and spin you spread. So, you are going to disregard anything I've done during that time. The personal time spent on bug fixes and code enhancements, the active discussions with people, the work on datetime, the contribution of resources, the growing of the community, the teaching, the talking, the actively trying to figure out just how to improve not only the state of the code but also how it gets written, the documentation improvements (from my early donation of my book). Just because you are not aware personally of something or I don't comment on this list, it doesn't mean I'm not active. I was not as active as I wanted to be sometimes (I do have other responsibilities), but this kind of statement is pretty hurtful as well as being completely inaccurate. "The community" is not just people that post to this list and a few users of SciPy that you know about. "The community" is much larger than that, and I've been working with them too --- all along, even when I wasn't actively making releases. I would suggest that you be more careful about accusing who is and who isn't "helping" with things. -Travis

Dag Sverre Seljebotn

8:06 p.m.

On 06/26/2012 09:51 PM, Travis Oliphant wrote:

...

...
Exactly.

I don't feel responsible for this issue (except I maybe should have pushed more strongly about datetime being included),

I think you left out a 'not'. I don't mean to imply that you were in anyway the blame. And you have been pretty adamant about not allowing late merges of large bits of code since then. It falls in the lessons learned category.

but I am also not interested in making a big deal out of it, certainly not two years after the fact. I am merely point this out so that you realize that you may both have a different view that could be seen as valid depending on what you are willing to highlight.

I suggest that Chuck and you take this off-list,

I don't think there is much more to say, although I would suggest Travis be more careful about criticising previous work, ala 'gratuitous', 'not listening', etc. We got 1.3, 1.4, 1.5, and 1.6 out without any help from him, and I think we did a pretty damn good job of working with the community and improving the code in the process.

Wow! Again, your attitude surprises me and I can't just let a public comment like that go unaddressed. Not *any* help from me. Is that really

I hereby call you out!, per your comment earlier :-) Something the Sage project does very well is meeting often in person (granted, that's a lot easier to pull off for academics than people who have real work to be done). In my experience getting to know somebody better in person does wonders to email clarity -- one needs to know how somebody else's mind works to be able to read their emails well, and that's better picked up in person. Cython's had one workshop, and it did improve discussion climate. (And I don't think a SciPy conference, brawl or not, is a good replacement for an honest NumPy developer workshop to a nice cottage *by invitation only*, there's too much stuff going on, and too little undivided attention to one another.) Dag

Jason Grout

8:11 p.m.

On 6/26/12 3:06 PM, Dag Sverre Seljebotn wrote:

...

Something the Sage project does very well is meeting often in person

Another thing we have that has improved the mailing list climate is a "sage-flame" list [1] that serves as a venting release valve for anyone to post *anything* at all. There have been multiple occasions where we called on people to move their discussion to sage-flame, and overall it's worked very nicely. Having a public forum to argue things out seems to help, and my guess is that most of us may peek at it every now and then for kicks and giggles. Thanks, Jason [1] https://groups.google.com/forum/?fromgroups#!forum/sage-flame

Thouis (Ray) Jones

8:27 p.m.

On Tue, Jun 26, 2012 at 10:11 PM, Jason Grout <jason-sage@creativetrax.com> wrote:

...

On 6/26/12 3:06 PM, Dag Sverre Seljebotn wrote:

...
Something the Sage project does very well is meeting often in person

Another thing we have that has improved the mailing list climate is a "sage-flame" list [1]

+1 ! Speaking as someone trying to get started in contributing to numpy, I find this discussion extremely off-putting. It's childish, meaningless, and spiteful, and I think it's doing more harm than any possible good that could come out of continuing it.

Travis Oliphant

8:31 p.m.

On Jun 26, 2012, at 3:27 PM, Thouis (Ray) Jones wrote:

...

On Tue, Jun 26, 2012 at 10:11 PM, Jason Grout <jason-sage@creativetrax.com> wrote:

...
On 6/26/12 3:06 PM, Dag Sverre Seljebotn wrote:

...
Something the Sage project does very well is meeting often in person

Another thing we have that has improved the mailing list climate is a "sage-flame" list [1]

+1 !

Speaking as someone trying to get started in contributing to numpy, I find this discussion extremely off-putting. It's childish, meaningless, and spiteful, and I think it's doing more harm than any possible good that could come out of continuing it.

Thank you for the reminder. I was already called out for not stopping. Thanks, Dag. A flame-list might indeed be a good idea at this point if there is further need for "clearing the air" -Travis

...

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Jason Grout

8:40 p.m.

On 6/26/12 3:31 PM, Travis Oliphant wrote:

...

Thank you for the reminder. I was already called out for not stopping. Thanks, Dag. A flame-list might indeed be a good idea at this point if there is further need for "clearing the air"

Also, having it set up before it is needed is part of the solution. Setting it up in the heat of the moment can just further inflame feelings. You put a pressure valve in at the start, instead of waiting for a hole to blow in the side :). Sort of like all the governance discussions about setting up a decision procedure before having to face a huge decision.... Jason

John Hunter

8:39 p.m.

On Tue, Jun 26, 2012 at 3:27 PM, Thouis (Ray) Jones <thouis@gmail.com> wrote:

...

+1 !

Speaking as someone trying to get started in contributing to numpy, I find this discussion extremely off-putting. It's childish, meaningless, and spiteful, and I think it's doing more harm than any possible good that could come out of continuing it.

Hey Thouis, Just chiming in to encourage you not to get discouraged. There is a large, mostly silent majority who feel just the same way you do, it's just that they are silent precisely because they want to write good code and contribute and not participate in long, unproductive email threads that border on flame wars. You've made helpful comments here already advising people to take this offlist. After that there is nothing much to do but roll up your sleeves, make some pull requests, and engage in a worthwhile discussion about work. There are lots of people here who will engage you on that.

Andrea Gavana

9:02 p.m.

On 26 June 2012 22:39, John Hunter wrote:

...

On Tue, Jun 26, 2012 at 3:27 PM, Thouis (Ray) Jones <thouis@gmail.com> wrote:

...
+1 !

Speaking as someone trying to get started in contributing to numpy, I find this discussion extremely off-putting. It's childish, meaningless, and spiteful, and I think it's doing more harm than any possible good that could come out of continuing it.

Hey Thouis,

Just chiming in to encourage you not to get discouraged. There is a large, mostly silent majority who feel just the same way you do, it's just that they are silent precisely because they want to write good code and contribute and not participate in long, unproductive email threads that border on flame wars. You've made helpful comments here already advising people to take this offlist. After that there is nothing much to do but roll up your sleeves, make some pull requests, and engage in a worthwhile discussion about work. There are lots of people here who will engage you on that.

+1 from a pretty much silent user. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/

Benjamin Root

11:50 p.m.

On Tuesday, June 26, 2012, Thouis (Ray) Jones wrote:

...

On Tue, Jun 26, 2012 at 10:11 PM, Jason Grout <jason-sage@creativetrax.com <javascript:;>> wrote:

...
On 6/26/12 3:06 PM, Dag Sverre Seljebotn wrote:

...
Something the Sage project does very well is meeting often in person

Another thing we have that has improved the mailing list climate is a "sage-flame" list [1]

+1 !

Speaking as someone trying to get started in contributing to numpy, I find this discussion extremely off-putting. It's childish, meaningless, and spiteful, and I think it's doing more harm than any possible good that could come out of continuing it.

And if you still feel dissuaded from contributing here, you are always welcome over at the matplotlib lists. </poaching> Cheers! Ben Root

Charles R Harris

8:07 p.m.

On Tue, Jun 26, 2012 at 1:51 PM, Travis Oliphant <travis@continuum.io>wrote:

...

Exactly.

...
I don't feel responsible for this issue (except I maybe should have pushed more strongly about datetime being included),

I think you left out a 'not'. I don't mean to imply that you were in anyway the blame. And you have been pretty adamant about not allowing late merges of large bits of code since then. It falls in the lessons learned category.

but I am also not

...
interested in making a big deal out of it, certainly not two years after the fact. I am merely point this out so that you realize that you may both have a different view that could be seen as valid depending on what you are willing to highlight.

I suggest that Chuck and you take this off-list,

I don't think there is much more to say, although I would suggest Travis be more careful about criticising previous work, ala 'gratuitous', 'not listening', etc. We got 1.3, 1.4, 1.5, and 1.6 out without any help from him, and I think we did a pretty damn good job of working with the community and improving the code in the process.

Wow! Again, your attitude surprises me and I can't just let a public comment like that go unaddressed. Not *any* help from me. Is that really the way you view it. Amazing! No wonder people new to the project lose sight of where it came from if that's the kind of dialogue and spin you spread.

So, you are going to disregard anything I've done during that time. The personal time spent on bug fixes and code enhancements, the active discussions with people, the work on datetime, the contribution of resources, the growing of the community, the teaching, the talking, the actively trying to figure out just how to improve not only the state of the code but also how it gets written, the documentation improvements (from my early donation of my book). Just because you are not aware personally of something or I don't comment on this list, it doesn't mean I'm not active. I was not as active as I wanted to be sometimes (I do have other responsibilities), but this kind of statement is pretty hurtful as well as being completely inaccurate.

"The community" is not just people that post to this list and a few users of SciPy that you know about. "The community" is much larger than that, and I've been working with them too --- all along, even when I wasn't actively making releases. I would suggest that you be more careful about accusing who is and who isn't "helping" with things.

Thouis (Ray) Jones

4:29 p.m.

On Tue, Jun 26, 2012 at 5:33 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

Calling this and that 'gratuitous' is already damaging to the community. Them's fightin' words. If you didn't want a fight you could have simply pointed out a path forward.

I disagree. If a change is gratuitous, and someone call's it out for being so, it's not a reason to get offended. Even if someone call's a change stupid, one should take a large step back before taking offense, just because they were responsible for the change. Defend the change, give the reasons it's not gratuitous/stupid/ugly/whatever, but keep calm and carry on. This sort of back-and-forth sniping should be taken off list.

Travis Oliphant

4:46 p.m.

On Jun 26, 2012, at 11:29 AM, Thouis (Ray) Jones wrote:

...

On Tue, Jun 26, 2012 at 5:33 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
Calling this and that 'gratuitous' is already damaging to the community. Them's fightin' words. If you didn't want a fight you could have simply pointed out a path forward.

I disagree. If a change is gratuitous, and someone call's it out for being so, it's not a reason to get offended. Even if someone call's a change stupid, one should take a large step back before taking offense, just because they were responsible for the change. Defend the change, give the reasons it's not gratuitous/stupid/ugly/whatever, but keep calm and carry on.

This sort of back-and-forth sniping should be taken off list.

I agree. I will try to refrain from this. Please call me out if I slip up and react to something posted. -Travis

...

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

josef.pktd＠gmail.com

12:25 a.m.

On Mon, Jun 25, 2012 at 8:10 PM, Travis Oliphant <travis@continuum.io> wrote:

...

You are still missing the point that there was already a choice that was made in the previous class --- made in Numeric actually.

You made a change to that. It is the change that is 'gratuitous'. The pain and unnecessary overhead of having two competing standards is the problem --- not whether one is 'right' or not. That is a different discussion entirely.

I remember there was a discussion about the order of the coefficients on the mailing list and all in favor of the new order, IIRC. I cannot find the thread. I know I was. At least I'm switching pretty much to the new polynomial classes, and don't really care about the inherited choice before that any more. So, I'm pretty much in favor of updating, if new choices are more convenient and more familiar to new users. Josef

...

-- Travis Oliphant (on a mobile) 512-826-7480

On Jun 25, 2012, at 7:01 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

On Mon, Jun 25, 2012 at 4:21 PM, Perry Greenfield <perry@stsci.edu> wrote:

...
On Jun 25, 2012, at 3:25 PM, Charles R Harris wrote:

...
On Mon, Jun 25, 2012 at 11:56 AM, Perry Greenfield <perry@stsci.edu> wrote:

It's hard to generalize that much here. There are some areas in what you say is true, particularly if whole industries rely on libraries that have much time involved in developing them, and for which it is particularly difficult to break away. But there are plenty of other areas where it isn't that hard.

I'd characterize the process a bit differently. I would agree that it is pretty hard to get someone who has been using matlab or IDL for many years to transition. That doesn't happen very often (if it does, it's because all the other people they work with are using a different tool and they are forced to). I think we are targeting the younger people; those that do not have a lot of experience tied up in matlab or IDL. For example, IDL is very well established in astronomy, and we've seen few make that switch if they already have been using IDL for a while. But we are seeing many more younger astronomers choose Python over IDL these days.

I didn't bring up the Astronomy experience, but I think that is a special case because it is a fairly small area and to some extent you had the advantage of a supported center, STSci, maintaining some software. There are also a lot of amateurs who can appreciate the low costs and simplicity of Python.

The software engineers use tends to be set early, in college or in their first jobs. I suspect that these days professional astronomers spend a number of years in graduate school where they have time to experiment a bit. That is a nice luxury to have.

Sure. But it's not unusual for an invasive technology (that's us) to take root in certain niches before spreading more widely.

Another way of looking at such things is: is what we are seeking to replace that much worse? If the gains are marginal, then it is very hard to displace. But if there are significant advantages, eventually they will win through. I tend to think Python and the scientific stack does offer the potential for great advantages over IDL or matlab. But that doesn't make it easy.

I didn't say we couldn't make inroads. The original proposition was that we needed a polynomial class compatible with Matlab. I didn't think compatibility with Matlab mattered so much in that case because not many people switch, as you have agreed is the case, and those who start fresh, or are the adventurous sort, can adapt without a problem. In other words, IMHO, it wasn't a pressing issue and could be decided on the merits of the interface, which I thought of in terms of series approximation. In particular, it wasn't a 'gratuitous' choice as I had good reasons to do things the way I did.

Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

josef.pktd＠gmail.com

12:53 a.m.

On Mon, Jun 25, 2012 at 8:25 PM, <josef.pktd@gmail.com> wrote:

...

On Mon, Jun 25, 2012 at 8:10 PM, Travis Oliphant <travis@continuum.io> wrote:

...
You are still missing the point that there was already a choice that was made in the previous class --- made in Numeric actually.

You made a change to that. It is the change that is 'gratuitous'. The pain and unnecessary overhead of having two competing standards is the problem --- not whether one is 'right' or not. That is a different discussion entirely.

I remember there was a discussion about the order of the coefficients on the mailing list and all in favor of the new order, IIRC. I cannot find the thread. I know I was.

At least I'm switching pretty much to the new polynomial classes, and don't really care about the inherited choice before that any more.

So, I'm pretty much in favor of updating, if new choices are more convenient and more familiar to new users.

just to add a bit more information, given the existence of both poly's nobody had to rewrite flipping order in scipy.signal.residuez b, a = map(asarray, (b, a)) gain = a[0] brev, arev = b[::-1], a[::-1] krev, brev = polydiv(brev, arev) if krev == []: k = [] else: k = krev[::-1] b = brev[::-1] while my arma_process class can start at the same time with def __init__(self, ar, ma, nobs=None): self.ar = np.asarray(ar) self.ma = np.asarray(ma) self.arpoly = np.polynomial.Polynomial(self.ar) self.mapoly = np.polynomial.Polynomial(self.ma) As a downstream user of numpy and observer of the mailing list for a few years, I think the gradual improvements have gone down pretty well. At least I haven't seen any mayor complaints on the mailing list. For me, the big problem was numpy 1.4.0 where several packages where not available because of binary compatibility, NaN's didn't concern me much, current incomplete transition to new MinGW and gcc is currently a bit of a problem. Purely as an observer, my impression was also that the internal numpy c source cleanup, started by David C., I guess, didn't cause any big problems that would have created lots of complaints on the numpy mailing list. Josef

...

Josef

...
-- Travis Oliphant (on a mobile) 512-826-7480

On Jun 25, 2012, at 7:01 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

On Mon, Jun 25, 2012 at 4:21 PM, Perry Greenfield <perry@stsci.edu> wrote:

...
On Jun 25, 2012, at 3:25 PM, Charles R Harris wrote:

...
On Mon, Jun 25, 2012 at 11:56 AM, Perry Greenfield <perry@stsci.edu> wrote:

It's hard to generalize that much here. There are some areas in what you say is true, particularly if whole industries rely on libraries that have much time involved in developing them, and for which it is particularly difficult to break away. But there are plenty of other areas where it isn't that hard.

I'd characterize the process a bit differently. I would agree that it is pretty hard to get someone who has been using matlab or IDL for many years to transition. That doesn't happen very often (if it does, it's because all the other people they work with are using a different tool and they are forced to). I think we are targeting the younger people; those that do not have a lot of experience tied up in matlab or IDL. For example, IDL is very well established in astronomy, and we've seen few make that switch if they already have been using IDL for a while. But we are seeing many more younger astronomers choose Python over IDL these days.

I didn't bring up the Astronomy experience, but I think that is a special case because it is a fairly small area and to some extent you had the advantage of a supported center, STSci, maintaining some software. There are also a lot of amateurs who can appreciate the low costs and simplicity of Python.

The software engineers use tends to be set early, in college or in their first jobs. I suspect that these days professional astronomers spend a number of years in graduate school where they have time to experiment a bit. That is a nice luxury to have.

Sure. But it's not unusual for an invasive technology (that's us) to take root in certain niches before spreading more widely.

Another way of looking at such things is: is what we are seeking to replace that much worse? If the gains are marginal, then it is very hard to displace. But if there are significant advantages, eventually they will win through. I tend to think Python and the scientific stack does offer the potential for great advantages over IDL or matlab. But that doesn't make it easy.

I didn't say we couldn't make inroads. The original proposition was that we needed a polynomial class compatible with Matlab. I didn't think compatibility with Matlab mattered so much in that case because not many people switch, as you have agreed is the case, and those who start fresh, or are the adventurous sort, can adapt without a problem. In other words, IMHO, it wasn't a pressing issue and could be decided on the merits of the interface, which I thought of in terms of series approximation. In particular, it wasn't a 'gratuitous' choice as I had good reasons to do things the way I did.

Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Travis Oliphant

1:50 a.m.

On Jun 25, 2012, at 7:53 PM, josef.pktd@gmail.com wrote:

...

On Mon, Jun 25, 2012 at 8:25 PM, <josef.pktd@gmail.com> wrote:

...
On Mon, Jun 25, 2012 at 8:10 PM, Travis Oliphant <travis@continuum.io> wrote:

...
You are still missing the point that there was already a choice that was made in the previous class --- made in Numeric actually.

You made a change to that. It is the change that is 'gratuitous'. The pain and unnecessary overhead of having two competing standards is the problem --- not whether one is 'right' or not. That is a different discussion entirely.

I remember there was a discussion about the order of the coefficients on the mailing list and all in favor of the new order, IIRC. I cannot find the thread. I know I was.

At least I'm switching pretty much to the new polynomial classes, and don't really care about the inherited choice before that any more.

So, I'm pretty much in favor of updating, if new choices are more convenient and more familiar to new users.

just to add a bit more information, given the existence of both poly's

nobody had to rewrite flipping order in scipy.signal.residuez b, a = map(asarray, (b, a)) gain = a[0] brev, arev = b[::-1], a[::-1] krev, brev = polydiv(brev, arev) if krev == []: k = [] else: k = krev[::-1] b = brev[::-1]

while my arma_process class can start at the same time with def __init__(self, ar, ma, nobs=None): self.ar = np.asarray(ar) self.ma = np.asarray(ma) self.arpoly = np.polynomial.Polynomial(self.ar) self.mapoly = np.polynomial.Polynomial(self.ma)

That's a nice argument for a different convention, really it is. It's not enough for changing a convention that already exists. Now, the polynomial object could store coefficients in this order, but allow construction with the coefficients in the standard convention order. That would have been a fine compromise from my perspective.

...

As a downstream user of numpy and observer of the mailing list for a few years, I think the gradual improvements have gone down pretty well. At least I haven't seen any mayor complaints on the mailing list.

You are an *active* user of NumPy. Your perspective is valuable, but it is one of many perspectives in the user community. What is missing in this discussion is the 100's of thousands of users of NumPy who never comment on this mailing list and won't. There are many that have not moved from 1.5.1 yet. I hope your optimism is correct about how difficult it will be to upgrade for them. As long as I hold any influence at all on the NumPy project, I will argue and fight on behalf of those users to the best that I can understand their perspective.

...

For me, the big problem was numpy 1.4.0 where several packages where not available because of binary compatibility, NaN's didn't concern me much, current incomplete transition to new MinGW and gcc is currently a bit of a problem.

It is *much*, *much* easier to create binaries of downstream packages than to re-write APIs. I still think we would be better off to remove the promise of ABI compatibility in every .X release (perhaps we hold ABI compatibility for 2 releases). However, we should preserve API compatibility for every release.

...

Purely as an observer, my impression was also that the internal numpy c source cleanup, started by David C., I guess, didn't cause any big problems that would have created lots of complaints on the numpy mailing list.

David C spent a lot of time ensuring his changes did not alter the compiling experience or the run-time experience of users of NumPy. This was greatly appreciated. Lack of complaints on the mailing list is not the metric we should be using. Most users will never comment on this list --- especially given how hard we've made it for people to feel like they will be listened to. We have to think about the implications of our changes on existing users. -Travis

...

Josef

...
Josef

...
-- Travis Oliphant (on a mobile) 512-826-7480

On Jun 25, 2012, at 7:01 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

On Mon, Jun 25, 2012 at 4:21 PM, Perry Greenfield <perry@stsci.edu> wrote:

...
On Jun 25, 2012, at 3:25 PM, Charles R Harris wrote:

...
On Mon, Jun 25, 2012 at 11:56 AM, Perry Greenfield <perry@stsci.edu> wrote:

It's hard to generalize that much here. There are some areas in what you say is true, particularly if whole industries rely on libraries that have much time involved in developing them, and for which it is particularly difficult to break away. But there are plenty of other areas where it isn't that hard.

I'd characterize the process a bit differently. I would agree that it is pretty hard to get someone who has been using matlab or IDL for many years to transition. That doesn't happen very often (if it does, it's because all the other people they work with are using a different tool and they are forced to). I think we are targeting the younger people; those that do not have a lot of experience tied up in matlab or IDL. For example, IDL is very well established in astronomy, and we've seen few make that switch if they already have been using IDL for a while. But we are seeing many more younger astronomers choose Python over IDL these days.

I didn't bring up the Astronomy experience, but I think that is a special case because it is a fairly small area and to some extent you had the advantage of a supported center, STSci, maintaining some software. There are also a lot of amateurs who can appreciate the low costs and simplicity of Python.

The software engineers use tends to be set early, in college or in their first jobs. I suspect that these days professional astronomers spend a number of years in graduate school where they have time to experiment a bit. That is a nice luxury to have.

Sure. But it's not unusual for an invasive technology (that's us) to take root in certain niches before spreading more widely.

Another way of looking at such things is: is what we are seeking to replace that much worse? If the gains are marginal, then it is very hard to displace. But if there are significant advantages, eventually they will win through. I tend to think Python and the scientific stack does offer the potential for great advantages over IDL or matlab. But that doesn't make it easy.

I didn't say we couldn't make inroads. The original proposition was that we needed a polynomial class compatible with Matlab. I didn't think compatibility with Matlab mattered so much in that case because not many people switch, as you have agreed is the case, and those who start fresh, or are the adventurous sort, can adapt without a problem. In other words, IMHO, it wasn't a pressing issue and could be decided on the merits of the interface, which I thought of in terms of series approximation. In particular, it wasn't a 'gratuitous' choice as I had good reasons to do things the way I did.

Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

josef.pktd＠gmail.com

2:37 a.m.

On Mon, Jun 25, 2012 at 9:50 PM, Travis Oliphant <travis@continuum.io> wrote:

...

On Jun 25, 2012, at 7:53 PM, josef.pktd@gmail.com wrote:

...
On Mon, Jun 25, 2012 at 8:25 PM, <josef.pktd@gmail.com> wrote:

...
On Mon, Jun 25, 2012 at 8:10 PM, Travis Oliphant <travis@continuum.io> wrote:

...
You are still missing the point that there was already a choice that was made in the previous class --- made in Numeric actually.

You made a change to that. It is the change that is 'gratuitous'. The pain and unnecessary overhead of having two competing standards is the problem --- not whether one is 'right' or not. That is a different discussion entirely.

I remember there was a discussion about the order of the coefficients on the mailing list and all in favor of the new order, IIRC. I cannot find the thread. I know I was.

At least I'm switching pretty much to the new polynomial classes, and don't really care about the inherited choice before that any more.

So, I'm pretty much in favor of updating, if new choices are more convenient and more familiar to new users.

just to add a bit more information, given the existence of both poly's

nobody had to rewrite flipping order in scipy.signal.residuez b, a = map(asarray, (b, a)) gain = a[0] brev, arev = b[::-1], a[::-1] krev, brev = polydiv(brev, arev) if krev == []: k = [] else: k = krev[::-1] b = brev[::-1]

while my arma_process class can start at the same time with def __init__(self, ar, ma, nobs=None): self.ar = np.asarray(ar) self.ma = np.asarray(ma) self.arpoly = np.polynomial.Polynomial(self.ar) self.mapoly = np.polynomial.Polynomial(self.ma)

That's a nice argument for a different convention, really it is. It's not enough for changing a convention that already exists. Now, the polynomial object could store coefficients in this order, but allow construction with the coefficients in the standard convention order. That would have been a fine compromise from my perspective.

I'm much happier with the current solution. As long as I stick with the np.polynomial classes, I don't have to *think* about coefficient order. With a hybrid I would always have to worry about whether this animal is facing front or back. I wouldn't mind if the old order is eventually deprecated and dropped. (Another example: NIST polynomial follow the new order, 2nd section http://jpktd.blogspot.ca/2012/03/numerical-accuracy-in-linear-least.html no [::-1] in the second version.)

...

...
As a downstream user of numpy and observer of the mailing list for a few years, I think the gradual improvements have gone down pretty well. At least I haven't seen any mayor complaints on the mailing list.

You are an *active* user of NumPy. Your perspective is valuable, but it is one of many perspectives in the user community. What is missing in this discussion is the 100's of thousands of users of NumPy who never comment on this mailing list and won't. There are many that have not moved from 1.5.1 yet. I hope your optimism is correct about how difficult it will be to upgrade for them. As long as I hold any influence at all on the NumPy project, I will argue and fight on behalf of those users to the best that I can understand their perspective.

oops, my working version

...

...
...
np.__version__ '1.5.1'

I'm testing and maintaining statsmodels compatibility from numpy 1.4.1 and scipy 0.7.2 to the current released versions (with a compat directory). statsmodels dropped numpy 1.3 support, because I didn't want to give up using numpy.polynomial. Most of the 100,000s of numpy users that never show up on the mailing list won't worry much about most changes, because package managers and binary builders and developers of application packages take care of most of it. When I use matplotlib, I don't care whether it uses masked arrays, or other array types internally (and rely on Benjamin and others to represent matplotlib usage/users). Wes is recommending users to use the pandas API to insulate them from changes in numpy's datetimes.

...

...
For me, the big problem was numpy 1.4.0 where several packages where not available because of binary compatibility, NaN's didn't concern me much, current incomplete transition to new MinGW and gcc is currently a bit of a problem.

It is *much*, *much* easier to create binaries of downstream packages than to re-write APIs. I still think we would be better off to remove the promise of ABI compatibility in every .X release (perhaps we hold ABI compatibility for 2 releases). However, we should preserve API compatibility for every release.

freeze the API wherever it got by "historical accident"?

...

...
Purely as an observer, my impression was also that the internal numpy c source cleanup, started by David C., I guess, didn't cause any big problems that would have created lots of complaints on the numpy mailing list.

David C spent a lot of time ensuring his changes did not alter the compiling experience or the run-time experience of users of NumPy. This was greatly appreciated. Lack of complaints on the mailing list is not the metric we should be using. Most users will never comment on this list --- especially given how hard we've made it for people to feel like they will be listened to.

I think for some things, questions and complaints on the mailing list or stackoverflow is a very good metric. My reason to appreciate David's work, is reflected in that the number of installation issues on Windows has disappeared from the mailing list. I just easy_installed numpy into a virtualenv without any problems at all (it just worked), which was the last issue on Windows that I know of (last seen on stackoverflow). easy_installing scipy into a virtualenv almost worked (needed some help).

...

We have to think about the implications of our changes on existing users.

Yes, Josef

...

-Travis

...
Josef

...
Josef

...
-- Travis Oliphant (on a mobile) 512-826-7480

On Jun 25, 2012, at 7:01 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

On Mon, Jun 25, 2012 at 4:21 PM, Perry Greenfield <perry@stsci.edu> wrote:

...
On Jun 25, 2012, at 3:25 PM, Charles R Harris wrote:

...
On Mon, Jun 25, 2012 at 11:56 AM, Perry Greenfield <perry@stsci.edu> wrote:

It's hard to generalize that much here. There are some areas in what you say is true, particularly if whole industries rely on libraries that have much time involved in developing them, and for which it is particularly difficult to break away. But there are plenty of other areas where it isn't that hard.

I'd characterize the process a bit differently. I would agree that it is pretty hard to get someone who has been using matlab or IDL for many years to transition. That doesn't happen very often (if it does, it's because all the other people they work with are using a different tool and they are forced to). I think we are targeting the younger people; those that do not have a lot of experience tied up in matlab or IDL. For example, IDL is very well established in astronomy, and we've seen few make that switch if they already have been using IDL for a while. But we are seeing many more younger astronomers choose Python over IDL these days.

I didn't bring up the Astronomy experience, but I think that is a special case because it is a fairly small area and to some extent you had the advantage of a supported center, STSci, maintaining some software. There are also a lot of amateurs who can appreciate the low costs and simplicity of Python.

The software engineers use tends to be set early, in college or in their first jobs. I suspect that these days professional astronomers spend a number of years in graduate school where they have time to experiment a bit. That is a nice luxury to have.

Sure. But it's not unusual for an invasive technology (that's us) to take root in certain niches before spreading more widely.

Another way of looking at such things is: is what we are seeking to replace that much worse? If the gains are marginal, then it is very hard to displace. But if there are significant advantages, eventually they will win through. I tend to think Python and the scientific stack does offer the potential for great advantages over IDL or matlab. But that doesn't make it easy.

I didn't say we couldn't make inroads. The original proposition was that we needed a polynomial class compatible with Matlab. I didn't think compatibility with Matlab mattered so much in that case because not many people switch, as you have agreed is the case, and those who start fresh, or are the adventurous sort, can adapt without a problem. In other words, IMHO, it wasn't a pressing issue and could be decided on the merits of the interface, which I thought of in terms of series approximation. In particular, it wasn't a 'gratuitous' choice as I had good reasons to do things the way I did.

Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Travis Oliphant

3:13 a.m.

...

...
That's a nice argument for a different convention, really it is. It's not enough for changing a convention that already exists. Now, the polynomial object could store coefficients in this order, but allow construction with the coefficients in the standard convention order. That would have been a fine compromise from my perspective.

I'm much happier with the current solution. As long as I stick with the np.polynomial classes, I don't have to *think* about coefficient order. With a hybrid I would always have to worry about whether this animal is facing front or back.

I don't think you would have to worry about it at all. It would just be an interface you personally wouldn't ever call. In other words, you just provide the option for someone else to specify their *input* arrays in reverse order. You could keep them stored in this "natural order" just as they are now.

...

I wouldn't mind if the old order is eventually deprecated and dropped.

(Another example: NIST polynomial follow the new order, 2nd section http://jpktd.blogspot.ca/2012/03/numerical-accuracy-in-linear-least.html no [::-1] in the second version.)

Thanks for providing the additional references. I do recognize that the convention is in use elsewhere.

...

...
It is *much*, *much* easier to create binaries of downstream packages than to re-write APIs. I still think we would be better off to remove the promise of ABI compatibility in every .X release (perhaps we hold ABI compatibility for 2 releases). However, we should preserve API compatibility for every release.

freeze the API wherever it got by "historical accident"?

Not quite. You can add new and different APIs. You just can't change old ones. You also have to be careful about changes that break the implied but not specified code contract of current users. Even the strategy of "deprecating APIs needs to be used very judiciously and occasionally. We can deprecate APIs but can't remove them for several releases --- say 4 or 5. You are correct, I'm concerned about users that have built additional packages on top of NumPy. Some of these we know about, many of them we don't know about --- as they are in internal systems. Many users are shielded from NumPy changes by other APIs, this is an avenue of exploration that can and will continue. We aren't there yet, though, and I don't think the "plans for NumPy change" have previously considered enough the impact on users of NumPy. Thank for your voicing your comments and perspective. -Travis

Ralf Gommers

6:49 p.m.

On Thu, Jun 21, 2012 at 5:25 PM, Travis Oliphant <travis@continuum.io>wrote:

...

I thought it was clear we were doing a 1.7 release before SciPy. It seems pretty urgent that we get something out sooner than later. I know there is never enough time to do all the things we want to do.

There is time before the first Release candidate to make changes on the 1.7.x branch. If you want to make the changes on master, and just indicate the Pull requests, Ondrej can make sure they are added to the 1.7.x. branch by Monday. We can also delay the first Release Candidate by a few days to next Wednesday and then bump everything 3 days if that will help. There will be a follow-on 1.8 release before the end of the year --- so there is time to make changes for that release as well. The next release will not take a year to get out, so we shouldn't feel pressured to get *everything* in this release.

What about http://projects.scipy.org/numpy/ticket/2108? Someone needs to at least answer the question of how much of datetime is unusable on Windows with the current code. If that's not a lot then perhaps this is not a blocker, but we did consider it one until now..... Of the other tickets (http://projects.scipy.org/numpy/report/3) it would also be good to get an assessment of which ones are critical. Perhaps none of them are and the branch is in good shape for a release, but some of those segfaults would be nice to have fixed. Debian multi-arch support too, as discussed on this list recently. Ralf

Wes McKinney

7:31 p.m.

On Thu, Jun 21, 2012 at 2:49 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:

...

On Thu, Jun 21, 2012 at 5:25 PM, Travis Oliphant <travis@continuum.io> wrote:

...
I thought it was clear we were doing a 1.7 release before SciPy. It seems pretty urgent that we get something out sooner than later. I know there is never enough time to do all the things we want to do.

There is time before the first Release candidate to make changes on the 1.7.x branch. If you want to make the changes on master, and just indicate the Pull requests, Ondrej can make sure they are added to the 1.7.x. branch by Monday. We can also delay the first Release Candidate by a few days to next Wednesday and then bump everything 3 days if that will help. There will be a follow-on 1.8 release before the end of the year --- so there is time to make changes for that release as well. The next release will not take a year to get out, so we shouldn't feel pressured to get *everything* in this release.

What about http://projects.scipy.org/numpy/ticket/2108? Someone needs to at least answer the question of how much of datetime is unusable on Windows with the current code. If that's not a lot then perhaps this is not a blocker, but we did consider it one until now.....

pandas has become a heavy consumer of datetime64 recently, and we haven't had any issues using VS2003 and VS2008, but haven't tested heavily against NumPy compiled with mingw outside of the version shipped in Enthought Python Distribution (the test suite passes fine, last time I checked).

...

Of the other tickets (http://projects.scipy.org/numpy/report/3) it would also be good to get an assessment of which ones are critical. Perhaps none of them are and the branch is in good shape for a release, but some of those segfaults would be nice to have fixed. Debian multi-arch support too, as discussed on this list recently.

Ralf

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Ralf Gommers

7:53 p.m.

On Thu, Jun 21, 2012 at 9:31 PM, Wes McKinney <wesmckinn@gmail.com> wrote:

...

On Thu, Jun 21, 2012 at 2:49 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:

...
On Thu, Jun 21, 2012 at 5:25 PM, Travis Oliphant <travis@continuum.io> wrote:

...
I thought it was clear we were doing a 1.7 release before SciPy. It seems pretty urgent that we get something out sooner than later. I

know

...
...
there is never enough time to do all the things we want to do.

There is time before the first Release candidate to make changes on the 1.7.x branch. If you want to make the changes on master, and just indicate the Pull requests, Ondrej can make sure they are added to the 1.7.x. branch by Monday. We can also delay the first Release Candidate by a few days to next Wednesday and then bump everything 3 days if that will help. There will be a follow-on 1.8 release before the end of the year --- so there is time to make changes for that release as well. The next release will not take a year to get out, so we shouldn't feel pressured to get *everything* in this release.

What about http://projects.scipy.org/numpy/ticket/2108? Someone needs to at least answer the question of how much of datetime is unusable on Windows with the current code. If that's not a lot then perhaps this is not a blocker, but we did consider it one until now.....

pandas has become a heavy consumer of datetime64 recently, and we haven't had any issues using VS2003 and VS2008, but haven't tested heavily against NumPy compiled with mingw outside of the version shipped in Enthought Python Distribution (the test suite passes fine, last time I checked).

Thanks Wes. It's indeed a MinGW-specific issue. EPD ships MinGW 4.5.2, which should work but has issues when producing binary installers that aren't yet resolved AFAIK. David C. last reported on that a few months ago that he didn't see an easy solution. All releases until now have been done with MinGW 3.4.5, which has a datetime problem. So we still need a confirmation about whether current issues with 3.4.5 are acceptable, or we need a fix or another way of creating binaries. Ralf

...

...
Of the other tickets (http://projects.scipy.org/numpy/report/3) it would also be good to get an assessment of which ones are critical. Perhaps none of them are and the branch is in good shape for a release, but some of those segfaults would be nice to have fixed. Debian multi-arch support too, as discussed on this list recently.

Ralf

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Ondřej Čertík

7:36 a.m.

On Thu, Jun 21, 2012 at 3:11 AM, Travis Oliphant <travis@continuum.io> wrote:

...

Hey all,

I made a branch called with_maskna and then merged Nathaniel's PR which removes the mask_na support from master. I then applied a patch to fix the boolean indexing problem reported by Ralf.

I then created a NumPy 1.7.x maintenance branch from which the release of NumPy 1.7 will be made. Ondrej Certik and I will be managing the release of NumPy 1.7. Ondrej is the author of SymPy and has agreed to help get NumPy 1.7 out the door. Thanks, Ondrej for being willing to help in this way.

In principal only bug-fixes should be pushed to the NumPy 1.7 branch at this point. The target is to make a release of NumPy 1.7.x by July 9th. The schedule we will work for is:

RC1 -- June 25 RC2 -- July 5 Release -- July 13

I worked on the release notes: https://github.com/numpy/numpy/pull/318 Please let me know if you think that I forgot some important feature or if you have any suggestions for improvement. If it looks pretty good, then I will start testing NumPy against packages (https://github.com/numpy/numpy/issues/319). Ondrej

4622

Age (days ago)

4629

Last active (days ago)

List overview

Download

79 comments

16 participants

participants (16)

Andrea Gavana
Benjamin Root
Charles R Harris
Dag Sverre Seljebotn
David Cournapeau
Fernando Perez
Jason Grout
John Hunter
josef.pktd＠gmail.com
Ondřej Čertík
Perry Greenfield
Pierre Haessig
Ralf Gommers
Thouis (Ray) Jones
Travis Oliphant
Wes McKinney