![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
Hi All, There several problems with numpy master that need to be fixed before a release can be considered. 1. Datetime on windows with mingw. 2. Bus error on SPARC, ticket #2076. 3. NA and real/complex views of complex arrays. Number 1 has been proved to be particularly difficult, any help or suggestions for that would be much appreciated. The current work has been going in pull request 214 <https://github.com/numpy/numpy/pull/214>. This isn't to say that there aren't a ton of other things that need fixing or that we can skip out on the current stack of pull requests, but I think it is impossible to consider a release while those three problems are outstanding. Chuck
![](https://secure.gravatar.com/avatar/3d3176cf99cae23d0ac119d1ea6c4d11.jpg?s=120&d=mm&r=g)
On Sat, Mar 24, 2012 at 10:13 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
There's one more ticket that hasn't been looked at AFAIK and that has been keeping the buildbots (except the Linux one) on red: http://projects.scipy.org/numpy/ticket/1755 (floating point errors). The other tickets have been looked at and either have PRs already or should be fixable with not too much effort. Ralf
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Sun, Mar 25, 2012 at 10:33 AM, Ralf Gommers <ralf.gommers@googlemail.com>wrote:
I don't know what to do about that one.There may be some compiler flags that would help. The other tickets have been looked at and either have PRs already or should
be fixable with not too much effort.
Chuck
![](https://secure.gravatar.com/avatar/3d3176cf99cae23d0ac119d1ea6c4d11.jpg?s=120&d=mm&r=g)
On Sat, Mar 24, 2012 at 10:13 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
Why do you consider (2) a blocker? Not saying it's not important, but there are eight other open tickets with segfaults. Some are more esoteric than other, but I don't see why for example #1713 and #1808 are less important than this one. #1522 provides a patch that fixes a segfault by the way, could use a review. Ralf
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Sun, Mar 25, 2012 at 3:14 PM, Ralf Gommers <ralf.gommers@googlemail.com>wrote:
I wasn't aware of the other segfaults, I'd like to get them all fixed... The list was meant to elicit additions. I don't know where the missed floating point errors come from, but they are somewhat dependent on the compiler doing the right thing and hardware support. I'd welcome any insight into why we get them on SPARC (underflow) and Windows (overflow). The windows buildbot doesn't seem to be updating correctly since it is still missing the combinations method that is now part of the test module. Chuck
![](https://secure.gravatar.com/avatar/3d3176cf99cae23d0ac119d1ea6c4d11.jpg?s=120&d=mm&r=g)
On Mon, Mar 26, 2012 at 1:27 AM, Charles R Harris <charlesr.harris@gmail.com
wrote:
8. Search for "segmentation fault" or "bus error" in Trac. I would hope these have a high priority to get fixed but, unless they're backwards compatibility issues, I don't consider them blockers. For the simple reason that then we'd never be able to ship any release. I don't know where the missed floating point errors come from, but they are
The windows buildbot doesn't seem to be updating correctly since it is still missing the combinations method that is now part of the test module.
Yeah, none of them are updating, it's a pain. We'll hopefully soon be able to switch to a shiny new one. Ralf
![](https://secure.gravatar.com/avatar/6c8561779fff34c62074c614d19980fc.jpg?s=120&d=mm&r=g)
There is an issue with the NumPy 1.7 release that we all need to understand. Doesn't including the missing-data attributes in the NumPy structure in a released version of NumPy basically commit to including those attributes in NumPy 1.8? I'm not comfortable with that, is everyone else? One possibility is to move those attributes to a C-level sub-class of NumPy. I have heard from a few people that they are not excited by the growth of the NumPy data-structure by the 3 pointers needed to hold the masked-array storage. This is especially true when there is talk to potentially add additional attributes to the NumPy array (for labels and other meta-information). If you are willing to let us know how you feel about this, please speak up. Mark Wiebe will be in Austin for about 3 months. He and I will be hashing some of this out in the first week or two. We will present any proposal and ask questions to this list before acting. We will be using some phone calls and face-to-face communications to increase the bandwidth and speed of the conversations (not to exclude anyone). If you would like to be part of the in-person discussions let me know -- or just make your views known here --- they will be taken seriously. The goal is consensus for any major change in NumPy. If we can't get consensus, then we vote on this list and use a super-majority. If we can't get a super-majority, then except in rare circumstances we can't move forward. Heavy users of NumPy get higher voting privileges. My perspective is that we don't have consensus on the current additions to the NumPy data-structure to have the current additional attributes on the NumPy data-structure be included for long-term release. Best, -Travis On Mar 25, 2012, at 6:27 PM, Charles R Harris wrote:
![](https://secure.gravatar.com/avatar/3d3176cf99cae23d0ac119d1ea6c4d11.jpg?s=120&d=mm&r=g)
On Tue, Apr 17, 2012 at 12:06 AM, Travis Oliphant <travis@continuum.io>wrote:
We clearly labeled NA as experimental, so some changes are to be expected. But not complete removal - so yes, if we release them they should stay in some form.
I'm not comfortable with that, is everyone else? One possibility is to move those attributes to a C-level sub-class of NumPy.
That's the first time I've heard this. Until now, we have talked a lot about adding bitmasks and API changes, not about complete removal. My assumption was that the experimental label was enough. From Nathaniel's reaction I gathered the same. It looks like too many conversations on this topic are happening off-list. Ralf
![](https://secure.gravatar.com/avatar/95198572b00e5fbcd97fb5315215bf7a.jpg?s=120&d=mm&r=g)
On Mon, Apr 16, 2012 at 3:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
My impression was that Travis was just suggesting that as an option here for discussion, not presenting it as something discussed elsewhere. I read Travis' email precisely as restarting the discussion for consideration of the issues in full public view (+ calls/skype open to anyone interested for bandwidth purposes), so in this case I don't think there's any background off-list to worry about. At least that's how I read it... Cheers, f
![](https://secure.gravatar.com/avatar/6c8561779fff34c62074c614d19980fc.jpg?s=120&d=mm&r=g)
No off list discussions have been happening material to this point. I am basically stating my view for the first time. I have delayed because I realize it is not a pleasant view and I was hoping I could end up resolving it favorably. But, it needs to be discussed before 1.7 is released. -- Travis Oliphant (on a mobile) 512-826-7480 On Apr 16, 2012, at 5:27 PM, Fernando Perez <fperez.net@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/3d3176cf99cae23d0ac119d1ea6c4d11.jpg?s=120&d=mm&r=g)
On Tue, Apr 17, 2012 at 12:27 AM, Fernando Perez <fperez.net@gmail.com>wrote:
From "I have heard from a few people that they are not excited ...." I deduce it was discussed to some extent.
I read Travis' email precisely as restarting the
discussion for consideration of the issues in full public view
It wasn't restating anything, it's completely opposite to the part that I thought we did reach consensus on (*not* backing out changes). I stated as much when first discussing a 1.7.0 in December, http://thread.gmane.org/gmane.comp.python.numeric.general/47022/focus=47027, with no one disagreeing. It's perfectly fine to reconsider any previous decisions/discussions of course. However, I do now draw the conclusion that it's best to wait for this issue to be resolved before considering a new release. I had been working on closing tickets and cleaning up loose ends for 1.7.0, and pinging others to do the same. I guess I'll stop doing that for now, until the renewed NA debate has been settled. If there are bug fixes that are important (like the Debian segfaults with Python debug builds), we can do a 1.6.2 release. Ralf (+
![](https://secure.gravatar.com/avatar/6c8561779fff34c62074c614d19980fc.jpg?s=120&d=mm&r=g)
The comments I have heard have been from people who haven't wanted to make them on this list. I wish they would, but I understand that not everyone wants to be drawn into a long discussion. They have not been discussions. My bias is to just move forward with what is there. After a week or two of discussion, I expect that we will resolve this one way or another. The result be to just move forward as previously planned. However, that might not be the best move forward either. These are significant changes and they do impact users. We need to understand those implications and take very seriously any concerns from users. There is time to look at this carefully. We need to take the time. I am really posting so that the discussions Mark and I have this week (I haven't seen Mark since PyCon) can be productive with as many other people participating as possible. -- Travis Oliphant (on a mobile) 512-826-7480 On Apr 16, 2012, at 6:01 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Mon, Apr 16, 2012 at 5:17 PM, Travis Oliphant <travis@continuum.io>wrote:
I would suggest the you and Mark have a good talk first, then report here with some specifics that you think need discussion, along with specifics from the unnamed sources. The somewhat vague "some say" doesn't help much and in the absence of specifics the discussion is likely to proceed along the same old lines if it happens at all. Meanwhile there is a disturbance in the force that makes us all uneasy. Chuck
![](https://secure.gravatar.com/avatar/6c8561779fff34c62074c614d19980fc.jpg?s=120&d=mm&r=g)
Ralf, I wouldn't change your plans just yet for NumPy 1.7. With Mark available full time for the next few weeks, I think we will be able to make rapid progress on whatever is decided -- in fact if people are available to help but just need resources let me know off list. I just want to make sure that the process for making significant changes to NumPy does not dis-enfranchise any voice. Like bug-reports, and feature-requests, complaints are food to a project, just like usage is oxygen. In my view, we should take any concern that is raised from the perspective of NumPy is "guilty until proven innocent." This takes some intentional effort. I have found that because of how much work it takes to design and implement software, my natural perspective is to be defensive, but I have always appreciated the outcome when all view-points are considered seriously and addressed respectfully. Best regards, -Travis On Apr 16, 2012, at 6:01 PM, Ralf Gommers wrote:
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
Hi, On Mon, Apr 16, 2012 at 3:06 PM, Travis Oliphant <travis@continuum.io> wrote:
I guess there are two questions here 1) Will something like the current version of masked arrays have a long term future in numpy, regardless of eventual API? Most likely answer - yes? 2) Will likely changes to the masked array API make any difference to the number of extra pointers? Likely answer no? Is that right? I have the impression that the masked array API discussion still has not come out fully into the unforgiving light of discussion day, but if the answer to 2) is No, then I suppose the API discussion is not relevant to the 3 pointers change. See y'all, Matthew
![](https://secure.gravatar.com/avatar/6c8561779fff34c62074c614d19980fc.jpg?s=120&d=mm&r=g)
On Apr 16, 2012, at 8:03 PM, Matthew Brett wrote:
I think the answer to this is yes, but it could be as a feature-filled sub-class (like the current numpy.ma, except in C).
The answer to this is very likely no on the Python side. But, on the C-side, their could be some differences (i.e. are masked arrays a sub-class of the ndarray or not).
You are correct that the API discussion is separate from this one. Overall, I was surprised at how fervently people would oppose ABI changes. As has been pointed out, NumPy and Numeric before it were not really designed to prevent having to recompile when changes were made. I'm still not sure that a better overall solution is not to promote better availability of downstream binary packages than excessively worry about ABI changes in NumPy. But, that is the current climate. In that climate, my concern is that we haven't finalized the API but are rapidly cementing the *structure* of NumPy arrays into a modified form that has real downstream implications. Two other people I have talked to share this concern (nobody who has posted on this list before but who are heavy users of NumPy). I may have missed the threads where it was discussed, but have these structure changes and their implications been fully discussed? Is there anyone else who is concerned about adding 3 more pointers (12 bytes or 24 bytes) to the NumPy structure? As Chuck points out, 3 more pointers is not necessarily that big of a deal if you are talking about a large array (though for small arrays it could matter). But, I personally know of half-written NEPs that propose to add more pointers to the NumPy array: * to allow meta-information to be attached to a NumPy array * to allow labels to be attached to a NumPy array (ala data-array) * to allow multiple chunks for an array. Are people O.K. with 5 or 6 more pointers on every NumPy array? We could also think about adding just one more pointer to a new "enhanced" structure that contains multiple enhancements to the NumPy array. But, this whole line of discussion sounds a lot like a true sub-class of the NumPy array at the C-level. It has the benefit that only people that use the features of the sub-class have to worry about using the extra space. Mark and I will talk about this long and hard. Mark has ideas about where he wants to see NumPy go, but I don't think we have fully accounted for where NumPy and its user base *is* and there may be better ways to approach this evolution. If others are interested in the outcome of the discussion please speak up (either on the list or privately) and we will make sure your views get heard and accounted for. Best regards, -Travis
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
Hi, On Mon, Apr 16, 2012 at 7:46 PM, Travis Oliphant <travis@continuum.io> wrote:
I'd love to hear that argument fleshed out in more detail - do you have time?
The objectors object to any binary ABI change, but not specifically three pointers rather than two or one? Is their point then about ABI breakage? Because that seems like a different point again. Or is it possible that they are in fact worried about the masked array API?
Mark and I will talk about this long and hard. Mark has ideas about where he wants to see NumPy go, but I don't think we have fully accounted for where NumPy and its user base *is* and there may be better ways to approach this evolution. If others are interested in the outcome of the discussion please speak up (either on the list or privately) and we will make sure your views get heard and accounted for.
I started writing something about this but I guess you'd know what I'd write, so I only humbly ask that you consider whether it might be doing real damage to allow substantial discussion that is not documented or argued out in public. See you, Matthew
![](https://secure.gravatar.com/avatar/6c8561779fff34c62074c614d19980fc.jpg?s=120&d=mm&r=g)
I think the answer to this is yes, but it could be as a feature-filled sub-class (like the current numpy.ma, except in C).
I'd love to hear that argument fleshed out in more detail - do you have time?
My proposal here is to basically take the current github NumPy data-structure and make this a sub-type (in C) of the NumPy 1.6 data-structure which is unchanged in NumPy 1.7. This would not require removing code but would require another PyTypeObject and associated structures. I expect Mark could do this work in 2-4 weeks. We also have other developers who could help in order to get the sub-type in NumPy 1.7. What kind of details would you like to see? In this way, the masked-array approach to missing data could be pursued by those who prefer that approach without affecting any other users of numpy arrays (and the numpy.ma sub-class could be deprecated). I would also like to add missing-data dtypes (ideally before NumPy 1.7, but it is not a requirement of release). I just think we need more data and uses and this would provide a way to get that without making a forced decision one way or another.
Adding pointers is not really an ABI change (but removing them after they were there would be...) It's really just the addition of data to the NumPy array structure that they aren't going to use. Most of the time it would not be a real problem (the number of use-cases where you have a lot of small NumPy arrays is small), but when it is a problem it is very annoying.
Is their point then about ABI breakage? Because that seems like a different point again.
Yes, it's not that.
Or is it possible that they are in fact worried about the masked array API?
I don't think most people whose opinion would be helpful are really tuned in to the discussion at this point. I think they just want us to come up with an answer and then move forward. But, they will judge us based on the solution we come up with.
It will be documented and argued in public. We are just going to have one off-list conversation to try and speed up the process. You make a valid point, and I appreciate the perspective. Please speak up again after hearing the report if something is not clear. I don't want this to even have the appearance of a "back-room" deal. Mark and I will have conversations about NumPy while he is in Austin. There are many other active stake-holders whose opinions and views are essential for major changes. Mark and I are working on other things besides just NumPy and all NumPy changes will be discussed on list and require consensus or super-majority for NumPy itself to change. I'm not sure if that helps. Is there more we can do? Thanks, -Travis
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
Hi, On Mon, Apr 16, 2012 at 8:40 PM, Travis Oliphant <travis@continuum.io> wrote:
I was dimly thinking of the same questions that Chuck had - about how subclassing would relate to the ufunc changes.
I just think we need more data and uses and this would provide a way to get that without making a forced decision one way or another.
Is the proposal that this would be an alternative API to numpy.ma? Is numpy.ma not itself satisfactory as a test of these uses, because of performance or some other reason?
As you might have heard me say before, my concern is that it has not been easy to have good discussions on this list. I think the problem has been that is has not been clear what the culture was, and how decisions got made, and that had led to some uncomfortable and unhelpful discussions. My plea would be for you as BDF$N to strongly encourage on-list discussions and discourage off-list discussions as far as possible, and to help us make the difficult public effort to bash out the arguments to clarity and consensus. I know that's a big ask. See you, Matthew
![](https://secure.gravatar.com/avatar/6c8561779fff34c62074c614d19980fc.jpg?s=120&d=mm&r=g)
On Apr 16, 2012, at 11:59 PM, Matthew Brett wrote:
Basically, there are two sets of changes as far as I understand right now: 1) ufunc infrastructure understands masked arrays 2) ndarray grew attributes to represent masked arrays I am proposing that we keep 1) but change 2) so that only certain kinds of NumPy arrays actually have the extra function pointers (effectively a sub-type). In essence, what I'm proposing is that the NumPy 1.6 PyArrayObject become a base-object, but the other members of the C-structure are not even present unless the Masked flag is set. Such changes would not require ripping code out --- just altering the presentation a bit. Yet, they could have large long-term implications, that we should explore before they get fixed. Whether masked arrays should be a formal sub-class is actually an un-related question and I generally lean in the direction of not encouraging sub-classes of the ndarray. The big questions are does this object work in the calculation infrastructure. Can I add an array to a masked array. Does it have a sum method? I think it could be argued that a masked array does have a "is a" relationship with an array. It can also be argued that it is better to have a "has a" relationship with an array and be-it's own-object. Either way, this object could still have it's first-part be binary compatible with a NumPy Array, and that is what I'm really suggesting. -Travis
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Tue, Apr 17, 2012 at 6:44 AM, Travis Oliphant <travis@continuum.io> wrote:
It sounds like the main implementation issue here is that this masked array class needs some way to coordinate with the ufunc infrastructure to efficiently and reliably handle the mask in calculations. The core ufunc code now knows how to handle masks, and this functionality is needed for where= and NA-dtypes, so obviously it's staying, independent of what we decide to do with masked arrays. So the question is just, how do we get the masked array and the ufuncs talking to each other so they can do the right thing. Perhaps we should focus, then, on how to create a better hooking mechanism for ufuncs? Something along these lines? http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html If done in a solid enough way, this would also solve other problems, e.g. we could make ufuncs work reliably on sparse matrices, which seems to trip people up on scipy-user every month or two. Of course, it's very tricky to get right :-( As far the masked array API: I'm still not convinced we know how we want these things to behave. The masked arrays in master currently implement MISSING semantics, but AFAICT everyone who wants MISSING semantics prefers NA-dtypes or even plain old NaN's over a masked implementation. And some of the current implementation's biggest backers, like Chuck, have argued that they should switch to skipNA=True, which is more of an IGNORED-style semantic. OTOH, there's still disagreement over how IGNORED-style semantics should even work (I'm thinking of that discussion about commutivity). The best existing model is numpy.ma -- but the numpy.ma API is quite different from the NEP, in more ways than just the default setting for skipNA. numpy.ma uses the opposite convention for mask values, it has additional concepts like the fillvalue, hardmask versus softmask, and then there's the whole way the NEP uses views to manage the mask. And I don't know which of these numpy.ma features are useful, which are extraneous, and which are currently useful but will become extraneous once the users who really wanted something more like NA-dtypes switch to those. So we all agree that masked arrays can be useful, and that numpy.ma has problems. But straightforwardly porting numpy.ma to C doesn't seem like it would help much, and neither does simply declaring that numpy.ma has been deprecated in favour of a new NEP-like API. So, I dunno. It seems like it might make the most sense to: 1) take the mask fields out of the core ndarray (while leaving the rest of Mark's infrastructure, as per above) 2) make sure we have the hooks needed so that numpy.ma, and NEP-like APIs, and whatever other experiments people want to try, can all integrate well with ufuncs, and make any other extensions that are generally useful and required so that they can work well 3) once we've experimented, move the winner into the core. Or whatever else makes sense to do once we understand what we're trying to accomplish. -- Nathaniel
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Tue, Apr 17, 2012 at 5:59 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi Matthew, As you know, I agree with everything you just said :-). So in interest of transparency, I should add: I have been in touch with Travis some off-list, and the main topic has been how to proceed in a way that let's us achieve public consensus. -- Nathaniel
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
Hi, On Tue, Apr 17, 2012 at 7:24 AM, Nathaniel Smith <njs@pobox.com> wrote:
I'm glad to hear that discussion is happening, but please do have it on list. If it's off list it easy for people to feel they are being bypassed, and that the public discussion is not important. So, yes, you might get a better outcome for this specific case, but a worse outcome in the long term, because the list will start to feel that it's for signing off or voting rather than discussion, and that - I feel sure - would lead to worse decisions. The other issue is that there's a reason you are having the discussion off-list - which is that it was getting difficult on-list. But - again - a personal view - that really has to be addressed directly by setting out the rules of engagement and modeling the kind of discussion we want to have. Cheers, Matthew
![](https://secure.gravatar.com/avatar/7e9e53dbe9781722d56e308c32387078.jpg?s=120&d=mm&r=g)
On 04/17/2012 08:40 AM, Matthew Brett wrote:
...when possible without paralysis.
I think you are over-stating the case a bit. Taking what you say literally, one might conclude that numpy people should never meet and chat, or phone each other up and chat. But such small conversations are an important extension and facilitator of individual thinking. Major decisions do need to get hashed out publicly, but mailing list discussions are only one part of the thinking and decision process. Eric
![](https://secure.gravatar.com/avatar/95198572b00e5fbcd97fb5315215bf7a.jpg?s=120&d=mm&r=g)
On Tue, Apr 17, 2012 at 11:40 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
I'm afraid I have to disagree: you seem to be proposing an absolute, 'zero-tolerance'-style policy against any off-list discussion. The only thing ZT policies achieve is to remove common sense and human judgement from a process, invariably causing more harm than they do good, no matter how well intentioned. There are perfectly reasonable cases where a quick phone call may be a more effective and sensible way to work than an on-list discussion. The question isn't whether someone, somewhere, had an off-list discussion or not; it's whether *the main decision making process* is being handled transparently or not. I trust that Nathaniel and Travis had a sensible reason to speak off-list; as long as it appears clear that the *decisions about numpy* are being made via public discussion with room for all necessary input and confrontation of disparate viewpoints, I don't care what they talk about in private. In IPython, I am constantly fielding private emails that I very often refer to the list because they make more sense there, but I also have off-list discussions when I consider that to be the right thing to do. And I certainly hope nobody ever asks me to *never* have an off-list discussion. I try very hard to ensure that the direction of the project is very transparent, with redundant points (people) of access to critical resources and a good vetting of key decisions with public input (e.g. our first IPEP at https://github.com/ipython/ipython/issues/1611). If I am failing at that, I hope people will call me out *on that point*, but not on whether I ever pick up the phone or email to talk about IPython off-list. Let's try to trust for one minute that the actual decisions will be made here with solid debate and project-wide input, and seek change only if we have evidence that this isn't happening (not evidence of a meta-problem that isn't a problem here). Best, f
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
Hi, On Tue, Apr 17, 2012 at 12:04 PM, Fernando Perez <fperez.net@gmail.com> wrote:
Right - but that would be an absurd overstatement of what I said. There's no point in addressing something I didn't say and no sensible person would think. Indeed, it makes the discussion harder. It's just exhausting to have to keep stating the obvious. Of course discussions happen off-list. Of course sometimes that has to happen. Of course that can be a better and quicker way of having discussions. However, in this case the
meta-problem that is a real problem is that we've shown ourselves that we are not currently good at having discussions on list. There are clearly reasons for that, and also clearly, they can be addressed. The particular point I am making is neither silly nor extreme nor vapid. It is simply that, in order to make discussion work better on the list, it is in my view better to make an explicit effort to make the discussions - explicit. Yours in Bay Area opposition, Matthew
![](https://secure.gravatar.com/avatar/95198572b00e5fbcd97fb5315215bf7a.jpg?s=120&d=mm&r=g)
On Tue, Apr 17, 2012 at 12:10 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Well, in that case neither Eric Firing nor I are 'sensible persons', since that's how we both understood what you said (Eric's email appeared to me as a more concise/better phrased version of the same points I was making). You said: """ I'm glad to hear that discussion is happening, but please do have it on list. If it's off list it easy for people to feel they are being bypassed, and that the public discussion is not important. """ I don't think it's an 'absurd overstatement' to interpret that as "don't have discussions off-list", but hey, it may just be me :)
meta-problem that is a real problem is that we've shown ourselves that we are not currently good at having discussions on list.
Oh, I know that did happen in the past regarding this very topic (the big NA mess last summer); what I meant was to try and trust that *this time around* things might be already moving in a better direction, which it seems to me they are. It seems to me that everyone is genuinely trying to tackle the discussion/consensus questions head-on right on the list, and that's why I proposed waiting to see if there were really any problems before asking Nathaniel not to have any discussion off-list (esp. since we have no evidence that what they talked about had any impact on any decisions bypassing the open forum). Best, f
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
On Tue, Apr 17, 2012 at 12:32 PM, Fernando Perez <fperez.net@gmail.com> wrote:
The absurd over-statement is the following: " I'm afraid I have to disagree: you seem to be proposing an absolute, 'zero-tolerance'-style policy against any off-list discussion. "
The question - which seems to me to be sensible rational and important - is how to get better at on-list discussion, and whether taking this particular discussion mainly off-list is good or bad in that respect. See you, Matthew
![](https://secure.gravatar.com/avatar/5c9fb379c4e97b58960d74dcbfc5dee5.jpg?s=120&d=mm&r=g)
On Mon, Apr 16, 2012 at 10:40:53PM -0500, Travis Oliphant wrote:
The objectors object to any binary ABI change, but not specifically three pointers rather than two or one?
I think that something that the numpy community must be very careful about is ABI breakage. At the scale of a large and heavy institution, it is very costly. In my mind, this is the argument that should guide the discussion: does going one way of the other (removing NA or not) will lead us likely into ABI breakage ? My 2 cents, Gaël
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Mon, Apr 16, 2012 at 8:46 PM, Travis Oliphant <travis@continuum.io>wrote:
I think making numpy.ma a subclass of ndarray has caused all sorts of trouble. It doesn't satisfy 'is a', rather it tries to use inheritance from ndarray for implementation of various parts. The upshot is that almost everything has to be overridden, so it didn't buy much.
Yes, this whole thing could get out of hand with too many extras. One of the things you could discuss with Mark is how to deal with this, or limit the modifications. At some point the ndarray class could become cumbersome, complicated, and difficult to maintain. We need to be careful that it doesn't go that way. I'd like to keep it as simple as possible, the question is what is fundamental. The main long term advantage of having masks part of the base is the possibility of adapted loops in ufuncs, which would give the advantage of speed. But that is just how it looks from where I stand, no doubt others have different priorities. But, this whole line of discussion sounds a lot like a true sub-class of
Chuck
![](https://secure.gravatar.com/avatar/6c8561779fff34c62074c614d19980fc.jpg?s=120&d=mm&r=g)
On Apr 16, 2012, at 11:01 PM, Charles R Harris wrote:
This is a valid point. One could create a new object that is binary compatible with the NumPy Array but not really a sub-class but provides the array interface. We could keep Mark's modifications to the array interface as well so that it can communicate a mask. -Travis
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Mon, Apr 16, 2012 at 10:38 PM, Travis Oliphant <travis@continuum.io>wrote:
Another place inheritance causes problems is PyUnicodeArrType inheriting from PyUnicodeType. There the difficulty is that the unicode itemsize/encoding may not match between the types. IIRC, it isn't recommended that derived classes change the itemsize. Numpy also has the different byte orderings... The Python types are sort of like virtual classes, so in some sense they are designed for inheritance. We could maybe set up some sort of parallel numpy type system with empty slots and such but we would need to decide what those slots are ahead of time. And if we got really serious, ABI backwards compatibility would break big time. <snip> Chuck
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Mon, Apr 16, 2012 at 7:46 PM, Travis Oliphant <travis@continuum.io> wrote:
As Chuck points out, 3 more pointers is not necessarily that big of a deal if you are talking about a large array (though for small arrays it could matter).
yup -- for the most part, numpy arrays are best for workign with large data sets, in which case a little bit bigger core object doesn't matter. But there are many times that we do want to work with small arrays (particularly ones that are pulled out of a larger array -- iterating over an array or (x,y) points or the like) However, numpy overhead is already pretty heavy for such use, so it may not matter. I recall discossion a couple times in the past of having some special-case numpy arrays for the simple, small cases -- perhaps 1-d or 2-d C-contiguous only, for instance. That might be a better way to address the small-array performance issue, and free us of concerns about minor growth to the core ndarray object. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/95198572b00e5fbcd97fb5315215bf7a.jpg?s=120&d=mm&r=g)
On Fri, Apr 20, 2012 at 9:49 AM, Chris Barker <chris.barker@noaa.gov> wrote:
+1 on that: I once wrote such code in pyrex (years ago) and it worked extremely well for me. No fancy features, very small footprint and highly optimized codepaths that gave me excellent performance. Cheers, f
![](https://secure.gravatar.com/avatar/723b49f8d57b46f753cc4097459cbcdb.jpg?s=120&d=mm&r=g)
Fernando Perez <fperez.net@gmail.com> wrote:
I don't think you gain that much by using a different type though? Those optimized code paths could be plugged into NumPy as well. I always assumed that it would be possible to optimize NumPy, just that nobody invested time in it. Starting from scratch you gain that you don't have to work with and understand NumPy's codebase, but I honestly think that's a small price to pay for compatibility. Dag
-- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
![](https://secure.gravatar.com/avatar/95198572b00e5fbcd97fb5315215bf7a.jpg?s=120&d=mm&r=g)
On Fri, Apr 20, 2012 at 11:27 AM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
I don't think you gain that much by using a different type though? Those optimized code paths could be plugged into NumPy as well.
Could be: this was years ago, and the bottleneck for me was in the constructor and in basic arithmetic. I had to make millions of these vectors and I needed to do basic arithmetic, but they were always 1-d and had one to 6 entries only. So writing a very static constructor with very low overhead did make a huge difference in that project. Also, when I wrote this code numpy didn't exist, I was using Numeric. Perhaps the same results could be obtained in numpy itself with judicious coding, I don't know. But in that project, ~600 lines of really easy pyrex code (it would be cython today) made a *huge* performance difference for me. Cheers, f
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Fri, Apr 20, 2012 at 11:39 AM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
Oh, right. I was thinking "small" as in "fits in L2 cache", not small as in a few dozen entries.
or even two or three entries. I often use a (2,) or (3,) numpy array to represent an (x,y) point (usually pulled out from a Nx2 array). I like it 'cause i can do array math, etc. it makes the code cleaner, but it's actually faster to use tuples and do the indexing by hand :-( but yes, having something built-in, or at least very compatible with numpy would be best. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/1967de10a7e68cd5c06ef2ba06e2301a.jpg?s=120&d=mm&r=g)
On Fri, Apr 20, 2012 at 11:45 AM, Chris Barker <chris.barker@noaa.gov> wrote:
Another example of a small array use-case: I've been using numpy for my research in multi-target tracking, which involves something like a bunch of entangled hidden markov models. I represent target states with small 2d arrays (e.g. 2x2, 4x4, ..) and observations with small 1d arrays (1 or 2 elements). It may be possible to combine a bunch of these small arrays into a single larger array and use indexing to extract views, but it is much cleaner and more intuitive to use separate, small arrays. It's also convenient to use numpy arrays rather than a custom class because I use the linear algebra functionality as well as integration with other libraries (e.g. matplotlib). I also work with approximate probabilistic inference in graphical models (belief propagation, etc), which is another area where it can be nice to work with many small arrays. In any case, I just wanted to chime in with my small bit of evidence for people wanting to use numpy for work with small arrays, even if they are currently pretty slow. If there were a special version of a numpy array that would be faster for cases like this, I would definitely make use of it. Drew
![](https://secure.gravatar.com/avatar/d98bd91ed2cea43056594b7ce5369b17.jpg?s=120&d=mm&r=g)
On 21. apr. 2012, at 00:16, Drew Frank wrote:
Although performance hasn't been a killer for me, I've been using numpy arrays (or matrices) for Mueller matrices [0] and Stokes vectors [1]. These describe the polarization of light and are always 4x1 vectors or 4x4 matrices. It would be nice if my code ran in 1 night instead of one week, although this is still tolerable in my case. Again, just an example of how small-vector/matrix performance can be important in certain use cases. Paul [0] https://en.wikipedia.org/wiki/Mueller_calculus [1] https://en.wikipedia.org/wiki/Stokes_vector
![](https://secure.gravatar.com/avatar/2ed170f366fc7841334af33ece2f9914.jpg?s=120&d=mm&r=g)
I have never found mailing lists good places for discussion and consensus. I think the format itself does not lend itself to involvement, carefully considered (or the ability to change) positions, or voting since all of it can be so easily lost within all of the quoting, the back and forth, people walking away,,,etc. And you also want involvement from people who don't have x hours to craft a finely worded, politically correct, and detailed response. I am not advocating this particular system, but something like http://meta.programmers.stackexchange.com/ would be a better platform for building to a decision when there are many choices to be made. Now about ma, NA, missing... I am just an engineer working in water resources and I had lots of difficulty reading the NEP (so sleeeeepy) so I will be the first to admit that I probably have something wrong. Just for reference (since I missed it the first time around) Nathaniel posted this page at https://github.com/njsmith/numpy/wiki/NA-discussion-status I think that I could adapt to everything that is discussed in the NEP, but I do have some comments about things that puzzled me. I don't need answers, but if I am puzzled maybe others are also. First - 'maskna=True'? Tested on development version of numpy... >>> a = np.arange(10, maskna = True) >>> a[:2] = np.NA >>> a array([NA, NA, 2, 3, 4, 5, 6, 7, 8, 9]) Why do I have to specify 'maskna = True'? If NA and ndarray are intended to be combined in some way, then I don't think that I need this. During development, I understand, but the NEP shouldn't have it. Heck, even if you keep NA and ndarrays separate when someone tries to set a ndarray element with np.NA, instead of a ValueError convert to an NA array. I say that very casually as if I know how to do it. I do have a proof, but the margin is too small to include it. :-) I am torn about 'skipna=True'. I think I understand the desire for explicit behavior, but I suspect that every operation that I would use a NA array for, would require 'skipna=True'. Actually, I don't use that many reducing functions, so maybe not a big deal. Regardless of the skipna setting, a related idea that could be useful for reducing functions is to set an 'includesna' attribute with the returned scalar value. The view() didn't work as described in the NEP, where np.NA isn't propagated back to the original array. This could be because the NEP references a 'missingdata' work in progress branch and I don't know what has been merged. I can force the NEP described behavior if I set 'd.flags.ownmaskna=True'. >>> d = a.view() >>> d array([NA, NA, 2, 3, 4, 5, 6, 7, 8, 9]) >>> d[0] = 4 >>> a array([4, NA, 2, 3, 4, 5, 6, 7, 8, 9]) >>> d array([4, NA, 2, 3, 4, 5, 6, 7, 8, 9]) >>> d[6] = np.NA >>> d array([4, NA, 2, 3, 4, 5, NA, 7, 8, 9]) >>> a array([4, NA, 2, 3, 4, 5, NA, 7, 8, 9]) In the NEP 'Accessing a Boolean Mask' section there is a comment about... actually I don't understand this section at all. Especially about a boolean byle level mask? Why would it need to be a byte level mask in order to be viewed? The logic also of mask = True or False, that can be easily handled by using a better name for the flag. 'mask = True' means that the value is masked (missing), where if 'exposed = True' is used that means the value is not masked (not missing). The biggest question mark to me is that 'a[0] = np.NA' is destructive and (using numpy.ma) 'a.mask[0] = True' is not. Is that a big deal? I am trying to think back on all of my 'ma' code and try to remember if I masked, then unmasked values and I don't recall any time that I did that. Of course my use cases are constrained to what I have done in the past. It feels like a bad idea, for the sake of saving the memory for the mask bits. Now, the amazing thing is that understanding so little, doing even less of the work, I get to vote. Sounds like America! I would really like to see NA in the wild, and I think that I can adapt my ma code to it, so +1. If it has to wait until 1.8, +1. If it has to wait until 1.9, +1. Kindest regards, Tim
![](https://secure.gravatar.com/avatar/082c34b6c1544c177cb7a66609360089.jpg?s=120&d=mm&r=g)
Hi, I just discovered that the NA mask will modify the base ndarray object. So I read about it to find the consequences on our c code. Up to now I have fully read: http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html and partially read: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst https://github.com/njsmith/numpy/wiki/NA-discussion-status In those documents, I see a problem with legacy code that will receive an NA masked array as input. If I missed something, tell me. All our c functions check their inputs array with PyArray_Check and PyArray_ISALIGNED. If the NA mask array is set inside the ndarray c object, our c functions who don't know about it and will treat those inputs as not masked. So the user will have unexpected results. The output will be an ndarray without mask but the code will have used the masked value. This will also happen with all other c code that use ndarray. In our case, all the input check is done at the same place, so adding the check with "PyArray_HasNASupport(PyArrayObject* obj)" to raise an error will be easy for us. But I don't think this is the case for most other c code. So I would prefer a separate object to protect users from code not being updated to reject NA masked inputs. An alternative would be to have PyArray_Check return False for the NA masked array, but I don't like that as this break the semantic that it check for the class. A last option I see would be to make the NPY_ARRAY_BEHAVED flag also check that the array is not an NA marked array. I suppose many c code do this check. But this is not a bullet proof check as not all code (as ours) do not use it. Also, I don't mind the added pointers to the structure as we use big arrays. thanks Frédéric
![](https://secure.gravatar.com/avatar/3d3176cf99cae23d0ac119d1ea6c4d11.jpg?s=120&d=mm&r=g)
On Sat, Mar 24, 2012 at 10:13 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
We've closed a number of open issues and merged some PRs, but haven't made much progress on the issues above. Especially for the NA issues I'm not sure what's going on. Is anyone working on this at the moment? If so, can he/she give an update of things to change/fix and an estimate of how long that will take? Thanks, Ralf
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Mon, Apr 16, 2012 at 10:09 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
There's been some ongoing behind-the-scenes discussion of the overall NA problem, but I wouldn't try to give an estimate on the outcome. My personal opinion is that given you already added the note to the docs that masked arrays are in a kind of experimental prototype state for this release, some small inconsistencies in their behaviour shouldn't be a release blocker. The release notes already have a whole list of stuff that's unsupported in the presence of masks ("Fancy indexing...UFunc.accumulate, UFunc.reduceat...where=...ndarray.argmax, ndarray.argmin..."), I'm not sure why .real and .imag are blockers and they aren't :-). Maybe just make a note of them on that list? (Unless of course Chuck fixes them before the other blockers are finished, as per his email that just arrived.) -- Nathaniel
![](https://secure.gravatar.com/avatar/3d3176cf99cae23d0ac119d1ea6c4d11.jpg?s=120&d=mm&r=g)
On Mon, Apr 16, 2012 at 11:29 PM, Nathaniel Smith <njs@pobox.com> wrote:
Good point. If you look at the open tickets for 1.7.0 ( http://projects.scipy.org/numpy/report/3) with a view on getting a release out soon, you could do the following: #2066 : close as fixed. #2078 : regression, should fix. #1578 : important to fix, but not a regression. Include only if fixed on time. #1755 : mark as knownfail. #2025 : document as not working as expected yet. #2047 : fix or postpone. Pearu indicated this will take him a few hours. #2076 : one of many. not a blocker, postpone. #2101 : need to do this. shouldn't cost much time. #2108 : status unclear. likely a blocker. Can someone who knows about datetime give some feedback on #2108? If that's not a blocker, a release within a couple of weeks can be considered. Although not fixing #1578 is questionable, and we need to revisit the LTS release debate... Ralf
![](https://secure.gravatar.com/avatar/3d3176cf99cae23d0ac119d1ea6c4d11.jpg?s=120&d=mm&r=g)
On Sat, Mar 24, 2012 at 10:13 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
There's one more ticket that hasn't been looked at AFAIK and that has been keeping the buildbots (except the Linux one) on red: http://projects.scipy.org/numpy/ticket/1755 (floating point errors). The other tickets have been looked at and either have PRs already or should be fixable with not too much effort. Ralf
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Sun, Mar 25, 2012 at 10:33 AM, Ralf Gommers <ralf.gommers@googlemail.com>wrote:
I don't know what to do about that one.There may be some compiler flags that would help. The other tickets have been looked at and either have PRs already or should
be fixable with not too much effort.
Chuck
![](https://secure.gravatar.com/avatar/3d3176cf99cae23d0ac119d1ea6c4d11.jpg?s=120&d=mm&r=g)
On Sat, Mar 24, 2012 at 10:13 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
Why do you consider (2) a blocker? Not saying it's not important, but there are eight other open tickets with segfaults. Some are more esoteric than other, but I don't see why for example #1713 and #1808 are less important than this one. #1522 provides a patch that fixes a segfault by the way, could use a review. Ralf
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Sun, Mar 25, 2012 at 3:14 PM, Ralf Gommers <ralf.gommers@googlemail.com>wrote:
I wasn't aware of the other segfaults, I'd like to get them all fixed... The list was meant to elicit additions. I don't know where the missed floating point errors come from, but they are somewhat dependent on the compiler doing the right thing and hardware support. I'd welcome any insight into why we get them on SPARC (underflow) and Windows (overflow). The windows buildbot doesn't seem to be updating correctly since it is still missing the combinations method that is now part of the test module. Chuck
![](https://secure.gravatar.com/avatar/3d3176cf99cae23d0ac119d1ea6c4d11.jpg?s=120&d=mm&r=g)
On Mon, Mar 26, 2012 at 1:27 AM, Charles R Harris <charlesr.harris@gmail.com
wrote:
8. Search for "segmentation fault" or "bus error" in Trac. I would hope these have a high priority to get fixed but, unless they're backwards compatibility issues, I don't consider them blockers. For the simple reason that then we'd never be able to ship any release. I don't know where the missed floating point errors come from, but they are
The windows buildbot doesn't seem to be updating correctly since it is still missing the combinations method that is now part of the test module.
Yeah, none of them are updating, it's a pain. We'll hopefully soon be able to switch to a shiny new one. Ralf
![](https://secure.gravatar.com/avatar/6c8561779fff34c62074c614d19980fc.jpg?s=120&d=mm&r=g)
There is an issue with the NumPy 1.7 release that we all need to understand. Doesn't including the missing-data attributes in the NumPy structure in a released version of NumPy basically commit to including those attributes in NumPy 1.8? I'm not comfortable with that, is everyone else? One possibility is to move those attributes to a C-level sub-class of NumPy. I have heard from a few people that they are not excited by the growth of the NumPy data-structure by the 3 pointers needed to hold the masked-array storage. This is especially true when there is talk to potentially add additional attributes to the NumPy array (for labels and other meta-information). If you are willing to let us know how you feel about this, please speak up. Mark Wiebe will be in Austin for about 3 months. He and I will be hashing some of this out in the first week or two. We will present any proposal and ask questions to this list before acting. We will be using some phone calls and face-to-face communications to increase the bandwidth and speed of the conversations (not to exclude anyone). If you would like to be part of the in-person discussions let me know -- or just make your views known here --- they will be taken seriously. The goal is consensus for any major change in NumPy. If we can't get consensus, then we vote on this list and use a super-majority. If we can't get a super-majority, then except in rare circumstances we can't move forward. Heavy users of NumPy get higher voting privileges. My perspective is that we don't have consensus on the current additions to the NumPy data-structure to have the current additional attributes on the NumPy data-structure be included for long-term release. Best, -Travis On Mar 25, 2012, at 6:27 PM, Charles R Harris wrote:
![](https://secure.gravatar.com/avatar/3d3176cf99cae23d0ac119d1ea6c4d11.jpg?s=120&d=mm&r=g)
On Tue, Apr 17, 2012 at 12:06 AM, Travis Oliphant <travis@continuum.io>wrote:
We clearly labeled NA as experimental, so some changes are to be expected. But not complete removal - so yes, if we release them they should stay in some form.
I'm not comfortable with that, is everyone else? One possibility is to move those attributes to a C-level sub-class of NumPy.
That's the first time I've heard this. Until now, we have talked a lot about adding bitmasks and API changes, not about complete removal. My assumption was that the experimental label was enough. From Nathaniel's reaction I gathered the same. It looks like too many conversations on this topic are happening off-list. Ralf
![](https://secure.gravatar.com/avatar/95198572b00e5fbcd97fb5315215bf7a.jpg?s=120&d=mm&r=g)
On Mon, Apr 16, 2012 at 3:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
My impression was that Travis was just suggesting that as an option here for discussion, not presenting it as something discussed elsewhere. I read Travis' email precisely as restarting the discussion for consideration of the issues in full public view (+ calls/skype open to anyone interested for bandwidth purposes), so in this case I don't think there's any background off-list to worry about. At least that's how I read it... Cheers, f
![](https://secure.gravatar.com/avatar/6c8561779fff34c62074c614d19980fc.jpg?s=120&d=mm&r=g)
No off list discussions have been happening material to this point. I am basically stating my view for the first time. I have delayed because I realize it is not a pleasant view and I was hoping I could end up resolving it favorably. But, it needs to be discussed before 1.7 is released. -- Travis Oliphant (on a mobile) 512-826-7480 On Apr 16, 2012, at 5:27 PM, Fernando Perez <fperez.net@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/3d3176cf99cae23d0ac119d1ea6c4d11.jpg?s=120&d=mm&r=g)
On Tue, Apr 17, 2012 at 12:27 AM, Fernando Perez <fperez.net@gmail.com>wrote:
From "I have heard from a few people that they are not excited ...." I deduce it was discussed to some extent.
I read Travis' email precisely as restarting the
discussion for consideration of the issues in full public view
It wasn't restating anything, it's completely opposite to the part that I thought we did reach consensus on (*not* backing out changes). I stated as much when first discussing a 1.7.0 in December, http://thread.gmane.org/gmane.comp.python.numeric.general/47022/focus=47027, with no one disagreeing. It's perfectly fine to reconsider any previous decisions/discussions of course. However, I do now draw the conclusion that it's best to wait for this issue to be resolved before considering a new release. I had been working on closing tickets and cleaning up loose ends for 1.7.0, and pinging others to do the same. I guess I'll stop doing that for now, until the renewed NA debate has been settled. If there are bug fixes that are important (like the Debian segfaults with Python debug builds), we can do a 1.6.2 release. Ralf (+
![](https://secure.gravatar.com/avatar/6c8561779fff34c62074c614d19980fc.jpg?s=120&d=mm&r=g)
The comments I have heard have been from people who haven't wanted to make them on this list. I wish they would, but I understand that not everyone wants to be drawn into a long discussion. They have not been discussions. My bias is to just move forward with what is there. After a week or two of discussion, I expect that we will resolve this one way or another. The result be to just move forward as previously planned. However, that might not be the best move forward either. These are significant changes and they do impact users. We need to understand those implications and take very seriously any concerns from users. There is time to look at this carefully. We need to take the time. I am really posting so that the discussions Mark and I have this week (I haven't seen Mark since PyCon) can be productive with as many other people participating as possible. -- Travis Oliphant (on a mobile) 512-826-7480 On Apr 16, 2012, at 6:01 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Mon, Apr 16, 2012 at 5:17 PM, Travis Oliphant <travis@continuum.io>wrote:
I would suggest the you and Mark have a good talk first, then report here with some specifics that you think need discussion, along with specifics from the unnamed sources. The somewhat vague "some say" doesn't help much and in the absence of specifics the discussion is likely to proceed along the same old lines if it happens at all. Meanwhile there is a disturbance in the force that makes us all uneasy. Chuck
![](https://secure.gravatar.com/avatar/6c8561779fff34c62074c614d19980fc.jpg?s=120&d=mm&r=g)
Ralf, I wouldn't change your plans just yet for NumPy 1.7. With Mark available full time for the next few weeks, I think we will be able to make rapid progress on whatever is decided -- in fact if people are available to help but just need resources let me know off list. I just want to make sure that the process for making significant changes to NumPy does not dis-enfranchise any voice. Like bug-reports, and feature-requests, complaints are food to a project, just like usage is oxygen. In my view, we should take any concern that is raised from the perspective of NumPy is "guilty until proven innocent." This takes some intentional effort. I have found that because of how much work it takes to design and implement software, my natural perspective is to be defensive, but I have always appreciated the outcome when all view-points are considered seriously and addressed respectfully. Best regards, -Travis On Apr 16, 2012, at 6:01 PM, Ralf Gommers wrote:
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
Hi, On Mon, Apr 16, 2012 at 3:06 PM, Travis Oliphant <travis@continuum.io> wrote:
I guess there are two questions here 1) Will something like the current version of masked arrays have a long term future in numpy, regardless of eventual API? Most likely answer - yes? 2) Will likely changes to the masked array API make any difference to the number of extra pointers? Likely answer no? Is that right? I have the impression that the masked array API discussion still has not come out fully into the unforgiving light of discussion day, but if the answer to 2) is No, then I suppose the API discussion is not relevant to the 3 pointers change. See y'all, Matthew
![](https://secure.gravatar.com/avatar/6c8561779fff34c62074c614d19980fc.jpg?s=120&d=mm&r=g)
On Apr 16, 2012, at 8:03 PM, Matthew Brett wrote:
I think the answer to this is yes, but it could be as a feature-filled sub-class (like the current numpy.ma, except in C).
The answer to this is very likely no on the Python side. But, on the C-side, their could be some differences (i.e. are masked arrays a sub-class of the ndarray or not).
You are correct that the API discussion is separate from this one. Overall, I was surprised at how fervently people would oppose ABI changes. As has been pointed out, NumPy and Numeric before it were not really designed to prevent having to recompile when changes were made. I'm still not sure that a better overall solution is not to promote better availability of downstream binary packages than excessively worry about ABI changes in NumPy. But, that is the current climate. In that climate, my concern is that we haven't finalized the API but are rapidly cementing the *structure* of NumPy arrays into a modified form that has real downstream implications. Two other people I have talked to share this concern (nobody who has posted on this list before but who are heavy users of NumPy). I may have missed the threads where it was discussed, but have these structure changes and their implications been fully discussed? Is there anyone else who is concerned about adding 3 more pointers (12 bytes or 24 bytes) to the NumPy structure? As Chuck points out, 3 more pointers is not necessarily that big of a deal if you are talking about a large array (though for small arrays it could matter). But, I personally know of half-written NEPs that propose to add more pointers to the NumPy array: * to allow meta-information to be attached to a NumPy array * to allow labels to be attached to a NumPy array (ala data-array) * to allow multiple chunks for an array. Are people O.K. with 5 or 6 more pointers on every NumPy array? We could also think about adding just one more pointer to a new "enhanced" structure that contains multiple enhancements to the NumPy array. But, this whole line of discussion sounds a lot like a true sub-class of the NumPy array at the C-level. It has the benefit that only people that use the features of the sub-class have to worry about using the extra space. Mark and I will talk about this long and hard. Mark has ideas about where he wants to see NumPy go, but I don't think we have fully accounted for where NumPy and its user base *is* and there may be better ways to approach this evolution. If others are interested in the outcome of the discussion please speak up (either on the list or privately) and we will make sure your views get heard and accounted for. Best regards, -Travis
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
Hi, On Mon, Apr 16, 2012 at 7:46 PM, Travis Oliphant <travis@continuum.io> wrote:
I'd love to hear that argument fleshed out in more detail - do you have time?
The objectors object to any binary ABI change, but not specifically three pointers rather than two or one? Is their point then about ABI breakage? Because that seems like a different point again. Or is it possible that they are in fact worried about the masked array API?
Mark and I will talk about this long and hard. Mark has ideas about where he wants to see NumPy go, but I don't think we have fully accounted for where NumPy and its user base *is* and there may be better ways to approach this evolution. If others are interested in the outcome of the discussion please speak up (either on the list or privately) and we will make sure your views get heard and accounted for.
I started writing something about this but I guess you'd know what I'd write, so I only humbly ask that you consider whether it might be doing real damage to allow substantial discussion that is not documented or argued out in public. See you, Matthew
![](https://secure.gravatar.com/avatar/6c8561779fff34c62074c614d19980fc.jpg?s=120&d=mm&r=g)
I think the answer to this is yes, but it could be as a feature-filled sub-class (like the current numpy.ma, except in C).
I'd love to hear that argument fleshed out in more detail - do you have time?
My proposal here is to basically take the current github NumPy data-structure and make this a sub-type (in C) of the NumPy 1.6 data-structure which is unchanged in NumPy 1.7. This would not require removing code but would require another PyTypeObject and associated structures. I expect Mark could do this work in 2-4 weeks. We also have other developers who could help in order to get the sub-type in NumPy 1.7. What kind of details would you like to see? In this way, the masked-array approach to missing data could be pursued by those who prefer that approach without affecting any other users of numpy arrays (and the numpy.ma sub-class could be deprecated). I would also like to add missing-data dtypes (ideally before NumPy 1.7, but it is not a requirement of release). I just think we need more data and uses and this would provide a way to get that without making a forced decision one way or another.
Adding pointers is not really an ABI change (but removing them after they were there would be...) It's really just the addition of data to the NumPy array structure that they aren't going to use. Most of the time it would not be a real problem (the number of use-cases where you have a lot of small NumPy arrays is small), but when it is a problem it is very annoying.
Is their point then about ABI breakage? Because that seems like a different point again.
Yes, it's not that.
Or is it possible that they are in fact worried about the masked array API?
I don't think most people whose opinion would be helpful are really tuned in to the discussion at this point. I think they just want us to come up with an answer and then move forward. But, they will judge us based on the solution we come up with.
It will be documented and argued in public. We are just going to have one off-list conversation to try and speed up the process. You make a valid point, and I appreciate the perspective. Please speak up again after hearing the report if something is not clear. I don't want this to even have the appearance of a "back-room" deal. Mark and I will have conversations about NumPy while he is in Austin. There are many other active stake-holders whose opinions and views are essential for major changes. Mark and I are working on other things besides just NumPy and all NumPy changes will be discussed on list and require consensus or super-majority for NumPy itself to change. I'm not sure if that helps. Is there more we can do? Thanks, -Travis
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
Hi, On Mon, Apr 16, 2012 at 8:40 PM, Travis Oliphant <travis@continuum.io> wrote:
I was dimly thinking of the same questions that Chuck had - about how subclassing would relate to the ufunc changes.
I just think we need more data and uses and this would provide a way to get that without making a forced decision one way or another.
Is the proposal that this would be an alternative API to numpy.ma? Is numpy.ma not itself satisfactory as a test of these uses, because of performance or some other reason?
As you might have heard me say before, my concern is that it has not been easy to have good discussions on this list. I think the problem has been that is has not been clear what the culture was, and how decisions got made, and that had led to some uncomfortable and unhelpful discussions. My plea would be for you as BDF$N to strongly encourage on-list discussions and discourage off-list discussions as far as possible, and to help us make the difficult public effort to bash out the arguments to clarity and consensus. I know that's a big ask. See you, Matthew
![](https://secure.gravatar.com/avatar/6c8561779fff34c62074c614d19980fc.jpg?s=120&d=mm&r=g)
On Apr 16, 2012, at 11:59 PM, Matthew Brett wrote:
Basically, there are two sets of changes as far as I understand right now: 1) ufunc infrastructure understands masked arrays 2) ndarray grew attributes to represent masked arrays I am proposing that we keep 1) but change 2) so that only certain kinds of NumPy arrays actually have the extra function pointers (effectively a sub-type). In essence, what I'm proposing is that the NumPy 1.6 PyArrayObject become a base-object, but the other members of the C-structure are not even present unless the Masked flag is set. Such changes would not require ripping code out --- just altering the presentation a bit. Yet, they could have large long-term implications, that we should explore before they get fixed. Whether masked arrays should be a formal sub-class is actually an un-related question and I generally lean in the direction of not encouraging sub-classes of the ndarray. The big questions are does this object work in the calculation infrastructure. Can I add an array to a masked array. Does it have a sum method? I think it could be argued that a masked array does have a "is a" relationship with an array. It can also be argued that it is better to have a "has a" relationship with an array and be-it's own-object. Either way, this object could still have it's first-part be binary compatible with a NumPy Array, and that is what I'm really suggesting. -Travis
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Tue, Apr 17, 2012 at 6:44 AM, Travis Oliphant <travis@continuum.io> wrote:
It sounds like the main implementation issue here is that this masked array class needs some way to coordinate with the ufunc infrastructure to efficiently and reliably handle the mask in calculations. The core ufunc code now knows how to handle masks, and this functionality is needed for where= and NA-dtypes, so obviously it's staying, independent of what we decide to do with masked arrays. So the question is just, how do we get the masked array and the ufuncs talking to each other so they can do the right thing. Perhaps we should focus, then, on how to create a better hooking mechanism for ufuncs? Something along these lines? http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html If done in a solid enough way, this would also solve other problems, e.g. we could make ufuncs work reliably on sparse matrices, which seems to trip people up on scipy-user every month or two. Of course, it's very tricky to get right :-( As far the masked array API: I'm still not convinced we know how we want these things to behave. The masked arrays in master currently implement MISSING semantics, but AFAICT everyone who wants MISSING semantics prefers NA-dtypes or even plain old NaN's over a masked implementation. And some of the current implementation's biggest backers, like Chuck, have argued that they should switch to skipNA=True, which is more of an IGNORED-style semantic. OTOH, there's still disagreement over how IGNORED-style semantics should even work (I'm thinking of that discussion about commutivity). The best existing model is numpy.ma -- but the numpy.ma API is quite different from the NEP, in more ways than just the default setting for skipNA. numpy.ma uses the opposite convention for mask values, it has additional concepts like the fillvalue, hardmask versus softmask, and then there's the whole way the NEP uses views to manage the mask. And I don't know which of these numpy.ma features are useful, which are extraneous, and which are currently useful but will become extraneous once the users who really wanted something more like NA-dtypes switch to those. So we all agree that masked arrays can be useful, and that numpy.ma has problems. But straightforwardly porting numpy.ma to C doesn't seem like it would help much, and neither does simply declaring that numpy.ma has been deprecated in favour of a new NEP-like API. So, I dunno. It seems like it might make the most sense to: 1) take the mask fields out of the core ndarray (while leaving the rest of Mark's infrastructure, as per above) 2) make sure we have the hooks needed so that numpy.ma, and NEP-like APIs, and whatever other experiments people want to try, can all integrate well with ufuncs, and make any other extensions that are generally useful and required so that they can work well 3) once we've experimented, move the winner into the core. Or whatever else makes sense to do once we understand what we're trying to accomplish. -- Nathaniel
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Tue, Apr 17, 2012 at 5:59 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi Matthew, As you know, I agree with everything you just said :-). So in interest of transparency, I should add: I have been in touch with Travis some off-list, and the main topic has been how to proceed in a way that let's us achieve public consensus. -- Nathaniel
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
Hi, On Tue, Apr 17, 2012 at 7:24 AM, Nathaniel Smith <njs@pobox.com> wrote:
I'm glad to hear that discussion is happening, but please do have it on list. If it's off list it easy for people to feel they are being bypassed, and that the public discussion is not important. So, yes, you might get a better outcome for this specific case, but a worse outcome in the long term, because the list will start to feel that it's for signing off or voting rather than discussion, and that - I feel sure - would lead to worse decisions. The other issue is that there's a reason you are having the discussion off-list - which is that it was getting difficult on-list. But - again - a personal view - that really has to be addressed directly by setting out the rules of engagement and modeling the kind of discussion we want to have. Cheers, Matthew
![](https://secure.gravatar.com/avatar/7e9e53dbe9781722d56e308c32387078.jpg?s=120&d=mm&r=g)
On 04/17/2012 08:40 AM, Matthew Brett wrote:
...when possible without paralysis.
I think you are over-stating the case a bit. Taking what you say literally, one might conclude that numpy people should never meet and chat, or phone each other up and chat. But such small conversations are an important extension and facilitator of individual thinking. Major decisions do need to get hashed out publicly, but mailing list discussions are only one part of the thinking and decision process. Eric
![](https://secure.gravatar.com/avatar/95198572b00e5fbcd97fb5315215bf7a.jpg?s=120&d=mm&r=g)
On Tue, Apr 17, 2012 at 11:40 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
I'm afraid I have to disagree: you seem to be proposing an absolute, 'zero-tolerance'-style policy against any off-list discussion. The only thing ZT policies achieve is to remove common sense and human judgement from a process, invariably causing more harm than they do good, no matter how well intentioned. There are perfectly reasonable cases where a quick phone call may be a more effective and sensible way to work than an on-list discussion. The question isn't whether someone, somewhere, had an off-list discussion or not; it's whether *the main decision making process* is being handled transparently or not. I trust that Nathaniel and Travis had a sensible reason to speak off-list; as long as it appears clear that the *decisions about numpy* are being made via public discussion with room for all necessary input and confrontation of disparate viewpoints, I don't care what they talk about in private. In IPython, I am constantly fielding private emails that I very often refer to the list because they make more sense there, but I also have off-list discussions when I consider that to be the right thing to do. And I certainly hope nobody ever asks me to *never* have an off-list discussion. I try very hard to ensure that the direction of the project is very transparent, with redundant points (people) of access to critical resources and a good vetting of key decisions with public input (e.g. our first IPEP at https://github.com/ipython/ipython/issues/1611). If I am failing at that, I hope people will call me out *on that point*, but not on whether I ever pick up the phone or email to talk about IPython off-list. Let's try to trust for one minute that the actual decisions will be made here with solid debate and project-wide input, and seek change only if we have evidence that this isn't happening (not evidence of a meta-problem that isn't a problem here). Best, f
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
Hi, On Tue, Apr 17, 2012 at 12:04 PM, Fernando Perez <fperez.net@gmail.com> wrote:
Right - but that would be an absurd overstatement of what I said. There's no point in addressing something I didn't say and no sensible person would think. Indeed, it makes the discussion harder. It's just exhausting to have to keep stating the obvious. Of course discussions happen off-list. Of course sometimes that has to happen. Of course that can be a better and quicker way of having discussions. However, in this case the
meta-problem that is a real problem is that we've shown ourselves that we are not currently good at having discussions on list. There are clearly reasons for that, and also clearly, they can be addressed. The particular point I am making is neither silly nor extreme nor vapid. It is simply that, in order to make discussion work better on the list, it is in my view better to make an explicit effort to make the discussions - explicit. Yours in Bay Area opposition, Matthew
![](https://secure.gravatar.com/avatar/95198572b00e5fbcd97fb5315215bf7a.jpg?s=120&d=mm&r=g)
On Tue, Apr 17, 2012 at 12:10 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Well, in that case neither Eric Firing nor I are 'sensible persons', since that's how we both understood what you said (Eric's email appeared to me as a more concise/better phrased version of the same points I was making). You said: """ I'm glad to hear that discussion is happening, but please do have it on list. If it's off list it easy for people to feel they are being bypassed, and that the public discussion is not important. """ I don't think it's an 'absurd overstatement' to interpret that as "don't have discussions off-list", but hey, it may just be me :)
meta-problem that is a real problem is that we've shown ourselves that we are not currently good at having discussions on list.
Oh, I know that did happen in the past regarding this very topic (the big NA mess last summer); what I meant was to try and trust that *this time around* things might be already moving in a better direction, which it seems to me they are. It seems to me that everyone is genuinely trying to tackle the discussion/consensus questions head-on right on the list, and that's why I proposed waiting to see if there were really any problems before asking Nathaniel not to have any discussion off-list (esp. since we have no evidence that what they talked about had any impact on any decisions bypassing the open forum). Best, f
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
On Tue, Apr 17, 2012 at 12:32 PM, Fernando Perez <fperez.net@gmail.com> wrote:
The absurd over-statement is the following: " I'm afraid I have to disagree: you seem to be proposing an absolute, 'zero-tolerance'-style policy against any off-list discussion. "
The question - which seems to me to be sensible rational and important - is how to get better at on-list discussion, and whether taking this particular discussion mainly off-list is good or bad in that respect. See you, Matthew
![](https://secure.gravatar.com/avatar/5c9fb379c4e97b58960d74dcbfc5dee5.jpg?s=120&d=mm&r=g)
On Mon, Apr 16, 2012 at 10:40:53PM -0500, Travis Oliphant wrote:
The objectors object to any binary ABI change, but not specifically three pointers rather than two or one?
I think that something that the numpy community must be very careful about is ABI breakage. At the scale of a large and heavy institution, it is very costly. In my mind, this is the argument that should guide the discussion: does going one way of the other (removing NA or not) will lead us likely into ABI breakage ? My 2 cents, Gaël
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Mon, Apr 16, 2012 at 8:46 PM, Travis Oliphant <travis@continuum.io>wrote:
I think making numpy.ma a subclass of ndarray has caused all sorts of trouble. It doesn't satisfy 'is a', rather it tries to use inheritance from ndarray for implementation of various parts. The upshot is that almost everything has to be overridden, so it didn't buy much.
Yes, this whole thing could get out of hand with too many extras. One of the things you could discuss with Mark is how to deal with this, or limit the modifications. At some point the ndarray class could become cumbersome, complicated, and difficult to maintain. We need to be careful that it doesn't go that way. I'd like to keep it as simple as possible, the question is what is fundamental. The main long term advantage of having masks part of the base is the possibility of adapted loops in ufuncs, which would give the advantage of speed. But that is just how it looks from where I stand, no doubt others have different priorities. But, this whole line of discussion sounds a lot like a true sub-class of
Chuck
![](https://secure.gravatar.com/avatar/6c8561779fff34c62074c614d19980fc.jpg?s=120&d=mm&r=g)
On Apr 16, 2012, at 11:01 PM, Charles R Harris wrote:
This is a valid point. One could create a new object that is binary compatible with the NumPy Array but not really a sub-class but provides the array interface. We could keep Mark's modifications to the array interface as well so that it can communicate a mask. -Travis
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Mon, Apr 16, 2012 at 10:38 PM, Travis Oliphant <travis@continuum.io>wrote:
Another place inheritance causes problems is PyUnicodeArrType inheriting from PyUnicodeType. There the difficulty is that the unicode itemsize/encoding may not match between the types. IIRC, it isn't recommended that derived classes change the itemsize. Numpy also has the different byte orderings... The Python types are sort of like virtual classes, so in some sense they are designed for inheritance. We could maybe set up some sort of parallel numpy type system with empty slots and such but we would need to decide what those slots are ahead of time. And if we got really serious, ABI backwards compatibility would break big time. <snip> Chuck
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Mon, Apr 16, 2012 at 7:46 PM, Travis Oliphant <travis@continuum.io> wrote:
As Chuck points out, 3 more pointers is not necessarily that big of a deal if you are talking about a large array (though for small arrays it could matter).
yup -- for the most part, numpy arrays are best for workign with large data sets, in which case a little bit bigger core object doesn't matter. But there are many times that we do want to work with small arrays (particularly ones that are pulled out of a larger array -- iterating over an array or (x,y) points or the like) However, numpy overhead is already pretty heavy for such use, so it may not matter. I recall discossion a couple times in the past of having some special-case numpy arrays for the simple, small cases -- perhaps 1-d or 2-d C-contiguous only, for instance. That might be a better way to address the small-array performance issue, and free us of concerns about minor growth to the core ndarray object. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/95198572b00e5fbcd97fb5315215bf7a.jpg?s=120&d=mm&r=g)
On Fri, Apr 20, 2012 at 9:49 AM, Chris Barker <chris.barker@noaa.gov> wrote:
+1 on that: I once wrote such code in pyrex (years ago) and it worked extremely well for me. No fancy features, very small footprint and highly optimized codepaths that gave me excellent performance. Cheers, f
![](https://secure.gravatar.com/avatar/723b49f8d57b46f753cc4097459cbcdb.jpg?s=120&d=mm&r=g)
Fernando Perez <fperez.net@gmail.com> wrote:
I don't think you gain that much by using a different type though? Those optimized code paths could be plugged into NumPy as well. I always assumed that it would be possible to optimize NumPy, just that nobody invested time in it. Starting from scratch you gain that you don't have to work with and understand NumPy's codebase, but I honestly think that's a small price to pay for compatibility. Dag
-- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
![](https://secure.gravatar.com/avatar/95198572b00e5fbcd97fb5315215bf7a.jpg?s=120&d=mm&r=g)
On Fri, Apr 20, 2012 at 11:27 AM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
I don't think you gain that much by using a different type though? Those optimized code paths could be plugged into NumPy as well.
Could be: this was years ago, and the bottleneck for me was in the constructor and in basic arithmetic. I had to make millions of these vectors and I needed to do basic arithmetic, but they were always 1-d and had one to 6 entries only. So writing a very static constructor with very low overhead did make a huge difference in that project. Also, when I wrote this code numpy didn't exist, I was using Numeric. Perhaps the same results could be obtained in numpy itself with judicious coding, I don't know. But in that project, ~600 lines of really easy pyrex code (it would be cython today) made a *huge* performance difference for me. Cheers, f
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Fri, Apr 20, 2012 at 11:39 AM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
Oh, right. I was thinking "small" as in "fits in L2 cache", not small as in a few dozen entries.
or even two or three entries. I often use a (2,) or (3,) numpy array to represent an (x,y) point (usually pulled out from a Nx2 array). I like it 'cause i can do array math, etc. it makes the code cleaner, but it's actually faster to use tuples and do the indexing by hand :-( but yes, having something built-in, or at least very compatible with numpy would be best. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/1967de10a7e68cd5c06ef2ba06e2301a.jpg?s=120&d=mm&r=g)
On Fri, Apr 20, 2012 at 11:45 AM, Chris Barker <chris.barker@noaa.gov> wrote:
Another example of a small array use-case: I've been using numpy for my research in multi-target tracking, which involves something like a bunch of entangled hidden markov models. I represent target states with small 2d arrays (e.g. 2x2, 4x4, ..) and observations with small 1d arrays (1 or 2 elements). It may be possible to combine a bunch of these small arrays into a single larger array and use indexing to extract views, but it is much cleaner and more intuitive to use separate, small arrays. It's also convenient to use numpy arrays rather than a custom class because I use the linear algebra functionality as well as integration with other libraries (e.g. matplotlib). I also work with approximate probabilistic inference in graphical models (belief propagation, etc), which is another area where it can be nice to work with many small arrays. In any case, I just wanted to chime in with my small bit of evidence for people wanting to use numpy for work with small arrays, even if they are currently pretty slow. If there were a special version of a numpy array that would be faster for cases like this, I would definitely make use of it. Drew
![](https://secure.gravatar.com/avatar/d98bd91ed2cea43056594b7ce5369b17.jpg?s=120&d=mm&r=g)
On 21. apr. 2012, at 00:16, Drew Frank wrote:
Although performance hasn't been a killer for me, I've been using numpy arrays (or matrices) for Mueller matrices [0] and Stokes vectors [1]. These describe the polarization of light and are always 4x1 vectors or 4x4 matrices. It would be nice if my code ran in 1 night instead of one week, although this is still tolerable in my case. Again, just an example of how small-vector/matrix performance can be important in certain use cases. Paul [0] https://en.wikipedia.org/wiki/Mueller_calculus [1] https://en.wikipedia.org/wiki/Stokes_vector
![](https://secure.gravatar.com/avatar/2ed170f366fc7841334af33ece2f9914.jpg?s=120&d=mm&r=g)
I have never found mailing lists good places for discussion and consensus. I think the format itself does not lend itself to involvement, carefully considered (or the ability to change) positions, or voting since all of it can be so easily lost within all of the quoting, the back and forth, people walking away,,,etc. And you also want involvement from people who don't have x hours to craft a finely worded, politically correct, and detailed response. I am not advocating this particular system, but something like http://meta.programmers.stackexchange.com/ would be a better platform for building to a decision when there are many choices to be made. Now about ma, NA, missing... I am just an engineer working in water resources and I had lots of difficulty reading the NEP (so sleeeeepy) so I will be the first to admit that I probably have something wrong. Just for reference (since I missed it the first time around) Nathaniel posted this page at https://github.com/njsmith/numpy/wiki/NA-discussion-status I think that I could adapt to everything that is discussed in the NEP, but I do have some comments about things that puzzled me. I don't need answers, but if I am puzzled maybe others are also. First - 'maskna=True'? Tested on development version of numpy... >>> a = np.arange(10, maskna = True) >>> a[:2] = np.NA >>> a array([NA, NA, 2, 3, 4, 5, 6, 7, 8, 9]) Why do I have to specify 'maskna = True'? If NA and ndarray are intended to be combined in some way, then I don't think that I need this. During development, I understand, but the NEP shouldn't have it. Heck, even if you keep NA and ndarrays separate when someone tries to set a ndarray element with np.NA, instead of a ValueError convert to an NA array. I say that very casually as if I know how to do it. I do have a proof, but the margin is too small to include it. :-) I am torn about 'skipna=True'. I think I understand the desire for explicit behavior, but I suspect that every operation that I would use a NA array for, would require 'skipna=True'. Actually, I don't use that many reducing functions, so maybe not a big deal. Regardless of the skipna setting, a related idea that could be useful for reducing functions is to set an 'includesna' attribute with the returned scalar value. The view() didn't work as described in the NEP, where np.NA isn't propagated back to the original array. This could be because the NEP references a 'missingdata' work in progress branch and I don't know what has been merged. I can force the NEP described behavior if I set 'd.flags.ownmaskna=True'. >>> d = a.view() >>> d array([NA, NA, 2, 3, 4, 5, 6, 7, 8, 9]) >>> d[0] = 4 >>> a array([4, NA, 2, 3, 4, 5, 6, 7, 8, 9]) >>> d array([4, NA, 2, 3, 4, 5, 6, 7, 8, 9]) >>> d[6] = np.NA >>> d array([4, NA, 2, 3, 4, 5, NA, 7, 8, 9]) >>> a array([4, NA, 2, 3, 4, 5, NA, 7, 8, 9]) In the NEP 'Accessing a Boolean Mask' section there is a comment about... actually I don't understand this section at all. Especially about a boolean byle level mask? Why would it need to be a byte level mask in order to be viewed? The logic also of mask = True or False, that can be easily handled by using a better name for the flag. 'mask = True' means that the value is masked (missing), where if 'exposed = True' is used that means the value is not masked (not missing). The biggest question mark to me is that 'a[0] = np.NA' is destructive and (using numpy.ma) 'a.mask[0] = True' is not. Is that a big deal? I am trying to think back on all of my 'ma' code and try to remember if I masked, then unmasked values and I don't recall any time that I did that. Of course my use cases are constrained to what I have done in the past. It feels like a bad idea, for the sake of saving the memory for the mask bits. Now, the amazing thing is that understanding so little, doing even less of the work, I get to vote. Sounds like America! I would really like to see NA in the wild, and I think that I can adapt my ma code to it, so +1. If it has to wait until 1.8, +1. If it has to wait until 1.9, +1. Kindest regards, Tim
![](https://secure.gravatar.com/avatar/082c34b6c1544c177cb7a66609360089.jpg?s=120&d=mm&r=g)
Hi, I just discovered that the NA mask will modify the base ndarray object. So I read about it to find the consequences on our c code. Up to now I have fully read: http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html and partially read: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst https://github.com/njsmith/numpy/wiki/NA-discussion-status In those documents, I see a problem with legacy code that will receive an NA masked array as input. If I missed something, tell me. All our c functions check their inputs array with PyArray_Check and PyArray_ISALIGNED. If the NA mask array is set inside the ndarray c object, our c functions who don't know about it and will treat those inputs as not masked. So the user will have unexpected results. The output will be an ndarray without mask but the code will have used the masked value. This will also happen with all other c code that use ndarray. In our case, all the input check is done at the same place, so adding the check with "PyArray_HasNASupport(PyArrayObject* obj)" to raise an error will be easy for us. But I don't think this is the case for most other c code. So I would prefer a separate object to protect users from code not being updated to reject NA masked inputs. An alternative would be to have PyArray_Check return False for the NA masked array, but I don't like that as this break the semantic that it check for the class. A last option I see would be to make the NPY_ARRAY_BEHAVED flag also check that the array is not an NA marked array. I suppose many c code do this check. But this is not a bullet proof check as not all code (as ours) do not use it. Also, I don't mind the added pointers to the structure as we use big arrays. thanks Frédéric
![](https://secure.gravatar.com/avatar/3d3176cf99cae23d0ac119d1ea6c4d11.jpg?s=120&d=mm&r=g)
On Sat, Mar 24, 2012 at 10:13 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
We've closed a number of open issues and merged some PRs, but haven't made much progress on the issues above. Especially for the NA issues I'm not sure what's going on. Is anyone working on this at the moment? If so, can he/she give an update of things to change/fix and an estimate of how long that will take? Thanks, Ralf
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Mon, Apr 16, 2012 at 10:09 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
There's been some ongoing behind-the-scenes discussion of the overall NA problem, but I wouldn't try to give an estimate on the outcome. My personal opinion is that given you already added the note to the docs that masked arrays are in a kind of experimental prototype state for this release, some small inconsistencies in their behaviour shouldn't be a release blocker. The release notes already have a whole list of stuff that's unsupported in the presence of masks ("Fancy indexing...UFunc.accumulate, UFunc.reduceat...where=...ndarray.argmax, ndarray.argmin..."), I'm not sure why .real and .imag are blockers and they aren't :-). Maybe just make a note of them on that list? (Unless of course Chuck fixes them before the other blockers are finished, as per his email that just arrived.) -- Nathaniel
![](https://secure.gravatar.com/avatar/3d3176cf99cae23d0ac119d1ea6c4d11.jpg?s=120&d=mm&r=g)
On Mon, Apr 16, 2012 at 11:29 PM, Nathaniel Smith <njs@pobox.com> wrote:
Good point. If you look at the open tickets for 1.7.0 ( http://projects.scipy.org/numpy/report/3) with a view on getting a release out soon, you could do the following: #2066 : close as fixed. #2078 : regression, should fix. #1578 : important to fix, but not a regression. Include only if fixed on time. #1755 : mark as knownfail. #2025 : document as not working as expected yet. #2047 : fix or postpone. Pearu indicated this will take him a few hours. #2076 : one of many. not a blocker, postpone. #2101 : need to do this. shouldn't cost much time. #2108 : status unclear. likely a blocker. Can someone who knows about datetime give some feedback on #2108? If that's not a blocker, a release within a couple of weeks can be considered. Although not fixing #1578 is questionable, and we need to revisit the LTS release debate... Ralf
participants (14)
-
Charles R Harris
-
Chris Barker
-
Dag Sverre Seljebotn
-
Drew Frank
-
Eric Firing
-
Fernando Perez
-
Frédéric Bastien
-
Gael Varoquaux
-
Matthew Brett
-
Nathaniel Smith
-
Paul Anton Letnes
-
Ralf Gommers
-
Tim Cera
-
Travis Oliphant