On Wed, Jun 12, 2013 at 7:43 PM, Eric Firing <efiring@hawaii.edu> wrote:
On 2013/06/12 2:10 AM, Nathaniel Smith wrote:
Personally I think that overloading np.empty is horribly ugly, will continue confusing newbies and everyone else indefinitely, and I'm 100% convinced that we'll regret implementing such a warty interface for something that should be so idiomatic. (Unfortunately I got busy and didn't actually say this in the previous thread though.) So I think we should just merge the PR as is. The only downside is the np.ma inconsistency, but, np.ma is already inconsistent (cf. masked_array.fill versus masked_array.filled!), somewhat deprecated,
"somewhat deprecated"? Really? Since when? By whom? Replaced by what?
Sorry, not trying to start a fight, just trying to summarize the situation. As far as I can tell: Despite heroic efforts on the part of its authors, numpy.ma has a number of weird quirks (masked data can still trigger invalid value errors), misfeatures (hard versus soft masks), and just plain old pain points (ongoing issues with whether any given operation will respect or preserve the mask). It's been in deep maintenance mode for some time; we merge the occasional bug fix that people send in, and that's it. (To be fair, numpy as a whole is fairly slow-moving, but numpy.ma still gets much less attention.) Even if there were active maintainers, no-one really has any idea how to fix any of the problems above; they're not so much bugs as intrinsic limitations of the design. Therefore, my impression is that a majority (not all, but a majority) of numpy developers strongly recommend against the use of numpy.ma in new projects. I could be wrong! And I know there's nothing to really replace it. I'd like to fix that. But I think "semi-deprecated" is not an unfair shorthand for the above. (I'll even admit that I'd *like* to actually deprecate it. But what I mean by that is, I don't think it's possible to fix it to the point where it's actually a solid/clean/robust library, so I'd like to reach a point where everyone who's currently using it is happier switching to something else and is happy to sign off on deprecating it.)
and AFAICT there are far more people who will benefit from a clean np.filled idiom than who actually use np.ma (and in particular its fill-value functionality). So there would be two
I think there are more np.ma users than you realize. Everyone who uses matplotlib is using np.ma at least implicitly, if not explicitly. Many of the matplotlib examples put np.ma to good use. np.ma.filled is an essential long-standing part of the np.ma API. I don't see any good rationale for generating a conflict with it, when an adequate non-conflicting alternative ('np.initialized', maybe others) exists.
I'm aware of that. If I didn't care about the opinions of numpy.ma users, I wouldn't go starting long and annoying mailing list threads about features that are only problematic because of their affect on numpy.ma :-). But, IMHO given the issues with numpy.ma, our number #1 priority ought to be making numpy proper as clean and beautiful as possible; my position that started this thread is basically just that we shouldn't make numpy proper worse just for numpy.ma's sake. That's the tail wagging the dog. And this 'conflict' seems a bit overstated given that (1) np.ma.filled already has multiple names (and 3/4 of the uses in matplotlib use the method version, not the function version), (2) even if we give it a non-conflicting name, np.ma's lack of maintenance means that it'd probably be years before someone got around to actually adding a parallel function to np.ma. [Unless this thread spurs someone into submitting one just to prove me wrong ;-).] But anyway, that was when the comparison was between np.filled() and np.empty(..., fill_value=...). Of the new things on the table: - I agree with Tom that 'np.values(...)' is so generic as to be unguessable. np.fromvalues() was also suggested, but this is even worse, because it suggests that it's analogous to np.from{buffer,file,function,regex,...}. But the analogous fromvalues() function already has a name: np.array. - np.filled_with and np.initialized are both gratuitously cumbersome. (It's the gratuitous that bothers me more than the cumbersome. No-one enjoys using APIs that feel like they're annoying for no good reason.) - np.full is... huh. It's quirky, and compared to np.filled it's more confusing (all arrays are full of *something*, but not all have been filled with a particular value) and it's less consistent with things like 'sorted'. But at least it's short, simple, and -- once you see it -- memorable. And at least it isn't immediately obvious when looking at it that it's a fallback choice because all the good names were taken. I could probably live with np.full. -n