Mailman 3 November 2011 - NumPy-Discussion

Re: [Numpy-discussion] in the NA discussion, what can we agree on?
by Laurent Gautier Nov. 3, 2011

Nov. 3, 2011

On 2011-11-03 04:22, numpy-discussion-request(a)scipy.org wrote: > > Message: 1 > Date: Wed, 2 Nov 2011 22:20:15 -0500 > From: Benjamin Root<ben.root(a)ou.edu> > Subject: Re: [Numpy-discussion] in the NA discussion, what can we > agree on? > To: Discussion of Numerical Python<numpy-discussion(a)scipy.org> > Message-ID: > <CANNq6FnLkweuXugoEy0kTO7Yi-V+tnV3nZJ6uPUKkva+D0Dz6A(a)mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > &… [View More]gt; On Wednesday, November 2, 2011, Nathaniel Smith<njs(a)pobox.com> wrote: >> Hi Benjamin, >> >> On Wed, Nov 2, 2011 at 5:25 PM, Benjamin Root<ben.root(a)ou.edu> wrote: >>> I want to pare this down even more. I think the above lists makes too > many >>> unneeded extrapolations. >> Okay. I found your formatting a little confusing, so I want to make >> sure I understood the changes you're suggesting: >> >> For the description of what MISSING means, you removed the lines: >> - Compatibility with R is valuable >> - To avoid user confusion, ideally it should *not* be possible to >> 'unmask' a missing value, since this is inconsistent with the "missing >> value" metaphor (e.g., see Wes's comment about "leaky abstractions") >> >> And you added the line: >> + Assigning MISSING is destructive >> >> And for the description of what IGNORED means, you removed the lines: >> - Some memory overhead is inevitable and acceptable >> - Compatibility with R neither possible nor valuable >> - Ability to toggle the IGNORED state of a location is critical, and >> should be as convenient as possible >> >> And you added the lines: >> + IGNORE is non-destructive >> + Must be competitive with np.ma for speed and memory (or else users >> would just use np.ma) >> >> Is that right? > Correct. > >> Assuming it is, my thoughts are: >> >> By R compatibility, I specifically had in mind in-memory >> compatibility. rpy2 provides a more-or-less seamless within-process >> interface between R and Python (and specifically lets you get numpy >> views on arrays returned by R functions), so if we can make this work >> for R arrays containing NA too then that'd be handy. (The rpy2 author >> requested this in the last discussion here: >> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/057084.html) >> When it comes to disk formats, then this doesn't matter so much, since >> IO routines have to translate between different representations all >> the time anyway. >> > Interesting, but I still have to wonder if that should be on the wishlist > for MISSING. I guess it would matter by knowing whether people would be > fully converting from R or gradually transitioning from it? That is > something that I can't answer. I probably do not have all possible use-cases but what I'd think of as the most common is: use R stuff just straight out of R from Python. Say that you are doing your work in Python and read about some statistical method for which an implementation in R exists (but not in Python/numpy). You can just pass your numpy arrays or vectors to the relevant R function(s) and retrieve the results in a form directly usable by numpy (without having the data copied around). Should performances become an issue, and that method be of crucial importance, you will probably want to reimplement it (C, or Cython, for example). Otherwise you could pick R's phenomenal toolbox without much effort and keep those calls to R as part of your code. In my experience, the later would be the most frequent. Get some compatibility for the NA "magic" values and that possible coupling between R and numpy becomes even better by preventing one side or the other to understand them as non-NA values. > >> I take the replacement of my line about MISSING disallowing unmasking >> and your line about MISSING assignment being destructive as basically >> expressing the same idea. Is that fair, or did you mean something >> else? > I am someone who wants to get to the absolute core of ideas. Also, this > expression cleanly delineates the differences as binary. > > By expressing it this way, we also shy away from implementation details. > For example, Unmasking can be programmatically prevented for MISSING while > it could be implemented by other indirect means for IGNORE. Not that those > are the preferred ways, only that the phrasing is more flexible and > exacting. > >> Finally, do you think that people who want IGNORED support care about >> having a convenient API for masking/unmasking values? You removed that >> line, but I don't know if that was because you disagreed with it, or >> were just trying to simplify. > See previous. > >>> Then, as a third-party module developer, I can tell you that having > separate >>> and independent ways to detect "MISSING"/"IGNORED" would likely make > support >>> more difficult and would greatly benefit from a common (or easily >>> combinable) method of identification. >> Right, sorry... I didn't forget, and that's part of what I was >> thinking when I described the second approach as keeping them as >> *mostly*-separate interfaces... but I should have made it more >> explicit! Anyway, yes: >> >> 4) There is consensus that whatever approach is taken, there should be >> a quick and convenient way to identify values that are MISSING, >> IGNORED, or both. (E.g., functions is_MISSING, is_IGNORED, >> is_MISSING_or_IGNORED, or some equivalent.) >> > Good. > > Cheers! > Ben Root > [View Less]

1 0

Nice float -> integer conversion?
by Matthew Brett Nov. 2, 2011

Nov. 2, 2011

Hi, Have I missed a fast way of doing nice float to integer conversion? By nice I mean, rounding to the nearest integer, converting NaN to 0, inf, -inf to the max and min of the integer range? The astype method and cast functions don't do what I need here: In [40]: np.array([1.6, np.nan, np.inf, -np.inf]).astype(np.int16) Out[40]: array([1, 0, 0, 0], dtype=int16) In [41]: np.cast[np.int16](np.array([1.6, np.nan, np.inf, -np.inf])) Out[41]: array([1, 0, 0, 0], dtype=int16) Have I missed … [View More]

4 8

Float128 integer comparison
by Matthew Brett Nov. 2, 2011

Nov. 2, 2011

Hi, Continuing the exploration of float128 - can anyone explain this behavior? >>> np.float64(9223372036854775808.0) == 9223372036854775808L True >>> np.float128(9223372036854775808.0) == 9223372036854775808L False >>> int(np.float128(9223372036854775808.0)) == 9223372036854775808L True >>> np.round(np.float128(9223372036854775808.0)) == np.float128(9223372036854775808.0) True Thanks for any pointers, Best, Matthew

3 3

float64 / int comparison different from float / int comparison
by Matthew Brett Nov. 2, 2011

Nov. 2, 2011

Hi, I just ran into this confusing difference between np.float and np.float64: In [8]: np.float(2**63) == 2**63 Out[8]: True In [9]: np.float(2**63) > 2**63-1 Out[9]: True In [10]: np.float64(2**63) == 2**63 Out[10]: True In [11]: np.float64(2**63) > 2**63-1 Out[11]: False In [16]: np.float64(2**63-1) == np.float(2**63-1) Out[16]: True I believe values above 2*52 are all represented as integers in float64. http://matthew-brett.github.com/pydagogue/floating_point.html Is this … [View More]

3 5

einsum performance with many operands
by Eugenio Piasini Nov. 1, 2011

Nov. 1, 2011

Hi, I was playing around with einsum (which is really cool, by the way!), and I noticed something strange (or unexpected at least) performance-wise. Here is an example: Let x and y be two NxM arrays, and W be a NxN array. I want to compute f = x^{ik} W_i^j y_{jk} (I hope the notation is clear enough) What surprises me is that it's much faster to compute the two products separately - as in ((xW)y) or (x(Wy)), rather than directly passing the three-operands expression to einsum. I did … [View More]

1 0

Reason behind C_ABI_VERSION so high
by Sandro Tosi Nov. 1, 2011

Nov. 1, 2011

Hello, in Debian we're trying to define a way to handle numpy transitions more smoothly (you can read the proposal, if interested, at http://bugs.debian.org/643873). In order to do that, we'd like to use the C_API_VERSION and C_ABI_VERSION values; while for C_API_VERSION we can see it's a quite small value, with a clear history at ./numpy/core/code_generators/cversions.txt , we don't have quite clear why C_ABI_VERSION is such a high value and how it would be incremented. Could you please … [View More]

2 2

NumPy nogil API
by mark florisson Nov. 1, 2011

Nov. 1, 2011

Hello, First, I'd like to report a bug. It seems ndarray does not implement tp_traverse or tp_clear, so if you have a reference cycle in an ndarray with dtype object none of those objects will ever be collected. Secondly, please bear with me, I'm not a NumPy expert, but would it be possible to have NumPy export an API that can be called without the GIL? In the upcoming Cython release 0.16 (soon to be released) we will have what is called typed memoryviews [1], which allow you to obtain a … [View More]typed view on any PEP 3118 buffer, and it allows you to do a lot more things without the GIL. e.g. it allows you to pass these views around, transpose them, slice them (in the same way as in NumPy but slightly more restricted, it doesn't support masks and such), index them etc without the GIL. However, there is a lot of good functionality in NumPy that is simply not accessible without going through a Python and GIL layer, only to have NumPy (possibly) release the GIL. So instead we could have an API that takes a C-level ndarray view descriptor (shape/strides/ndim (possibly suboffsets), base type description) that would do the actual work (and perhaps doesn't even need to know anything about Python) and won't need the GIL (this wouldn't work for the dtype object, of course). The Py_buffer struct comes to mind, but format strings are hard to deal with. It would be the caller's responsibility to do any necessary synchronization. This API would simply be wrapped by a Python API that gets this "view" from the PyArrayObject, and which may or may not decide to release the GIL, and call the nogil API. This wrapping API could even be written easily in Cython. As for exceptions and error handling, there are many ways we can think of doing this without requiring the GIL. One of the reasons that I think this is important is that when you're using cython.parallel [2] you don't want to hold the gil, but you do want your NumPy goodies. Cython re-implements a very small subset of NumPy to get you the core functionality, but to get back to NumPy world you have to acquire the GIL, convert your memoryview slice to an ndarray (using the buffer interface through numpy.asarray()) and then have NumPy operate on that. It's a pain to write and it's terrible for performance. Even if you forget the GIL part, there's still the (expensive and explicit) conversion. In general I think there might be many advantages to such functionality other than for Cython. There shouldn't really be a reason to tie NumPy only to the CPython platform. Anyway, what do you guys think, does this make any sense? Mark [1]: https://sage.math.washington.edu:8091/hudson/job/cython-docs/doclinks/1/src… [2]: http://docs.cython.org/src/userguide/parallelism.html [View Less]

5 11