From russell at keith-magee.com Sun Mar 4 03:05:57 2018 From: russell at keith-magee.com (Russell Keith-Magee) Date: Sun, 4 Mar 2018 16:05:57 +0800 Subject: [Numpy-discussion] Request for review: PR #10689 Message-ID: <385EFD5F-EFD8-44BC-B11F-C60AE00EFE4B@keith-magee.com> Hi all, I?ve just submitted PR #10689, making some small changes to allow NumPy to be compiled on iOS. https://github.com/numpy/numpy/pull/10689 The changes are described in detail on the PR description. For the most part, they?re changes to differentiate between building *on* macOS, and building *for* macOS. Yours, Russ Magee %-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From marko.asplund at gmail.com Tue Mar 6 04:39:31 2018 From: marko.asplund at gmail.com (Marko Asplund) Date: Tue, 6 Mar 2018 11:39:31 +0200 Subject: [Numpy-discussion] numpy.random.randn Message-ID: I've some neural network code in NumPy that I'd like to compare with a Scala based implementation. My problem is currently random initialization of the neural net parameters. I'd like to be able to get the same results from both implementations when using the same random seed. One approach I've though of would be to use the NumPy random generator also with the Scala implementation, but unfortunately the linear algebra library I'm using doesn't provide an equivalent for this. Could someone give pointers to implementing numpy.random.randn? Or alternatively, is there an equivalent random generator for Scala or Java? marko -------------- next part -------------- An HTML attachment was scrubbed... URL: From cimrman3 at ntc.zcu.cz Tue Mar 6 06:06:20 2018 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Tue, 6 Mar 2018 12:06:20 +0100 Subject: [Numpy-discussion] ANN: SfePy 2018.1 Message-ID: <07f7c4f6-abae-a7f4-5e87-e29dbab2e296@ntc.zcu.cz> I am pleased to announce release 2018.1 of SfePy. Description ----------- SfePy (simple finite elements in Python) is a software for solving systems of coupled partial differential equations by the finite element method or by the isogeometric analysis (limited support). It is distributed under the new BSD license. Home page: http://sfepy.org Mailing list: https://mail.python.org/mm3/mailman3/lists/sfepy.python.org/ Git (source) repository, issue tracker: https://github.com/sfepy/sfepy Highlights of this release -------------------------- - major update of time-stepping solvers and solver handling - Newmark and Bathe elastodynamics solvers - interface to MUMPS linear solver - new examples: - iron plate impact problem (elastodynamics) - incompressible Mooney-Rivlin material model (hyperelasticity) as a script For full release notes see http://docs.sfepy.org/doc/release_notes.html#id1 (rather long and technical). Cheers, Robert Cimrman --- Contributors to this release in alphabetical order: Robert Cimrman Jan Heczko Jan Kopacka Vladimir Lukes From robert.kern at gmail.com Tue Mar 6 15:52:14 2018 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 6 Mar 2018 12:52:14 -0800 Subject: [Numpy-discussion] numpy.random.randn In-Reply-To: References: Message-ID: On Tue, Mar 6, 2018 at 1:39 AM, Marko Asplund wrote: > > I've some neural network code in NumPy that I'd like to compare with a Scala based implementation. > My problem is currently random initialization of the neural net parameters. > I'd like to be able to get the same results from both implementations when using the same random seed. > > One approach I've though of would be to use the NumPy random generator also with the Scala implementation, but unfortunately the linear algebra library I'm using doesn't provide an equivalent for this. > > Could someone give pointers to implementing numpy.random.randn? > Or alternatively, is there an equivalent random generator for Scala or Java? I would just recommend using one of the codebases to initialize the network, save the network out to disk, and load up the initialized network in each of the different codebases for training. That way you are sure that they are both starting from the same exact network parameters. Even if you do rewrite a precisely equivalent np.random.randn() for Scala/Java, you ought to write the code to serialize the initialized network anyways so that you can test that the two initialization routines are equivalent. But if you're going to do that, you might as well take my recommended approach. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From marko.asplund at gmail.com Wed Mar 7 16:10:38 2018 From: marko.asplund at gmail.com (Marko Asplund) Date: Wed, 7 Mar 2018 23:10:38 +0200 Subject: [Numpy-discussion] numpy.random.randn In-Reply-To: References: Message-ID: On Tue, 6 Mar 2018 12:52:14, Robert Kern wrote: > I would just recommend using one of the codebases to initialize the > network, save the network out to disk, and load up the initialized network > in each of the different codebases for training. That way you are sure that > they are both starting from the same exact network parameters. > > Even if you do rewrite a precisely equivalent np.random.randn() for > Scala/Java, you ought to write the code to serialize the initialized > network anyways so that you can test that the two initialization routines > are equivalent. But if you're going to do that, you might as well take my > recommended approach. Thanks for the suggestion! I decided to use the approach you proposed. Still, I'm puzzled by an issue that seems to be related to random initilization. I've three different NN implementations, 2 in Scala and one in NumPy. When using the exact same initialization parameters I get the same cost after each training iteration from each implementation. So, based on this I'd infer that the implementations work equivalently. However, the results look very different when using random initialization. With respect to exact cost this is course expected, but what I find troublesome is that after N training iterations the cost starts approaching zero with the NumPy code (most of of the time), whereas with the Scala based implementations cost fails to converge (most of the time). With NumPy I'm simply using the following random initilization code: np.random.randn(n_h, n_x) * 0.01 I'm trying to emulate the same behaviour in my Scala code by sampling from a Gaussian distribution with mean = 0 and std dev = 1. Any ideas? Marko -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Mar 7 16:14:36 2018 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 7 Mar 2018 13:14:36 -0800 Subject: [Numpy-discussion] numpy.random.randn In-Reply-To: References: Message-ID: On Wed, Mar 7, 2018 at 1:10 PM, Marko Asplund wrote: > > However, the results look very different when using random initialization. > With respect to exact cost this is course expected, but what I find troublesome > is that after N training iterations the cost starts approaching zero with the NumPy > code (most of of the time), whereas with the Scala based implementations cost fails > to converge (most of the time). > > With NumPy I'm simply using the following random initilization code: > > np.random.randn(n_h, n_x) * 0.01 > > I'm trying to emulate the same behaviour in my Scala code by sampling from a > Gaussian distribution with mean = 0 and std dev = 1. `np.random.randn(n_h, n_x) * 0.01` gives a Gaussian distribution of mean=0 and stdev=0.01 -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Mar 8 03:25:00 2018 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 8 Mar 2018 00:25:00 -0800 Subject: [Numpy-discussion] New NEP: merging multiarray and umath Message-ID: Hi all, Well, this is something that we've discussed for a while and I think generally has consensus already, but I figured I'd write it down anyway to make sure. There's a rendered version here: https://github.com/njsmith/numpy/blob/nep-0015-merge-multiarray-umath/doc/neps/nep-0015-merge-multiarray-umath.rst ----- ============================ Merging multiarray and umath ============================ :Author: Nathaniel J. Smith :Status: Draft :Type: Standards Track :Created: 2018-02-22 Abstract -------- Let's merge ``numpy.core.multiarray`` and ``numpy.core.umath`` into a single extension module, and deprecate ``np.set_numeric_ops``. Background ---------- Currently, numpy's core C code is split between two separate extension modules. ``numpy.core.multiarray`` is built from ``numpy/core/src/multiarray/*.c``, and contains the core array functionality (in particular, the ``ndarray`` object). ``numpy.core.umath`` is built from ``numpy/core/src/umath/*.c``, and contains the ufunc machinery. These two modules each expose their own separate C API, accessed via ``import_multiarray()`` and ``import_umath()`` respectively. The idea is that they're supposed to be independent modules, with ``multiarray`` as a lower-level layer with ``umath`` built on top. In practice this has turned out to be problematic. First, the layering isn't perfect: when you write ``ndarray + ndarray``, this invokes ``ndarray.__add__``, which then calls the ufunc ``np.add``. This means that ``ndarray`` needs to know about ufuncs ? so instead of a clean layering, we have a circular dependency. To solve this, ``multiarray`` exports a somewhat terrifying function called ``set_numeric_ops``. The bootstrap procedure each time you ``import numpy`` is: 1. ``multiarray`` and its ``ndarray`` object are loaded, but arithmetic operations on ndarrays are broken. 2. ``umath`` is loaded. 3. ``set_numeric_ops`` is used to monkeypatch all the methods like ``ndarray.__add__`` with objects from ``umath``. In addition, ``set_numeric_ops`` is exposed as a public API, ``np.set_numeric_ops``. Furthermore, even when this layering does work, it ends up distorting the shape of our public ABI. In recent years, the most common reason for adding new functions to ``multiarray``\'s "public" ABI is not that they really need to be public or that we expect other projects to use them, but rather just that we need to call them from ``umath``. This is extremely unfortunate, because it makes our public ABI unnecessarily large, and since we can never remove things from it then this creates an ongoing maintenance burden. The way C works, you can have internal API that's visible to everything inside the same extension module, or you can have a public API that everyone can use; you can't have an API that's visible to multiple extension modules inside numpy, but not to external users. We've also increasingly been putting utility code into ``numpy/core/src/private/``, which now contains a bunch of files which are ``#include``\d twice, once into ``multiarray`` and once into ``umath``. This is pretty gross, and is purely a workaround for these being separate C extensions. Proposed changes ---------------- This NEP proposes three changes: 1. We should start building ``numpy/core/src/multiarray/*.c`` and ``numpy/core/src/umath/*.c`` together into a single extension module. 2. Instead of ``set_numeric_ops``, we should use some new, private API to set up ``ndarray.__add__`` and friends. 3. We should deprecate, and eventually remove, ``np.set_numeric_ops``. Non-proposed changes -------------------- We don't necessarily propose to throw away the distinction between multiarray/ and umath/ in terms of our source code organization: internal organization is useful! We just want to build them together into a single extension module. Of course, this does open the door for potential future refactorings, which we can then evaluate based on their merits as they come up. It also doesn't propose that we break the public C ABI. We should continue to provide ``import_multiarray()`` and ``import_umath()`` functions ? it's just that now both ABIs will ultimately be loaded from the same C library. Due to how ``import_multiarray()`` and ``import_umath()`` are written, we'll also still need to have modules called ``numpy.core.multiarray`` and ``numpy.core.umath``, and they'll need to continue to export ``_ARRAY_API`` and ``_UFUNC_API`` objects ? but we can make one or both of these modules be tiny shims that simply re-export the magic API object from where-ever it's actually defined. (See ``numpy/core/code_generators/generate_{numpy,ufunc}_api.py`` for details of how these imports work.) Backward compatibility ---------------------- The only compatibility break is the deprecation of ``np.set_numeric_ops``. Alternatives ------------ n/a Discussion ---------- TBD Copyright --------- This document has been placed in the public domain. -- Nathaniel J. Smith -- https://vorpus.org From wieser.eric+numpy at gmail.com Thu Mar 8 03:47:46 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Thu, 08 Mar 2018 08:47:46 +0000 Subject: [Numpy-discussion] New NEP: merging multiarray and umath In-Reply-To: References: Message-ID: This means that ndarray needs to know about ufuncs ? so instead of a clean layering, we have a circular dependency. Perhaps we should split ndarray into a base_ndarray class with no arithmetic support (*add*, sum, etc), and then provide an ndarray subclass from umath instead (either the separate extension, or just a different set of files) ? On Thu, 8 Mar 2018 at 08:25 Nathaniel Smith wrote: > Hi all, > > Well, this is something that we've discussed for a while and I think > generally has consensus already, but I figured I'd write it down > anyway to make sure. > > There's a rendered version here: > > https://github.com/njsmith/numpy/blob/nep-0015-merge-multiarray-umath/doc/neps/nep-0015-merge-multiarray-umath.rst > > ----- > > ============================ > Merging multiarray and umath > ============================ > > :Author: Nathaniel J. Smith > :Status: Draft > :Type: Standards Track > :Created: 2018-02-22 > > > Abstract > -------- > > Let's merge ``numpy.core.multiarray`` and ``numpy.core.umath`` into a > single extension module, and deprecate ``np.set_numeric_ops``. > > > Background > ---------- > > Currently, numpy's core C code is split between two separate extension > modules. > > ``numpy.core.multiarray`` is built from > ``numpy/core/src/multiarray/*.c``, and contains the core array > functionality (in particular, the ``ndarray`` object). > > ``numpy.core.umath`` is built from ``numpy/core/src/umath/*.c``, and > contains the ufunc machinery. > > These two modules each expose their own separate C API, accessed via > ``import_multiarray()`` and ``import_umath()`` respectively. The idea > is that they're supposed to be independent modules, with > ``multiarray`` as a lower-level layer with ``umath`` built on top. In > practice this has turned out to be problematic. > > First, the layering isn't perfect: when you write ``ndarray + > ndarray``, this invokes ``ndarray.__add__``, which then calls the > ufunc ``np.add``. This means that ``ndarray`` needs to know about > ufuncs ? so instead of a clean layering, we have a circular > dependency. To solve this, ``multiarray`` exports a somewhat > terrifying function called ``set_numeric_ops``. The bootstrap > procedure each time you ``import numpy`` is: > > 1. ``multiarray`` and its ``ndarray`` object are loaded, but > arithmetic operations on ndarrays are broken. > > 2. ``umath`` is loaded. > > 3. ``set_numeric_ops`` is used to monkeypatch all the methods like > ``ndarray.__add__`` with objects from ``umath``. > > In addition, ``set_numeric_ops`` is exposed as a public API, > ``np.set_numeric_ops``. > > Furthermore, even when this layering does work, it ends up distorting > the shape of our public ABI. In recent years, the most common reason > for adding new functions to ``multiarray``\'s "public" ABI is not that > they really need to be public or that we expect other projects to use > them, but rather just that we need to call them from ``umath``. This > is extremely unfortunate, because it makes our public ABI > unnecessarily large, and since we can never remove things from it then > this creates an ongoing maintenance burden. The way C works, you can > have internal API that's visible to everything inside the same > extension module, or you can have a public API that everyone can use; > you can't have an API that's visible to multiple extension modules > inside numpy, but not to external users. > > We've also increasingly been putting utility code into > ``numpy/core/src/private/``, which now contains a bunch of files which > are ``#include``\d twice, once into ``multiarray`` and once into > ``umath``. This is pretty gross, and is purely a workaround for these > being separate C extensions. > > > Proposed changes > ---------------- > > This NEP proposes three changes: > > 1. We should start building ``numpy/core/src/multiarray/*.c`` and > ``numpy/core/src/umath/*.c`` together into a single extension > module. > > 2. Instead of ``set_numeric_ops``, we should use some new, private API > to set up ``ndarray.__add__`` and friends. > > 3. We should deprecate, and eventually remove, ``np.set_numeric_ops``. > > > Non-proposed changes > -------------------- > > We don't necessarily propose to throw away the distinction between > multiarray/ and umath/ in terms of our source code organization: > internal organization is useful! We just want to build them together > into a single extension module. Of course, this does open the door for > potential future refactorings, which we can then evaluate based on > their merits as they come up. > > It also doesn't propose that we break the public C ABI. We should > continue to provide ``import_multiarray()`` and ``import_umath()`` > functions ? it's just that now both ABIs will ultimately be loaded > from the same C library. Due to how ``import_multiarray()`` and > ``import_umath()`` are written, we'll also still need to have modules > called ``numpy.core.multiarray`` and ``numpy.core.umath``, and they'll > need to continue to export ``_ARRAY_API`` and ``_UFUNC_API`` objects ? > but we can make one or both of these modules be tiny shims that simply > re-export the magic API object from where-ever it's actually defined. > (See ``numpy/core/code_generators/generate_{numpy,ufunc}_api.py`` for > details of how these imports work.) > > > Backward compatibility > ---------------------- > > The only compatibility break is the deprecation of ``np.set_numeric_ops``. > > > Alternatives > ------------ > > n/a > > > Discussion > ---------- > > TBD > > > Copyright > --------- > > This document has been placed in the public domain. > > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Mar 8 03:57:38 2018 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 8 Mar 2018 00:57:38 -0800 Subject: [Numpy-discussion] New NEP: merging multiarray and umath In-Reply-To: References: Message-ID: On Thu, Mar 8, 2018 at 12:47 AM, Eric Wieser wrote: > This means that ndarray needs to know about ufuncs ? so instead of a clean > layering, we have a circular dependency. > > Perhaps we should split ndarray into a base_ndarray class with no arithmetic > support (add, sum, etc), and then provide an ndarray subclass from umath > instead (either the separate extension, or just a different set of files) This just seems like adding more complexity because we can, though? -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Thu Mar 8 04:33:56 2018 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 8 Mar 2018 01:33:56 -0800 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray Message-ID: Hi all, Here's a more substantive NEP: trying to define how to define a standard way for functions to say that they can accept any "duck array". Biggest open question for me: the name "asabstractarray" kinda sucks (for reasons described in the NEP), and I'd love to have something better. Any ideas? Rendered version: https://github.com/njsmith/numpy/blob/nep-16-abstract-array/doc/neps/nep-0016-abstract-array.rst -n ---- ==================================================== An abstract base class for identifying "duck arrays" ==================================================== :Author: Nathaniel J. Smith :Status: Draft :Type: Standards Track :Created: 2018-03-06 Abstract -------- We propose to add an abstract base class ``AbstractArray`` so that third-party classes can declare their ability to "quack like" an ``ndarray``, and an ``asabstractarray`` function that performs similarly to ``asarray`` except that it passes through ``AbstractArray`` instances unchanged. Detailed description -------------------- Many functions, in NumPy and in third-party packages, start with some code like:: def myfunc(a, b): a = np.asarray(a) b = np.asarray(b) ... This ensures that ``a`` and ``b`` are ``np.ndarray`` objects, so ``myfunc`` can carry on assuming that they'll act like ndarrays both semantically (at the Python level), and also in terms of how they're stored in memory (at the C level). But many of these functions only work with arrays at the Python level, which means that they don't actually need ``ndarray`` objects *per se*: they could work just as well with any Python object that "quacks like" an ndarray, such as sparse arrays, dask's lazy arrays, or xarray's labeled arrays. However, currently, there's no way for these libraries to express that their objects can quack like an ndarray, and there's no way for functions like ``myfunc`` to express that they'd be happy with anything that quacks like an ndarray. The purpose of this NEP is to provide those two features. Sometimes people suggest using ``np.asanyarray`` for this purpose, but unfortunately its semantics are exactly backwards: it guarantees that the object it returns uses the same memory layout as an ``ndarray``, but tells you nothing at all about its semantics, which makes it essentially impossible to use safely in practice. Indeed, the two ``ndarray`` subclasses distributed with NumPy ? ``np.matrix`` and ``np.ma.masked_array`` ? do have incompatible semantics, and if they were passed to a function like ``myfunc`` that doesn't check for them as a special-case, then it may silently return incorrect results. Declaring that an object can quack like an array ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There are two basic approaches we could use for checking whether an object quacks like an array. We could check for a special attribute on the class:: def quacks_like_array(obj): return bool(getattr(type(obj), "__quacks_like_array__", False)) Or, we could define an `abstract base class (ABC) `__:: def quacks_like_array(obj): return isinstance(obj, AbstractArray) If you look at how ABCs work, this is essentially equivalent to keeping a global set of types that have been declared to implement the ``AbstractArray`` interface, and then checking it for membership. Between these, the ABC approach seems to have a number of advantages: * It's Python's standard, "one obvious way" of doing this. * ABCs can be introspected (e.g. ``help(np.AbstractArray)`` does something useful). * ABCs can provide useful mixin methods. * ABCs integrate with other features like mypy type-checking, ``functools.singledispatch``, etc. One obvious thing to check is whether this choice affects speed. Using the attached benchmark script on a CPython 3.7 prerelease (revision c4d77a661138d, self-compiled, no PGO), on a Thinkpad T450s running Linux, we find:: np.asarray(ndarray_obj) 330 ns np.asarray([]) 1400 ns Attribute check, success 80 ns Attribute check, failure 80 ns ABC, success via subclass 340 ns ABC, success via register() 700 ns ABC, failure 370 ns Notes: * The first two lines are included to put the other lines in context. * This used 3.7 because both ``getattr`` and ABCs are receiving substantial optimizations in this release, and it's more representative of the long-term future of Python. (Failed ``getattr`` doesn't necessarily construct an exception object anymore, and ABCs were reimplemented in C.) * The "success" lines refer to cases where ``quacks_like_array`` would return True. The "failure" lines are cases where it would return False. * The first measurement for ABCs is subclasses defined like:: class MyArray(AbstractArray): ... The second is for subclasses defined like:: class MyArray: ... AbstractArray.register(MyArray) I don't know why there's such a large difference between these. In practice, either way we'd only do the full test after first checking for well-known types like ``ndarray``, ``list``, etc. `This is how NumPy currently checks for other double-underscore attributes `__ and the same idea applies here to either approach. So these numbers won't affect the common case, just the case where we actually have an ``AbstractArray``, or else another third-party object that will end up going through ``__array__`` or ``__array_interface__`` or end up as an object array. So in summary, using an ABC will be slightly slower than using an attribute, but this doesn't affect the most common paths, and the magnitude of slowdown is fairly small (~250 ns on an operation that already takes longer than that). Furthermore, we can potentially optimize this further (e.g. by keeping a tiny LRU cache of types that are known to be AbstractArray subclasses, on the assumption that most code will only use one or two of these types at a time), and it's very unclear that this even matters ? if the speed of ``asarray`` no-op pass-throughs were a bottleneck that showed up in profiles, then probably we would have made them faster already! (It would be trivial to fast-path this, but we don't.) Given the semantic and usability advantages of ABCs, this seems like an acceptable trade-off. .. CPython 3.6 (from Debian):: Attribute check, success 110 ns Attribute check, failure 370 ns ABC, success via subclass 690 ns ABC, success via register() 690 ns ABC, failure 1220 ns Specification of ``asabstractarray`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Given ``AbstractArray``, the definition of ``asabstractarray`` is simple:: def asabstractarray(a, dtype=None): if isinstance(a, AbstractArray): if dtype is not None and dtype != a.dtype: return a.astype(dtype) return a return asarray(a, dtype=dtype) Things to note: * ``asarray`` also accepts an ``order=`` argument, but we don't include that here because it's about details of memory representation, and the whole point of this function is that you use it to declare that you don't care about details of memory representation. * Using the ``astype`` method allows the ``a`` object to decide how to implement casting for its particular type. * For strict compatibility with ``asarray``, we skip calling ``astype`` when the dtype is already correct. Compare:: >>> a = np.arange(10) # astype() always returns a view: >>> a.astype(a.dtype) is a False # asarray() returns the original object if possible: >>> np.asarray(a, dtype=a.dtype) is a True What exactly are you promising if you inherit from ``AbstractArray``? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This will presumably be refined over time. The ideal of course is that your class should be indistinguishable from a real ``ndarray``, but nothing enforces that except the expectations of users. In practice, declaring that your class implements the ``AbstractArray`` interface simply means that it will start passing through ``asabstractarray``, and so by subclassing it you're saying that if some code works for ``ndarray``\s but breaks for your class, then you're willing to accept bug reports on that. To start with, we should declare ``__array_ufunc__`` to be an abstract method, and add the ``NDArrayOperatorsMixin`` methods as mixin methods. Declaring ``astype`` as an ``@abstractmethod`` probably makes sense as well, since it's used by ``asabstractarray``. We might also want to go ahead and add some basic attributes like ``ndim``, ``shape``, ``dtype``. Adding new abstract methods will be a bit trick, because ABCs enforce these at subclass time; therefore, simply adding a new `@abstractmethod` will be a backwards compatibility break. If this becomes a problem then we can use some hacks to implement an `@upcoming_abstractmethod` decorator that only issues a warning if the method is missing, and treat it like a regular deprecation cycle. (In this case, the thing we'd be deprecating is "support for abstract arrays that are missing feature X".) Naming ~~~~~~ The name of the ABC doesn't matter too much, because it will only be referenced rarely and in relatively specialized situations. The name of the function matters a lot, because most existing instances of ``asarray`` should be replaced by this, and in the future it's what everyone should be reaching for by default unless they have a specific reason to use ``asarray`` instead. This suggests that its name really should be *shorter* and *more memorable* than ``asarray``... which is difficult. I've used ``asabstractarray`` in this draft, but I'm not really happy with it, because it's too long and people are unlikely to start using it by habit without endless exhortations. One option would be to actually change ``asarray``\'s semantics so that *it* passes through ``AbstractArray`` objects unchanged. But I'm worried that there may be a lot of code out there that calls ``asarray`` and then passes the result into some C function that doesn't do any further type checking (because it knows that its caller has already used ``asarray``). If we allow ``asarray`` to return ``AbstractArray`` objects, and then someone calls one of these C wrappers and passes it an ``AbstractArray`` object like a sparse array, then they'll get a segfault. Right now, in the same situation, ``asarray`` will instead invoke the object's ``__array__`` method, or use the buffer interface to make a view, or pass through an array with object dtype, or raise an error, or similar. Probably none of these outcomes are actually desireable in most cases, so maybe making it a segfault instead would be OK? But it's dangerous given that we don't know how common such code is. OTOH, if we were starting from scratch then this would probably be the ideal solution. We can't use ``asanyarray`` or ``array``, since those are already taken. Any other ideas? ``np.cast``, ``np.coerce``? Implementation -------------- 1. Rename ``NDArrayOperatorsMixin`` to ``AbstractArray`` (leaving behind an alias for backwards compatibility) and make it an ABC. 2. Add ``asabstractarray`` (or whatever we end up calling it), and probably a C API equivalent. 3. Begin migrating NumPy internal functions to using ``asabstractarray`` where appropriate. Backward compatibility ---------------------- This is purely a new feature, so there are no compatibility issues. (Unless we decide to change the semantics of ``asarray`` itself.) Rejected alternatives --------------------- One suggestion that has come up is to define multiple abstract classes for different subsets of the array interface. Nothing in this proposal stops either NumPy or third-parties from doing this in the future, but it's very difficult to guess ahead of time which subsets would be useful. Also, "the full ndarray interface" is something that existing libraries are written to expect (because they work with actual ndarrays) and test (because they test with actual ndarrays), so it's by far the easiest place to start. Links to discussion ------------------- TBD Appendix: Benchmark script -------------------------- .. literal-include:: nep-0016-benchmark.py Copyright --------- This document has been placed in the public domain. -n -- Nathaniel J. Smith -- https://vorpus.org From gregor.thalhammer at gmail.com Thu Mar 8 04:52:15 2018 From: gregor.thalhammer at gmail.com (Gregor Thalhammer) Date: Thu, 8 Mar 2018 10:52:15 +0100 Subject: [Numpy-discussion] New NEP: merging multiarray and umath In-Reply-To: References: Message-ID: <23471BD4-A81B-4B9C-AECC-D161C3643B81@gmail.com> Hi, long time ago I wrote a wrapper to to use optimised and parallelized math functions from Intels vector math library geggo/uvml: Provide vectorized math function (MKL) for numpy I found it useful to inject (some of) the fast methods into numpy via np.set_num_ops(), to gain more performance without changing my programs. While this original project is outdated, I can imagine that a centralised way to swap the implementation of math functions is useful. Therefor I suggest to keep np.set_num_ops(), but admittedly I do not understand all the technical implications of the proposed change. best Gregor -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu Mar 8 10:06:23 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 8 Mar 2018 10:06:23 -0500 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: Message-ID: Hi Nathaniel, Overall, hugely in favour! For detailed comments, it would be good to have a link to a PR; could you put that up? A larger comment: you state that you think `np.asanyarray` is a mistake since `np.matrix` and `np.ma.MaskedArray` would pass through and that those do not strictly mimic `NDArray`. Here, I agree with `matrix` (but since we're deprecating it, let's remove that from the discussion), but I do not see how your proposed interface would not let `MaskedArray` pass through, nor really that one would necessarily want that. I think it may be good to distinguish two separate cases: 1. Everything has exactly the same meaning as for `ndarray` but the data is stored differently (i.e., only `view` does not work). One can thus expect that for `output = function(inputs)`, at the end all `duck_output == ndarray_output`. 2. Everything is implemented but operations may give different output (depending on masks for masked arrays, units for quantities, etc.), so generally `duck_output != ndarray_output`. Which one of these are you aiming at? By including `NDArrayOperatorsMixin`, it would seem option (2), but perhaps not? Is there a case for both separately? Smaller general comment: at least in the NEP I would not worry about deprecating `NDArrayOperatorsMixin` - this may well be handy in itself (for things that implement `__array_ufunc__` but do not have shape, etc. (I have been doing some work on creating ufunc chains that would use this -- but they definitely are not array-like). Similarly, I think there is room for an `NDArrayShapeMixin` which might help with `concatenate` and friends. Finally, on the name: `asarray` and `asanyarray` are just shims over `array`, so one option would be to add an argument in `array` (or broaden the scope of `subok`). As an explicit suggestion, one could introduce a `duck` or `abstract` argument to `array` which is used in `asarray` and `asanyarray` as well (corresponding to options 1 and 2), and eventually default to something sensible (I would think `False` for `asarray` and `True` for `asanyarray`). All the best, Marten From charlesr.harris at gmail.com Thu Mar 8 11:20:08 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 8 Mar 2018 09:20:08 -0700 Subject: [Numpy-discussion] New NEP: merging multiarray and umath In-Reply-To: <23471BD4-A81B-4B9C-AECC-D161C3643B81@gmail.com> References: <23471BD4-A81B-4B9C-AECC-D161C3643B81@gmail.com> Message-ID: On Thu, Mar 8, 2018 at 2:52 AM, Gregor Thalhammer < gregor.thalhammer at gmail.com> wrote: > > Hi, > > long time ago I wrote a wrapper to to use optimised and parallelized math > functions from Intels vector math library > geggo/uvml: Provide vectorized math function (MKL) for numpy > > > I found it useful to inject (some of) the fast methods into numpy via > np.set_num_ops(), to gain more performance without changing my programs. > I think that was much of the original motivation for `set_num_ops` back in the Numeric days, where there was little commonality among platforms and getting hold of optimized libraries was very much an individual thing. The former cblas module, now merged with multiarray, was present for the same reasons. > > While this original project is outdated, I can imagine that a centralised > way to swap the implementation of math functions is useful. Therefor I > suggest to keep np.set_num_ops(), but admittedly I do not understand all > the technical implications of the proposed change. > I suppose we could set it up to detect and use an external library during compilation. The CBLAS implementations currently do that and should pick up the MKL version when available. Where are the MKL functions you used presented? That is an admittedly lower level interface, however. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu Mar 8 11:27:40 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 8 Mar 2018 11:27:40 -0500 Subject: [Numpy-discussion] New NEP: merging multiarray and umath In-Reply-To: <23471BD4-A81B-4B9C-AECC-D161C3643B81@gmail.com> References: <23471BD4-A81B-4B9C-AECC-D161C3643B81@gmail.com> Message-ID: On Thu, Mar 8, 2018 at 4:52 AM, Gregor Thalhammer wrote: > > Hi, > > long time ago I wrote a wrapper to to use optimised and parallelized math > functions from Intels vector math library > geggo/uvml: Provide vectorized math function (MKL) for numpy > > I found it useful to inject (some of) the fast methods into numpy via > np.set_num_ops(), to gain more performance without changing my programs. > > While this original project is outdated, I can imagine that a centralised > way to swap the implementation of math functions is useful. Therefor I > suggest to keep np.set_num_ops(), but admittedly I do not understand all the > technical implications of the proposed change. There may still be a case for being able to swap out the functions that do the actual work, i.e., the parts of the ufuncs that are called once any conversion to ndarray has been done. -- Marten From charlesr.harris at gmail.com Thu Mar 8 11:30:22 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 8 Mar 2018 09:30:22 -0700 Subject: [Numpy-discussion] New NEP: merging multiarray and umath In-Reply-To: References: <23471BD4-A81B-4B9C-AECC-D161C3643B81@gmail.com> Message-ID: On Thu, Mar 8, 2018 at 9:20 AM, Charles R Harris wrote: > > > On Thu, Mar 8, 2018 at 2:52 AM, Gregor Thalhammer < > gregor.thalhammer at gmail.com> wrote: > >> >> Hi, >> >> long time ago I wrote a wrapper to to use optimised and parallelized math >> functions from Intels vector math library >> geggo/uvml: Provide vectorized math function (MKL) for numpy >> >> >> I found it useful to inject (some of) the fast methods into numpy via >> np.set_num_ops(), to gain more performance without changing my programs. >> > > I think that was much of the original motivation for `set_num_ops` back in > the Numeric days, where there was little commonality among platforms and > getting hold of optimized libraries was very much an individual thing. The > former cblas module, now merged with multiarray, was present for the same > reasons. > > >> >> While this original project is outdated, I can imagine that a centralised >> way to swap the implementation of math functions is useful. Therefor I >> suggest to keep np.set_num_ops(), but admittedly I do not understand all >> the technical implications of the proposed change. >> > > I suppose we could set it up to detect and use an external library during > compilation. The CBLAS implementations currently do that and should pick up > the MKL version when available. Where are the MKL functions you used > presented? That is an admittedly lower level interface, however. > > Note that Intel is also working to support NumPy and intends to use the Intel optimizations as part of that. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu Mar 8 11:34:34 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 8 Mar 2018 11:34:34 -0500 Subject: [Numpy-discussion] New NEP: merging multiarray and umath In-Reply-To: References: Message-ID: I think part of the problem is that ufuncs actually have two parts: a generic interface, which turns all its arguments into ndarray (or calls `__array_ufunc__`) and an ndarray-specific implementation of the given function (partially, just the iterator, partially the inner loop). The latter could logically be moved to `ndarray.__array_ufunc__` (and thus to `multiarray`). In that case, `umath` would hardly depend on `multiarray` any more. But perhaps this is a bit besides the point: building the two at the same time would go a long way to making it easier to do a move like the above. -- Marten From shoyer at gmail.com Thu Mar 8 13:56:19 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Thu, 08 Mar 2018 18:56:19 +0000 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: Message-ID: Hi Nathaniel, Thanks for starting the discussion! Like Marten says, I think it would be useful to more clearly define what it means to be an abstract array. ndarray has lots of methods/properties that expose internal implementation (e.g., view, strides) that presumably we don't want to require as part of this interfaces. On the other hand, dtype and shape are almost assuredly part of this interface. To help guide the discussion, it would be good to identify concrete examples of types that should and should not satisfy this interface, e.g., Marten's case 1: works exactly like ndarray, but stores data differently: parallel arrays (e.g., dask.array), sparse arrays (e.g., https://github.com/pydata/sparse), hypothetical non-strided arrays (e.g., always C ordered). Marten's case 2: same methods as ndarray, but gives different results: np.ma.MaskedArray, arrays with units (quantities), maybe labeled arrays like xarray.DataArray I don't think we have a hope of making a single base class for case 2 work with everything in NumPy, but we can define interfaces with different levels of functionality. Because there is such a gradation of "duck array" types, I agree with Marten that we should not deprecate NDArrayOperatorsMixin. It's useful for types like xarray.Dataset that define __array_ufunc__ but cannot satisfy the full abstract array interface. Finally for the name, what about `asduckarray`? Thought perhaps that could be a source of confusion, and given the gradation of duck array like types. Cheers, Stephan On Thu, Mar 8, 2018 at 7:07 AM Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > Hi Nathaniel, > > Overall, hugely in favour! For detailed comments, it would be good to > have a link to a PR; could you put that up? > > A larger comment: you state that you think `np.asanyarray` is a > mistake since `np.matrix` and `np.ma.MaskedArray` would pass through > and that those do not strictly mimic `NDArray`. Here, I agree with > `matrix` (but since we're deprecating it, let's remove that from the > discussion), but I do not see how your proposed interface would not > let `MaskedArray` pass through, nor really that one would necessarily > want that. > > I think it may be good to distinguish two separate cases: > 1. Everything has exactly the same meaning as for `ndarray` but the > data is stored differently (i.e., only `view` does not work). One can > thus expect that for `output = function(inputs)`, at the end all > `duck_output == ndarray_output`. > 2. Everything is implemented but operations may give different output > (depending on masks for masked arrays, units for quantities, etc.), so > generally `duck_output != ndarray_output`. > > Which one of these are you aiming at? By including > `NDArrayOperatorsMixin`, it would seem option (2), but perhaps not? Is > there a case for both separately? > > Smaller general comment: at least in the NEP I would not worry about > deprecating `NDArrayOperatorsMixin` - this may well be handy in itself > (for things that implement `__array_ufunc__` but do not have shape, > etc. (I have been doing some work on creating ufunc chains that would > use this -- but they definitely are not array-like). Similarly, I > think there is room for an `NDArrayShapeMixin` which might help with > `concatenate` and friends. > > Finally, on the name: `asarray` and `asanyarray` are just shims over > `array`, so one option would be to add an argument in `array` (or > broaden the scope of `subok`). > > As an explicit suggestion, one could introduce a `duck` or `abstract` > argument to `array` which is used in `asarray` and `asanyarray` as > well (corresponding to options 1 and 2), and eventually default to > something sensible (I would think `False` for `asarray` and `True` for > `asanyarray`). > > All the best, > > Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marko.asplund at gmail.com Thu Mar 8 15:44:39 2018 From: marko.asplund at gmail.com (Marko Asplund) Date: Thu, 8 Mar 2018 22:44:39 +0200 Subject: [Numpy-discussion] numpy.random.randn Message-ID: On Wed, 7 Mar 2018 13:14:36, Robert Kern wrote: > > With NumPy I'm simply using the following random initilization code: > > > > np.random.randn(n_h, n_x) * 0.01 > > > > I'm trying to emulate the same behaviour in my Scala code by sampling > from a > > Gaussian distribution with mean = 0 and std dev = 1. > `np.random.randn(n_h, n_x) * 0.01` gives a Gaussian distribution of mean=0 > and stdev=0.01 Sorry for being a bit inaccurate. My Scala code actually mirrors the NumPy based random initialization, so I sample with Gaussian of mean = 0 and std dev = 1, then multiply with 0.01. Despite the extra step the result should be the same as with the NumPy code above. Is there anything else that could be different with the random initilization methods? marko -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Mar 8 18:55:10 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 8 Mar 2018 16:55:10 -0700 Subject: [Numpy-discussion] NumPy 1.14.2 release Message-ID: Hi All, I'm looking to make a NumPy soonish, possibly at the beginning of next week. The only change planned is a fix for the printing problem that the astropy folks reported. The fix for that problem is also in master, so if you test against master you should be able check if the fix works for you. If you have experienced any other problems with 1.14.1, please report them, and also mention them here so that they don't fall through the cracks. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Mar 8 20:06:52 2018 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 8 Mar 2018 17:06:52 -0800 Subject: [Numpy-discussion] New NEP: merging multiarray and umath In-Reply-To: <23471BD4-A81B-4B9C-AECC-D161C3643B81@gmail.com> References: <23471BD4-A81B-4B9C-AECC-D161C3643B81@gmail.com> Message-ID: On Thu, Mar 8, 2018 at 1:52 AM, Gregor Thalhammer wrote: > > Hi, > > long time ago I wrote a wrapper to to use optimised and parallelized math > functions from Intels vector math library > geggo/uvml: Provide vectorized math function (MKL) for numpy > > I found it useful to inject (some of) the fast methods into numpy via > np.set_num_ops(), to gain more performance without changing my programs. > > While this original project is outdated, I can imagine that a centralised > way to swap the implementation of math functions is useful. Therefor I > suggest to keep np.set_num_ops(), but admittedly I do not understand all the > technical implications of the proposed change. The main part of the proposal is to merge the two libraries; the question of whether to deprecate set_numeric_ops is a bit separate. There's no technical obstacle to keeping it, except the usual issue of having more cruft to maintain :-). It's usually true that any monkeypatching interface will be useful to someone under some circumstances, but we usually don't consider this a good enough reason on its own to add and maintain these kinds of interfaces. And an unfortunate side-effect of these kinds of hacky interfaces is that they can end up removing the pressure to solve problems properly. In this case, better solutions would include: - Adding support for accelerated vector math libraries to NumPy directly (e.g. MKL, yeppp) - Overriding the inner loops inside ufuncs like numpy.add that np.ndarray.__add__ ultimately calls. This would speed up all addition (whether or not it uses Python + syntax), would be a more general solution (e.g. you could monkeypatch np.exp to use MKL's fast vectorized exp), would let you skip reimplementing all the tricky shared bits of the ufunc logic, etc. Conceptually it's not even very hacky, because we allow you add new loops to existing ufuncs; making it possible to replace existing loops wouldn't be a big stretch. (In fact it's possible that we already allow this; I haven't checked.) So I still lean towards deprecating set_numeric_ops. It's not the most crucial part of the proposal though; if it turns out to be too controversial then I'll take it out. -n -- Nathaniel J. Smith -- https://vorpus.org From jni.soma at gmail.com Thu Mar 8 20:51:56 2018 From: jni.soma at gmail.com (Juan Nunez-Iglesias) Date: Fri, 09 Mar 2018 12:51:56 +1100 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: Message-ID: <1520560316.2962680.1296803088.6C85AC87@webmail.messagingengine.com> On Fri, Mar 9, 2018, at 5:56 AM, Stephan Hoyer wrote: > Marten's case 1: works exactly like ndarray, but stores data > differently: parallel arrays (e.g., dask.array), sparse arrays (e.g., > https://github.com/pydata/sparse), hypothetical non-strided arrays > (e.g., always C ordered). Two other "hypotheticals" that would fit nicely in this space: - the Open Connectome folks (https://neurodata.io) proposed linearising indices using space-filling curves, which minimizes cache misses (or IO reads) for giant volumes. I believe they implemented this but can't find it currently.- the N5 format for chunked arrays on disk: https://github.com/saalfeldlab/n5 > Finally for the name, what about `asduckarray`? Thought perhaps that > could be a source of confusion, and given the gradation of duck array > like types. I suggest that the name should *not* use programmer lingo, so neither "abstract" nor "duck" should be in there. My humble proposal is "arraylike". (I know that this term has included things like "list-of- list" before but only in text, not code, as far as I know.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Mar 8 23:22:29 2018 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 8 Mar 2018 20:22:29 -0800 Subject: [Numpy-discussion] Where to discuss NEPs (was: Re: new NEP: np.AbstractArray and np.asabstractarray) Message-ID: On Thu, Mar 8, 2018 at 7:06 AM, Marten van Kerkwijk wrote: > Hi Nathaniel, > > Overall, hugely in favour! For detailed comments, it would be good to > have a link to a PR; could you put that up? Well, there's a PR here: https://github.com/numpy/numpy/pull/10706 But, this raises a question :-). (One which also came up here: https://github.com/numpy/numpy/pull/10704#issuecomment-371684170) There are sensible two workflows we could use (or at least, two that I can think of): 1. We merge updates to the NEPs as we go, so that whatever's in the repo is the current draft. Anyone can go to the NEP webpage at http://numpy.org/neps (WIP, see #10702) to see the latest version of all NEPs, whether accepted, rejected, or in progress. Discussion happens on the mailing list, and line-by-line feedback can be done by quote-replying and commenting on individual lines. From time to time, the NEP author takes all the accumulated feedback, updates the document, and makes a new post to the list to let people know about the updated version. This is how python-dev handles PEPs. 2. We use Github itself to manage the review. The repo only contains "accepted" NEPs; draft NEPs are represented by open PRs, and rejected NEPs are represented by PRs that were closed-without-merging. Discussion uses Github's commenting/review tools, and happens in the PR itself. This is roughly how Rust handles their RFC process, for example: https://github.com/rust-lang/rfcs Trying to do some hybrid version of these seems like it would be pretty painful, so we should pick one. Given that historically we've tried to use the mailing list for substantive features/planning discussions, and that our NEP process has been much closer to workflow 1 than workflow 2 (e.g., there are already a bunch of old NEPs already in the repo that are effectively rejected/withdrawn), I think we should maybe continue that way, and keep discussions here? So my suggestion is discussion should happen on the list, and NEP updates should be merged promptly, or just self-merged. Sound good? -n -- Nathaniel J. Smith -- https://vorpus.org From shoyer at gmail.com Fri Mar 9 00:45:35 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 09 Mar 2018 05:45:35 +0000 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: <1520560316.2962680.1296803088.6C85AC87@webmail.messagingengine.com> References: <1520560316.2962680.1296803088.6C85AC87@webmail.messagingengine.com> Message-ID: On Thu, Mar 8, 2018 at 5:54 PM Juan Nunez-Iglesias wrote: > On Fri, Mar 9, 2018, at 5:56 AM, Stephan Hoyer wrote: > > Marten's case 1: works exactly like ndarray, but stores data differently: > parallel arrays (e.g., dask.array), sparse arrays (e.g., > https://github.com/pydata/sparse), hypothetical non-strided arrays (e.g., > always C ordered). > > > Two other "hypotheticals" that would fit nicely in this space: > - the Open Connectome folks (https://neurodata.io) proposed linearising > indices using space-filling curves, which minimizes cache misses (or IO > reads) for giant volumes. I believe they implemented this but can't find it > currently. > - the N5 format for chunked arrays on disk: > https://github.com/saalfeldlab/n5 > I think these fall into another important category of duck arrays. "Indexable" arrays the serve as storage, but that don't support computation. These sorts of arrays typically support operations like indexing and define handful of array-like properties (e.g., dtype and shape), but not arithmetic, reductions or reshaping. This means you can't quite use them as a drop-in replacement for NumPy arrays in all cases, but that's OK. In contrast, both dask.array and sparse do aspire to do fill out nearly the full numpy.ndarray API. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Mar 9 01:26:46 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 8 Mar 2018 22:26:46 -0800 Subject: [Numpy-discussion] Where to discuss NEPs (was: Re: new NEP: np.AbstractArray and np.asabstractarray) In-Reply-To: References: Message-ID: On Thu, Mar 8, 2018 at 8:22 PM, Nathaniel Smith wrote: > On Thu, Mar 8, 2018 at 7:06 AM, Marten van Kerkwijk > wrote: > > Hi Nathaniel, > > > > Overall, hugely in favour! For detailed comments, it would be good to > > have a link to a PR; could you put that up? > > Well, there's a PR here: https://github.com/numpy/numpy/pull/10706 > > But, this raises a question :-). (One which also came up here: > https://github.com/numpy/numpy/pull/10704#issuecomment-371684170) > > There are sensible two workflows we could use (or at least, two that I > can think of): > > 1. We merge updates to the NEPs as we go, so that whatever's in the > repo is the current draft. Anyone can go to the NEP webpage at > http://numpy.org/neps (WIP, see #10702) to see the latest version of > all NEPs, whether accepted, rejected, or in progress. Discussion > happens on the mailing list, and line-by-line feedback can be done by > quote-replying and commenting on individual lines. From time to time, > the NEP author takes all the accumulated feedback, updates the > document, and makes a new post to the list to let people know about > the updated version. > > This is how python-dev handles PEPs. > > 2. We use Github itself to manage the review. The repo only contains > "accepted" NEPs; draft NEPs are represented by open PRs, and rejected > NEPs are represented by PRs that were closed-without-merging. > Discussion uses Github's commenting/review tools, and happens in the > PR itself. > > This is roughly how Rust handles their RFC process, for example: > https://github.com/rust-lang/rfcs > > Trying to do some hybrid version of these seems like it would be > pretty painful, so we should pick one. > > Given that historically we've tried to use the mailing list for > substantive features/planning discussions, and that our NEP process > has been much closer to workflow 1 than workflow 2 (e.g., there are > already a bunch of old NEPs already in the repo that are effectively > rejected/withdrawn), I think we should maybe continue that way, and > keep discussions here? > > So my suggestion is discussion should happen on the list, and NEP > updates should be merged promptly, or just self-merged. Sound good? Agreed that overall (1) is better than (2), rejected NEPs should be visible. However there's no need for super-quick self-merge, and I think it would be counter-productive. Instead, just send a PR, leave it open for some discussion, and update for detailed comments (as well as long in-depth discussions that only a couple of people care about) in the Github UI and major ones on the list. Once it's stabilized a bit, then merge with status "Draft" and update once in a while. I think this is also much more in like with what python-dev does, I have seen substantial discussion on Github and have not seen quick self-merges. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Fri Mar 9 02:21:01 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Thu, 8 Mar 2018 23:21:01 -0800 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: <1520560316.2962680.1296803088.6C85AC87@webmail.messagingengine.com> References: <1520560316.2962680.1296803088.6C85AC87@webmail.messagingengine.com> Message-ID: Not that I?m against different ?levels? of ndarray granularity, but I just don?t want it to introduce complexity for the end-user. For example, it would be unreasonable to expect the end-user to check for all parts of the interface that they need support for separately. Keeping this in view; different levels only make sense if and only if they are strict sub/supersets of each other, so the user can just check for the highest level of compatibility they require, but even then they would need to learn about the different ?levels". PS, thanks for putting this together! I was thinking of doing it this weekend but you beat me to it and covered aspects I wouldn?t have thought of. The name ?asarraylike? appeals to me, as does a ?custom=? kwarg for asanyarray. Sent from Astro for Mac On Mar 9, 2018 at 02:51, Juan Nunez-Iglesias wrote: On Fri, Mar 9, 2018, at 5:56 AM, Stephan Hoyer wrote: Marten's case 1: works exactly like ndarray, but stores data differently: parallel arrays (e.g., dask.array), sparse arrays (e.g., https://github.com/pydata/sparse), hypothetical non-strided arrays (e.g., always C ordered). Two other "hypotheticals" that would fit nicely in this space: - the Open Connectome folks (https://neurodata.io) proposed linearising indices using space-filling curves, which minimizes cache misses (or IO reads) for giant volumes. I believe they implemented this but can't find it currently. - the N5 format for chunked arrays on disk: https://github.com/saalfeldlab/n5 Finally for the name, what about `asduckarray`? Thought perhaps that could be a source of confusion, and given the gradation of duck array like types. I suggest that the name should *not* use programmer lingo, so neither "abstract" nor "duck" should be in there. My humble proposal is "arraylike". (I know that this term has included things like "list-of-list" before but only in text, not code, as far as I know.) _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Fri Mar 9 02:23:37 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 8 Mar 2018 23:23:37 -0800 Subject: [Numpy-discussion] Where to discuss NEPs (was: Re: new NEP: np.AbstractArray and np.asabstractarray) In-Reply-To: References: Message-ID: <20180309072337.slrjkru657dpbmuo@carbo> On Thu, 08 Mar 2018 20:22:29 -0800, Nathaniel Smith wrote: > 1. We merge updates to the NEPs as we go, so that whatever's in the > repo is the current draft. Anyone can go to the NEP webpage at > http://numpy.org/neps (WIP, see #10702) to see the latest version of > all NEPs, whether accepted, rejected, or in progress. If we go this route, it may also be useful to give some more guidance on how complete we expect a first draft of a NEP to be before it is submitted as a PR. We currently only have: """ The NEP champion (a.k.a. Author) should first attempt to ascertain whether the idea is suitable for a NEP. Posting to the numpy-discussion mailing list is the best way to go about doing this. Following a discussion on the mailing list, the proposal should be submitted as a draft NEP via a GitHub pull request to the doc/neps directory [...] """ Best regards St?fan From matti.picus at gmail.com Fri Mar 9 02:49:27 2018 From: matti.picus at gmail.com (Matti Picus) Date: Fri, 9 Mar 2018 09:49:27 +0200 Subject: [Numpy-discussion] Where to discuss NEPs (was: Re: new NEP: np.AbstractArray and np.asabstractarray) In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Mar 9 03:00:47 2018 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 9 Mar 2018 00:00:47 -0800 Subject: [Numpy-discussion] Where to discuss NEPs (was: Re: new NEP: np.AbstractArray and np.asabstractarray) In-Reply-To: References: Message-ID: On Thu, Mar 8, 2018 at 10:26 PM, Ralf Gommers wrote: > > > On Thu, Mar 8, 2018 at 8:22 PM, Nathaniel Smith wrote: >> >> On Thu, Mar 8, 2018 at 7:06 AM, Marten van Kerkwijk >> wrote: >> > Hi Nathaniel, >> > >> > Overall, hugely in favour! For detailed comments, it would be good to >> > have a link to a PR; could you put that up? >> >> Well, there's a PR here: https://github.com/numpy/numpy/pull/10706 >> >> But, this raises a question :-). (One which also came up here: >> https://github.com/numpy/numpy/pull/10704#issuecomment-371684170) >> >> There are sensible two workflows we could use (or at least, two that I >> can think of): >> >> 1. We merge updates to the NEPs as we go, so that whatever's in the >> repo is the current draft. Anyone can go to the NEP webpage at >> http://numpy.org/neps (WIP, see #10702) to see the latest version of >> all NEPs, whether accepted, rejected, or in progress. Discussion >> happens on the mailing list, and line-by-line feedback can be done by >> quote-replying and commenting on individual lines. From time to time, >> the NEP author takes all the accumulated feedback, updates the >> document, and makes a new post to the list to let people know about >> the updated version. >> >> This is how python-dev handles PEPs. >> >> 2. We use Github itself to manage the review. The repo only contains >> "accepted" NEPs; draft NEPs are represented by open PRs, and rejected >> NEPs are represented by PRs that were closed-without-merging. >> Discussion uses Github's commenting/review tools, and happens in the >> PR itself. >> >> This is roughly how Rust handles their RFC process, for example: >> https://github.com/rust-lang/rfcs >> >> Trying to do some hybrid version of these seems like it would be >> pretty painful, so we should pick one. >> >> Given that historically we've tried to use the mailing list for >> substantive features/planning discussions, and that our NEP process >> has been much closer to workflow 1 than workflow 2 (e.g., there are >> already a bunch of old NEPs already in the repo that are effectively >> rejected/withdrawn), I think we should maybe continue that way, and >> keep discussions here? >> >> So my suggestion is discussion should happen on the list, and NEP >> updates should be merged promptly, or just self-merged. Sound good? > > > Agreed that overall (1) is better than (2), rejected NEPs should be visible. > However there's no need for super-quick self-merge, and I think it would be > counter-productive. > > Instead, just send a PR, leave it open for some discussion, and update for > detailed comments (as well as long in-depth discussions that only a couple > of people care about) in the Github UI and major ones on the list. Once it's > stabilized a bit, then merge with status "Draft" and update once in a while. > I think this is also much more in like with what python-dev does, I have > seen substantial discussion on Github and have not seen quick self-merges. Not sure what you mean about python-dev. Are you looking at the peps repository? https://github.com/python/peps >From a quick skim, it looks like of the last 37 commits, only 8 came in through PRs and the other 29 were pushed directly by committers without any review. 3 of the 8 PRs were self-merged immediately after submission, and of the remaining 5 PRs, 4 of them were from external contributors who didn't have commit rights, and the 1 other was a fix to the repo README, rather than an actual PEP change. I don't think I've ever seen any kind of substantive discussion in that repo -- any discussion is mostly restricted to helping new contributors with procedural stuff, maybe formatting issues or fixes to the PEP tooling. Anyway, just because python-dev does it that way doesn't mean that we have to too. But if we split discussions between GH and the mailing list, then we're definitely going to end up discussing substantive issues there (how do we know which discussions only a couple of people care about?), and trying to juggle that seems confusing to me, plus makes it harder to track down what happened later, after we've had multiple PRs each with their own comments... -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Fri Mar 9 04:29:17 2018 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 9 Mar 2018 01:29:17 -0800 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: Message-ID: On Thu, Mar 8, 2018 at 7:06 AM, Marten van Kerkwijk wrote: > A larger comment: you state that you think `np.asanyarray` is a > mistake since `np.matrix` and `np.ma.MaskedArray` would pass through > and that those do not strictly mimic `NDArray`. Here, I agree with > `matrix` (but since we're deprecating it, let's remove that from the > discussion), but I do not see how your proposed interface would not > let `MaskedArray` pass through, nor really that one would necessarily > want that. We can discuss whether MaskedArray should be an AbstractArray. Conceptually it probably should be; I think that was a goal of the MaskedArray authors (even if they wouldn't have put it that way). In practice there are a lot of funny quirks in MaskedArray, so I'd want to look more carefully in case there are weird incompatibilities that would cause problems. Note that we can figure this out after the NEP is finished, too. I wonder if the matplotlib folks have any thoughts on this? I know they're one of the more prominent libraries that tries to handle both regular and masked arrays, so maybe they could comment on how often they run > I think it may be good to distinguish two separate cases: > 1. Everything has exactly the same meaning as for `ndarray` but the > data is stored differently (i.e., only `view` does not work). One can > thus expect that for `output = function(inputs)`, at the end all > `duck_output == ndarray_output`. > 2. Everything is implemented but operations may give different output > (depending on masks for masked arrays, units for quantities, etc.), so > generally `duck_output != ndarray_output`. > > Which one of these are you aiming at? By including > `NDArrayOperatorsMixin`, it would seem option (2), but perhaps not? Is > there a case for both separately? Well, (1) is much easier to design around, because it's well-defined :-). And I'm not sure that there's a principled difference between regular arrays and masked arrays/quantity arrays; these *could* be ndarray objects with special dtypes and extra methods, neither of which would disqualify you from being a "case 1" array. (I guess one issue is that because MaskedArray ignores the mask by default, you could get weird results from things like mean calculations: np.sum(masked_arr) / np.prod(masked_arr.shape) does not give the right result. This isn't an issue for quantities, though, or for an R-style NA that propagated by default.) > Smaller general comment: at least in the NEP I would not worry about > deprecating `NDArrayOperatorsMixin` - this may well be handy in itself > (for things that implement `__array_ufunc__` but do not have shape, > etc. (I have been doing some work on creating ufunc chains that would > use this -- but they definitely are not array-like). Similarly, I > think there is room for an `NDArrayShapeMixin` which might help with > `concatenate` and friends. Fair enough. > Finally, on the name: `asarray` and `asanyarray` are just shims over > `array`, so one option would be to add an argument in `array` (or > broaden the scope of `subok`). We definitely don't want to broaden the scope of 'subok', because one of the goals here is to have something that projects like sklearn can use, and they won't use subok :-). (In particular, np.matrix is definitely not a duck array of any kind.) And supporting array() is tricky, because then you have to figure out what to do with the copy=, order=, subok=, ndmin= arguments. copy= in particular is tricky given that we don't know the object's type! I guess we could call obj.copy() or something... but for this first iteration it seemed simplest to make a new function that just has the most important stuff for writing generic functions that accept duck arrays. What we could do is, in addition to adding some kind of asabstractarray() function, *also* make it so asanyarray() starts accepting abstract/duck arrays, on the theory that anyone who's willing to put up with asanyarrays()'s weak guarantees won't notice if we weaken them a bit more. Honestly though I'd rather just not touch asanyarray at all, and maybe even deprecate it someday. -n -- Nathaniel J. Smith -- https://vorpus.org From cmkleffner at gmail.com Fri Mar 9 04:46:11 2018 From: cmkleffner at gmail.com (Carl Kleffner) Date: Fri, 9 Mar 2018 10:46:11 +0100 Subject: [Numpy-discussion] New NEP: merging multiarray and umath In-Reply-To: References: <23471BD4-A81B-4B9C-AECC-D161C3643B81@gmail.com> Message-ID: 2018-03-09 2:06 GMT+01:00 Nathaniel Smith : > On Thu, Mar 8, 2018 at 1:52 AM, Gregor Thalhammer > wrote: > > > > Hi, > > > > long time ago I wrote a wrapper to to use optimised and parallelized math > > functions from Intels vector math library > > geggo/uvml: Provide vectorized math function (MKL) for numpy > > > > I found it useful to inject (some of) the fast methods into numpy via > > np.set_num_ops(), to gain more performance without changing my programs. > > > > While this original project is outdated, I can imagine that a centralised > > way to swap the implementation of math functions is useful. Therefor I > > suggest to keep np.set_num_ops(), but admittedly I do not understand all > the > > technical implications of the proposed change. > > The main part of the proposal is to merge the two libraries; the > question of whether to deprecate set_numeric_ops is a bit separate. > There's no technical obstacle to keeping it, except the usual issue of > having more cruft to maintain :-). > > It's usually true that any monkeypatching interface will be useful to > someone under some circumstances, but we usually don't consider this a > good enough reason on its own to add and maintain these kinds of > interfaces. And an unfortunate side-effect of these kinds of hacky > interfaces is that they can end up removing the pressure to solve > problems properly. In this case, better solutions would include: > > - Adding support for accelerated vector math libraries to NumPy > directly (e.g. MKL, yeppp) > > I just want to bring the Sleef library for vectorized math (C99) into the discussion. Recently a new version with a stabilized API has been provided by its authors. The library is now well documented http://sleef.org and available under the permissive boost license. A runtime CPU dispatcher is used for the different SIMD variants (SSE2, AVX, AVX2, FMA ...) However, I never understand how a vectorized math library can be easily used with numpy arrays in all manners (strided arrays i.e.). > - Overriding the inner loops inside ufuncs like numpy.add that > np.ndarray.__add__ ultimately calls. This would speed up all addition > (whether or not it uses Python + syntax), would be a more general > solution (e.g. you could monkeypatch np.exp to use MKL's fast > vectorized exp), would let you skip reimplementing all the tricky > shared bits of the ufunc logic, etc. Conceptually it's not even very > hacky, because we allow you add new loops to existing ufuncs; making > it possible to replace existing loops wouldn't be a big stretch. (In > fact it's possible that we already allow this; I haven't checked.) > > So I still lean towards deprecating set_numeric_ops. It's not the most > crucial part of the proposal though; if it turns out to be too > controversial then I'll take it out. > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Mar 9 06:04:53 2018 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 9 Mar 2018 11:04:53 +0000 Subject: [Numpy-discussion] Endian dtype specifier without using character codes? Message-ID: Hi, We (over at https://github.com/nipy/nibabel) often want to do stuff like this: ``` dtype_type = 'i' size = 8 endianness = '<' dtype = np.dtype('{}{}{}'.format(endianness, dtype_type, size)) ``` I see that """ Use of the character codes, however, is discouraged. """ https://docs.scipy.org/doc/numpy-1.14.0/reference/arrays.scalars.html What is the recommended way of specifying endianness in my dtype, if I am not using the character codes? Do I have to use something like: ``` np.dtype('int64').newbyteorder(endianness) ``` ? Cheers, Matthew From jtaylor.debian at googlemail.com Fri Mar 9 06:33:21 2018 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 9 Mar 2018 12:33:21 +0100 Subject: [Numpy-discussion] New NEP: merging multiarray and umath In-Reply-To: References: <23471BD4-A81B-4B9C-AECC-D161C3643B81@gmail.com> Message-ID: <81930c51-ac3c-77e9-74c0-ccf12691096a@googlemail.com> On 08.03.2018 17:20, Charles R Harris wrote: > > > On Thu, Mar 8, 2018 at 2:52 AM, Gregor Thalhammer > > wrote: > > > Hi, > > long time ago I wrote a wrapper to to use optimised and parallelized > math functions from Intels vector math library? > geggo/uvml: Provide vectorized math function (MKL) for numpy > > > I found it useful to inject (some of) the fast methods into numpy > via np.set_num_ops(), to gain more performance without changing my > programs. > > > I think that was much of the original motivation for `set_num_ops` back > in the Numeric days, where there was little commonality among platforms > and getting hold of optimized libraries was very much an individual > thing. The former cblas module, now merged with multiarray, was present > for the same reasons. > ?? > > > While this original project is outdated, I can imagine that a > centralised way to swap the implementation of math functions is > useful. Therefor I suggest to keep np.set_num_ops(), but admittedly > I do not understand all the technical implications of the proposed > change. > > > I suppose we could set it up to detect and use an external library > during compilation. The CBLAS implementations currently do that and > should pick up the MKL version when available. Where are the MKL > functions you used presented? That is an admittedly lower level > interface, however. > > Chuck As the functions of the different libraries have vastly different accuracies you want to be able to exchange numeric ops at runtime or at least during load time (like our cblas) and not limit yourself one compile time defined set of functions. Keeping set_numeric_ops would be preferable to me. Though I am not clear on why the two things are connected? Why can't we keep set_numeric_ops and merge multiarray and umath into one shared object? From sebastian at sipsolutions.net Fri Mar 9 05:51:21 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 09 Mar 2018 11:51:21 +0100 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: Message-ID: <1520592681.19004.11.camel@sipsolutions.net> On Thu, 2018-03-08 at 18:56 +0000, Stephan Hoyer wrote: > Hi Nathaniel, > > Thanks for starting the discussion! > > Like Marten says, I think it would be useful to more clearly define > what it means to be an abstract array. ndarray has lots of > methods/properties that expose internal implementation (e.g., view, > strides) that presumably we don't want to require as part of this > interfaces. On the other hand, dtype and shape are almost assuredly > part of this interface. > > To help guide the discussion, it would be good to identify concrete > examples of types that should and should not satisfy this interface, > e.g., > Marten's case 1: works exactly like ndarray, but stores data > differently: parallel arrays (e.g., dask.array), sparse arrays (e.g., > https://github.com/pydata/sparse), hypothetical non-strided arrays > (e.g., always C ordered). > Marten's case 2: same methods as ndarray, but gives different > results: np.ma.MaskedArray, arrays with units (quantities), maybe > labeled arrays like xarray.DataArray > > I don't think we have a hope of making a single base class for case 2 > work with everything in NumPy, but we can define interfaces with > different levels of functionality. True, but I guess the aim is not to care at all about how things are implemented (so only 2)? I agree that we can aim to be as close as possible, but should not expect to reach it. My personal opinion: 1. To do this, we should start it "experimentally" 2. We need something like a reference implementation. First, because it allows testing whether a function e.g. in numpy is actually abstract- safe and second because it will be the only way to find out what our minimal abstract interface actually is (assuming we have started 3). 3. Go ahead with putting it into numpy functions and see how much you need to make them work. In the end, my guess is, everything that works for MaskedArrays and xarray is a pretty safe bet. I disagree with the statement that we do not need to define the minimal reference. In practice we do as soon as we use it for numpy functions. - Sebastian > > Because there is such a gradation of "duck array" types, I agree with > Marten that we should not deprecate NDArrayOperatorsMixin. It's > useful for types like xarray.Dataset that define __array_ufunc__ but > cannot satisfy the full abstract array interface. > > Finally for the name, what about `asduckarray`? Thought perhaps that > could be a source of confusion, and given the gradation of duck array > like types. > > Cheers, > Stephan > > On Thu, Mar 8, 2018 at 7:07 AM Marten van Kerkwijk mail.com> wrote: > > Hi Nathaniel, > > > > Overall, hugely in favour! For detailed comments, it would be good > > to > > have a link to a PR; could you put that up? > > > > A larger comment: you state that you think `np.asanyarray` is a > > mistake since `np.matrix` and `np.ma.MaskedArray` would pass > > through > > and that those do not strictly mimic `NDArray`. Here, I agree with > > `matrix` (but since we're deprecating it, let's remove that from > > the > > discussion), but I do not see how your proposed interface would not > > let `MaskedArray` pass through, nor really that one would > > necessarily > > want that. > > > > I think it may be good to distinguish two separate cases: > > 1. Everything has exactly the same meaning as for `ndarray` but the > > data is stored differently (i.e., only `view` does not work). One > > can > > thus expect that for `output = function(inputs)`, at the end all > > `duck_output == ndarray_output`. > > 2. Everything is implemented but operations may give different > > output > > (depending on masks for masked arrays, units for quantities, etc.), > > so > > generally `duck_output != ndarray_output`. > > > > Which one of these are you aiming at? By including > > `NDArrayOperatorsMixin`, it would seem option (2), but perhaps not? > > Is > > there a case for both separately? > > > > Smaller general comment: at least in the NEP I would not worry > > about > > deprecating `NDArrayOperatorsMixin` - this may well be handy in > > itself > > (for things that implement `__array_ufunc__` but do not have shape, > > etc. (I have been doing some work on creating ufunc chains that > > would > > use this -- but they definitely are not array-like). Similarly, I > > think there is room for an `NDArrayShapeMixin` which might help > > with > > `concatenate` and friends. > > > > Finally, on the name: `asarray` and `asanyarray` are just shims > > over > > `array`, so one option would be to add an argument in `array` (or > > broaden the scope of `subok`). > > > > As an explicit suggestion, one could introduce a `duck` or > > `abstract` > > argument to `array` which is used in `asarray` and `asanyarray` as > > well (corresponding to options 1 and 2), and eventually default to > > something sensible (I would think `False` for `asarray` and `True` > > for > > `asanyarray`). > > > > All the best, > > > > Marten > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From charlesr.harris at gmail.com Fri Mar 9 10:23:19 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 9 Mar 2018 08:23:19 -0700 Subject: [Numpy-discussion] Where to discuss NEPs (was: Re: new NEP: np.AbstractArray and np.asabstractarray) In-Reply-To: References: Message-ID: On Thu, Mar 8, 2018 at 11:26 PM, Ralf Gommers wrote: > > > On Thu, Mar 8, 2018 at 8:22 PM, Nathaniel Smith wrote: > >> On Thu, Mar 8, 2018 at 7:06 AM, Marten van Kerkwijk >> wrote: >> > Hi Nathaniel, >> > >> > Overall, hugely in favour! For detailed comments, it would be good to >> > have a link to a PR; could you put that up? >> >> Well, there's a PR here: https://github.com/numpy/numpy/pull/10706 >> >> But, this raises a question :-). (One which also came up here: >> https://github.com/numpy/numpy/pull/10704#issuecomment-371684170) >> >> There are sensible two workflows we could use (or at least, two that I >> can think of): >> >> 1. We merge updates to the NEPs as we go, so that whatever's in the >> repo is the current draft. Anyone can go to the NEP webpage at >> http://numpy.org/neps (WIP, see #10702) to see the latest version of >> all NEPs, whether accepted, rejected, or in progress. Discussion >> happens on the mailing list, and line-by-line feedback can be done by >> quote-replying and commenting on individual lines. From time to time, >> the NEP author takes all the accumulated feedback, updates the >> document, and makes a new post to the list to let people know about >> the updated version. >> >> This is how python-dev handles PEPs. >> >> 2. We use Github itself to manage the review. The repo only contains >> "accepted" NEPs; draft NEPs are represented by open PRs, and rejected >> NEPs are represented by PRs that were closed-without-merging. >> Discussion uses Github's commenting/review tools, and happens in the >> PR itself. >> >> This is roughly how Rust handles their RFC process, for example: >> https://github.com/rust-lang/rfcs >> >> Trying to do some hybrid version of these seems like it would be >> pretty painful, so we should pick one. >> >> Given that historically we've tried to use the mailing list for >> substantive features/planning discussions, and that our NEP process >> has been much closer to workflow 1 than workflow 2 (e.g., there are >> already a bunch of old NEPs already in the repo that are effectively >> rejected/withdrawn), I think we should maybe continue that way, and >> keep discussions here? >> >> So my suggestion is discussion should happen on the list, and NEP >> updates should be merged promptly, or just self-merged. Sound good? > > > Agreed that overall (1) is better than (2), rejected NEPs should be > visible. However there's no need for super-quick self-merge, and I think it > would be counter-productive. > > Instead, just send a PR, leave it open for some discussion, and update for > detailed comments (as well as long in-depth discussions that only a couple > of people care about) in the Github UI and major ones on the list. Once > it's stabilized a bit, then merge with status "Draft" and update once in a > while. I think this is also much more in like with what python-dev does, I > have seen substantial discussion on Github and have not seen quick > self-merges. > > I have a slight preference for managing the discussion on Github. Note that I added a `component: NEP` label and that discussion can take place on merged/closed PRs, the index could also contain links to proposed NEP PRs. If we just left PR open until acceptance/rejection the label would allow the proposed NEPs to be easily found, especially if we include the NEP number in the title, something like `NEP-10111: ` . Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Fri Mar 9 11:58:55 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 09 Mar 2018 16:58:55 +0000 Subject: [Numpy-discussion] Where to discuss NEPs (was: Re: new NEP: np.AbstractArray and np.asabstractarray) In-Reply-To: References: Message-ID: I also have a slight preference for managing the discussion on GitHub, which is a bit more fully featured than email for long discussion (e.g., it supports code formatting and editing comments). But I'm really OK either way as long as discussion is kept in one place. We could still stipulate that NEPs are advertised on the mailing list: first, to announce them, and second, before merging them marked as accepted. We could even still merge rejected/abandoned NEPs as long as they are clearly marked. On Fri, Mar 9, 2018 at 7:24 AM Charles R Harris wrote: > On Thu, Mar 8, 2018 at 11:26 PM, Ralf Gommers > wrote: > >> >> >> On Thu, Mar 8, 2018 at 8:22 PM, Nathaniel Smith wrote: >> >>> On Thu, Mar 8, 2018 at 7:06 AM, Marten van Kerkwijk >>> wrote: >>> > Hi Nathaniel, >>> > >>> > Overall, hugely in favour! For detailed comments, it would be good to >>> > have a link to a PR; could you put that up? >>> >>> Well, there's a PR here: https://github.com/numpy/numpy/pull/10706 >>> >>> But, this raises a question :-). (One which also came up here: >>> https://github.com/numpy/numpy/pull/10704#issuecomment-371684170) >>> >>> There are sensible two workflows we could use (or at least, two that I >>> can think of): >>> >>> 1. We merge updates to the NEPs as we go, so that whatever's in the >>> repo is the current draft. Anyone can go to the NEP webpage at >>> http://numpy.org/neps (WIP, see #10702) to see the latest version of >>> all NEPs, whether accepted, rejected, or in progress. Discussion >>> happens on the mailing list, and line-by-line feedback can be done by >>> quote-replying and commenting on individual lines. From time to time, >>> the NEP author takes all the accumulated feedback, updates the >>> document, and makes a new post to the list to let people know about >>> the updated version. >>> >>> This is how python-dev handles PEPs. >>> >>> 2. We use Github itself to manage the review. The repo only contains >>> "accepted" NEPs; draft NEPs are represented by open PRs, and rejected >>> NEPs are represented by PRs that were closed-without-merging. >>> Discussion uses Github's commenting/review tools, and happens in the >>> PR itself. >>> >>> This is roughly how Rust handles their RFC process, for example: >>> https://github.com/rust-lang/rfcs >>> >>> Trying to do some hybrid version of these seems like it would be >>> pretty painful, so we should pick one. >>> >>> Given that historically we've tried to use the mailing list for >>> substantive features/planning discussions, and that our NEP process >>> has been much closer to workflow 1 than workflow 2 (e.g., there are >>> already a bunch of old NEPs already in the repo that are effectively >>> rejected/withdrawn), I think we should maybe continue that way, and >>> keep discussions here? >>> >>> So my suggestion is discussion should happen on the list, and NEP >>> updates should be merged promptly, or just self-merged. Sound good? >> >> >> Agreed that overall (1) is better than (2), rejected NEPs should be >> visible. However there's no need for super-quick self-merge, and I think it >> would be counter-productive. >> >> Instead, just send a PR, leave it open for some discussion, and update >> for detailed comments (as well as long in-depth discussions that only a >> couple of people care about) in the Github UI and major ones on the list. >> Once it's stabilized a bit, then merge with status "Draft" and update once >> in a while. I think this is also much more in like with what python-dev >> does, I have seen substantial discussion on Github and have not seen quick >> self-merges. >> >> > I have a slight preference for managing the discussion on Github. Note > that I added a `component: NEP` label and that discussion can take place on > merged/closed PRs, the index could also contain links to proposed NEP PRs. > If we just left PR open until acceptance/rejection the label would allow > the proposed NEPs to be easily found, especially if we include the NEP > number in the title, something like `NEP-10111: ` . > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Fri Mar 9 12:00:43 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 09 Mar 2018 17:00:43 +0000 Subject: [Numpy-discussion] Where to discuss NEPs (was: Re: new NEP: np.AbstractArray and np.asabstractarray) References: Message-ID: I'll note that we basically used GitHub for revising __array_ufunc__ NEP, and I think that worked out better for everyone involved. The discussion was a little too specialized and high volume to be well handled on the mailing list. On Fri, Mar 9, 2018 at 8:58 AM Stephan Hoyer wrote: > I also have a slight preference for managing the discussion on GitHub, > which is a bit more fully featured than email for long discussion (e.g., it > supports code formatting and editing comments). But I'm really OK either > way as long as discussion is kept in one place. > > We could still stipulate that NEPs are advertised on the mailing list: > first, to announce them, and second, before merging them marked as > accepted. We could even still merge rejected/abandoned NEPs as long as they > are clearly marked. > > On Fri, Mar 9, 2018 at 7:24 AM Charles R Harris > wrote: > >> On Thu, Mar 8, 2018 at 11:26 PM, Ralf Gommers >> wrote: >> >>> >>> >>> On Thu, Mar 8, 2018 at 8:22 PM, Nathaniel Smith wrote: >>> >>>> On Thu, Mar 8, 2018 at 7:06 AM, Marten van Kerkwijk >>>> wrote: >>>> > Hi Nathaniel, >>>> > >>>> > Overall, hugely in favour! For detailed comments, it would be good to >>>> > have a link to a PR; could you put that up? >>>> >>>> Well, there's a PR here: https://github.com/numpy/numpy/pull/10706 >>>> >>>> But, this raises a question :-). (One which also came up here: >>>> https://github.com/numpy/numpy/pull/10704#issuecomment-371684170) >>>> >>>> There are sensible two workflows we could use (or at least, two that I >>>> can think of): >>>> >>>> 1. We merge updates to the NEPs as we go, so that whatever's in the >>>> repo is the current draft. Anyone can go to the NEP webpage at >>>> http://numpy.org/neps (WIP, see #10702) to see the latest version of >>>> all NEPs, whether accepted, rejected, or in progress. Discussion >>>> happens on the mailing list, and line-by-line feedback can be done by >>>> quote-replying and commenting on individual lines. From time to time, >>>> the NEP author takes all the accumulated feedback, updates the >>>> document, and makes a new post to the list to let people know about >>>> the updated version. >>>> >>>> This is how python-dev handles PEPs. >>>> >>>> 2. We use Github itself to manage the review. The repo only contains >>>> "accepted" NEPs; draft NEPs are represented by open PRs, and rejected >>>> NEPs are represented by PRs that were closed-without-merging. >>>> Discussion uses Github's commenting/review tools, and happens in the >>>> PR itself. >>>> >>>> This is roughly how Rust handles their RFC process, for example: >>>> https://github.com/rust-lang/rfcs >>>> >>>> Trying to do some hybrid version of these seems like it would be >>>> pretty painful, so we should pick one. >>>> >>>> Given that historically we've tried to use the mailing list for >>>> substantive features/planning discussions, and that our NEP process >>>> has been much closer to workflow 1 than workflow 2 (e.g., there are >>>> already a bunch of old NEPs already in the repo that are effectively >>>> rejected/withdrawn), I think we should maybe continue that way, and >>>> keep discussions here? >>>> >>>> So my suggestion is discussion should happen on the list, and NEP >>>> updates should be merged promptly, or just self-merged. Sound good? >>> >>> >>> Agreed that overall (1) is better than (2), rejected NEPs should be >>> visible. However there's no need for super-quick self-merge, and I think it >>> would be counter-productive. >>> >>> Instead, just send a PR, leave it open for some discussion, and update >>> for detailed comments (as well as long in-depth discussions that only a >>> couple of people care about) in the Github UI and major ones on the list. >>> Once it's stabilized a bit, then merge with status "Draft" and update once >>> in a while. I think this is also much more in like with what python-dev >>> does, I have seen substantial discussion on Github and have not seen quick >>> self-merges. >>> >>> >> I have a slight preference for managing the discussion on Github. Note >> that I added a `component: NEP` label and that discussion can take place on >> merged/closed PRs, the index could also contain links to proposed NEP PRs. >> If we just left PR open until acceptance/rejection the label would allow >> the proposed NEPs to be easily found, especially if we include the NEP >> number in the title, something like `NEP-10111: ` . >> >> Chuck >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kirit.thadaka at gmail.com Fri Mar 9 13:41:56 2018 From: kirit.thadaka at gmail.com (Kirit Thadaka) Date: Sat, 10 Mar 2018 00:11:56 +0530 Subject: [Numpy-discussion] PR to add a function to calculate histogram edges without calculating the histogram Message-ID: Hi! I've created a PR to add a function called "histogram_bin_edges" which will allow a user to calculate the bins used by the histogram for some data without requiring the entire histogram to be calculated. https://github.com/numpy/numpy/pull/10591#issuecomment-371863472 This function allows one set of bins to be computed, and reused across multiple histograms which gives more easily comparable results than using separate bins for each histogram. Please let me know if you have any suggestions on how to improve this PR. Thanks! - Kirit -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmay31 at gmail.com Fri Mar 9 14:36:47 2018 From: rmay31 at gmail.com (Ryan May) Date: Fri, 9 Mar 2018 12:36:47 -0700 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: Message-ID: On Fri, Mar 9, 2018 at 2:29 AM, Nathaniel Smith wrote: > On Thu, Mar 8, 2018 at 7:06 AM, Marten van Kerkwijk > wrote: > > A larger comment: you state that you think `np.asanyarray` is a > > mistake since `np.matrix` and `np.ma.MaskedArray` would pass through > > and that those do not strictly mimic `NDArray`. Here, I agree with > > `matrix` (but since we're deprecating it, let's remove that from the > > discussion), but I do not see how your proposed interface would not > > let `MaskedArray` pass through, nor really that one would necessarily > > want that. > > We can discuss whether MaskedArray should be an AbstractArray. > Conceptually it probably should be; I think that was a goal of the > MaskedArray authors (even if they wouldn't have put it that way). In > practice there are a lot of funny quirks in MaskedArray, so I'd want > to look more carefully in case there are weird incompatibilities that > would cause problems. Note that we can figure this out after the NEP > is finished, too. > > I wonder if the matplotlib folks have any thoughts on this? I know > they're one of the more prominent libraries that tries to handle both > regular and masked arrays, so maybe they could comment on how often > they run There's a lot of places in matplotlib where this could simplify our checks, though probably more from a standpoint of "does this thing we've been given need converting?" There are also a lot of places where matplotlib needs to know if we have actually been given a MaskedArray so that we can handle it specially. Ryan -- Ryan May -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Mar 9 14:38:55 2018 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 9 Mar 2018 11:38:55 -0800 Subject: [Numpy-discussion] numpy.random.randn In-Reply-To: References: Message-ID: On Thu, Mar 8, 2018 at 12:44 PM, Marko Asplund wrote: > > On Wed, 7 Mar 2018 13:14:36, Robert Kern wrote: > > > > With NumPy I'm simply using the following random initilization code: > > > > > > np.random.randn(n_h, n_x) * 0.01 > > > > > > I'm trying to emulate the same behaviour in my Scala code by sampling > > from a > > > Gaussian distribution with mean = 0 and std dev = 1. > > > `np.random.randn(n_h, n_x) * 0.01` gives a Gaussian distribution of mean=0 > > and stdev=0.01 > > Sorry for being a bit inaccurate. > My Scala code actually mirrors the NumPy based random initialization, so I sample with Gaussian of mean = 0 and std dev = 1, then multiply with 0.01. Have you verified this? I.e. save out the Scala-initialized network and load it up with numpy to check the mean and std dev? How about if you run the numpy NN training with the Scala-initialized network? Does that also diverge? -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Fri Mar 9 14:51:40 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Fri, 9 Mar 2018 11:51:40 -0800 Subject: [Numpy-discussion] Where to discuss NEPs (was: Re: new NEP: np.AbstractArray and np.asabstractarray) In-Reply-To: References: Message-ID: <20180309195140.ga465g7bbv6byuqh@carbo> On Fri, 09 Mar 2018 17:00:43 +0000, Stephan Hoyer wrote: > I'll note that we basically used GitHub for revising __array_ufunc__ NEP, > and I think that worked out better for everyone involved. The discussion > was a little too specialized and high volume to be well handled on the > mailing list. A disadvantage of GitHub PR comments is that they do not track sub-threads of conversation, so you cannot "reply to" a previous concern directly. PRs also mix inline comments (that become much less visible after rebases and updates) and "story line" comments. These two "modes" of commenting, substantive discussion around ideas, v.s. concerns about specific phrasing, usage of words, typos, content of code snippets, etc., may require different approaches. It would be quite easy to redirect the prior to the mailing list and the latter to the GitHub PR. I'm also not too keen on repeated PR creation and merging (it splits up the PR discussion even further). Why not simply hold off until the PEP is ready, and view the documents on GitHub? The rendering there is just as good. +1 also on merging rejected PEPs, once they are fully developed. St?fan From rmay31 at gmail.com Fri Mar 9 14:42:05 2018 From: rmay31 at gmail.com (Ryan May) Date: Fri, 9 Mar 2018 12:42:05 -0700 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: <1520560316.2962680.1296803088.6C85AC87@webmail.messagingengine.com> Message-ID: On Fri, Mar 9, 2018 at 12:21 AM, Hameer Abbasi wrote: > Not that I?m against different ?levels? of ndarray granularity, but I just > don?t want it to introduce complexity for the end-user. For example, it > would be unreasonable to expect the end-user to check for all parts of the > interface that they need support for separately. > I wouldn't necessarily want all of the granularity exposed in something like "asarraylike"--that should be kept really simple. But I think there's value in numpy providing multiple ABCs for portions of the interface (and one big one that combines them all). That way, people who want the finer-grained checking (say for a more limited array-like) can use a common, shared, existing ABC, rather than having everyone re-invent it. Ryan -- Ryan May -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Fri Mar 9 16:55:31 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Fri, 9 Mar 2018 16:55:31 -0500 Subject: [Numpy-discussion] NumPy 1.14.2 release In-Reply-To: References: Message-ID: Hi Chuck, Astropy tests indeed all pass again against master, without the work-arounds for 1.14.1. Thanks, of course also to Allan for the fix, Marten From m.h.vankerkwijk at gmail.com Fri Mar 9 17:10:33 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Fri, 9 Mar 2018 17:10:33 -0500 Subject: [Numpy-discussion] Where to discuss NEPs (was: Re: new NEP: np.AbstractArray and np.asabstractarray) In-Reply-To: <20180309195140.ga465g7bbv6byuqh@carbo> References: <20180309195140.ga465g7bbv6byuqh@carbo> Message-ID: Hi Nathaniel, astropy is an example of a project that does essentially all discussion of its "Astropy Proposals for Enhancement" on github. I actually like the numpy approach of sending anything to the mailing list that deserves community input (which includes NEP by their very nature). I don't think it has to be either/or, though; maybe the preferred approach is in fact a combination, where the draft is send to the mailing list, initial general comments are incorporated, and then discussion moves to github when one is past the "general interest" stage. When exactly this happens will be somewhat subjective, but probably is not important to nail down anyway. All the best, Marten p.s. I think the __array_ufunc__ discussion indeed showed that github can work, but only once the general ideas are agreed upon - the initial discussion become hopeless to follow (though I'm not sure a mailing list discussion would have been any better). From m.h.vankerkwijk at gmail.com Fri Mar 9 17:49:21 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Fri, 9 Mar 2018 17:49:21 -0500 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: <1520560316.2962680.1296803088.6C85AC87@webmail.messagingengine.com> Message-ID: We may be getting a bit distracted by the naming -- though I'll throw out `asarraymimic` as another non-programmer-lingo option that doesn't reuse `arraylike` and might describe what the duck array is attempting to do more closely. But more to the point: I think in essence, we're trying to create a function that does the equivalent of: ``` def ...(arraylike, ...) if isinstance(arraylike, NDAbstractArray): return arraylike else: return np.array(arraylike, ...) ``` Given that one possibly might want to check for partial compatibility, maybe the new or old function should just expose what compatibility is desired, via something like: ``` input = np.as...(input, ..., mimicok='shape|operator|...') ``` Where one could have `mimicok=True` to indicate the highest level (maybe not including being viewable?), `False` to not allow any mimics. This might even work for np.array itself: - dtype - any mimic must provide `astype` (which can error if not possible; this could be the ABC default) - copy - can't one just use `copy.copy`? I think this defaults to `__copy__`. - order - can be passed to `astype` as well; up to code to error if not possible. - subok - meaningless - ndmin - requirement of mimicok='shape' would be to provide a shape attribute and reshape method. -- Marten From stefanv at berkeley.edu Fri Mar 9 18:26:38 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Fri, 9 Mar 2018 15:26:38 -0800 Subject: [Numpy-discussion] NEP sprint: 21 and 22 March Message-ID: <20180309232638.vumxg3z4dzfaz3yo@carbo> Hi everyone, As you may have noticed, there's been quite a bit of movement recently around NumPy Enhancement Proposals---on setting specifications, building infrastructure, as well as writing new proposals. To further support this work, we will be hosting an informal NEP sprint at Berkeley on 21 and 22 March. Our aim is to bring core contributors and interested community members together to discuss proposal ideas, write up new NEPs, and polish existing ones. Some potential topics of discussion are: - Duck arrays - Array concatenation - Random number generator seed versioning - User defined dtypes - Deprecation pathways for `np.matrix` - What to do about nditer? All community members are welcome to attend. If you are a core contributor, we may be able to fund some travel costs as well; please let me know. Best regards St?fan From njs at pobox.com Fri Mar 9 18:32:18 2018 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 9 Mar 2018 15:32:18 -0800 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: <1520560316.2962680.1296803088.6C85AC87@webmail.messagingengine.com> Message-ID: On Thu, Mar 8, 2018 at 9:45 PM, Stephan Hoyer wrote: > On Thu, Mar 8, 2018 at 5:54 PM Juan Nunez-Iglesias > wrote: >> >> On Fri, Mar 9, 2018, at 5:56 AM, Stephan Hoyer wrote: >> >> Marten's case 1: works exactly like ndarray, but stores data differently: >> parallel arrays (e.g., dask.array), sparse arrays (e.g., >> https://github.com/pydata/sparse), hypothetical non-strided arrays (e.g., >> always C ordered). >> >> >> Two other "hypotheticals" that would fit nicely in this space: >> - the Open Connectome folks (https://neurodata.io) proposed linearising >> indices using space-filling curves, which minimizes cache misses (or IO >> reads) for giant volumes. I believe they implemented this but can't find it >> currently. >> - the N5 format for chunked arrays on disk: >> https://github.com/saalfeldlab/n5 > > > I think these fall into another important category of duck arrays. > "Indexable" arrays the serve as storage, but that don't support computation. > These sorts of arrays typically support operations like indexing and define > handful of array-like properties (e.g., dtype and shape), but not > arithmetic, reductions or reshaping. > > This means you can't quite use them as a drop-in replacement for NumPy > arrays in all cases, but that's OK. In contrast, both dask.array and sparse > do aspire to do fill out nearly the full numpy.ndarray API. I'm not sure if these particular formats fall into that category or not (isn't the point of the space-filling curves to support cache-efficient computation?). But I suppose you're also thinking of things like h5py.Dataset? My impression is that these are mostly handled pretty well already by defining __array__ and/or providing array operations that implicitly convert to ndarray -- do you agree? This does raise an interesting point: maybe we'll eventually want an __abstract_array__ method that asabstractarray tries calling if defined, so e.g. if your object isn't itself an array but can be efficiently converted into a *sparse* array, you have a way to declare that? I think this is something to file under "worry about later, after we have the basic infrastructure", but it's not something I'd thought of before so mentioning here. -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Fri Mar 9 19:40:11 2018 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 9 Mar 2018 16:40:11 -0800 Subject: [Numpy-discussion] Where to discuss NEPs (was: Re: new NEP: np.AbstractArray and np.asabstractarray) In-Reply-To: <20180309195140.ga465g7bbv6byuqh@carbo> References: <20180309195140.ga465g7bbv6byuqh@carbo> Message-ID: On Fri, Mar 9, 2018 at 11:51 AM, Stefan van der Walt wrote: > On Fri, 09 Mar 2018 17:00:43 +0000, Stephan Hoyer wrote: >> I'll note that we basically used GitHub for revising __array_ufunc__ NEP, >> and I think that worked out better for everyone involved. The discussion >> was a little too specialized and high volume to be well handled on the >> mailing list. > > A disadvantage of GitHub PR comments is that they do not track > sub-threads of conversation, so you cannot "reply to" a previous concern > directly. Yeah, I actually find email much easier for this kind of complex high-volume discussion. Even if lots of people don't use traditional threaded mail clients anymore [1], archives are still threaded, and the tools that make line-by-line responses easy and the ability to split off conversations are both really helpful. (E.g., the way I split this thread off from the original one :-).) The __array_ufunc__ discussion was almost impenetrable on GH, I think. I admit though that some of this is probably just that I'm more used to the email-based discussion workflow. Honestly none of these tools are particularly amazing, and the __array_ufunc__ conversation would have been difficult and inaccessible to outsiders no matter what medium we used. It's much more important that we just pick something and use it consistently than that pick the Most Optimal Solution. [1] Meaning this, not gmail's threads: https://en.wikipedia.org/wiki/Conversation_threading#/media/File:Nntp.jpg > PRs also mix inline comments (that become much less visible after > rebases and updates) and "story line" comments. These two "modes" of > commenting, substantive discussion around ideas, v.s. concerns about > specific phrasing, usage of words, typos, content of code snippets, > etc., may require different approaches. It would be quite easy to > redirect the prior to the mailing list and the latter to the GitHub PR. I don't think we should worry about this. Fiddly detail comments are, by definition, not super important, and generally make up a tiny volume of the discussion around a proposal. Also in practice reviewers are no good at splitting up substantive comments from fiddly details: the review workflow is that you read through and as thoughts occur you write them down, so even if you start out thinking "okay, I'm only going to comment on typos", then half-way through some paragraph sparks a thought and suddenly you're writing something substantive (and I'm as guilty of this as anyone, maybe more so...). Asking people to classify their comments and then chiding them for putting them in the wrong place etc. isn't a good use of time. Let's just pick one place for everything and stick with it. > I'm also not too keen on repeated PR creation and merging (it splits up > the PR discussion even further). Why not simply hold off until the PEP > is ready, and view the documents on GitHub? The rendering there is just > as good. Well, if we aren't using PRs for discussion then multiple PRs are fine :-). And merging changes quickly is helpful because it makes the rendered NEPs page a single one-stop-shop to see all the latest NEPs, no matter what their current status. If we do use PRs for discussion, then I agree that we should try to keep the PR open until the NEP is "done", to minimize the splitting of discussion. This does create a bit of extra friction because it turns out that "is this done?" is not something you can really ever answer for certain :-). Even after PEPs are accepted they usually end up getting some further tweaks once people start implementing them. Sometimes PEPs get abandoned in "Draft" state without ever being accepted/rejected, and sometimes a PEP that had been abandoned for years gets picked up and finished. You can see this in the Rust RFC guidelines too [2]; they specifically address the issue of post-merge changes, and it sounds like their solution is that if a substantive issue is discovered in an accepted RFC, then you have to create a new "fixup" RFC, which then gets its own PR for discussion. I guess if this were our process then __array_ufunc__ would have ended up with ~3 NEPs :-). This is all doable -- every approach has trade-offs. But we should pick one, so we can adapt to those trade-offs. [2] https://github.com/rust-lang/rfcs#the-rfc-life-cycle -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Fri Mar 9 20:10:17 2018 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 9 Mar 2018 17:10:17 -0800 Subject: [Numpy-discussion] New NEP: merging multiarray and umath In-Reply-To: <81930c51-ac3c-77e9-74c0-ccf12691096a@googlemail.com> References: <23471BD4-A81B-4B9C-AECC-D161C3643B81@gmail.com> <81930c51-ac3c-77e9-74c0-ccf12691096a@googlemail.com> Message-ID: On Fri, Mar 9, 2018 at 3:33 AM, Julian Taylor wrote: > As the functions of the different libraries have vastly different > accuracies you want to be able to exchange numeric ops at runtime or at > least during load time (like our cblas) and not limit yourself one > compile time defined set of functions. > Keeping set_numeric_ops would be preferable to me. > > Though I am not clear on why the two things are connected? > Why can't we keep set_numeric_ops and merge multiarray and umath into > one shared object? I think I addressed both of these topics here? https://mail.python.org/pipermail/numpy-discussion/2018-March/077777.html Looking again now, I see that we actually *do* have an explicit API for monkeypatching ufuncs: https://docs.scipy.org/doc/numpy/reference/c-api.ufunc.html#c.PyUFunc_ReplaceLoopBySignature So this seems to be a strictly more general/powerful/useful version of set_numeric_ops... I added some discussion to the NEP: https://github.com/numpy/numpy/pull/10704/commits/4c4716ee0b3bc51d5be9baa891d60473f480d1f2 -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Fri Mar 9 20:45:20 2018 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 9 Mar 2018 17:45:20 -0800 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: <1520560316.2962680.1296803088.6C85AC87@webmail.messagingengine.com> References: <1520560316.2962680.1296803088.6C85AC87@webmail.messagingengine.com> Message-ID: On Thu, Mar 8, 2018 at 5:51 PM, Juan Nunez-Iglesias wrote: >> Finally for the name, what about `asduckarray`? Thought perhaps that could >> be a source of confusion, and given the gradation of duck array like types. > > I suggest that the name should *not* use programmer lingo, so neither > "abstract" nor "duck" should be in there. My humble proposal is "arraylike". > (I know that this term has included things like "list-of-list" before but > only in text, not code, as far as I know.) I agree with your point about avoiding programmer lingo. My first draft actually used 'asduckarray', but that's like an in-joke; it works fine for us, but it's not really something I want teachers to have to explain on day 1... Array-like is problematic too though, because we still need a way to say "thing that can be coerced to an array", which is what array-like has been used to mean historically. And with the new type hints stuff, it is actually becoming code. E.g. what should the type hints here be: asabstractarray(a: X) -> Y Right now "X" is "ArrayLike", but if we make "Y" be "ArrayLike" then we'll need to come up with some other name for "X" :-). Maybe we can call duck arrays "py arrays", since the idea is that they implement the standard Python array API (but not necessarily the C-level array API)? np.PyArray, np.aspyarray()? -n -- Nathaniel J. Smith -- https://vorpus.org From ralf.gommers at gmail.com Sat Mar 10 00:24:52 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 9 Mar 2018 21:24:52 -0800 Subject: [Numpy-discussion] Where to discuss NEPs (was: Re: new NEP: np.AbstractArray and np.asabstractarray) In-Reply-To: References: Message-ID: On Fri, Mar 9, 2018 at 12:00 AM, Nathaniel Smith wrote: > On Thu, Mar 8, 2018 at 10:26 PM, Ralf Gommers > wrote: > > > > > > On Thu, Mar 8, 2018 at 8:22 PM, Nathaniel Smith wrote: > >> > >> On Thu, Mar 8, 2018 at 7:06 AM, Marten van Kerkwijk > >> wrote: > >> > Hi Nathaniel, > >> > > >> > Overall, hugely in favour! For detailed comments, it would be good to > >> > have a link to a PR; could you put that up? > >> > >> Well, there's a PR here: https://github.com/numpy/numpy/pull/10706 > >> > >> But, this raises a question :-). (One which also came up here: > >> https://github.com/numpy/numpy/pull/10704#issuecomment-371684170) > >> > >> There are sensible two workflows we could use (or at least, two that I > >> can think of): > >> > >> 1. We merge updates to the NEPs as we go, so that whatever's in the > >> repo is the current draft. Anyone can go to the NEP webpage at > >> http://numpy.org/neps (WIP, see #10702) to see the latest version of > >> all NEPs, whether accepted, rejected, or in progress. Discussion > >> happens on the mailing list, and line-by-line feedback can be done by > >> quote-replying and commenting on individual lines. From time to time, > >> the NEP author takes all the accumulated feedback, updates the > >> document, and makes a new post to the list to let people know about > >> the updated version. > >> > >> This is how python-dev handles PEPs. > >> > >> 2. We use Github itself to manage the review. The repo only contains > >> "accepted" NEPs; draft NEPs are represented by open PRs, and rejected > >> NEPs are represented by PRs that were closed-without-merging. > >> Discussion uses Github's commenting/review tools, and happens in the > >> PR itself. > >> > >> This is roughly how Rust handles their RFC process, for example: > >> https://github.com/rust-lang/rfcs > >> > >> Trying to do some hybrid version of these seems like it would be > >> pretty painful, so we should pick one. > >> > >> Given that historically we've tried to use the mailing list for > >> substantive features/planning discussions, and that our NEP process > >> has been much closer to workflow 1 than workflow 2 (e.g., there are > >> already a bunch of old NEPs already in the repo that are effectively > >> rejected/withdrawn), I think we should maybe continue that way, and > >> keep discussions here? > >> > >> So my suggestion is discussion should happen on the list, and NEP > >> updates should be merged promptly, or just self-merged. Sound good? > > > > > > Agreed that overall (1) is better than (2), rejected NEPs should be > visible. > > However there's no need for super-quick self-merge, and I think it would > be > > counter-productive. > > > > Instead, just send a PR, leave it open for some discussion, and update > for > > detailed comments (as well as long in-depth discussions that only a > couple > > of people care about) in the Github UI and major ones on the list. Once > it's > > stabilized a bit, then merge with status "Draft" and update once in a > while. > > I think this is also much more in like with what python-dev does, I have > > seen substantial discussion on Github and have not seen quick > self-merges. > > Not sure what you mean about python-dev. Are you looking at the peps > repository? https://github.com/python/peps I was mostly thinking about packaging PEPs that are now also there, but were separate. Stuff like https://github.com/pypa/interoperability-peps/pull/54. There seems to be significantly more comments on packaging things than on other PEPs. > > > From a quick skim, it looks like of the last 37 commits, only 8 came > in through PRs and the other 29 were pushed directly by committers > without any review. 3 of the 8 PRs were self-merged immediately after > submission, and of the remaining 5 PRs, 4 of them were from external > contributors who didn't have commit rights, and the 1 other was a fix > to the repo README, rather than an actual PEP change. I don't think > I've ever seen any kind of substantive discussion in that repo -- any > discussion is mostly restricted to helping new contributors with > procedural stuff, maybe formatting issues or fixes to the PEP tooling. > > Anyway, just because python-dev does it that way doesn't mean that we > have to too. > > But if we split discussions between GH and the mailing list, then > we're definitely going to end up discussing substantive issues there > (how do we know which discussions only a couple of people care > about?), and trying to juggle that seems confusing to me, plus makes > it harder to track down what happened later, after we've had multiple > PRs each with their own comments... > It's not imho, because it's what we already do on this list. Github is a superior review interface over mailing list, so my vote goes to using that interface, while keeping this list in the loop on critical stuff and decisions about to be made. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrocklin at gmail.com Sat Mar 10 07:27:04 2018 From: mrocklin at gmail.com (Matthew Rocklin) Date: Sat, 10 Mar 2018 07:27:04 -0500 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: <1520560316.2962680.1296803088.6C85AC87@webmail.messagingengine.com> Message-ID: I'm very glad to see this discussion. I think that coming up with a single definition of array-like may be difficult, and that we might end up wanting to embrace duck typing instead. It seems to me that different array-like classes will implement different mixtures of features. It may be difficult to pin down a single definition that includes anything except for the most basic attributes (shape and dtype?). Consider two extreme cases of restrictive functionality: 1. LinearOperators (support dot in a numpy-like way) 2. Storage objects like h5py (support getitem in a numpy-like way) I can imagine authors of both groups saying that they should qualify as array-like because downstream projects that consume them should not convert them to numpy arrays in important contexts. The name "duck arrays" that we sometimes use doesn't necessarily mean "quack like an ndarray" but might actually mean a number of different things in different contexts. Making a single class or predicate for duck arrays may not be as effective as we want. Instead, it might be that we need a number of different protocols like `__array_mat_vec__` or `__array_slice__` that downstream projects can check instead. I can imagine cases where I want to check only "can I use this thing to multiply against arrays" or "can I get numpy arrays out of this thing with numpy slicing" rather than "is this thing array-like" because I may genuinely not care about most of the functionality in a blessed definition of "array-like". On Fri, Mar 9, 2018 at 8:45 PM, Nathaniel Smith wrote: > On Thu, Mar 8, 2018 at 5:51 PM, Juan Nunez-Iglesias > wrote: > >> Finally for the name, what about `asduckarray`? Thought perhaps that > could > >> be a source of confusion, and given the gradation of duck array like > types. > > > > I suggest that the name should *not* use programmer lingo, so neither > > "abstract" nor "duck" should be in there. My humble proposal is > "arraylike". > > (I know that this term has included things like "list-of-list" before but > > only in text, not code, as far as I know.) > > I agree with your point about avoiding programmer lingo. My first > draft actually used 'asduckarray', but that's like an in-joke; it > works fine for us, but it's not really something I want teachers to > have to explain on day 1... > > Array-like is problematic too though, because we still need a way to > say "thing that can be coerced to an array", which is what array-like > has been used to mean historically. And with the new type hints stuff, > it is actually becoming code. E.g. what should the type hints here be: > > asabstractarray(a: X) -> Y > > Right now "X" is "ArrayLike", but if we make "Y" be "ArrayLike" then > we'll need to come up with some other name for "X" :-). > > Maybe we can call duck arrays "py arrays", since the idea is that they > implement the standard Python array API (but not necessarily the > C-level array API)? np.PyArray, np.aspyarray()? > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Sat Mar 10 17:39:40 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Sat, 10 Mar 2018 23:39:40 +0100 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: <1520560316.2962680.1296803088.6C85AC87@webmail.messagingengine.com> Message-ID: On Sat, Mar 10, 2018 at 1:27 PM, Matthew Rocklin wrote: > I'm very glad to see this discussion. > me too, but.... > I think that coming up with a single definition of array-like may be > difficult, and that we might end up wanting to embrace duck typing instead. > exactly -- I think there is a clear line between "uses the numpy memory layout" and the Python API. But the python API is pretty darn big, and many "array_ish" classes implement only partvof it, and may even implement some parts a bit differently. So really hard to have "one" definition, except "Python API exactly like a ndarray" -- and I'm wondering how useful that is. It seems to me that different array-like classes will implement different > mixtures of features. It may be difficult to pin down a single definition > that includes anything except for the most basic attributes (shape and > dtype?). > or a minimum set -- but again, how useful?? > Storage objects like h5py (support getitem in a numpy-like way) > Exactly -- though I don't know about h5py, but netCDF4 variables supoprt a useful subst of ndarray, but do "fancy indexing" differently -- so are they ndarray_ish? -- sorry to coin yet another term :-) > I can imagine authors of both groups saying that they should qualify as > array-like because downstream projects that consume them should not convert > them to numpy arrays in important contexts. > indeed. My solution so far is to define my own duck types "asarraylike" that checks for the actual methods I need: https://github.com/NOAA-ORR-ERD/gridded/blob/master/gridded/utilities.py which has: must_have = ['dtype', 'shape', 'ndim', '__len__', '__getitem__', ' __getattribute__'] def isarraylike(obj): """ tests if obj acts enough like an array to be used in gridded. This should catch netCDF4 variables and numpy arrays, at least, etc. Note: these won't check if the attributes required actually work right. """ for attr in must_have: if not hasattr(obj, attr): return False return True def asarraylike(obj): """ If it satisfies the requirements of pyugrid the object is returned as is. If not, then numpy's array() will be called on it. :param obj: The object to check if it's like an array """ return obj if isarraylike(obj) else np.array(obj) It's possible that we could come up with semi-standard "groupings" of attributes to produce "levels" of compatibility, or maybe not levels, but independentgroupings, so you could specify which groupings you need in this instance. > The name "duck arrays" that we sometimes use doesn't necessarily mean > "quack like an ndarray" but might actually mean a number of different > things in different contexts. Making a single class or predicate for duck > arrays may not be as effective as we want. Instead, it might be that we > need a number of different protocols like `__array_mat_vec__` or `__array_slice__` > that downstream projects can check instead. I can imagine cases where I > want to check only "can I use this thing to multiply against arrays" or > "can I get numpy arrays out of this thing with numpy slicing" rather than > "is this thing array-like" because I may genuinely not care about most of > the functionality in a blessed definition of "array-like". > exactly. but maybe we won't know until we try. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Sat Mar 10 19:13:50 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Sat, 10 Mar 2018 19:13:50 -0500 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: <1520560316.2962680.1296803088.6C85AC87@webmail.messagingengine.com> Message-ID: ?I think we don't have to make it sounds like there are *that* many types of compatibility: really there is just array organisation (indexing/reshaping) and array arithmetic. These correspond roughly to ShapedLikeNDArray in astropy and NDArrayOperatorMixin in numpy (missing so far is concatenation). The advantage of the ABC classes is that they can supply missing methods (say, size, isscalar, __len__, and ndim given shape; __iter__ given __getitem__, ravel, squeeze, flatten given reshape; etc.). -- Marten -------------- next part -------------- An HTML attachment was scrubbed... URL: From gregor.thalhammer at gmail.com Sun Mar 11 15:52:46 2018 From: gregor.thalhammer at gmail.com (Gregor Thalhammer) Date: Sun, 11 Mar 2018 20:52:46 +0100 Subject: [Numpy-discussion] New NEP: merging multiarray and umath In-Reply-To: References: <23471BD4-A81B-4B9C-AECC-D161C3643B81@gmail.com> Message-ID: > Am 09.03.2018 um 02:06 schrieb Nathaniel Smith : > > On Thu, Mar 8, 2018 at 1:52 AM, Gregor Thalhammer > > wrote: >> >> Hi, >> >> long time ago I wrote a wrapper to to use optimised and parallelized math >> functions from Intels vector math library >> geggo/uvml: Provide vectorized math function (MKL) for numpy >> >> I found it useful to inject (some of) the fast methods into numpy via >> np.set_num_ops(), to gain more performance without changing my programs. >> >> While this original project is outdated, I can imagine that a centralised >> way to swap the implementation of math functions is useful. Therefor I >> suggest to keep np.set_num_ops(), but admittedly I do not understand all the >> technical implications of the proposed change. > > The main part of the proposal is to merge the two libraries; the > question of whether to deprecate set_numeric_ops is a bit separate. > There's no technical obstacle to keeping it, except the usual issue of > having more cruft to maintain :-). > > It's usually true that any monkeypatching interface will be useful to > someone under some circumstances, but we usually don't consider this a > good enough reason on its own to add and maintain these kinds of > interfaces. And an unfortunate side-effect of these kinds of hacky > interfaces is that they can end up removing the pressure to solve > problems properly. In this case, better solutions would include: > > - Adding support for accelerated vector math libraries to NumPy > directly (e.g. MKL, yeppp) > > - Overriding the inner loops inside ufuncs like numpy.add that > np.ndarray.__add__ ultimately calls. This would speed up all addition > (whether or not it uses Python + syntax), would be a more general > solution (e.g. you could monkeypatch np.exp to use MKL's fast > vectorized exp), would let you skip reimplementing all the tricky > shared bits of the ufunc logic, etc. Conceptually it's not even very > hacky, because we allow you add new loops to existing ufuncs; making > it possible to replace existing loops wouldn't be a big stretch. (In > fact it's possible that we already allow this; I haven't checked.) > > So I still lean towards deprecating set_numeric_ops. It's not the most > crucial part of the proposal though; if it turns out to be too > controversial then I'll take it out. Dear Nathaniel, since you referred to your reply in your latest post in this thread I comment here. First, I agree that set_numeric_ops() is not very important for replacing numpy math functions with faster implementations, mostly because this covers only the basic operations (+, *, boolean operations), which are fast anyhow, only pow can be accelerated by a substantial factor. I also agree that adding support for optimised math function libraries directly to numpy might be a better solution than patching numpy. But in the past there have been a couple of proposals to add fast vectorised math functions directly to numpy, e.g. for a GSoC project. There have always been long discussions about maintainability, testing, vendor lock-in, free versus non-free software ? all attempts failed. Only the Intel accelerated Python distribution claims that it boosted performance for transcendental functions, but I do not know how they achieved this and if this could be integrated in the official numpy. Therefor I think there is some need for an ?official? way to swap numpy math functions at the user (Python) level at runtime. As Julian commented, you want this flexibility because of speed and accuracy trade-offs. Just replacing the inner loop might be an alternative way, but I am not sure. Many optimised vector math libraries require contiguous arrays, so they don?t fulfil the expectations numpy has for an inner loop. So you would need to allocate memory, copy, and free memory for each call to the inner loop. I image this gives quite some overhead you could avoid by a completely custom ufunc. On the other hand, setting up a ufunc from inner loop functions is easy, you can reuse all the numpy machinery. I disagree with you that you have to reimplement the whole ufunc machinery if you swap math functions at the ufunc level. Stupid question: how to get the first argument of int PyUFunc_ReplaceLoopBySignature(PyUFuncObject * ufunc, e.g. for np.add ? So, please consider this when refactoring/redesigning the ufunc module. Gregor > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Mon Mar 12 12:05:15 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Mon, 12 Mar 2018 12:05:15 -0400 Subject: [Numpy-discussion] New NEP: merging multiarray and umath In-Reply-To: References: Message-ID: Hi Nathanial, I looked through the revised text at https://github.com/numpy/numpy/pull/10704 and think it covers things well; any improvements on the organisation I can think of would seem to start with doing the merge anyway (e.g., I quite like Eric Wieser's suggested base ndarray class; the additional bits that implement operators might quite easily become useful for duck arrays). One request: can it be part of the NEP to aim to document the organisation of the whole more clearly? For me at least, one of the big hurdles to trying to contribute to the C code has been the absence of a mental picture of how it all hangs together. All the best, Marten From charlesr.harris at gmail.com Mon Mar 12 14:25:42 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 12 Mar 2018 12:25:42 -0600 Subject: [Numpy-discussion] NumPy 1.14.2 released Message-ID: Hi All, I am pleased to announce the release of NumPy 1.14.2. This is a bugfix release for some bugs reported following the 1.14.1 release. The major problems dealt with are as follows. - Residual bugs in the new array printing functionality. - Regression resulting in a relocation problem with shared library. - Improved PyPy compatibility. This release supports Python 2.7 and 3.4 - 3.6. Wheels for the release are available on PyPI. Source tarballs, zipfiles, release notes, and the changelog are available on github . The Python 3.6 wheels available from PIP are built with Python 3.6.2 and should be compatible with all previous versions of Python 3.6. The source releases were cythonized with Cython 0.26.1, which is known to *not* support the upcoming Python 3.7 release. People who wish to run Python 3.7 should check out the NumPy repo and try building with the, as yet, unreleased master branch of Cython. Contributors ============ A total of 4 people contributed to this release. People with a "+" by their names contributed a patch for the first time. * Allan Haldane * Charles Harris * Eric Wieser * Pauli Virtanen Pull requests merged ==================== A total of 5 pull requests were merged for this release. * `#10674 `__: BUG: Further back-compat fix for subclassed array repr * `#10725 `__: BUG: dragon4 fractional output mode adds too many trailing zeros * `#10726 `__: BUG: Fix f2py generated code to work on PyPy * `#10727 `__: BUG: Fix missing NPY_VISIBILITY_HIDDEN on npy_longdouble_to_PyLong * `#10729 `__: DOC: Create 1.14.2 notes and changelog. Cheers, Charles Harris -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Mar 12 14:44:31 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 12 Mar 2018 12:44:31 -0600 Subject: [Numpy-discussion] NumPy 1.15 release schedule Message-ID: Hi All, I'm thinking of branching NumPy in the middle/end of April. That is quicker than usual, but there don't seem to be any major changes proposed for the near future, we have merged a reasonable number of PRs, and a Python 3.7 compatible release of Cython looks to be forthcoming. An early release will also give us time for possibly two following releases before we drop Python 2.7 support. With that schedule, I also propose to drop Python 3.4 support in NumPy 1.16. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Mar 12 15:01:40 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 12 Mar 2018 13:01:40 -0600 Subject: [Numpy-discussion] New NEP: merging multiarray and umath In-Reply-To: References: Message-ID: On Thu, Mar 8, 2018 at 1:25 AM, Nathaniel Smith wrote: > Hi all, > > Well, this is something that we've discussed for a while and I think > generally has consensus already, but I figured I'd write it down > anyway to make sure. > > There's a rendered version here: > https://github.com/njsmith/numpy/blob/nep-0015-merge- > multiarray-umath/doc/neps/nep-0015-merge-multiarray-umath.rst > > ----- > > ============================ > Merging multiarray and umath > ============================ > > :Author: Nathaniel J. Smith > :Status: Draft > :Type: Standards Track > :Created: 2018-02-22 > > > Abstract > -------- > > Let's merge ``numpy.core.multiarray`` and ``numpy.core.umath`` into a > single extension module, and deprecate ``np.set_numeric_ops``. > > > Background > ---------- > > Currently, numpy's core C code is split between two separate extension > modules. > > ``numpy.core.multiarray`` is built from > ``numpy/core/src/multiarray/*.c``, and contains the core array > functionality (in particular, the ``ndarray`` object). > > ``numpy.core.umath`` is built from ``numpy/core/src/umath/*.c``, and > contains the ufunc machinery. > > These two modules each expose their own separate C API, accessed via > ``import_multiarray()`` and ``import_umath()`` respectively. The idea > is that they're supposed to be independent modules, with > ``multiarray`` as a lower-level layer with ``umath`` built on top. In > practice this has turned out to be problematic. > > First, the layering isn't perfect: when you write ``ndarray + > ndarray``, this invokes ``ndarray.__add__``, which then calls the > ufunc ``np.add``. This means that ``ndarray`` needs to know about > ufuncs ? so instead of a clean layering, we have a circular > dependency. To solve this, ``multiarray`` exports a somewhat > terrifying function called ``set_numeric_ops``. The bootstrap > procedure each time you ``import numpy`` is: > > 1. ``multiarray`` and its ``ndarray`` object are loaded, but > arithmetic operations on ndarrays are broken. > > 2. ``umath`` is loaded. > > 3. ``set_numeric_ops`` is used to monkeypatch all the methods like > ``ndarray.__add__`` with objects from ``umath``. > > In addition, ``set_numeric_ops`` is exposed as a public API, > ``np.set_numeric_ops``. > > Furthermore, even when this layering does work, it ends up distorting > the shape of our public ABI. In recent years, the most common reason > for adding new functions to ``multiarray``\'s "public" ABI is not that > they really need to be public or that we expect other projects to use > them, but rather just that we need to call them from ``umath``. This > is extremely unfortunate, because it makes our public ABI > unnecessarily large, and since we can never remove things from it then > this creates an ongoing maintenance burden. The way C works, you can > have internal API that's visible to everything inside the same > extension module, or you can have a public API that everyone can use; > you can't have an API that's visible to multiple extension modules > inside numpy, but not to external users. > > We've also increasingly been putting utility code into > ``numpy/core/src/private/``, which now contains a bunch of files which > are ``#include``\d twice, once into ``multiarray`` and once into > ``umath``. This is pretty gross, and is purely a workaround for these > being separate C extensions. > > > Proposed changes > ---------------- > > This NEP proposes three changes: > > 1. We should start building ``numpy/core/src/multiarray/*.c`` and > ``numpy/core/src/umath/*.c`` together into a single extension > module. > > 2. Instead of ``set_numeric_ops``, we should use some new, private API > to set up ``ndarray.__add__`` and friends. > > 3. We should deprecate, and eventually remove, ``np.set_numeric_ops``. > > > Non-proposed changes > -------------------- > > We don't necessarily propose to throw away the distinction between > multiarray/ and umath/ in terms of our source code organization: > internal organization is useful! We just want to build them together > into a single extension module. Of course, this does open the door for > potential future refactorings, which we can then evaluate based on > their merits as they come up. > > It also doesn't propose that we break the public C ABI. We should > continue to provide ``import_multiarray()`` and ``import_umath()`` > functions ? it's just that now both ABIs will ultimately be loaded > from the same C library. Due to how ``import_multiarray()`` and > ``import_umath()`` are written, we'll also still need to have modules > called ``numpy.core.multiarray`` and ``numpy.core.umath``, and they'll > need to continue to export ``_ARRAY_API`` and ``_UFUNC_API`` objects ? > but we can make one or both of these modules be tiny shims that simply > re-export the magic API object from where-ever it's actually defined. > (See ``numpy/core/code_generators/generate_{numpy,ufunc}_api.py`` for > details of how these imports work.) > > > Backward compatibility > ---------------------- > > The only compatibility break is the deprecation of ``np.set_numeric_ops``. > > > Alternatives > ------------ > > n/a > > > Discussion > ---------- > > TBD > > > Copyright > --------- > > This document has been placed in the public domain. > If we accept this NEP, I'd like to get it done soon, preferably and the next few months, so that it is finished before we drop Python 2.7 support. That will make maintenance of the NumPy long term support release through 2019 easier. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Mar 12 15:25:20 2018 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 12 Mar 2018 12:25:20 -0700 Subject: [Numpy-discussion] New NEP: merging multiarray and umath In-Reply-To: References: Message-ID: On Mar 12, 2018 12:02, "Charles R Harris" wrote: If we accept this NEP, I'd like to get it done soon, preferably and the next few months, so that it is finished before we drop Python 2.7 support. That will make maintenance of the NumPy long term support release through 2019 easier. The reason you're seeing this spurt of activity on NEPs and NEP infrastructure from people at Berkeley is that we're preparing for the upcoming arrival of full time devs on the numpy grant. (More announcements there soon.) So if it's accepted then I don't think there will be any problem getting it implemented by then. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Mar 12 15:40:42 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 12 Mar 2018 13:40:42 -0600 Subject: [Numpy-discussion] New NEP: merging multiarray and umath In-Reply-To: References: Message-ID: On Mon, Mar 12, 2018 at 1:25 PM, Nathaniel Smith wrote: > On Mar 12, 2018 12:02, "Charles R Harris" > wrote: > > > If we accept this NEP, I'd like to get it done soon, preferably and the > next few months, so that it is finished before we drop Python 2.7 support. > That will make maintenance of the NumPy long term support release through > 2019 easier. > > > The reason you're seeing this spurt of activity on NEPs and NEP > infrastructure from people at Berkeley is that we're preparing for the > upcoming arrival of full time devs on the numpy grant. (More announcements > there soon.) So if it's accepted then I don't think there will be any > problem getting it implemented by then. > Depends on background. Even the best developers need some time to come up to speed on a new project ... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From tcaswell at gmail.com Mon Mar 12 17:45:11 2018 From: tcaswell at gmail.com (Thomas Caswell) Date: Mon, 12 Mar 2018 21:45:11 +0000 Subject: [Numpy-discussion] PR to add a function to calculate histogram edges without calculating the histogram In-Reply-To: References: Message-ID: As commented in the OP, this would be very useful for Matplotlib. Tom On Fri, Mar 9, 2018 at 1:42 PM Kirit Thadaka wrote: > Hi! > > I've created a PR to add a function called "histogram_bin_edges" which > will allow a user to calculate the bins used by the histogram for some data > without requiring the entire histogram to be calculated. > > https://github.com/numpy/numpy/pull/10591#issuecomment-371863472 > > This function allows one set of bins to be computed, and reused across > multiple histograms which gives more easily comparable results than using > separate bins for each histogram. > > Please let me know if you have any suggestions on how to improve this PR. > > Thanks! > > - > Kirit > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Mon Mar 12 19:08:45 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Mon, 12 Mar 2018 23:08:45 +0000 Subject: [Numpy-discussion] PR to add a function to calculate histogram edges without calculating the histogram In-Reply-To: References: Message-ID: As likely one of the primary users, Tom - does the function name seem reasonable? Eric On Mon, Mar 12, 2018, 21:45 Thomas Caswell wrote: > As commented in the OP, this would be very useful for Matplotlib. > > Tom > > On Fri, Mar 9, 2018 at 1:42 PM Kirit Thadaka > wrote: > >> Hi! >> >> I've created a PR to add a function called "histogram_bin_edges" which >> will allow a user to calculate the bins used by the histogram for some data >> without requiring the entire histogram to be calculated. >> >> https://github.com/numpy/numpy/pull/10591#issuecomment-371863472 >> >> This function allows one set of bins to be computed, and reused across >> multiple histograms which gives more easily comparable results than using >> separate bins for each histogram. >> >> Please let me know if you have any suggestions on how to improve this PR. >> >> Thanks! >> >> - >> Kirit >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Mar 12 22:58:09 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 12 Mar 2018 22:58:09 -0400 Subject: [Numpy-discussion] PR to add a function to calculate histogram edges without calculating the histogram In-Reply-To: References: Message-ID: On Mon, Mar 12, 2018 at 7:08 PM, Eric Wieser wrote: > As likely one of the primary users, Tom - does the function name seem > reasonable? > > Eric > > > On Mon, Mar 12, 2018, 21:45 Thomas Caswell wrote: >> >> As commented in the OP, this would be very useful for Matplotlib. >> >> Tom >> >> On Fri, Mar 9, 2018 at 1:42 PM Kirit Thadaka >> wrote: >>> >>> Hi! >>> >>> I've created a PR to add a function called "histogram_bin_edges" which >>> will allow a user to calculate the bins used by the histogram for some data >>> without requiring the entire histogram to be calculated. >>> >>> https://github.com/numpy/numpy/pull/10591#issuecomment-371863472 >>> >>> This function allows one set of bins to be computed, and reused across >>> multiple histograms which gives more easily comparable results than using >>> separate bins for each histogram. Given that the bin selection are data driven, transferring them across datasets might not be so useful. (Aside I usually pick the bin_edges returned by the first histogram to use for any follow-up histograms, or pick something on a common range.) >>> >>> Please let me know if you have any suggestions on how to improve this PR. >>> >>> Thanks! as a bystander: LGTM and I think it's a good idea Josef >>> >>> - >>> Kirit >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From wieser.eric+numpy at gmail.com Mon Mar 12 23:20:17 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Mon, 12 Mar 2018 20:20:17 -0700 Subject: [Numpy-discussion] PR to add a function to calculate histogram edges without calculating the histogram In-Reply-To: References: Message-ID: > Given that the bin selection are data driven, transferring them across datasets might not be so useful. The main application would be to compute bins across the union of all datasets. This is already possibly by using `np.histogram` and discarding the first result, but that's super wasteful. From josef.pktd at gmail.com Mon Mar 12 23:34:41 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 12 Mar 2018 23:34:41 -0400 Subject: [Numpy-discussion] PR to add a function to calculate histogram edges without calculating the histogram In-Reply-To: References: Message-ID: On Mon, Mar 12, 2018 at 11:20 PM, Eric Wieser wrote: >> Given that the bin selection are data driven, transferring them across datasets might not be so useful. > > The main application would be to compute bins across the union of all > datasets. This is already possibly by using `np.histogram` and > discarding the first result, but that's super wasteful. assuming "union" means a combined dataset. If you stack datasets, then the number of observations will not be correct for individual datasets. In that case an additional keyword like nobs, or whatever name would be appropriate for numpy, would be useful, e.g. use the average number of observations across datasets. Auxiliary statistic like std could then be computed on the total dataset (if that makes sense, which would not be the case if the variance across datasets is larger than the variance within datasets. Josef > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From marko.asplund at gmail.com Wed Mar 14 02:32:20 2018 From: marko.asplund at gmail.com (Marko Asplund) Date: Wed, 14 Mar 2018 08:32:20 +0200 Subject: [Numpy-discussion] numpy.random.randn Message-ID: On Fri, 9 Mar 2018 11:38:55, Robert Kern wrote: > > Sorry for being a bit inaccurate. > > My Scala code actually mirrors the NumPy based random initialization, so > > I sample with Gaussian of mean = 0 and std dev = 1, then multiply with 0.01. > > Have you verified this? I.e. save out the Scala-initialized network and > load it up with numpy to check the mean and std dev? How about if you run > the numpy NN training with the Scala-initialized network? Does that also > diverge? I did what you suggested and it turned out my NumPy NN code was behaving exactly as the Scala code when using Scala-initialized network. After digging deeper into this I managed to find and fix a bug in how I was doing the random initilization and it's working correctly now. Thanks a lot for your help! Marko -------------- next part -------------- An HTML attachment was scrubbed... URL: From jkkulick at amazon.de Wed Mar 14 04:05:42 2018 From: jkkulick at amazon.de (Kulick, Johannes) Date: Wed, 14 Mar 2018 08:05:42 +0000 Subject: [Numpy-discussion] ENH: softmax Message-ID: <968D67CA-5A81-48EB-87CF-B03091A933C2@amazon.com> Hi, I regularly need the softmax function (https://en.wikipedia.org/wiki/Softmax_function) for my code. I have a quite efficient pure python implementation (credits to Nolan Conaway). I think it would be a valuable enhancement of the ndarray class. But since it is kind of a specialty function I wanted to ask you if you would consider it to be part of the numpy core (alongside ndarray.max and ndarray.argmax) or rather in scipy (e.g. scipy.stats seems also an appropriate place). Best Johannes Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Wed Mar 14 04:22:14 2018 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Wed, 14 Mar 2018 04:22:14 -0400 Subject: [Numpy-discussion] ENH: softmax In-Reply-To: <968D67CA-5A81-48EB-87CF-B03091A933C2@amazon.com> References: <968D67CA-5A81-48EB-87CF-B03091A933C2@amazon.com> Message-ID: On Wed, Mar 14, 2018 at 4:05 AM, Kulick, Johannes wrote: > Hi, > > > > I regularly need the softmax function (https://en.wikipedia.org/ > wiki/Softmax_function) for my code. I have a quite efficient pure python > implementation (credits to Nolan Conaway). I think it would be a valuable > enhancement of the ndarray class. But since it is kind of a specialty > function I wanted to ask you if you would consider it to be part of the > numpy core (alongside ndarray.max and ndarray.argmax) or rather in scipy > (e.g. scipy.stats seems also an appropriate place). > > Johannes, If the numpy devs aren't interested in adding it to numpy, I'm pretty sure we can get it in scipy. I've had adding it (or at least proposing that it be added) to scipy on my to-do list for quite a while now. Warren > > > Best > > Johannes > > > > Amazon Development Center Germany GmbH > Berlin - Dresden - Aachen > main office: Krausenstr. 38, 10117 Berlin > Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger > Ust-ID: DE289237879 > Eingetragen am Amtsgericht Charlottenburg HRB 149173 B > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Mar 14 04:41:25 2018 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 14 Mar 2018 17:41:25 +0900 Subject: [Numpy-discussion] ENH: softmax In-Reply-To: References: <968D67CA-5A81-48EB-87CF-B03091A933C2@amazon.com> Message-ID: On Wed, Mar 14, 2018 at 5:22 PM, Warren Weckesser < warren.weckesser at gmail.com> wrote: > > On Wed, Mar 14, 2018 at 4:05 AM, Kulick, Johannes wrote: >> >> Hi, >> >> I regularly need the softmax function ( https://en.wikipedia.org/wiki/Softmax_function) for my code. I have a quite efficient pure python implementation (credits to Nolan Conaway). I think it would be a valuable enhancement of the ndarray class. But since it is kind of a specialty function I wanted to ask you if you would consider it to be part of the numpy core (alongside ndarray.max and ndarray.argmax) or rather in scipy (e.g. scipy.stats seems also an appropriate place). > > Johannes, > > If the numpy devs aren't interested in adding it to numpy, I'm pretty sure we can get it in scipy. I've had adding it (or at least proposing that it be added) to scipy on my to-do list for quite a while now. +1 for scipy.special. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Wed Mar 14 09:27:21 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Wed, 14 Mar 2018 09:27:21 -0400 Subject: [Numpy-discussion] ENH: softmax In-Reply-To: References: <968D67CA-5A81-48EB-87CF-B03091A933C2@amazon.com> Message-ID: I think this indeed makes most sense for scipy. I possible, write it as a `gufunc`, so duck arrays can override with `__array_ufunc__` if necessary. -- Marten From ralf.gommers at gmail.com Wed Mar 14 09:37:46 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 14 Mar 2018 06:37:46 -0700 Subject: [Numpy-discussion] ENH: softmax In-Reply-To: References: <968D67CA-5A81-48EB-87CF-B03091A933C2@amazon.com> Message-ID: On Wed, Mar 14, 2018 at 1:41 AM, Robert Kern wrote: > On Wed, Mar 14, 2018 at 5:22 PM, Warren Weckesser < > warren.weckesser at gmail.com> wrote: > > > > On Wed, Mar 14, 2018 at 4:05 AM, Kulick, Johannes > wrote: > >> > >> Hi, > >> > >> I regularly need the softmax function (https://en.wikipedia.org/ > wiki/Softmax_function) for my code. I have a quite efficient pure python > implementation (credits to Nolan Conaway). I think it would be a valuable > enhancement of the ndarray class. But since it is kind of a specialty > function I wanted to ask you if you would consider it to be part of the > numpy core (alongside ndarray.max and ndarray.argmax) or rather in scipy > (e.g. scipy.stats seems also an appropriate place). > > > > Johannes, > > > > If the numpy devs aren't interested in adding it to numpy, I'm pretty > sure we can get it in scipy. I've had adding it (or at least proposing > that it be added) to scipy on my to-do list for quite a while now. > > +1 for scipy.special. > scipy.special sounds right to me too Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Wed Mar 14 09:44:49 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Wed, 14 Mar 2018 06:44:49 -0700 Subject: [Numpy-discussion] ENH: softmax In-Reply-To: References: <968D67CA-5A81-48EB-87CF-B03091A933C2@amazon.com> Message-ID: I possible, write it as a `gufunc`, so duck arrays can override with `__array_ufunc__` if necessary. -- Marten Softmax is a very simple combination of elementary `ufunc`s with two inputs, the weight vector `w` and the data `x`. Writing it as a `gufunc` would be going overboard, IMO. Writing it as a combination of `ufunc`s and avoiding Numpy-specific stuff should be good enough. -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Wed Mar 14 14:01:18 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Wed, 14 Mar 2018 14:01:18 -0400 Subject: [Numpy-discussion] ENH: softmax In-Reply-To: References: <968D67CA-5A81-48EB-87CF-B03091A933C2@amazon.com> Message-ID: On Wed, Mar 14, 2018 at 9:44 AM, Hameer Abbasi wrote: > I possible, write it as a `gufunc`, so duck arrays can override with > `__array_ufunc__` if > > necessary. -- Marten > > Softmax is a very simple combination of elementary `ufunc`s with two inputs, > the weight vector `w` and the data `x`. Writing it as a `gufunc` would be > going overboard, IMO. Writing it as a combination of `ufunc`s and avoiding > Numpy-specific stuff should be good enough. My mistake - I thought the result was reduced, but you only need a reduction along the way. Writing this in terms of standard functions is certainly fine! -- Marten From jkkulick at amazon.de Wed Mar 14 18:04:46 2018 From: jkkulick at amazon.de (Kulick, Johannes) Date: Wed, 14 Mar 2018 22:04:46 +0000 Subject: [Numpy-discussion] ENH: softmax In-Reply-To: References: <968D67CA-5A81-48EB-87CF-B03091A933C2@amazon.com> Message-ID: Alright. Going for scipy.special then. Thanks for the quick answer. Cheers Johannes ?On 14.03.18, 19:03, "NumPy-Discussion on behalf of Marten van Kerkwijk" wrote: On Wed, Mar 14, 2018 at 9:44 AM, Hameer Abbasi wrote: > I possible, write it as a `gufunc`, so duck arrays can override with > `__array_ufunc__` if > > necessary. -- Marten > > Softmax is a very simple combination of elementary `ufunc`s with two inputs, > the weight vector `w` and the data `x`. Writing it as a `gufunc` would be > going overboard, IMO. Writing it as a combination of `ufunc`s and avoiding > Numpy-specific stuff should be good enough. My mistake - I thought the result was reduced, but you only need a reduction along the way. Writing this in terms of standard functions is certainly fine! -- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B From m.h.vankerkwijk at gmail.com Wed Mar 14 21:27:46 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Wed, 14 Mar 2018 21:27:46 -0400 Subject: [Numpy-discussion] Where to discuss NEPs (was: Re: new NEP: np.AbstractArray and np.asabstractarray) In-Reply-To: References: Message-ID: Apparently, where and how to discuss enhancement proposals was recently a topic on the python mailing list as well -- see the write-up at LWN: https://lwn.net/SubscriberLink/749200/4343911ee71e35cf/ The conclusion seems to be that one should switch to mailman3... -- Marten From stefanv at berkeley.edu Thu Mar 15 18:29:06 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 15 Mar 2018 15:29:06 -0700 Subject: [Numpy-discussion] NEP sprint: 21 and 22 March In-Reply-To: <20180309232638.vumxg3z4dzfaz3yo@carbo> References: <20180309232638.vumxg3z4dzfaz3yo@carbo> Message-ID: <20180315222906.xc33qjkgas2k55xs@carbo> Hi everyone, A quick reminder of the NEP sprint that will happen at Berkeley next Wednesday and Thursday. Please let me know if you are interested in joining. Best regards St?fan On Fri, 09 Mar 2018 15:26:38 -0800, Stefan van der Walt wrote: > Hi everyone, > > As you may have noticed, there's been quite a bit of movement recently > around NumPy Enhancement Proposals---on setting specifications, > building infrastructure, as well as writing new proposals. > > To further support this work, we will be hosting an informal NEP > sprint at Berkeley on 21 and 22 March. Our aim is to bring core > contributors and interested community members together to discuss > proposal ideas, write up new NEPs, and polish existing ones. > > Some potential topics of discussion are: > > - Duck arrays > - Array concatenation > - Random number generator seed versioning > - User defined dtypes > - Deprecation pathways for `np.matrix` > - What to do about nditer? > > All community members are welcome to attend. If you are a core > contributor, we may be able to fund some travel costs as well; please > let me know. > > Best regards > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From tcaswell at gmail.com Thu Mar 15 22:56:41 2018 From: tcaswell at gmail.com (Thomas Caswell) Date: Fri, 16 Mar 2018 02:56:41 +0000 Subject: [Numpy-discussion] PR to add a function to calculate histogram edges without calculating the histogram In-Reply-To: References: Message-ID: Yes I like the name. The primary use-case for Matplotlib is that our `hist` method can take in a list of arrays and produces N histograms in one shot. Currently with 'auto' we only use the first data set to sort out what the bins should be and then re-use those for the rest of the data sets. This will let us get the bins on the merged input, but I take Josef's point that this is not actually what we want.... Tom On Mon, Mar 12, 2018 at 11:35 PM wrote: > On Mon, Mar 12, 2018 at 11:20 PM, Eric Wieser > wrote: > >> Given that the bin selection are data driven, transferring them across > datasets might not be so useful. > > > > The main application would be to compute bins across the union of all > > datasets. This is already possibly by using `np.histogram` and > > discarding the first result, but that's super wasteful. > > assuming "union" means a combined dataset. > > If you stack datasets, then the number of observations will not be > correct for individual datasets. > > In that case an additional keyword like nobs, or whatever name would > be appropriate for numpy, would be useful, e.g. use the average number > of observations across datasets. > Auxiliary statistic like std could then be computed on the total > dataset (if that makes sense, which would not be the case if the > variance across datasets is larger than the variance within datasets. > > Josef > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Mar 15 23:13:47 2018 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 15 Mar 2018 20:13:47 -0700 Subject: [Numpy-discussion] PR to add a function to calculate histogram edges without calculating the histogram In-Reply-To: References: Message-ID: Instead of an nobs argument, maybe we should have a version that accepts multiple data sets, so that we have the full information and can improve the algorithm over time. On Mar 15, 2018 7:57 PM, "Thomas Caswell" wrote: > Yes I like the name. > > The primary use-case for Matplotlib is that our `hist` method can take in > a list of arrays and produces N histograms in one shot. Currently with > 'auto' we only use the first data set to sort out what the bins should be > and then re-use those for the rest of the data sets. This will let us get > the bins on the merged input, but I take Josef's point that this is not > actually what we want.... > > Tom > > On Mon, Mar 12, 2018 at 11:35 PM wrote: > >> On Mon, Mar 12, 2018 at 11:20 PM, Eric Wieser >> wrote: >> >> Given that the bin selection are data driven, transferring them across >> datasets might not be so useful. >> > >> > The main application would be to compute bins across the union of all >> > datasets. This is already possibly by using `np.histogram` and >> > discarding the first result, but that's super wasteful. >> >> assuming "union" means a combined dataset. >> >> If you stack datasets, then the number of observations will not be >> correct for individual datasets. >> >> In that case an additional keyword like nobs, or whatever name would >> be appropriate for numpy, would be useful, e.g. use the average number >> of observations across datasets. >> Auxiliary statistic like std could then be computed on the total >> dataset (if that makes sense, which would not be the case if the >> variance across datasets is larger than the variance within datasets. >> >> Josef >> >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org >> > https://mail.python.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Mar 16 00:35:40 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 15 Mar 2018 21:35:40 -0700 Subject: [Numpy-discussion] NumPy 1.15 release schedule In-Reply-To: References: Message-ID: On Mon, Mar 12, 2018 at 11:44 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > Hi All, > > I'm thinking of branching NumPy in the middle/end of April. That is > quicker than usual, but there don't seem to be any major changes proposed > for the near future, we have merged a reasonable number of PRs, and a > Python 3.7 compatible release of Cython looks to be forthcoming. An early > release will also give us time for possibly two following releases before > we drop Python 2.7 support. With that schedule, I also propose to drop > Python 3.4 support in NumPy 1.16. > > Thoughts? > Sounds fine to me. Thanks Chuck! Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Fri Mar 16 00:45:19 2018 From: matti.picus at gmail.com (matti picus) Date: Fri, 16 Mar 2018 04:45:19 +0000 Subject: [Numpy-discussion] NEP sprint: 21 and 22 March In-Reply-To: <20180315222906.xc33qjkgas2k55xs@carbo> References: <20180309232638.vumxg3z4dzfaz3yo@carbo> <20180315222906.xc33qjkgas2k55xs@carbo> Message-ID: I would love to join but I will be at the PyPy yearly sprint in Switzerland from Saturday to Wednesday, and traveling back to Israel on Thursday. I can join virtually Wednesday, my evening will be your morning. I begin traveling Thurs morning which is sometime Wed afternoon for you and will be offline until I arrive home around 20:00 Israel time, which is Thurs morning. Matti On Fri, 16 Mar 2018 at 00:29, Stefan van der Walt wrote: > Hi everyone, > > A quick reminder of the NEP sprint that will happen at Berkeley next > Wednesday and Thursday. Please let me know if you are interested in > joining. > > Best regards > St?fan > > On Fri, 09 Mar 2018 15:26:38 -0800, Stefan van der Walt wrote: > > Hi everyone, > > > > As you may have noticed, there's been quite a bit of movement recently > > around NumPy Enhancement Proposals---on setting specifications, > > building infrastructure, as well as writing new proposals. > > > > To further support this work, we will be hosting an informal NEP > > sprint at Berkeley on 21 and 22 March. Our aim is to bring core > > contributors and interested community members together to discuss > > proposal ideas, write up new NEPs, and polish existing ones. > > > > Some potential topics of discussion are: > > > > - Duck arrays > > - Array concatenation > > - Random number generator seed versioning > > - User defined dtypes > > - Deprecation pathways for `np.matrix` > > - What to do about nditer? > > > > All community members are welcome to attend. If you are a core > > contributor, we may be able to fund some travel costs as well; please > > let me know. > > > > Best regards > > St?fan > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Fri Mar 16 01:09:52 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Fri, 16 Mar 2018 05:09:52 +0000 Subject: [Numpy-discussion] PR to add a function to calculate histogram edges without calculating the histogram In-Reply-To: References: Message-ID: That sounds like a reasonable extension - but I think there still exist cases where you want to treat the data as one uniform set when computing bins (toggling between orthogonal subsets of data) so isn't really a useful replacement. I suppose this becomes relevant when `density` is passed to the individual histogram invocations. Does matplotlib handle that correctly for stacked histograms? On Thu, Mar 15, 2018, 20:14 Nathaniel Smith wrote: > Instead of an nobs argument, maybe we should have a version that accepts > multiple data sets, so that we have the full information and can improve > the algorithm over time. > > On Mar 15, 2018 7:57 PM, "Thomas Caswell" wrote: > >> Yes I like the name. >> >> The primary use-case for Matplotlib is that our `hist` method can take in >> a list of arrays and produces N histograms in one shot. Currently with >> 'auto' we only use the first data set to sort out what the bins should be >> and then re-use those for the rest of the data sets. This will let us get >> the bins on the merged input, but I take Josef's point that this is not >> actually what we want.... >> >> Tom >> >> On Mon, Mar 12, 2018 at 11:35 PM wrote: >> >>> On Mon, Mar 12, 2018 at 11:20 PM, Eric Wieser >>> wrote: >>> >> Given that the bin selection are data driven, transferring them >>> across datasets might not be so useful. >>> > >>> > The main application would be to compute bins across the union of all >>> > datasets. This is already possibly by using `np.histogram` and >>> > discarding the first result, but that's super wasteful. >>> >>> assuming "union" means a combined dataset. >>> >>> If you stack datasets, then the number of observations will not be >>> correct for individual datasets. >>> >>> In that case an additional keyword like nobs, or whatever name would >>> be appropriate for numpy, would be useful, e.g. use the average number >>> of observations across datasets. >>> Auxiliary statistic like std could then be computed on the total >>> dataset (if that makes sense, which would not be the case if the >>> variance across datasets is larger than the variance within datasets. >>> >>> Josef >>> >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at python.org >>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Mar 16 03:06:58 2018 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 16 Mar 2018 00:06:58 -0700 Subject: [Numpy-discussion] PR to add a function to calculate histogram edges without calculating the histogram In-Reply-To: References: Message-ID: Oh sure, I'm not suggesting it be impossible to calculate for a single data set. If nothing else, if we had a version that accepted a list of data sets, then you could always pass in a single-element list :-). On Mar 15, 2018 22:10, "Eric Wieser" wrote: > That sounds like a reasonable extension - but I think there still exist > cases where you want to treat the data as one uniform set when computing > bins (toggling between orthogonal subsets of data) so isn't really a useful > replacement. > > I suppose this becomes relevant when `density` is passed to the individual > histogram invocations. Does matplotlib handle that correctly for stacked > histograms? > > On Thu, Mar 15, 2018, 20:14 Nathaniel Smith wrote: > >> Instead of an nobs argument, maybe we should have a version that accepts >> multiple data sets, so that we have the full information and can improve >> the algorithm over time. >> >> On Mar 15, 2018 7:57 PM, "Thomas Caswell" wrote: >> >>> Yes I like the name. >>> >>> The primary use-case for Matplotlib is that our `hist` method can take >>> in a list of arrays and produces N histograms in one shot. Currently with >>> 'auto' we only use the first data set to sort out what the bins should be >>> and then re-use those for the rest of the data sets. This will let us get >>> the bins on the merged input, but I take Josef's point that this is not >>> actually what we want.... >>> >>> Tom >>> >>> On Mon, Mar 12, 2018 at 11:35 PM wrote: >>> >>>> On Mon, Mar 12, 2018 at 11:20 PM, Eric Wieser >>>> wrote: >>>> >> Given that the bin selection are data driven, transferring them >>>> across datasets might not be so useful. >>>> > >>>> > The main application would be to compute bins across the union of all >>>> > datasets. This is already possibly by using `np.histogram` and >>>> > discarding the first result, but that's super wasteful. >>>> >>>> assuming "union" means a combined dataset. >>>> >>>> If you stack datasets, then the number of observations will not be >>>> correct for individual datasets. >>>> >>>> In that case an additional keyword like nobs, or whatever name would >>>> be appropriate for numpy, would be useful, e.g. use the average number >>>> of observations across datasets. >>>> Auxiliary statistic like std could then be computed on the total >>>> dataset (if that makes sense, which would not be the case if the >>>> variance across datasets is larger than the variance within datasets. >>>> >>>> Josef >>>> >>>> > _______________________________________________ >>>> > NumPy-Discussion mailing list >>>> > NumPy-Discussion at python.org >>>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Fri Mar 16 03:14:43 2018 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 16 Mar 2018 07:14:43 +0000 Subject: [Numpy-discussion] NEP sprint: 21 and 22 March In-Reply-To: <20180315222906.xc33qjkgas2k55xs@carbo> References: <20180309232638.vumxg3z4dzfaz3yo@carbo> <20180315222906.xc33qjkgas2k55xs@carbo> Message-ID: I will not be joining you for this sprint, but will be in the Bay Area from May 12th to May 25th, and wouldn't mind spending a day visiting you. If it works for you and anyone else want to join we could try to give it a little more structure than "just came over to say hi!" Jaime On Thu, Mar 15, 2018 at 11:29 PM Stefan van der Walt wrote: > Hi everyone, > > A quick reminder of the NEP sprint that will happen at Berkeley next > Wednesday and Thursday. Please let me know if you are interested in > joining. > > Best regards > St?fan > > On Fri, 09 Mar 2018 15:26:38 -0800, Stefan van der Walt wrote: > > Hi everyone, > > > > As you may have noticed, there's been quite a bit of movement recently > > around NumPy Enhancement Proposals---on setting specifications, > > building infrastructure, as well as writing new proposals. > > > > To further support this work, we will be hosting an informal NEP > > sprint at Berkeley on 21 and 22 March. Our aim is to bring core > > contributors and interested community members together to discuss > > proposal ideas, write up new NEPs, and polish existing ones. > > > > Some potential topics of discussion are: > > > > - Duck arrays > > - Array concatenation > > - Random number generator seed versioning > > - User defined dtypes > > - Deprecation pathways for `np.matrix` > > - What to do about nditer? > > > > All community members are welcome to attend. If you are a core > > contributor, we may be able to fund some travel costs as well; please > > let me know. > > > > Best regards > > St?fan > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Mar 16 09:43:41 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 16 Mar 2018 09:43:41 -0400 Subject: [Numpy-discussion] PR to add a function to calculate histogram edges without calculating the histogram In-Reply-To: References: Message-ID: passing a list of arrays would be useful (aside of discriminating between list and array_like) In that case I would add a keyword like "within=True" to compute the additional statistics like std or iqr on the group demeaned data. This would remove the effect of (mean-)shifted datasets on those auxiliary statistics. aside: An alternative to using a list of arrays would be to include a "groups" indicator as keyword, and if it is not None, then compute based on averages across groups or pooled within statistics. Josef On Fri, Mar 16, 2018 at 3:06 AM, Nathaniel Smith wrote: > Oh sure, I'm not suggesting it be impossible to calculate for a single data > set. If nothing else, if we had a version that accepted a list of data sets, > then you could always pass in a single-element list :-). > > On Mar 15, 2018 22:10, "Eric Wieser" wrote: >> >> That sounds like a reasonable extension - but I think there still exist >> cases where you want to treat the data as one uniform set when computing >> bins (toggling between orthogonal subsets of data) so isn't really a useful >> replacement. >> >> I suppose this becomes relevant when `density` is passed to the individual >> histogram invocations. Does matplotlib handle that correctly for stacked >> histograms? >> >> On Thu, Mar 15, 2018, 20:14 Nathaniel Smith wrote: >>> >>> Instead of an nobs argument, maybe we should have a version that accepts >>> multiple data sets, so that we have the full information and can improve the >>> algorithm over time. >>> >>> On Mar 15, 2018 7:57 PM, "Thomas Caswell" wrote: >>>> >>>> Yes I like the name. >>>> >>>> The primary use-case for Matplotlib is that our `hist` method can take >>>> in a list of arrays and produces N histograms in one shot. Currently with >>>> 'auto' we only use the first data set to sort out what the bins should be >>>> and then re-use those for the rest of the data sets. This will let us get >>>> the bins on the merged input, but I take Josef's point that this is not >>>> actually what we want.... >>>> >>>> Tom >>>> >>>> On Mon, Mar 12, 2018 at 11:35 PM wrote: >>>>> >>>>> On Mon, Mar 12, 2018 at 11:20 PM, Eric Wieser >>>>> wrote: >>>>> >> Given that the bin selection are data driven, transferring them >>>>> >> across datasets might not be so useful. >>>>> > >>>>> > The main application would be to compute bins across the union of all >>>>> > datasets. This is already possibly by using `np.histogram` and >>>>> > discarding the first result, but that's super wasteful. >>>>> >>>>> assuming "union" means a combined dataset. >>>>> >>>>> If you stack datasets, then the number of observations will not be >>>>> correct for individual datasets. >>>>> >>>>> In that case an additional keyword like nobs, or whatever name would >>>>> be appropriate for numpy, would be useful, e.g. use the average number >>>>> of observations across datasets. >>>>> Auxiliary statistic like std could then be computed on the total >>>>> dataset (if that makes sense, which would not be the case if the >>>>> variance across datasets is larger than the variance within datasets. >>>>> >>>>> Josef >>>>> >>>>> > _______________________________________________ >>>>> > NumPy-Discussion mailing list >>>>> > NumPy-Discussion at python.org >>>>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at python.org >>>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From einstein.edison at gmail.com Fri Mar 16 13:10:06 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Fri, 16 Mar 2018 10:10:06 -0700 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) Message-ID: Hello, everyone. I?ve submitted a PR to add a initializer kwarg to ufunc.reduce. This is useful in a few cases, e.g., it allows one to supply a ?default? value for identity-less ufunc reductions, and specify an initial value for reductions such as sum (other than zero.) Please feel free to review or leave feedback, (although I think Eric and Marten have picked it apart pretty well). https://github.com/numpy/numpy/pull/10635 Thanks, Hameer Sent from Astro for Mac -------------- next part -------------- An HTML attachment was scrubbed... URL: From tcaswell at gmail.com Sat Mar 17 17:42:01 2018 From: tcaswell at gmail.com (Thomas Caswell) Date: Sat, 17 Mar 2018 21:42:01 +0000 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: <1520560316.2962680.1296803088.6C85AC87@webmail.messagingengine.com> Message-ID: It would be nice if there was an IntEnum [1] that was taken is an input to `np.asarrayish` and `np.isarrayish` to require a combination of the groups of attributes/methods/semantics. Tom [1] https://docs.python.org/3/library/enum.html#intenum On Sat, Mar 10, 2018 at 7:14 PM Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > > ?I think we don't have to make it sounds like there are *that* many types > of compatibility: really there is just array organisation > (indexing/reshaping) and array arithmetic. These correspond roughly to > ShapedLikeNDArray in astropy and NDArrayOperatorMixin in numpy (missing so > far is concatenation). The advantage of the ABC classes is that they can > supply missing methods (say, size, isscalar, __len__, and ndim given shape; > __iter__ given __getitem__, ravel, squeeze, flatten given reshape; etc.). > > -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Sat Mar 17 18:01:57 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Sat, 17 Mar 2018 15:01:57 -0700 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: Message-ID: It would be nice if there was an IntEnum [1] that was taken is an input to `np.asarrayish` and `np.isarrayish` to require a combination of the groups of attributes/methods/semantics. Don?t you mean IntFlag ? I like Marten?s idea of ?grouping together? related functionality via ABCs and implementing different parts via ABCs (for example, in pydata/sparse we use NDArrayOperatorsMixin for exactly this), but I believe that separate ABCs should be provided for different parts of the interface. Then we can either: 1. Check with isinstance for the ABCs, or 2. Check with hasattr. I like the IntFlag idea most (it seems to be designed for use-cases like these), but a string-based (np.aspyarray(x, functionality=?arithmetic|reductions')) or list-based (np.aspyarray(x, functionality=[?arithmetic?, ?reductions?]) is also fine. It might help to have some sort of a ?dry-run? interface that (given a run of code) figures out which parts you need. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tcaswell at gmail.com Sat Mar 17 18:09:51 2018 From: tcaswell at gmail.com (Thomas Caswell) Date: Sat, 17 Mar 2018 22:09:51 +0000 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: Message-ID: Yes, meant IntFlag :sheep: On Sat, Mar 17, 2018 at 6:02 PM Hameer Abbasi wrote: > > It would be nice if there was an IntEnum [1] that was taken is an input to > `np.asarrayish` and `np.isarrayish` to require a combination of the groups > of attributes/methods/semantics. > > > Don?t you mean IntFlag > ? I like Marten?s > idea of ?grouping together? related functionality via ABCs and implementing > different parts via ABCs (for example, in pydata/sparse we use > NDArrayOperatorsMixin for exactly this), but I believe that separate ABCs > should be provided for different parts of the interface. > > Then we can either: > > 1. Check with isinstance for the ABCs, or > 2. Check with hasattr. > > I like the IntFlag idea most (it seems to be designed for use-cases like > these), but a string-based (np.aspyarray(x, > functionality=?arithmetic|reductions')) or list-based (np.aspyarray(x, > functionality=[?arithmetic?, ?reductions?]) is also fine. > > It might help to have some sort of a ?dry-run? interface that (given a run > of code) figures out which parts you need. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Sat Mar 17 20:25:59 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Sun, 18 Mar 2018 00:25:59 +0000 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: Message-ID: I would have thought that a simple tuple of types would be more appropriate than using integer flags, since that means that isinstance can be used on the individual elements. Ideally there?d be a typing.Intersection[TraitA, TraitB] for this kind of thing. ? On Sat, 17 Mar 2018 at 15:10 Thomas Caswell wrote: > Yes, meant IntFlag :sheep: > > On Sat, Mar 17, 2018 at 6:02 PM Hameer Abbasi > wrote: > >> >> It would be nice if there was an IntEnum [1] that was taken is an input >> to `np.asarrayish` and `np.isarrayish` to require a combination of the >> groups of attributes/methods/semantics. >> >> >> Don?t you mean IntFlag >> ? I like Marten?s >> idea of ?grouping together? related functionality via ABCs and implementing >> different parts via ABCs (for example, in pydata/sparse we use >> NDArrayOperatorsMixin for exactly this), but I believe that separate ABCs >> should be provided for different parts of the interface. >> >> Then we can either: >> >> 1. Check with isinstance for the ABCs, or >> 2. Check with hasattr. >> >> I like the IntFlag idea most (it seems to be designed for use-cases like >> these), but a string-based (np.aspyarray(x, >> functionality=?arithmetic|reductions')) or list-based (np.aspyarray(x, >> functionality=[?arithmetic?, ?reductions?]) is also fine. >> >> It might help to have some sort of a ?dry-run? interface that (given a >> run of code) figures out which parts you need. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Sun Mar 18 11:57:32 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Sun, 18 Mar 2018 11:57:32 -0400 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: Message-ID: Yes, a tuple of types would make more sense, given `isinstance` -- string abbreviations for those could be there for convenience. -- Marten On Sat, Mar 17, 2018 at 8:25 PM, Eric Wieser wrote: > I would have thought that a simple tuple of types would be more appropriate > than using integer flags, since that means that isinstance can be used on > the individual elements. Ideally there?d be a typing.Intersection[TraitA, > TraitB] for this kind of thing. > > > On Sat, 17 Mar 2018 at 15:10 Thomas Caswell wrote: >> >> Yes, meant IntFlag :sheep: >> >> On Sat, Mar 17, 2018 at 6:02 PM Hameer Abbasi >> wrote: >>> >>> >>> It would be nice if there was an IntEnum [1] that was taken is an input >>> to `np.asarrayish` and `np.isarrayish` to require a combination of the >>> groups of attributes/methods/semantics. >>> >>> >>> Don?t you mean IntFlag? I like Marten?s idea of ?grouping together? >>> related functionality via ABCs and implementing different parts via ABCs >>> (for example, in pydata/sparse we use NDArrayOperatorsMixin for exactly >>> this), but I believe that separate ABCs should be provided for different >>> parts of the interface. >>> >>> Then we can either: >>> >>> Check with isinstance for the ABCs, or >>> Check with hasattr. >>> >>> I like the IntFlag idea most (it seems to be designed for use-cases like >>> these), but a string-based (np.aspyarray(x, >>> functionality=?arithmetic|reductions')) or list-based (np.aspyarray(x, >>> functionality=[?arithmetic?, ?reductions?]) is also fine. >>> >>> It might help to have some sort of a ?dry-run? interface that (given a >>> run of code) figures out which parts you need. >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Mon Mar 19 21:06:10 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 19 Mar 2018 19:06:10 -0600 Subject: [Numpy-discussion] NEP sprint: 21 and 22 March In-Reply-To: References: <20180309232638.vumxg3z4dzfaz3yo@carbo> <20180315222906.xc33qjkgas2k55xs@carbo> Message-ID: On Fri, Mar 16, 2018 at 1:14 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > I will not be joining you for this sprint, but will be in the Bay Area > from May 12th to May 25th, and wouldn't mind spending a day visiting you. > > If it works for you and anyone else want to join we could try to give it a > little more structure than "just came over to say hi!" > > Jaime > That would be a good time frame for me also. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Mar 22 04:14:23 2018 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 22 Mar 2018 01:14:23 -0700 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: <1520560316.2962680.1296803088.6C85AC87@webmail.messagingengine.com> Message-ID: On Sat, Mar 10, 2018 at 4:27 AM, Matthew Rocklin wrote: > I'm very glad to see this discussion. > > I think that coming up with a single definition of array-like may be > difficult, and that we might end up wanting to embrace duck typing instead. > > It seems to me that different array-like classes will implement different > mixtures of features. It may be difficult to pin down a single definition > that includes anything except for the most basic attributes (shape and > dtype?). Consider two extreme cases of restrictive functionality: > > LinearOperators (support dot in a numpy-like way) > Storage objects like h5py (support getitem in a numpy-like way) > > I can imagine authors of both groups saying that they should qualify as > array-like because downstream projects that consume them should not convert > them to numpy arrays in important contexts. I think this is an important point -- there are a lot of subtleties in the interfaces that different objects might want to provide. Some interesting ones that haven't been mentioned: - a "duck array" that has everything except fancy indexing - xarray's arrays are just like numpy arrays in most ways, but they have incompatible broadcasting semantics - immutable vs. mutable arrays When faced with this kind of situation, always it's tempting to try to write down some classification system to capture every possible configuration of interesting behavior. In fact, this is one of the most classic nerd snipes; it's been catching people for literally thousands of years [1]. Most of these attempts fail though :-). So let's back up -- I probably erred in not making this more clear in the NEP, but I actually have a fairly concrete use case in mind here. What happened is, I started working on a NEP for __array_concatenate__, and my thought pattern went as follows: 1) Cool, this should work for np.concatenate. 2) But what about all the other variants, like np.row_stack. We don't want __array_row_stack__; we want to express row_stack in terms of concatenate. 3) Ok, what's row_stack? It's: np.concatenate([np.atleast_2d(arr) for arr in arrs], axis=0) 4) So I need to make atleast_2d work on duck arrays. What's atleast_2d? It's: asarray + some shape checks and indexing with newaxis 5) Okay, so I need something atleast_2d can call instead of asarray [2]. And this kind of pattern shows up everywhere inside numpy, e.g. it's the first thing inside lots of functions in np.linalg b/c they do some futzing with dtypes and shape before delegating to ufuncs, it's the first thing the mean() function does b/c it needs to check arr.dtype before proceeding, etc. etc. So, we need something we can use in these functions as a first step towards unlocking the use of duck arrays in general. But we can't realistically go through each of these functions, make an exact list of all the operations/attributes it cares about, and then come up with exactly the right type constraint for it to impose at the top. And these functions aren't generally going to work on LinearOperators or h5py datasets anyway. We also don't want to go through every function in numpy and add new arguments to control this coercion behavior. What we can do, at least to start, is to have a mechanism that passes through objects that aspire to be "complete" duck arrays, like dask arrays or sparse arrays or astropy's unit arrays, and then if it turns out that in practice people find uses for finer-grained distinctions, we can iteratively add those as a second pass. Notice that if a function starts out requiring a "complete" duck array, and then later relaxes that to accept "partial" duck arrays, that's actually increasing the domain of objects that it can act on, so it's a backwards-compatible change that we can do later. So I think we should start out with a concept of "duck array" that's fairly strong but a bit vague on the exact details (e.g., dask.array.Array is currently missing some weird things like arr.ptp() and arr.tolist(), I guess because no-one has ever noticed or cared?). ------------ Thinking things through like this, I also realized that this proposal jumps through hoops to avoid changing np.asarray itself, because I was nervous about changing the rule that its output is always an ndarray... but actually, this is currently the rule for most functions in numpy, and the whole point of this proposal is to relax that rule for most functions, in cases where the user is explicitly passing in a duck-array object. So maybe I'm being overparanoid? I'm genuinely unsure here. Instead of messing about with ABCs, an alternative mechanism would be to add a new method __arrayish__ (hat tip to Tom Caswell for the name :-)), that essentially acts as an override for Python-level calls to np.array / np.asarray, in much the same way that __array_ufunc__ overrides ufuncs, etc. (C level calls to PyArray_FromAny and similar would of course continue to return ndarray objects, and I assume we'd add some argument like require_ndarray= that you could pass to explicitly indicate whether you needed C-level compatibility.) This would also allow objects like h5py datasets to *produce* an arrayish object on demand, even if they aren't one themselves. (E.g., imagine some hdf5-like storage that holds sparse arrays instead of regular arrays.) I'm thinking I may write this option up as a second NEP, to compete with my first one. -n [1] See: https://www.wiley.com/en-us/The+Search+for+the+Perfect+Language-p-9780631205104 [2] Actually atleast_2d calls asanyarray, not asarray, but that's just a detail; the way to solve this problem for asanyarray is to first solve it for asarray. -- Nathaniel J. Smith -- https://vorpus.org From mhimes at knights.ucf.edu Wed Mar 21 16:40:55 2018 From: mhimes at knights.ucf.edu (Michael Himes) Date: Wed, 21 Mar 2018 20:40:55 +0000 Subject: [Numpy-discussion] 3D array slicing bug? Message-ID: Hi, I have discovered what I believe is a bug with array slicing involving 3D (and higher) dimension arrays. When slicing a 3D array by a single value for axis 0, all values for axis 1, and a list to slice axis 2, the dimensionality of the resulting 2D array is flipped. However, slicing more than a single index for axis 0 or performing the slicing in two steps results in the correct dimensionality. Below is a quick example to demonstrate this behavior. import numpy as np arr = np.arange(54).reshape(2, 3, 9) list = [0, 2, 4, 5, 8] print(arr.shape) # (2, 3, 9) print(arr[0, :, list].shape) # (5, 3) -- but it should be (3, 5)? print(arr[0][:, list].shape) # (3, 5), as expected print(arr[0:1, :, list].shape) # (1, 3, 5), as expected This behavior carries over to 4D arrays as well, where the axis sliced with a list becomes the 0th axis regardless of order. Below demonstrates that. arr2 = np.arange(324).reshape(2, 3, 6, 9) print(arr2[0, :, :, list].shape) # (5, 3, 6), but I expect (3, 6, 5) arr3 = np.arange(324).reshape(2, 3, 9, 6) print(arr3[0, :, list].shape) # (5, 3, 6), expected (3, 5, 6) print(arr3[0, :, list, :].shape) # same as above Can anyone explain this behavior, or is this a bug? Best, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu Mar 22 05:41:18 2018 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 22 Mar 2018 10:41:18 +0100 Subject: [Numpy-discussion] 3D array slicing bug? In-Reply-To: References: Message-ID: <1521711678.6503.44.camel@iki.fi> ke, 2018-03-21 kello 20:40 +0000, Michael Himes kirjoitti: > I have discovered what I believe is a bug with array slicing > involving 3D (and higher) dimension arrays. When slicing a 3D array > by a single value for axis 0, all values for axis 1, and a list to > slice axis 2, the dimensionality of the resulting 2D array is > flipped. However, slicing more than a single index for axis 0 or > performing the slicing in two steps results in the correct > dimensionality. Below is a quick example to demonstrate this > behavior. > https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#combining-advanced-and-basic-indexing The key part seems to be: "There are two parts to the indexing operation, the subspace defined by the basic indexing (**excluding integers**) and the subspace from the advanced indexing part." -- Pauli Virtanen From einstein.edison at gmail.com Thu Mar 22 06:35:46 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Thu, 22 Mar 2018 11:35:46 +0100 Subject: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray In-Reply-To: References: <1520560316.2962680.1296803088.6C85AC87@webmail.messagingengine.com> Message-ID: I think that with your comments in mind, it may just be best to embrace duck typing, like Matthew suggested. I propose the following workflow: - __array_concatenate__ and similar "protocol" functions return NotImplemented if they won't work. - "Base functions" that can be called directly like __getitem__ raise NotImplementedError if they won't work. - __arrayish__ = True Then, something like np.concatenate would do the following: - Call __array_concatenate__ following the same order as ufunc arguments. - If everything fails, raise NotImplementedError (or convert everything to ndarray). Overloaded functions would do something like this (perhaps a simple decorator will do for the repetitive work?): - Try with np.arrayish - Catch NotImplementedError - Try with np.array Then, we use abstract classes just to overload functionality or implement things in terms of others. If something fails, we have a decent fallback. We don't need to do anything special in order to "check" functionality. Feel free to propose changes, but this is the best I could come up with that would require the smallest incremental changes to Numpy while also supporting everything right from the start. On Thu, Mar 22, 2018 at 9:14 AM, Nathaniel Smith wrote: > On Sat, Mar 10, 2018 at 4:27 AM, Matthew Rocklin > wrote: > > I'm very glad to see this discussion. > > > > I think that coming up with a single definition of array-like may be > > difficult, and that we might end up wanting to embrace duck typing > instead. > > > > It seems to me that different array-like classes will implement different > > mixtures of features. It may be difficult to pin down a single > definition > > that includes anything except for the most basic attributes (shape and > > dtype?). Consider two extreme cases of restrictive functionality: > > > > LinearOperators (support dot in a numpy-like way) > > Storage objects like h5py (support getitem in a numpy-like way) > > > > I can imagine authors of both groups saying that they should qualify as > > array-like because downstream projects that consume them should not > convert > > them to numpy arrays in important contexts. > > I think this is an important point -- there are a lot of subtleties in > the interfaces that different objects might want to provide. Some > interesting ones that haven't been mentioned: > > - a "duck array" that has everything except fancy indexing > - xarray's arrays are just like numpy arrays in most ways, but they > have incompatible broadcasting semantics > - immutable vs. mutable arrays > > When faced with this kind of situation, always it's tempting to try to > write down some classification system to capture every possible > configuration of interesting behavior. In fact, this is one of the > most classic nerd snipes; it's been catching people for literally > thousands of years [1]. Most of these attempts fail though :-). > > So let's back up -- I probably erred in not making this more clear in > the NEP, but I actually have a fairly concrete use case in mind here. > What happened is, I started working on a NEP for > __array_concatenate__, and my thought pattern went as follows: > > 1) Cool, this should work for np.concatenate. > 2) But what about all the other variants, like np.row_stack. We don't > want __array_row_stack__; we want to express row_stack in terms of > concatenate. > 3) Ok, what's row_stack? It's: > np.concatenate([np.atleast_2d(arr) for arr in arrs], axis=0) > 4) So I need to make atleast_2d work on duck arrays. What's > atleast_2d? It's: asarray + some shape checks and indexing with > newaxis > 5) Okay, so I need something atleast_2d can call instead of asarray [2]. > > And this kind of pattern shows up everywhere inside numpy, e.g. it's > the first thing inside lots of functions in np.linalg b/c they do some > futzing with dtypes and shape before delegating to ufuncs, it's the > first thing the mean() function does b/c it needs to check arr.dtype > before proceeding, etc. etc. > > So, we need something we can use in these functions as a first step > towards unlocking the use of duck arrays in general. But we can't > realistically go through each of these functions, make an exact list > of all the operations/attributes it cares about, and then come up with > exactly the right type constraint for it to impose at the top. And > these functions aren't generally going to work on LinearOperators or > h5py datasets anyway. > > We also don't want to go through every function in numpy and add new > arguments to control this coercion behavior. > > What we can do, at least to start, is to have a mechanism that passes > through objects that aspire to be "complete" duck arrays, like dask > arrays or sparse arrays or astropy's unit arrays, and then if it turns > out that in practice people find uses for finer-grained distinctions, > we can iteratively add those as a second pass. Notice that if a > function starts out requiring a "complete" duck array, and then later > relaxes that to accept "partial" duck arrays, that's actually > increasing the domain of objects that it can act on, so it's a > backwards-compatible change that we can do later. > > So I think we should start out with a concept of "duck array" that's > fairly strong but a bit vague on the exact details (e.g., > dask.array.Array is currently missing some weird things like arr.ptp() > and arr.tolist(), I guess because no-one has ever noticed or cared?). > > ------------ > > Thinking things through like this, I also realized that this proposal > jumps through hoops to avoid changing np.asarray itself, because I was > nervous about changing the rule that its output is always an > ndarray... but actually, this is currently the rule for most functions > in numpy, and the whole point of this proposal is to relax that rule > for most functions, in cases where the user is explicitly passing in a > duck-array object. So maybe I'm being overparanoid? I'm genuinely > unsure here. > > Instead of messing about with ABCs, an alternative mechanism would be > to add a new method __arrayish__ (hat tip to Tom Caswell for the name > :-)), that essentially acts as an override for Python-level calls to > np.array / np.asarray, in much the same way that __array_ufunc__ > overrides ufuncs, etc. (C level calls to PyArray_FromAny and similar > would of course continue to return ndarray objects, and I assume we'd > add some argument like require_ndarray= that you could pass to > explicitly indicate whether you needed C-level compatibility.) > > This would also allow objects like h5py datasets to *produce* an > arrayish object on demand, even if they aren't one themselves. (E.g., > imagine some hdf5-like storage that holds sparse arrays instead of > regular arrays.) > > I'm thinking I may write this option up as a second NEP, to compete > with my first one. > > -n > > [1] See: https://www.wiley.com/en-us/The+Search+for+the+Perfect+ > Language-p-9780631205104 > [2] Actually atleast_2d calls asanyarray, not asarray, but that's just > a detail; the way to solve this problem for asanyarray is to first > solve it for asarray. > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu Mar 22 05:23:38 2018 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 22 Mar 2018 10:23:38 +0100 Subject: [Numpy-discussion] 3D array slicing bug? In-Reply-To: References: Message-ID: <1521710618.6503.43.camel@iki.fi> ke, 2018-03-21 kello 20:40 +0000, Michael Himes kirjoitti: > I have discovered what I believe is a bug with array slicing > involving 3D (and higher) dimension arrays. When slicing a 3D array > by a single value for axis 0, all values for axis 1, and a list to > slice axis 2, the dimensionality of the resulting 2D array is > flipped. https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#combining-advanced-and-basic-indexing The key part seems to be: "There are two parts to the indexing operation, the subspace defined by the basic indexing (**excluding integers**) and the subspace from the advanced indexing part." From sebastian at sipsolutions.net Thu Mar 22 08:44:38 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 22 Mar 2018 13:44:38 +0100 Subject: [Numpy-discussion] 3D array slicing bug? In-Reply-To: <1521711678.6503.44.camel@iki.fi> References: <1521711678.6503.44.camel@iki.fi> Message-ID: <1521722678.19593.2.camel@sipsolutions.net> This NEP draft has some more hints/explanations if you are interested: https://github.com/seberg/numpy/blob/5becd12914d0402967205579d6f59a9815 1e0d98/doc/neps/indexing.rst#examples Plus, it tries to avoid the word "subspace" hehehe. - Sebastian On Thu, 2018-03-22 at 10:41 +0100, Pauli Virtanen wrote: > ke, 2018-03-21 kello 20:40 +0000, Michael Himes kirjoitti: > > I have discovered what I believe is a bug with array slicing > > involving 3D (and higher) dimension arrays. When slicing a 3D array > > by a single value for axis 0, all values for axis 1, and a list to > > slice axis 2, the dimensionality of the resulting 2D array is > > flipped. However, slicing more than a single index for axis 0 or > > performing the slicing in two steps results in the correct > > dimensionality. Below is a quick example to demonstrate this > > behavior. > > > > https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#combi > ning-advanced-and-basic-indexing > > The key part seems to be: "There are two parts to the indexing > operation, the subspace defined by the basic indexing > (**excluding integers**) and the subspace from the advanced indexing > part." > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From matti.picus at gmail.com Thu Mar 22 13:37:03 2018 From: matti.picus at gmail.com (Matti Picus) Date: Thu, 22 Mar 2018 19:37:03 +0200 Subject: [Numpy-discussion] nditer as a context manager Message-ID: |Hello all, PR #9998 (https://github.com/numpy/numpy/pull/9998/) proposes an update to the nditer API, both C and python. The issue (link) is that |||sometimes nditer uses temp arrays via the "writeback" mechanism, the data is copied back to the original arrays "when finished". However "when finished" was implemented using nditer deallocation. |This mechanism is implicit and unclear, and relies on refcount semantics which do not work on non-refcount python implementations like PyPY. It also leads to lines of code like "iter=None" to trigger the writeback resolution. On the c-api level the agreed upon solution is to add a new |||`NpyIter_Close` function in C, this is to be called before `NpyIter_Dealloc`. The reviewers and I would like to ask the wider NumPy community for opinions about the proposed python-level solution: |turning the python nditer object into a context manager. This way "writeback" occurs at context manager exit via a call to `NpyIter_Close`, instead of like before when it occurred at nditer deallocation (which might not happen until much later in Pypy, and could be delayed by GC even in Cpython). Another solution that was rejected (https://github.com/numpy/numpy/pull/10184) was to add an nditer.close() python-level function that would not require a context manager It was felt that this is more error-prone since it requires users to add the line for each iterator created. The back-compat issues are that: 1. We are adding a new function to the numpy API, `NpyIter_Close` (pretty harmless) 2. We want people to update their C code using nditer, to call `NpyIter_Close` before ?they call `NpyIter_Dealloc` and will start raising a deprecation warning if misuse is detected 3. We want people to update their Python code to use the nditer object as a context manager, and will warn if they do not. We tried to minimize back-compat issues, in the sense that old code (which didn't work in PyPy anyway) will still work, although it will now emit deprecation warnings. In the future we also plan to raise an error if an nditer is used in Python without a context manager (when it should have been). For C code, we plan to leave the deprecation warning in place probably forever, as we can only detect the deprecated behavior in the deallocator, where exceptions cannot be raised. Anybody who uses nditers should take a look and please reply if it seems the change will be too painful. For more details, please see the updated docs in that PR Matti (and reviewers) | From matti.picus at gmail.com Thu Mar 22 13:43:23 2018 From: matti.picus at gmail.com (Matti Picus) Date: Thu, 22 Mar 2018 19:43:23 +0200 Subject: [Numpy-discussion] nditer as a context manager (reformatted?) In-Reply-To: References: Message-ID: <1522036d-a561-ba14-8dc3-48e329266827@gmail.com> Hello all, PR #9998 (https://github.com/numpy/numpy/pull/9998/) proposes an update to the nditer API, both C and python. The issue (https://github.com/numpy/numpy/issues/9714) is that sometimes nditer uses temp arrays via the "writeback" mechanism, the data is copied back to the original arrays "when finished". However "when finished" was implemented using nditer deallocation. This mechanism is implicit and unclear, and relies on refcount semantics which do not work on non-refcount python implementations like PyPY. It also leads to lines of code like "iter=None" to trigger the writeback resolution. On the c-api level the agreed upon solution is to add a new `NpyIter_Close` function in C, this is to be called before `NpyIter_Dealloc`. The reviewers and I would like to ask the wider NumPy community for opinions about the proposed python-level solution: turning the python nditer object into a context manager. This way "writeback" occurs at context manager exit via a call to `NpyIter_Close`, instead of like before when it occurred at `nditer` deallocation (which might not happen until much later in Pypy, and could be delayed by GC even in Cpython). Another solution that was rejected (https://github.com/numpy/numpy/pull/10184) was to add an nditer.close() python-level function that would not require a context manager It was felt that this is more error-prone since it requires users to add the line for each iterator created. The back-compat issues are that: 1. We are adding a new function to the numpy API, `NpyIter_Close` (pretty harmless) 2. We want people to update their C code using nditer, to call `NpyIter_Close` before ?they call `NpyIter_Dealloc` and will start raising a deprecation warning if misuse is detected 3. We want people to update their Python code to use the nditer object as a context manager, and will warn if they do not. We tried to minimize back-compat issues, in the sense that old code (which didn't work in PyPy anyway) will still work, although it will now emit deprecation warnings. In the future we also plan to raise an error if an nditer is used in Python without a context manager (when it should have been). For C code, we plan to leave the deprecation warning in place probably forever, as we can only detect the deprecated behavior in the deallocator, where exceptions cannot be raised. Anybody who uses nditers should take a look and please reply if it seems the change will be too painful. For more details, please see the updated docs in that PR Matti (and reviewers) From oc-spam66 at laposte.net Thu Mar 22 15:05:57 2018 From: oc-spam66 at laposte.net (Olivier) Date: Thu, 22 Mar 2018 20:05:57 +0100 Subject: [Numpy-discussion] round(numpy.float64(0.0)) is a numpy.float64 In-Reply-To: <422941419.2737564.1521718689632.JavaMail.zimbra@laposte.net> References: <422941419.2737564.1521718689632.JavaMail.zimbra@laposte.net> Message-ID: Hello, Is it normal, expected and desired that : ????round(numpy.float64(0.0)) is a numpy.float64 while ????round(numpy.float(0.0)) is an integer? I find it disturbing and misleading. What do you think? Has it already been discussed somewhere else? Best regards, Olivier From nathan12343 at gmail.com Thu Mar 22 15:32:57 2018 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Thu, 22 Mar 2018 19:32:57 +0000 Subject: [Numpy-discussion] round(numpy.float64(0.0)) is a numpy.float64 In-Reply-To: References: <422941419.2737564.1521718689632.JavaMail.zimbra@laposte.net> Message-ID: numpy.float is an alias to the python float builtin. https://github.com/numpy/numpy/issues/3998 On Thu, Mar 22, 2018 at 2:26 PM Olivier wrote: > Hello, > > > Is it normal, expected and desired that : > > > round(numpy.float64(0.0)) is a numpy.float64 > > > while > > round(numpy.float(0.0)) is an integer? > > > I find it disturbing and misleading. What do you think? Has it already been > discussed somewhere else? > > > Best regards, > > > Olivier > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From opossumnano at gmail.com Fri Mar 23 04:24:10 2018 From: opossumnano at gmail.com (Python School Organizers) Date: Fri, 23 Mar 2018 01:24:10 -0700 (PDT) Subject: [Numpy-discussion] =?utf-8?b?W0FOTl0gMTHhtZfKsCBBZHZhbmNlZCBT?= =?utf-8?q?cientific_Programming_in_Python_in_Camerino=2C_Italy=2C_3?= =?utf-8?q?=E2=80=948_September=2C_2018?= Message-ID: <5ab4b9aa.08c41c0a.55881.885a@mx.google.com> 11?? Advanced Scientific Programming in Python ============================================== a Summer School by the G-Node and the University of Camerino https://python.g-node.org Scientists spend more and more time writing, maintaining, and debugging software. While techniques for doing this efficiently have evolved, only few scientists have been trained to use them. As a result, instead of doing their research, they spend far too much time writing deficient code and reinventing the wheel. In this course we will present a selection of advanced programming techniques and best practices which are standard in the industry, but especially tailored to the needs of a programming scientist. Lectures are devised to be interactive and to give the students enough time to acquire direct hands-on experience with the materials. Students will work in pairs throughout the school and will team up to practice the newly learned skills in a real programming project ? an entertaining computer game. We use the Python programming language for the entire course. Python works as a simple programming language for beginners, but more importantly, it also works great in scientific simulations and data analysis. We show how clean language design, ease of extensibility, and the great wealth of open source libraries for scientific computing and data visualization are driving Python to become a standard tool for the programming scientist. This school is targeted at Master or PhD students and Post-docs from all areas of science. Competence in Python or in another language such as Java, C/C++, MATLAB, or Mathematica is absolutely required. Basic knowledge of Python and of a version control system such as git, subversion, mercurial, or bazaar is assumed. Participants without any prior experience with Python and/or git should work through the proposed introductory material before the course. We are striving hard to get a pool of students which is international and gender-balanced: see how far we got in previous years ! Date & Location =============== 3?8 September, 2018. Camerino, Italy. Application =========== You can apply online: https://python.g-node.org/wiki/applications Application deadline: 23:59 UTC, 31 May, 2018. There will be no deadline extension, so be sure to apply on time. Be sure to read the FAQ before applying: https://python.g-node.org/wiki/faq Participation is for free, i.e. no fee is charged! Participants however should take care of travel, living, and accommodation expenses by themselves. Program ======= ? Version control with git and how to contribute to open source projects with GitHub ? Best practices in data visualization ? Organizing, documenting, and distributing scientific code ? Testing scientific code ? Profiling scientific code ? Advanced NumPy ? Advanced scientific Python: decorators, context managers, generators, and elements of object oriented programming ? Writing parallel applications in Python ? Speeding up scientific code with Cython and numba ? Memory-bound computations and the memory hierarchy ? Programming in teams Also see the detailed day-by-day schedule: https://python.g-node.org/wiki/schedule Faculty ======= ? Ashwin Trikuta Srinath, Cyberinfrastructure Technology Integration, Clemson University, SC USA ? Jenni Rinker, Department of Wind Energy, Technical University of Denmark, Roskilde Denmark ? Juan Nunez-Iglesias, Melbourne Bioinformatics, University of Melbourne Australia ? Nicolas P. Rougier, Inria Bordeaux Sud-Ouest, Institute of Neurodegenerative Disease, University of Bordeaux France ? Pietro Berkes, NAGRA Kudelski, Lausanne Switzerland ? Rike-Benjamin Schuppner, Institute for Theoretical Biology, Humboldt-Universit?t zu Berlin Germany ? Tiziano Zito, freelance consultant, Berlin Germany ? Zbigniew J?drzejewski-Szmek, Red Hat Inc., Warsaw Poland Organizers ========== For the German Neuroinformatics Node of the INCF (G-Node) Germany: ? Tiziano Zito, freelance consultant, Berlin Germany ? Caterina Buizza, Personal Robotics Lab, Imperial College London UK ? Zbigniew J?drzejewski-Szmek, Red Hat Inc., Warsaw Poland ? Jakob Jordan, Department of Physiology, University of Bern, Switzerland Switzerland For the University of Camerino Italy: ? Flavio Corradini, Computer Science Division, School of Science and Technology, University of Camerino Italy ? Barbara Re, Computer Science Division, School of Science and Technology, University of Camerino Italy Website: https://python.g-node.org Contact: python-info at g-node.org From ralf.gommers at gmail.com Sat Mar 24 23:33:21 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 24 Mar 2018 20:33:21 -0700 Subject: [Numpy-discussion] ANN: SciPy 1.0.1 released Message-ID: On behalf of the SciPy development team I am pleased to announce the availability of Scipy 1.0.1. This is a maintenance release, no new features with respect to 1.0.0. See the release notes below for details. Wheels and sources can be found on PyPI (https://pypi.python.org/pypi/scipy) and on Github (https://github.com/scipy/scipy/releases/tag/v1.0.1). The conda-forge channel will be up to date within a couple of hours. Thanks to everyone who contributed to this release! Cheers, Ralf SciPy 1.0.1 Release Notes ==================== SciPy 1.0.1 is a bug-fix release with no new features compared to 1.0.0. Probably the most important change is a fix for an incompatibility between SciPy 1.0.0 and ``numpy.f2py`` in the NumPy master branch. Authors ======= * Saurabh Agarwal + * Alessandro Pietro Bardelli * Philip DeBoer * Ralf Gommers * Matt Haberland * Eric Larson * Denis Laxalde * Mihai Capot? + * Andrew Nelson * Oleksandr Pavlyk * Ilhan Polat * Anant Prakash + * Pauli Virtanen * Warren Weckesser * @xoviat * Ted Ying + A total of 16 people contributed to this release. People with a "+" by their names contributed a patch for the first time. This list of names is automatically generated, and may not be fully complete. Issues closed for 1.0.1 ----------------------- - `#7493 `__: `ndimage.morphology` functions are broken with numpy 1.13.0 - `#8118 `__: minimize_cobyla broken if `disp=True` passed - `#8142 `__: scipy-v1.0.0 pdist with metric=`minkowski` raises `ValueError:... - `#8173 `__: `scipy.stats.ortho_group` produces all negative determinants... - `#8207 `__: gaussian_filter seg faults on float16 numpy arrays - `#8234 `__: `scipy.optimize.linprog` `interior-point` presolve bug with trivial... - `#8243 `__: Make csgraph importable again via `from scipy.sparse import*` - `#8320 `__: scipy.root segfaults with optimizer 'lm' Pull requests for 1.0.1 ----------------------- - `#8068 `__: BUG: fix numpy deprecation test failures - `#8082 `__: BUG: fix solve_lyapunov import - `#8144 `__: MRG: Fix for cobyla - `#8150 `__: MAINT: resolve UPDATEIFCOPY deprecation errors - `#8156 `__: BUG: missing check on minkowski w kwarg - `#8187 `__: BUG: Sign of elements in random orthogonal 2D matrices in "ortho_group_gen"... - `#8197 `__: CI: uninstall oclint - `#8215 `__: Fixes Numpy datatype compatibility issues - `#8237 `__: BUG: optimize: fix bug when variables fixed by bounds are inconsistent... - `#8248 `__: BUG: declare "gfk" variable before call of terminate() in newton-cg - `#8280 `__: REV: reintroduce csgraph import in scipy.sparse - `#8322 `__: MAINT: prevent scipy.optimize.root segfault closes #8320 - `#8334 `__: TST: stats: don't use exact equality check for hdmedian test - `#8477 `__: BUG: signal/signaltools: fix wrong refcounting in PyArray_OrderFilterND - `#8530 `__: BUG: linalg: Fixed typo in flapack.pyf.src. - `#8566 `__: CI: Temporarily pin Cython version to 0.27.3 - `#8573 `__: Backports for 1.0.1 - `#8581 `__: Fix Cython 0.28 build break of qhull.pyx -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Sun Mar 25 16:14:23 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Sun, 25 Mar 2018 20:14:23 +0000 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: References: Message-ID: To reiterate my comments in the issue - I'm in favor of this. It seems seem especially valuable for identity-less functions (`min`, `max`, `lcm`), and the argument name is consistent with `functools.reduce`. too. The only argument I can see against merging this would be `kwarg`-creep of `reduce`, and I think this has enough use cases to justify that. I'd like to merge in a few days, if no one else has any opinions. Eric On Fri, 16 Mar 2018 at 10:13 Hameer Abbasi wrote: > Hello, everyone. I?ve submitted a PR to add a initializer kwarg to > ufunc.reduce. This is useful in a few cases, e.g., it allows one to supply > a ?default? value for identity-less ufunc reductions, and specify an > initial value for reductions such as sum (other than zero.) > > Please feel free to review or leave feedback, (although I think Eric and > Marten have picked it apart pretty well). > > https://github.com/numpy/numpy/pull/10635 > > Thanks, > > Hameer > Sent from Astro for Mac > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Mon Mar 26 03:16:28 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Mon, 26 Mar 2018 07:16:28 +0000 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: References: Message-ID: This looks like a very logical addition to the reduce interface. It has my support! I would have preferred the more descriptive name "initial_value", but consistency with functools.reduce makes a compelling case for "initializer". On Sun, Mar 25, 2018 at 1:15 PM Eric Wieser wrote: > To reiterate my comments in the issue - I'm in favor of this. > > It seems seem especially valuable for identity-less functions (`min`, > `max`, `lcm`), and the argument name is consistent with `functools.reduce`. > too. > > The only argument I can see against merging this would be `kwarg`-creep of > `reduce`, and I think this has enough use cases to justify that. > > I'd like to merge in a few days, if no one else has any opinions. > > Eric > > On Fri, 16 Mar 2018 at 10:13 Hameer Abbasi > wrote: > >> Hello, everyone. I?ve submitted a PR to add a initializer kwarg to >> ufunc.reduce. This is useful in a few cases, e.g., it allows one to supply >> a ?default? value for identity-less ufunc reductions, and specify an >> initial value for reductions such as sum (other than zero.) >> >> Please feel free to review or leave feedback, (although I think Eric and >> Marten have picked it apart pretty well). >> >> https://github.com/numpy/numpy/pull/10635 >> >> Thanks, >> >> Hameer >> Sent from Astro for Mac >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Mon Mar 26 03:54:10 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Mon, 26 Mar 2018 07:54:10 +0000 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: References: Message-ID: It turns out I mispoke - functools.reduce calls the argument `initial` On Mon, 26 Mar 2018 at 00:17 Stephan Hoyer wrote: > This looks like a very logical addition to the reduce interface. It has my > support! > > I would have preferred the more descriptive name "initial_value", but > consistency with functools.reduce makes a compelling case for "initializer". > > On Sun, Mar 25, 2018 at 1:15 PM Eric Wieser > wrote: > >> To reiterate my comments in the issue - I'm in favor of this. >> >> It seems seem especially valuable for identity-less functions (`min`, >> `max`, `lcm`), and the argument name is consistent with `functools.reduce`. >> too. >> >> The only argument I can see against merging this would be `kwarg`-creep >> of `reduce`, and I think this has enough use cases to justify that. >> >> I'd like to merge in a few days, if no one else has any opinions. >> >> Eric >> >> On Fri, 16 Mar 2018 at 10:13 Hameer Abbasi >> wrote: >> >>> Hello, everyone. I?ve submitted a PR to add a initializer kwarg to >>> ufunc.reduce. This is useful in a few cases, e.g., it allows one to supply >>> a ?default? value for identity-less ufunc reductions, and specify an >>> initial value for reductions such as sum (other than zero.) >>> >>> Please feel free to review or leave feedback, (although I think Eric and >>> Marten have picked it apart pretty well). >>> >>> https://github.com/numpy/numpy/pull/10635 >>> >>> Thanks, >>> >>> Hameer >>> Sent from Astro for Mac >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Mon Mar 26 05:57:14 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Mon, 26 Mar 2018 05:57:14 -0400 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: References: Message-ID: It calls it `initializer` - See https://docs.python.org/3.5/library/functools.html#functools.reduce Sent from Astro for Mac On Mar 26, 2018 at 09:54, Eric Wieser wrote: It turns out I mispoke - functools.reduce calls the argument `initial` On Mon, 26 Mar 2018 at 00:17 Stephan Hoyer wrote: > This looks like a very logical addition to the reduce interface. It has my > support! > > I would have preferred the more descriptive name "initial_value", but > consistency with functools.reduce makes a compelling case for "initializer". > > On Sun, Mar 25, 2018 at 1:15 PM Eric Wieser > wrote: > >> To reiterate my comments in the issue - I'm in favor of this. >> >> It seems seem especially valuable for identity-less functions (`min`, >> `max`, `lcm`), and the argument name is consistent with `functools.reduce`. >> too. >> >> The only argument I can see against merging this would be `kwarg`-creep >> of `reduce`, and I think this has enough use cases to justify that. >> >> I'd like to merge in a few days, if no one else has any opinions. >> >> Eric >> >> On Fri, 16 Mar 2018 at 10:13 Hameer Abbasi >> wrote: >> >>> Hello, everyone. I?ve submitted a PR to add a initializer kwarg to >>> ufunc.reduce. This is useful in a few cases, e.g., it allows one to supply >>> a ?default? value for identity-less ufunc reductions, and specify an >>> initial value for reductions such as sum (other than zero.) >>> >>> Please feel free to review or leave feedback, (although I think Eric and >>> Marten have picked it apart pretty well). >>> >>> https://github.com/numpy/numpy/pull/10635 >>> >>> Thanks, >>> >>> Hameer >>> Sent from Astro for Mac >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Mar 26 06:06:26 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 26 Mar 2018 12:06:26 +0200 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: References: Message-ID: <1522058786.15711.5.camel@sipsolutions.net> Initializer or this sounds fine to me. As an other data point which I think has been mentioned before, `sum` uses start and min/max use default. `start` does not work, unless we also change the code to always use the identity if given (currently that is not the case), in which case it might be nice. However, "start" seems a bit like solving a different issue in any case. Anyway, mostly noise. I really like adding this, the only thing worth discussing a bit is the name :). - Sebastian On Mon, 2018-03-26 at 05:57 -0400, Hameer Abbasi wrote: > It calls it `initializer` - See https://docs.python.org/3.5/library/f > unctools.html#functools.reduce > > Sent from Astro for Mac > > > On Mar 26, 2018 at 09:54, Eric Wieser > > wrote: > > > > It turns out I mispoke - functools.reduce calls the argument > > `initial` > > > > On Mon, 26 Mar 2018 at 00:17 Stephan Hoyer > > wrote: > > > This looks like a very logical addition to the reduce interface. > > > It has my support! > > > > > > I would have preferred the more descriptive name "initial_value", > > > but consistency with functools.reduce makes a compelling case for > > > "initializer". > > > > > > On Sun, Mar 25, 2018 at 1:15 PM Eric Wieser > > ail.com> wrote: > > > > To reiterate my comments in the issue - I'm in favor of this. > > > > > > > > It seems seem especially valuable for identity-less functions > > > > (`min`, `max`, `lcm`), and the argument name is consistent with > > > > `functools.reduce`. too. > > > > > > > > The only argument I can see against merging this would be > > > > `kwarg`-creep of `reduce`, and I think this has enough use > > > > cases to justify that. > > > > > > > > I'd like to merge in a few days, if no one else has any > > > > opinions. > > > > > > > > Eric > > > > > > > > On Fri, 16 Mar 2018 at 10:13 Hameer Abbasi > > > il.com> wrote: > > > > > Hello, everyone. I?ve submitted a PR to add a initializer > > > > > kwarg to ufunc.reduce. This is useful in a few cases, e.g., > > > > > it allows one to supply a ?default? value for identity-less > > > > > ufunc reductions, and specify an initial value for reductions > > > > > such as sum (other than zero.) > > > > > > > > > > Please feel free to review or leave feedback, (although I > > > > > think Eric and Marten have picked it apart pretty well). > > > > > > > > > > https://github.com/numpy/numpy/pull/10635 > > > > > > > > > > Thanks, > > > > > > > > > > Hameer > > > > > Sent from Astro for Mac > > > > > > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From einstein.edison at gmail.com Mon Mar 26 08:20:52 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Mon, 26 Mar 2018 08:20:52 -0400 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: <1522058786.15711.5.camel@sipsolutions.net> References: <1522058786.15711.5.camel@sipsolutions.net> Message-ID: Actually, the behavior right now isn?t that of `default` but that of `initializer` or `start`. This was discussed further down in the PR but to reiterate: `np.sum([10], initializer=5)` becomes `15`. Also, `np.min([5], initializer=0)` becomes `0`, so it isn?t really the default value, it?s the initial value among which the reduction is performed. This was the reason to call it initializer in the first place. I like `initial` and `initial_value` as well, and `start` also makes sense but isn?t descriptive enough. Hameer Sent from Astro for Mac On Mar 26, 2018 at 12:06, Sebastian Berg wrote: Initializer or this sounds fine to me. As an other data point which I think has been mentioned before, `sum` uses start and min/max use default. `start` does not work, unless we also change the code to always use the identity if given (currently that is not the case), in which case it might be nice. However, "start" seems a bit like solving a different issue in any case. Anyway, mostly noise. I really like adding this, the only thing worth discussing a bit is the name :). - Sebastian On Mon, 2018-03-26 at 05:57 -0400, Hameer Abbasi wrote: It calls it `initializer` - See https://docs.python.org/3.5/library/f unctools.html#functools.reduce Sent from Astro for Mac On Mar 26, 2018 at 09:54, Eric Wieser wrote: It turns out I mispoke - functools.reduce calls the argument `initial` On Mon, 26 Mar 2018 at 00:17 Stephan Hoyer wrote: This looks like a very logical addition to the reduce interface. It has my support! I would have preferred the more descriptive name "initial_value", but consistency with functools.reduce makes a compelling case for "initializer". On Sun, Mar 25, 2018 at 1:15 PM Eric Wieser wrote: To reiterate my comments in the issue - I'm in favor of this. It seems seem especially valuable for identity-less functions (`min`, `max`, `lcm`), and the argument name is consistent with `functools.reduce`. too. The only argument I can see against merging this would be `kwarg`-creep of `reduce`, and I think this has enough use cases to justify that. I'd like to merge in a few days, if no one else has any opinions. Eric On Fri, 16 Mar 2018 at 10:13 Hameer Abbasi wrote: Hello, everyone. I?ve submitted a PR to add a initializer kwarg to ufunc.reduce. This is useful in a few cases, e.g., it allows one to supply a ?default? value for identity-less ufunc reductions, and specify an initial value for reductions such as sum (other than zero.) Please feel free to review or leave feedback, (although I think Eric and Marten have picked it apart pretty well). https://github.com/numpy/numpy/pull/10635 Thanks, Hameer Sent from Astro for Mac _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Mon Mar 26 11:06:01 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Mon, 26 Mar 2018 15:06:01 +0000 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: References: Message-ID: Huh, looks like it has different names in different places. `help(functools.reduce)` shows "initial". On Mon, Mar 26, 2018, 02:57 Hameer Abbasi wrote: > It calls it `initializer` - See > https://docs.python.org/3.5/library/functools.html#functools.reduce > > > Sent from Astro for Mac > > On Mar 26, 2018 at 09:54, Eric Wieser wrote: > > > It turns out I mispoke - functools.reduce calls the argument `initial` > > On Mon, 26 Mar 2018 at 00:17 Stephan Hoyer wrote: > >> This looks like a very logical addition to the reduce interface. It has >> my support! >> >> I would have preferred the more descriptive name "initial_value", but >> consistency with functools.reduce makes a compelling case for "initializer". >> >> On Sun, Mar 25, 2018 at 1:15 PM Eric Wieser >> wrote: >> >>> To reiterate my comments in the issue - I'm in favor of this. >>> >>> It seems seem especially valuable for identity-less functions (`min`, >>> `max`, `lcm`), and the argument name is consistent with `functools.reduce`. >>> too. >>> >>> The only argument I can see against merging this would be `kwarg`-creep >>> of `reduce`, and I think this has enough use cases to justify that. >>> >>> I'd like to merge in a few days, if no one else has any opinions. >>> >>> Eric >>> >>> On Fri, 16 Mar 2018 at 10:13 Hameer Abbasi >>> wrote: >>> >>>> Hello, everyone. I?ve submitted a PR to add a initializer kwarg to >>>> ufunc.reduce. This is useful in a few cases, e.g., it allows one to supply >>>> a ?default? value for identity-less ufunc reductions, and specify an >>>> initial value for reductions such as sum (other than zero.) >>>> >>>> Please feel free to review or leave feedback, (although I think Eric >>>> and Marten have picked it apart pretty well). >>>> >>>> https://github.com/numpy/numpy/pull/10635 >>>> >>>> Thanks, >>>> >>>> Hameer >>>> Sent from Astro for Mac >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Mar 26 11:16:34 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 26 Mar 2018 17:16:34 +0200 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: References: <1522058786.15711.5.camel@sipsolutions.net> Message-ID: <1522077394.4888.10.camel@sipsolutions.net> OK, the new documentation is actually clear: initializer : scalar, optional The value with which to start the reduction. Defaults to the `~numpy.ufunc.identity` of the ufunc. If ``None`` is given, the first element of the reduction is used, and an error is thrown if the reduction is empty. If ``a.dtype`` is ``object``, then the initializer is _only_ used if reduction is empty. I would actually like to say that I do not like the object special case much (and it is probably the reason why I was confused), nor am I quite sure this is what helps a lot? Logically, I would argue there are two things: 1. initializer/start (always used) 2. default (oly used for empty reductions) For example, I might like to give `np.nan` as the default for some empty reductions, this will not work. I understand that this is a minimal invasive PR and I am not sure I find the solution bad enough to really dislike it, but what do other think? My first expectation was the default behaviour (in all cases, not just object case) for some reason. To be honest, for now I just wonder a bit: How hard would it be to do both, or is that too annoying? It would at least get rid of that annoying thing with object ufuncs (which currently have a default, but not really an identity/initializer). Best, Sebastian On Mon, 2018-03-26 at 08:20 -0400, Hameer Abbasi wrote: > Actually, the behavior right now isn?t that of `default` but that of > `initializer` or `start`. > > This was discussed further down in the PR but to reiterate: > `np.sum([10], initializer=5)` becomes `15`. > > Also, `np.min([5], initializer=0)` becomes `0`, so it isn?t really > the default value, it?s the initial value among which the reduction > is performed. > > This was the reason to call it initializer in the first place. I like > `initial` and `initial_value` as well, and `start` also makes sense > but isn?t descriptive enough. > > Hameer > Sent from Astro for Mac > > > On Mar 26, 2018 at 12:06, Sebastian Berg > t> wrote: > > > > Initializer or this sounds fine to me. As an other data point which > > I > > think has been mentioned before, `sum` uses start and min/max use > > default. `start` does not work, unless we also change the code to > > always use the identity if given (currently that is not the case), > > in > > which case it might be nice. However, "start" seems a bit like > > solving > > a different issue in any case. > > > > Anyway, mostly noise. I really like adding this, the only thing > > worth > > discussing a bit is the name :). > > > > - Sebastian > > > > > > On Mon, 2018-03-26 at 05:57 -0400, Hameer Abbasi wrote: > > > It calls it `initializer` - See https://docs.python.org/3.5/libra > > > ry/f > > > unctools.html#functools.reduce > > > > > > Sent from Astro for Mac > > > > > > > On Mar 26, 2018 at 09:54, Eric Wieser > > > com> > > > > wrote: > > > > > > > > It turns out I mispoke - functools.reduce calls the argument > > > > `initial` > > > > > > > > On Mon, 26 Mar 2018 at 00:17 Stephan Hoyer > > > > wrote: > > > > > This looks like a very logical addition to the reduce > > > > > interface. > > > > > It has my support! > > > > > > > > > > I would have preferred the more descriptive name > > > > > "initial_value", > > > > > but consistency with functools.reduce makes a compelling case > > > > > for > > > > > "initializer". > > > > > > > > > > On Sun, Mar 25, 2018 at 1:15 PM Eric Wieser > > > > y at gm > > > > > ail.com> wrote: > > > > > > To reiterate my comments in the issue - I'm in favor of > > > > > > this. > > > > > > > > > > > > It seems seem especially valuable for identity-less > > > > > > functions > > > > > > (`min`, `max`, `lcm`), and the argument name is consistent > > > > > > with > > > > > > `functools.reduce`. too. > > > > > > > > > > > > The only argument I can see against merging this would be > > > > > > `kwarg`-creep of `reduce`, and I think this has enough use > > > > > > cases to justify that. > > > > > > > > > > > > I'd like to merge in a few days, if no one else has any > > > > > > opinions. > > > > > > > > > > > > Eric > > > > > > > > > > > > On Fri, 16 Mar 2018 at 10:13 Hameer Abbasi > > > > > @gma > > > > > > il.com> wrote: > > > > > > > Hello, everyone. I?ve submitted a PR to add a initializer > > > > > > > kwarg to ufunc.reduce. This is useful in a few cases, > > > > > > > e.g., > > > > > > > it allows one to supply a ?default? value for identity- > > > > > > > less > > > > > > > ufunc reductions, and specify an initial value for > > > > > > > reductions > > > > > > > such as sum (other than zero.) > > > > > > > > > > > > > > Please feel free to review or leave feedback, (although I > > > > > > > think Eric and Marten have picked it apart pretty well). > > > > > > > > > > > > > > https://github.com/numpy/numpy/pull/10635 > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Hameer > > > > > > > Sent from Astro for Mac > > > > > > > > > > > > > > _______________________________________________ > > > > > > > NumPy-Discussion mailing list > > > > > > > NumPy-Discussion at python.org > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > _______________________________________________ > > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at python.org > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From ben.v.root at gmail.com Mon Mar 26 11:35:32 2018 From: ben.v.root at gmail.com (Benjamin Root) Date: Mon, 26 Mar 2018 11:35:32 -0400 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: <1522077394.4888.10.camel@sipsolutions.net> References: <1522058786.15711.5.camel@sipsolutions.net> <1522077394.4888.10.camel@sipsolutions.net> Message-ID: Hmm, this is neat. I imagine it would finally give some people a choice on what np.nansum([np.nan]) should return? It caused a huge hullabeloo a few years ago when we changed it from returning NaN to returning zero. Ben Root On Mon, Mar 26, 2018 at 11:16 AM, Sebastian Berg wrote: > OK, the new documentation is actually clear: > > initializer : scalar, optional > The value with which to start the reduction. > Defaults to the `~numpy.ufunc.identity` of the ufunc. > If ``None`` is given, the first element of the reduction is used, > and an error is thrown if the reduction is empty. If ``a.dtype`` is > ``object``, then the initializer is _only_ used if reduction is > empty. > > I would actually like to say that I do not like the object special case > much (and it is probably the reason why I was confused), nor am I quite > sure this is what helps a lot? Logically, I would argue there are two > things: > > 1. initializer/start (always used) > 2. default (oly used for empty reductions) > > For example, I might like to give `np.nan` as the default for some > empty reductions, this will not work. I understand that this is a > minimal invasive PR and I am not sure I find the solution bad enough to > really dislike it, but what do other think? My first expectation was > the default behaviour (in all cases, not just object case) for some > reason. > > To be honest, for now I just wonder a bit: How hard would it be to do > both, or is that too annoying? It would at least get rid of that > annoying thing with object ufuncs (which currently have a default, but > not really an identity/initializer). > > Best, > > Sebastian > > > On Mon, 2018-03-26 at 08:20 -0400, Hameer Abbasi wrote: > > Actually, the behavior right now isn?t that of `default` but that of > > `initializer` or `start`. > > > > This was discussed further down in the PR but to reiterate: > > `np.sum([10], initializer=5)` becomes `15`. > > > > Also, `np.min([5], initializer=0)` becomes `0`, so it isn?t really > > the default value, it?s the initial value among which the reduction > > is performed. > > > > This was the reason to call it initializer in the first place. I like > > `initial` and `initial_value` as well, and `start` also makes sense > > but isn?t descriptive enough. > > > > Hameer > > Sent from Astro for Mac > > > > > On Mar 26, 2018 at 12:06, Sebastian Berg > > t> wrote: > > > > > > Initializer or this sounds fine to me. As an other data point which > > > I > > > think has been mentioned before, `sum` uses start and min/max use > > > default. `start` does not work, unless we also change the code to > > > always use the identity if given (currently that is not the case), > > > in > > > which case it might be nice. However, "start" seems a bit like > > > solving > > > a different issue in any case. > > > > > > Anyway, mostly noise. I really like adding this, the only thing > > > worth > > > discussing a bit is the name :). > > > > > > - Sebastian > > > > > > > > > On Mon, 2018-03-26 at 05:57 -0400, Hameer Abbasi wrote: > > > > It calls it `initializer` - See https://docs.python.org/3.5/libra > > > > ry/f > > > > unctools.html#functools.reduce > > > > > > > > Sent from Astro for Mac > > > > > > > > > On Mar 26, 2018 at 09:54, Eric Wieser > > > > com> > > > > > wrote: > > > > > > > > > > It turns out I mispoke - functools.reduce calls the argument > > > > > `initial` > > > > > > > > > > On Mon, 26 Mar 2018 at 00:17 Stephan Hoyer > > > > > wrote: > > > > > > This looks like a very logical addition to the reduce > > > > > > interface. > > > > > > It has my support! > > > > > > > > > > > > I would have preferred the more descriptive name > > > > > > "initial_value", > > > > > > but consistency with functools.reduce makes a compelling case > > > > > > for > > > > > > "initializer". > > > > > > > > > > > > On Sun, Mar 25, 2018 at 1:15 PM Eric Wieser > > > > > y at gm > > > > > > ail.com> wrote: > > > > > > > To reiterate my comments in the issue - I'm in favor of > > > > > > > this. > > > > > > > > > > > > > > It seems seem especially valuable for identity-less > > > > > > > functions > > > > > > > (`min`, `max`, `lcm`), and the argument name is consistent > > > > > > > with > > > > > > > `functools.reduce`. too. > > > > > > > > > > > > > > The only argument I can see against merging this would be > > > > > > > `kwarg`-creep of `reduce`, and I think this has enough use > > > > > > > cases to justify that. > > > > > > > > > > > > > > I'd like to merge in a few days, if no one else has any > > > > > > > opinions. > > > > > > > > > > > > > > Eric > > > > > > > > > > > > > > On Fri, 16 Mar 2018 at 10:13 Hameer Abbasi > > > > > > @gma > > > > > > > il.com> wrote: > > > > > > > > Hello, everyone. I?ve submitted a PR to add a initializer > > > > > > > > kwarg to ufunc.reduce. This is useful in a few cases, > > > > > > > > e.g., > > > > > > > > it allows one to supply a ?default? value for identity- > > > > > > > > less > > > > > > > > ufunc reductions, and specify an initial value for > > > > > > > > reductions > > > > > > > > such as sum (other than zero.) > > > > > > > > > > > > > > > > Please feel free to review or leave feedback, (although I > > > > > > > > think Eric and Marten have picked it apart pretty well). > > > > > > > > > > > > > > > > https://github.com/numpy/numpy/pull/10635 > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Hameer > > > > > > > > Sent from Astro for Mac > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > NumPy-Discussion mailing list > > > > > > > > NumPy-Discussion at python.org > > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > > > _______________________________________________ > > > > > > > NumPy-Discussion mailing list > > > > > > > NumPy-Discussion at python.org > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > _______________________________________________ > > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at python.org > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Mon Mar 26 11:39:13 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Mon, 26 Mar 2018 11:39:13 -0400 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: References: <1522058786.15711.5.camel@sipsolutions.net> <1522077394.4888.10.camel@sipsolutions.net> Message-ID: That is the idea, but NaN functions are in a separate branch for another PR to be discussed later. You can see it on my fork, if you're interested. On 26/03/2018 at 17:35, Benjamin wrote: Hmm, this is neat. I imagine it would finally give some people a choice on what np.nansum([np.nan]) should return? It caused a huge hullabeloo a few years ago when we changed it from returning NaN to returning zero. Ben Root On Mon, Mar 26, 2018 at 11:16 AM, Sebastian Berg wrote: OK, the new documentation is actually clear: initializer : scalar, optional The value with which to start the reduction. Defaults to the `~numpy.ufunc.identity` of the ufunc. If ``None`` is given, the first element of the reduction is used, and an error is thrown if the reduction is empty. If ``a.dtype`` is ``object``, then the initializer is _only_ used if reduction is empty. I would actually like to say that I do not like the object special case much (and it is probably the reason why I was confused), nor am I quite sure this is what helps a lot? Logically, I would argue there are two things: 1. initializer/start (always used) 2. default (oly used for empty reductions) For example, I might like to give `np.nan` as the default for some empty reductions, this will not work. I understand that this is a minimal invasive PR and I am not sure I find the solution bad enough to really dislike it, but what do other think? My first expectation was the default behaviour (in all cases, not just object case) for some reason. To be honest, for now I just wonder a bit: How hard would it be to do both, or is that too annoying? It would at least get rid of that annoying thing with object ufuncs (which currently have a default, but not really an identity/initializer). Best, Sebastian On Mon, 2018-03-26 at 08:20 -0400, Hameer Abbasi wrote: > Actually, the behavior right now isn?t that of `default` but that of > `initializer` or `start`. > > This was discussed further down in the PR but to reiterate: > `np.sum([10], initializer=5)` becomes `15`. > > Also, `np.min([5], initializer=0)` becomes `0`, so it isn?t really > the default value, it?s the initial value among which the reduction > is performed. > > This was the reason to call it initializer in the first place. I like > `initial` and `initial_value` as well, and `start` also makes sense > but isn?t descriptive enough. > > Hameer > Sent from Astro for Mac > > > On Mar 26, 2018 at 12:06, Sebastian Berg > t> wrote: > > > > Initializer or this sounds fine to me. As an other data point which > > I > > think has been mentioned before, `sum` uses start and min/max use > > default. `start` does not work, unless we also change the code to > > always use the identity if given (currently that is not the case), > > in > > which case it might be nice. However, "start" seems a bit like > > solving > > a different issue in any case. > > > > Anyway, mostly noise. I really like adding this, the only thing > > worth > > discussing a bit is the name :). > > > > - Sebastian > > > > > > On Mon, 2018-03-26 at 05:57 -0400, Hameer Abbasi wrote: > > > It calls it `initializer` - See https://docs.python.org/3.5/libra > > > ry/f > > > unctools.html#functools.reduce > > > > > > Sent from Astro for Mac > > > > > > > On Mar 26, 2018 at 09:54, Eric Wieser > > > com> > > > > wrote: > > > > > > > > It turns out I mispoke - functools.reduce calls the argument > > > > `initial` > > > > > > > > On Mon, 26 Mar 2018 at 00:17 Stephan Hoyer > > > > wrote: > > > > > This looks like a very logical addition to the reduce > > > > > interface. > > > > > It has my support! > > > > > > > > > > I would have preferred the more descriptive name > > > > > "initial_value", > > > > > but consistency with functools.reduce makes a compelling case > > > > > for > > > > > "initializer". > > > > > > > > > > On Sun, Mar 25, 2018 at 1:15 PM Eric Wieser > > > > y at gm > > > > > ail.com> wrote: > > > > > > To reiterate my comments in the issue - I'm in favor of > > > > > > this. > > > > > > > > > > > > It seems seem especially valuable for identity-less > > > > > > functions > > > > > > (`min`, `max`, `lcm`), and the argument name is consistent > > > > > > with > > > > > > `functools.reduce`. too. > > > > > > > > > > > > The only argument I can see against merging this would be > > > > > > `kwarg`-creep of `reduce`, and I think this has enough use > > > > > > cases to justify that. > > > > > > > > > > > > I'd like to merge in a few days, if no one else has any > > > > > > opinions. > > > > > > > > > > > > Eric > > > > > > > > > > > > On Fri, 16 Mar 2018 at 10:13 Hameer Abbasi > > > > > @gma > > > > > > il.com> wrote: > > > > > > > Hello, everyone. I?ve submitted a PR to add a initializer > > > > > > > kwarg to ufunc.reduce. This is useful in a few cases, > > > > > > > e.g., > > > > > > > it allows one to supply a ?default? value for identity- > > > > > > > less > > > > > > > ufunc reductions, and specify an initial value for > > > > > > > reductions > > > > > > > such as sum (other than zero.) > > > > > > > > > > > > > > Please feel free to review or leave feedback, (although I > > > > > > > think Eric and Marten have picked it apart pretty well). > > > > > > > > > > > > > > https://github.com/numpy/numpy/pull/10635 > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Hameer > > > > > > > Sent from Astro for Mac > > > > > > > > > > > > > > _______________________________________________ > > > > > > > NumPy-Discussion mailing list > > > > > > > NumPy-Discussion at python.org > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > _______________________________________________ > > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at python.org > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Mon Mar 26 11:45:56 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 26 Mar 2018 17:45:56 +0200 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: References: <1522058786.15711.5.camel@sipsolutions.net> <1522077394.4888.10.camel@sipsolutions.net> Message-ID: <1522079156.4888.12.camel@sipsolutions.net> On Mon, 2018-03-26 at 11:39 -0400, Hameer Abbasi wrote: > That is the idea, but NaN functions are in a separate branch for > another PR to be discussed later. You can see it on my fork, if > you're > interested. Except that as far as I understand I am not sure it will help much with it, since it is not a default, but an initializer. Initializing to NaN would just make all results NaN. - Sebastian > On 26/03/2018 at 17:35, Benjamin wrote: Hmm, this is neat. > I imagine it would finally give some people a choice on what > np.nansum([np.nan]) should return? It caused a huge hullabeloo a few > years ago when we changed it from returning NaN to returning zero. > Ben > Root On Mon, Mar 26, 2018 at 11:16 AM, Sebastian Berg > wrote: OK, the new documentation is > actually clear: initializer : scalar, optional The value with which > to > start the reduction. Defaults to the `~numpy.ufunc.identity` of the > ufunc. If ``None`` is given, the first element of the reduction is > used, and an error is thrown if the reduction is empty. If > ``a.dtype`` > is ``object``, then the initializer is _only_ used if reduction is > empty. I would actually like to say that I do not like the object > special case much (and it is probably the reason why I was confused), > nor am I quite sure this is what helps a lot? Logically, I would > argue > there are two things: 1. initializer/start (always used) 2. default > (oly used for empty reductions) For example, I might like to give > `np.nan` as the default for some empty reductions, this will not > work. > I understand that this is a minimal invasive PR and I am not sure I > find the solution bad enough to really dislike it, but what do other > think? My first expectation was the default behaviour (in all cases, > not just object case) for some reason. To be honest, for now I just > wonder a bit: How hard would it be to do both, or is that too > annoying? It would at least get rid of that annoying thing with > object > ufuncs (which currently have a default, but not really an > identity/initializer). Best, Sebastian On Mon, 2018-03-26 at 08:20 > -0400, Hameer Abbasi wrote: > Actually, the behavior right now isn?t > that of `default` but that of > `initializer` or `start`. > > This > was > discussed further down in the PR but to reiterate: > `np.sum([10], > initializer=5)` becomes `15`. > > Also, `np.min([5], initializer=0)` > becomes `0`, so it isn?t really > the default value, it?s the initial > value among which the reduction > is performed. > > This was the > reason to call it initializer in the first place. I like > `initial` > and `initial_value` as well, and `start` also makes sense > but isn?t > descriptive enough. > > Hameer > Sent from Astro for Mac > > > On Mar > 26, 2018 at 12:06, Sebastian Berg > t> > wrote: > > > > Initializer or this sounds fine to me. As an other > data > point which > > I > > think has been mentioned before, `sum` uses > start and min/max use > > default. `start` does not work, unless we > also change the code to > > always use the identity if given > (currently that is not the case), > > in > > which case it might be > nice. However, "start" seems a bit like > > solving > > a different > issue in any case. > > > > Anyway, mostly noise. I really like adding > this, the only thing > > worth > > discussing a bit is the name :). > > > > > - Sebastian > > > > > > On Mon, 2018-03-26 at 05:57 -0400, > > Hameer Abbasi wrote: > > > It calls it `initializer` - See > https://docs.python.org/3.5/libra > > > ry/f > > > > unctools.html#functools.reduce > > > > > > Sent from Astro for Mac > > > > > > > > > On Mar 26, 2018 at 09:54, Eric Wieser > > > > > com> > > > > wrote: > > > > > > > > > It turns out I mispoke - functools.reduce calls the argument > > > > > `initial` > > > > > > > > On Mon, 26 Mar 2018 at 00:17 Stephan Hoyer > > > > > wrote: > > > > > This looks like a very > logical addition to the reduce > > > > > interface. > > > > > It has > my support! > > > > > > > > > > I would have preferred the more > descriptive name > > > > > "initial_value", > > > > > but consistency > with functools.reduce makes a compelling case > > > > > for > > > > > > "initializer". > > > > > > > > > > On Sun, Mar 25, 2018 at 1:15 PM > Eric Wieser > > > > y at gm > > > > > ail.com> > wrote: > > > > > > > To reiterate my comments in the issue - I'm in favor of > > > > > > > > > > > > > this. > > > > > > > > > > > > It seems seem especially > > valuable for identity-less > > > > > > functions > > > > > > (`min`, > `max`, `lcm`), and the argument name is consistent > > > > > > with > > > > > > > `functools.reduce`. too. > > > > > > > > > > > > The only > > argument I can see against merging this would be > > > > > > > `kwarg`-creep of `reduce`, and I think this has enough use > > > > > > > > cases to justify that. > > > > > > > > > > > > I'd like to merge in a > few days, if no one else has any > > > > > > opinions. > > > > > > > > > > > > > > Eric > > > > > > > > > > > > On Fri, 16 Mar 2018 at 10:13 > > Hameer Abbasi > > > > > @gma > > > > > > il.com> > wrote: > > > > > > > Hello, everyone. I?ve submitted a PR to add a > initializer > > > > > > > kwarg to ufunc.reduce. This is useful in a > few cases, > > > > > > > e.g., > > > > > > > it allows one to supply > a > ?default? value for identity- > > > > > > > less > > > > > > > ufunc > reductions, and specify an initial value for > > > > > > > reductions > > > > > > > > such as sum (other than zero.) > > > > > > > > > > > > > > > > > > > > > > > > > Please feel free to review or leave feedback, (although I > > > > > > > > think Eric and Marten have picked it apart pretty well). > > > > > > > > > > > > > > > > > > https://github.com/numpy/numpy/pull/10635 > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Hameer > > > > > > > > > > > > > > > > > > > > > Sent from Astro for Mac > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > NumPy-Discussion mailing list > > > > > > > > NumPy-Discussion at python.org > > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > NumPy-Discussion mailing list > > > > > > > > NumPy-Discussion at python.org > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > _______________________________________________ > > > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > _______________________________________________ > > > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > > NumPy-Discussion > mailing list > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion > mailing list > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion > mailing list NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From einstein.edison at gmail.com Mon Mar 26 11:53:01 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Mon, 26 Mar 2018 11:53:01 -0400 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: <1522079156.4888.12.camel@sipsolutions.net> References: <1522077394.4888.10.camel@sipsolutions.net> <1522079156.4888.12.camel@sipsolutions.net> Message-ID: It'll need to be thought out for object arrays and subclasses. But for Regular numeric stuff, Numpy uses fmin and this would have the desired effect. On 26/03/2018 at 17:45, Sebastian wrote: On Mon, 2018-03-26 at 11:39 -0400, Hameer Abbasi wrote: That is the idea, but NaN functions are in a separate branch for another PR to be discussed later. You can see it on my fork, if you're interested. Except that as far as I understand I am not sure it will help much with it, since it is not a default, but an initializer. Initializing to NaN would just make all results NaN. - Sebastian On 26/03/2018 at 17:35, Benjamin wrote: Hmm, this is neat. I imagine it would finally give some people a choice on what np.nansum([np.nan]) should return? It caused a huge hullabeloo a few years ago when we changed it from returning NaN to returning zero. Ben Root On Mon, Mar 26, 2018 at 11:16 AM, Sebastian Berg wrote: OK, the new documentation is actually clear: initializer : scalar, optional The value with which to start the reduction. Defaults to the `~numpy.ufunc.identity` of the ufunc. If ``None`` is given, the first element of the reduction is used, and an error is thrown if the reduction is empty. If ``a.dtype`` is ``object``, then the initializer is _only_ used if reduction is empty. I would actually like to say that I do not like the object special case much (and it is probably the reason why I was confused), nor am I quite sure this is what helps a lot? Logically, I would argue there are two things: 1. initializer/start (always used) 2. default (oly used for empty reductions) For example, I might like to give `np.nan` as the default for some empty reductions, this will not work. I understand that this is a minimal invasive PR and I am not sure I find the solution bad enough to really dislike it, but what do other think? My first expectation was the default behaviour (in all cases, not just object case) for some reason. To be honest, for now I just wonder a bit: How hard would it be to do both, or is that too annoying? It would at least get rid of that annoying thing with object ufuncs (which currently have a default, but not really an identity/initializer). Best, Sebastian On Mon, 2018-03-26 at 08:20 -0400, Hameer Abbasi wrote: > Actually, the behavior right now isn?t that of `default` but that of > `initializer` or `start`. > > This was discussed further down in the PR but to reiterate: > `np.sum([10], initializer=5)` becomes `15`. > > Also, `np.min([5], initializer=0)` becomes `0`, so it isn?t really > the default value, it?s the initial value among which the reduction > is performed. > > This was the reason to call it initializer in the first place. I like > `initial` and `initial_value` as well, and `start` also makes sense > but isn?t descriptive enough. > > Hameer > Sent from Astro for Mac > > > On Mar 26, 2018 at 12:06, Sebastian Berg > t> wrote: > > > > Initializer or this sounds fine to me. As an other data point which > > I > > think has been mentioned before, `sum` uses start and min/max use > > default. `start` does not work, unless we also change the code to > > always use the identity if given (currently that is not the case), > > in > > which case it might be nice. However, "start" seems a bit like > > solving > > a different issue in any case. > > > > Anyway, mostly noise. I really like adding this, the only thing > > worth > > discussing a bit is the name :). > - Sebastian > > > > > > On Mon, 2018-03-26 at 05:57 -0400, Hameer Abbasi wrote: > > > It calls it `initializer` - See https://docs.python.org/3.5/libra > > > ry/f > > > unctools.html#functools.reduce > > > > > > Sent from Astro for Mac > On Mar 26, 2018 at 09:54, Eric Wieser > > > com> > > > > wrote: > > > > > > > > It turns out I mispoke - functools.reduce calls the argument > > > > `initial` > > > > > > > > On Mon, 26 Mar 2018 at 00:17 Stephan Hoyer > > > > wrote: > > > > > This looks like a very logical addition to the reduce > > > > > interface. > > > > > It has my support! > > > > > > > > > > I would have preferred the more descriptive name > > > > > "initial_value", > > > > > but consistency with functools.reduce makes a compelling case > > > > > for > > > > > "initializer". > > > > > > > > > > On Sun, Mar 25, 2018 at 1:15 PM Eric Wieser > > > > y at gm > > > > > ail.com> wrote: To reiterate my comments in the issue - I'm in favor of > this. > > > > > > > > > > > > It seems seem especially valuable for identity-less > > > > > > functions > > > > > > (`min`, `max`, `lcm`), and the argument name is consistent > > > > > > with > `functools.reduce`. too. > > > > > > > > > > > > The only argument I can see against merging this would be > > > > > > `kwarg`-creep of `reduce`, and I think this has enough use > > > > > cases to justify that. > > > > > > > > > > > > I'd like to merge in a few days, if no one else has any > > > > > > opinions. > > > > > > > Eric > > > > > > > > > > > > On Fri, 16 Mar 2018 at 10:13 Hameer Abbasi > > > > > @gma > > > > > > il.com> wrote: > > > > > > > Hello, everyone. I?ve submitted a PR to add a initializer > > > > > > > kwarg to ufunc.reduce. This is useful in a few cases, > > > > > > > e.g., > > > > > > > it allows one to supply a ?default? value for identity- > > > > > > > less > > > > > > > ufunc reductions, and specify an initial value for > > > > > > > reductions such as sum (other than zero.) > > > > > > > > > > > > Please feel free to review or leave feedback, (although I > > > > > think Eric and Marten have picked it apart pretty well). > > > > https://github.com/numpy/numpy/pull/10635 > > > > > Thanks, > > > > > > > > > > > > > > Hameer > > > > Sent from Astro for Mac > > > > > > > > > > > > > > _______________________________________________ > > > > > > > NumPy-Discussion mailing list > > > > > > > NumPy-Discussion at python.org > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at python.org > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion From allanhaldane at gmail.com Mon Mar 26 12:07:59 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Mon, 26 Mar 2018 12:07:59 -0400 Subject: [Numpy-discussion] nditer as a context manager (reformatted?) In-Reply-To: <1522036d-a561-ba14-8dc3-48e329266827@gmail.com> References: <1522036d-a561-ba14-8dc3-48e329266827@gmail.com> Message-ID: <99f982c1-146f-44f0-8695-07bc197746b5@gmail.com> Given the lack of objections, we are probably going forward with this change to nditer. Anyone who uses nditers may have to update their code slightly if they want to avoid deprecation warnings, but otherwise old nditer code should work for a long time from now. Allan On 03/22/2018 01:43 PM, Matti Picus wrote: > Hello all, PR #9998 (https://github.com/numpy/numpy/pull/9998/) proposes > an update to the nditer API, both C and python. The issue > (https://github.com/numpy/numpy/issues/9714) is that sometimes nditer > uses temp arrays via the "writeback" mechanism, the data is copied back > to the original arrays "when finished". However "when finished" was > implemented using nditer deallocation. > > This mechanism is implicit and unclear, and relies on refcount semantics > which do not work on non-refcount python implementations like PyPY. It > also leads to lines of code like "iter=None" to trigger the writeback > resolution. > > On the c-api level the agreed upon solution is to add a new > `NpyIter_Close` function in C, this is to be called before > `NpyIter_Dealloc`. > > The reviewers and I would like to ask the wider NumPy community for > opinions about the proposed python-level solution: turning the python > nditer object into a context manager. This way "writeback" occurs at > context manager exit via a call to `NpyIter_Close`, instead of like > before when it occurred at `nditer` deallocation (which might not happen > until much later in Pypy, and could be delayed by GC even in Cpython). > > Another solution that was rejected > (https://github.com/numpy/numpy/pull/10184) was to add an nditer.close() > python-level function that would not require a context manager It was > felt that this is more error-prone since it requires users to add the > line for each iterator created. > > The back-compat issues are that: > > 1. We are adding a new function to the numpy API, `NpyIter_Close` > (pretty harmless) > > 2. We want people to update their C code using nditer, to call > `NpyIter_Close` before ?they call `NpyIter_Dealloc` and will start > raising a deprecation warning if misuse is detected > > 3. We want people to update their Python code to use the nditer object > as a context manager, and will warn if they do not. > > We tried to minimize back-compat issues, in the sense that old code > (which didn't work in PyPy anyway) will still work, although it will now > emit deprecation warnings. In the future we also plan to raise an error > if an nditer is used in Python without a context manager (when it should > have been). For C code, we plan to leave the deprecation warning in > place probably forever, as we can only detect the deprecated behavior in > the deallocator, where exceptions cannot be raised. > > Anybody who uses nditers should take a look and please reply if it seems > the change will be too painful. > > For more details, please see the updated docs in that PR > > Matti (and reviewers) > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Mon Mar 26 12:48:47 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 26 Mar 2018 18:48:47 +0200 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: References: <1522077394.4888.10.camel@sipsolutions.net> <1522079156.4888.12.camel@sipsolutions.net> Message-ID: <1522082927.4888.24.camel@sipsolutions.net> On Mon, 2018-03-26 at 11:53 -0400, Hameer Abbasi wrote: > It'll need to be thought out for object arrays and subclasses. But > for > Regular numeric stuff, Numpy uses fmin and this would have the > desired > effect. I do not want to block this, but I would like a clearer opinion about this issue, `np.nansum` as Benjamin noted would require something like: np.nansum([np.nan], default=np.nan) because np.sum([1], initializer=np.nan) np.nansum([1], initializer=np.nan) would both give NaN if the logic is the same as the current `np.sum`. And yes, I guess for fmin/fmax NaN happens to work. And then there are many nonsense reduces which could make sense with `initializer`. Now nansum is not implemented in a way that could make use of the new kwarg anyway, so maybe it does not matter in some sense. We can in principle use `default` in nansum and at some point possibly add `default` to the normal ufuncs. If we argue like that, the only annoying thing is the `object` dtype which confuses the two use cases currently. This confusion IMO is not harmless, because I might want to use it (e.g. sum with initializer=5), and I would expect things like dropping in `decimal.Decimal` to work most of the time, while here it would give silently bad results. - Sebastian > On 26/03/2018 at 17:45, Sebastian wrote: On Mon, 2018-03-26 at > 11:39 -0400, Hameer Abbasi wrote: That is the idea, but NaN functions > are in a separate branch for another PR to be discussed later. You > can > see it on my fork, if you're interested. Except that as far as I > understand I am not sure it will help much with it, since it is not a > default, but an initializer. Initializing to NaN would just make all > results NaN. - Sebastian On 26/03/2018 at 17:35, Benjamin wrote: Hmm, > this is neat. I imagine it would finally give some people a choice on > what np.nansum([np.nan]) should return? It caused a huge hullabeloo a > few years ago when we changed it from returning NaN to returning > zero. > Ben Root On Mon, Mar 26, 2018 at 11:16 AM, Sebastian Berg > wrote: OK, the new documentation is > actually clear: initializer : scalar, optional The value with which > to > start the reduction. Defaults to the `~numpy.ufunc.identity` of the > ufunc. If ``None`` is given, the first element of the reduction is > used, and an error is thrown if the reduction is empty. If > ``a.dtype`` > is ``object``, then the initializer is _only_ used if reduction is > empty. I would actually like to say that I do not like the object > special case much (and it is probably the reason why I was confused), > nor am I quite sure this is what helps a lot? Logically, I would > argue > there are two things: 1. initializer/start (always used) 2. default > (oly used for empty reductions) For example, I might like to give > `np.nan` as the default for some empty reductions, this will not > work. > I understand that this is a minimal invasive PR and I am not sure I > find the solution bad enough to really dislike it, but what do other > think? My first expectation was the default behaviour (in all cases, > not just object case) for some reason. To be honest, for now I just > wonder a bit: How hard would it be to do both, or is that too > annoying? It would at least get rid of that annoying thing with > object > ufuncs (which currently have a default, but not really an > identity/initializer). Best, Sebastian On Mon, 2018-03-26 at 08:20 > -0400, Hameer Abbasi wrote: > Actually, the behavior right now isn?t > that of `default` but that of > `initializer` or `start`. > > This > was > discussed further down in the PR but to reiterate: > `np.sum([10], > initializer=5)` becomes `15`. > > Also, `np.min([5], initializer=0)` > becomes `0`, so it isn?t really > the default value, it?s the initial > value among which the reduction > is performed. > > This was the > reason to call it initializer in the first place. I like > `initial` > and `initial_value` as well, and `start` also makes sense > but isn?t > descriptive enough. > > Hameer > Sent from Astro for Mac > > > On Mar > 26, 2018 at 12:06, Sebastian Berg > t> > wrote: > > > > Initializer or this sounds fine to me. As an other > data > point which > > I > > think has been mentioned before, `sum` uses > start and min/max use > > default. `start` does not work, unless we > also change the code to > > always use the identity if given > (currently that is not the case), > > in > > which case it might be > nice. However, "start" seems a bit like > > solving > > a different > issue in any case. > > > > Anyway, mostly noise. I really like adding > this, the only thing > > worth > > discussing a bit is the name :). > > - Sebastian > > > > > > On Mon, 2018-03-26 at 05:57 -0400, Hameer > Abbasi wrote: > > > It calls it `initializer` - See > https://docs.python.org/3.5/libra > > > ry/f > > > > unctools.html#functools.reduce > > > > > > Sent from Astro for Mac > > On Mar 26, 2018 at 09:54, Eric Wieser > > > > com> > > > > wrote: > > > > > > > > It turns out I mispoke - > > functools.reduce calls the argument > > > > `initial` > > > > > > > > > On Mon, 26 Mar 2018 at 00:17 Stephan Hoyer > > > > > wrote: > > > > > This looks like a very logical addition to the > reduce > > > > > > interface. > > > > > It has my support! > > > > > > > > > > > > I would have preferred the more descriptive name > > > > > > "initial_value", > > > > > but consistency with functools.reduce > makes > a compelling case > > > > > for > > > > > "initializer". > > > > > > > > > > > > On Sun, Mar 25, 2018 at 1:15 PM Eric Wieser > > > > > y at gm > > > > > ail.com> wrote: To reiterate my comments in > > > > > the > > issue - I'm in favor of > this. > > > > > > > > > > > > It seems seem > especially valuable for identity-less > > > > > > functions > > > > > > > (`min`, `max`, `lcm`), and the argument name is consistent > > > > > > > > > with > `functools.reduce`. too. > > > > > > > > > > > > The only > > argument I can see against merging this would be > > > > > > > `kwarg`-creep of `reduce`, and I think this has enough use > > > > > > cases to justify that. > > > > > > > > > > > > I'd like to merge in a > few days, if no one else has any > > > > > > opinions. > > > > > > > > Eric > > > > > > > > > > > > On Fri, 16 Mar 2018 at 10:13 Hameer > Abbasi > > > > > @gma > > > > > > il.com> wrote: > > > > > > > > Hello, everyone. I?ve submitted a PR to add a initializer > > > > > > > > kwarg to ufunc.reduce. This is useful in a few cases, > > > > > > > > > > > > > > > e.g., > > > > > > > it allows one to supply a ?default? > > value for identity- > > > > > > > less > > > > > > > ufunc > reductions, > and specify an initial value for > > > > > > > reductions such as sum > (other than zero.) > > > > > > > > > > > > Please feel free to review > or leave feedback, (although I > > > > > think Eric and Marten have > picked it apart pretty well). > > > > > https://github.com/numpy/numpy/pull/10635 > > > > > Thanks, > > > > > > > > > > > > > > > Hameer > > > > Sent from Astro for Mac > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > NumPy-Discussion mailing list > > > > > > > > > NumPy-Discussion at python.org > > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at python.org > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > > > NumPy- > Discussion > mailing list > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > > NumPy-Discussion > mailing list > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion > mailing list > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion > mailing list NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion > mailing list NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Mon Mar 26 12:54:51 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 26 Mar 2018 18:54:51 +0200 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: <1522082927.4888.24.camel@sipsolutions.net> References: <1522077394.4888.10.camel@sipsolutions.net> <1522079156.4888.12.camel@sipsolutions.net> <1522082927.4888.24.camel@sipsolutions.net> Message-ID: <1522083291.8319.3.camel@sipsolutions.net> On Mon, 2018-03-26 at 18:48 +0200, Sebastian Berg wrote: > On Mon, 2018-03-26 at 11:53 -0400, Hameer Abbasi wrote: > > It'll need to be thought out for object arrays and subclasses. But > > for > > Regular numeric stuff, Numpy uses fmin and this would have the > > desired > > effect. > > I do not want to block this, but I would like a clearer opinion about > this issue, `np.nansum` as Benjamin noted would require something > like: > > np.nansum([np.nan], default=np.nan) > > because > > np.sum([1], initializer=np.nan) > np.nansum([1], initializer=np.nan) > > would both give NaN if the logic is the same as the current `np.sum`. > And yes, I guess for fmin/fmax NaN happens to work. And then there > are > many nonsense reduces which could make sense with `initializer`. > > Now nansum is not implemented in a way that could make use of the new > kwarg anyway, so maybe it does not matter in some sense. We can in > principle use `default` in nansum and at some point possibly add > `default` to the normal ufuncs. If we argue like that, the only > annoying thing is the `object` dtype which confuses the two use cases > currently. > > This confusion IMO is not harmless, because I might want to use it > (e.g. sum with initializer=5), and I would expect things like > dropping > in `decimal.Decimal` to work most of the time, while here it would > give > silently bad results. > In other words: I am very very much in favor if you get rid that object dtype special case. I frankly not see why not (except that it needs a bit more code change). If given explicitly, we might as well force the use and not do the funny stuff which is designed to be more type agnostic! If it happens to fail due to not being type agnostic, it will at least fail loudly. If you leave that object special case I am *very* hesitant about it. That I think I would like a `default` argument as well, is another issue and it can wait to another day. - Sebastian > - Sebastian > > > > > > > On 26/03/2018 at 17:45, Sebastian wrote: On Mon, 2018-03-26 at > > 11:39 -0400, Hameer Abbasi wrote: That is the idea, but NaN > > functions > > are in a separate branch for another PR to be discussed later. You > > can > > see it on my fork, if you're interested. Except that as far as I > > understand I am not sure it will help much with it, since it is not > > a > > default, but an initializer. Initializing to NaN would just make > > all > > results NaN. - Sebastian On 26/03/2018 at 17:35, Benjamin wrote: > > Hmm, > > this is neat. I imagine it would finally give some people a choice > > on > > what np.nansum([np.nan]) should return? It caused a huge hullabeloo > > a > > few years ago when we changed it from returning NaN to returning > > zero. > > Ben Root On Mon, Mar 26, 2018 at 11:16 AM, Sebastian Berg > > wrote: OK, the new documentation is > > actually clear: initializer : scalar, optional The value with which > > to > > start the reduction. Defaults to the `~numpy.ufunc.identity` of the > > ufunc. If ``None`` is given, the first element of the reduction is > > used, and an error is thrown if the reduction is empty. If > > ``a.dtype`` > > is ``object``, then the initializer is _only_ used if reduction is > > empty. I would actually like to say that I do not like the object > > special case much (and it is probably the reason why I was > > confused), > > nor am I quite sure this is what helps a lot? Logically, I would > > argue > > there are two things: 1. initializer/start (always used) 2. default > > (oly used for empty reductions) For example, I might like to give > > `np.nan` as the default for some empty reductions, this will not > > work. > > I understand that this is a minimal invasive PR and I am not sure I > > find the solution bad enough to really dislike it, but what do > > other > > think? My first expectation was the default behaviour (in all > > cases, > > not just object case) for some reason. To be honest, for now I just > > wonder a bit: How hard would it be to do both, or is that too > > annoying? It would at least get rid of that annoying thing with > > object > > ufuncs (which currently have a default, but not really an > > identity/initializer). Best, Sebastian On Mon, 2018-03-26 at 08:20 > > -0400, Hameer Abbasi wrote: > Actually, the behavior right now > > isn?t > > that of `default` but that of > `initializer` or `start`. > > This > > was > > discussed further down in the PR but to reiterate: > `np.sum([10], > > initializer=5)` becomes `15`. > > Also, `np.min([5], > > initializer=0)` > > becomes `0`, so it isn?t really > the default value, it?s the > > initial > > value among which the reduction > is performed. > > This was the > > reason to call it initializer in the first place. I like > > > `initial` > > and `initial_value` as well, and `start` also makes sense > but > > isn?t > > descriptive enough. > > Hameer > Sent from Astro for Mac > > > On > > Mar > > 26, 2018 at 12:06, Sebastian Berg > t> > > wrote: > > > > Initializer or this sounds fine to me. As an other > > data > > point which > > I > > think has been mentioned before, `sum` uses > > start and min/max use > > default. `start` does not work, unless we > > also change the code to > > always use the identity if given > > (currently that is not the case), > > in > > which case it might be > > nice. However, "start" seems a bit like > > solving > > a different > > issue in any case. > > > > Anyway, mostly noise. I really like > > adding > > this, the only thing > > worth > > discussing a bit is the name :). > > > > > - Sebastian > > > > > > On Mon, 2018-03-26 at 05:57 -0400, Hameer > > Abbasi wrote: > > > It calls it `initializer` - See > > https://docs.python.org/3.5/libra > > > ry/f > > > > > unctools.html#functools.reduce > > > > > > Sent from Astro for Mac > > > > > On Mar 26, 2018 at 09:54, Eric Wieser > > > > > > > com> > > > > wrote: > > > > > > > > It turns out I mispoke - > > > > functools.reduce calls the argument > > > > `initial` > > > > > > > > > > > > On Mon, 26 Mar 2018 at 00:17 Stephan Hoyer > > > > > > > > wrote: > > > > > This looks like a very logical addition to the > > reduce > > > > > > > interface. > > > > > It has my support! > > > > > > > > > > > > > > > > > > > > > I would have preferred the more descriptive name > > > > > > > "initial_value", > > > > > but consistency with functools.reduce > > makes > > a compelling case > > > > > for > > > > > "initializer". > > > > > > > > > > > > > > > > On Sun, Mar 25, 2018 at 1:15 PM Eric Wieser > > > > > > > > > > > y at gm > > > > > ail.com> wrote: To reiterate my comments in > > > > > > the > > > > issue - I'm in favor of > this. > > > > > > > > > > > > It seems > > seem > > especially valuable for identity-less > > > > > > functions > > > > > > > > > > (`min`, `max`, `lcm`), and the argument name is consistent > > > > > > > > > > > > > > > > > with > `functools.reduce`. too. > > > > > > > > > > > > The only > > > > argument I can see against merging this would be > > > > > > > > `kwarg`-creep of `reduce`, and I think this has enough use > > > > > > > > > cases to justify that. > > > > > > > > > > > > I'd like to merge in > > a > > few days, if no one else has any > > > > > > opinions. > > > > > > > > > > > Eric > > > > > > > > > > > > On Fri, 16 Mar 2018 at 10:13 Hameer > > Abbasi > > > > > @gma > > > > > > il.com> wrote: > > > > > > > > > > > Hello, everyone. I?ve submitted a PR to add a > > > > > > > > initializer > > > > > > > > > kwarg to ufunc.reduce. This is useful in a few cases, > > > > > > > > > > > > > > > > > > > > > > > > > > e.g., > > > > > > > it allows one to supply a ?default? > > > > value for identity- > > > > > > > less > > > > > > > ufunc > > reductions, > > and specify an initial value for > > > > > > > reductions such as > > sum > > (other than zero.) > > > > > > > > > > > > Please feel free to > > review > > or leave feedback, (although I > > > > > think Eric and Marten have > > picked it apart pretty well). > > > > > > https://github.com/numpy/numpy/pull/10635 > > > > > Thanks, > > > > > > > > > > > > > > > > > > Hameer > > > > Sent from Astro for Mac > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > > > > > > > > > > > > NumPy-Discussion mailing list > > > > > > > > > > > NumPy-Discussion at python.org > > > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at python.o > > rg > > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussi > > > > > > > > on > > > > > > > > _______________________________________________ > > > > > > > > > > > > > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > > NumPy- > > Discussion > > mailing list > > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > NumPy- > > Discussion > > mailing list > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion > > mailing list > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ NumPy-Discussion > > mailing list NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ NumPy-Discussion > > mailing list NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From einstein.edison at gmail.com Mon Mar 26 12:59:38 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Mon, 26 Mar 2018 12:59:38 -0400 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: <1522083291.8319.3.camel@sipsolutions.net> References: <1522079156.4888.12.camel@sipsolutions.net> <1522082927.4888.24.camel@sipsolutions.net> <1522083291.8319.3.camel@sipsolutions.net> Message-ID: That may be complicated. Currently, the identity isn't used in object dtype reductions. We may need to change that, which could cause a whole lot of other backwards incompatible changes. For example, sum actually including zero in object reductions. Or we could pass in a flag saying an initializer was passed in to change that behaviour. If this is agreed upon and someone is kind enough to point me to the code, I'd be willing to make this change. On 26/03/2018 at 18:54, Sebastian wrote: On Mon, 2018-03-26 at 18:48 +0200, Sebastian Berg wrote: On Mon, 2018-03-26 at 11:53 -0400, Hameer Abbasi wrote: It'll need to be thought out for object arrays and subclasses. But for Regular numeric stuff, Numpy uses fmin and this would have the desired effect. I do not want to block this, but I would like a clearer opinion about this issue, `np.nansum` as Benjamin noted would require something like: np.nansum([np.nan], default=np.nan) because np.sum([1], initializer=np.nan) np.nansum([1], initializer=np.nan) would both give NaN if the logic is the same as the current `np.sum`. And yes, I guess for fmin/fmax NaN happens to work. And then there are many nonsense reduces which could make sense with `initializer`. Now nansum is not implemented in a way that could make use of the new kwarg anyway, so maybe it does not matter in some sense. We can in principle use `default` in nansum and at some point possibly add `default` to the normal ufuncs. If we argue like that, the only annoying thing is the `object` dtype which confuses the two use cases currently. This confusion IMO is not harmless, because I might want to use it (e.g. sum with initializer=5), and I would expect things like dropping in `decimal.Decimal` to work most of the time, while here it would give silently bad results. In other words: I am very very much in favor if you get rid that object dtype special case. I frankly not see why not (except that it needs a bit more code change). If given explicitly, we might as well force the use and not do the funny stuff which is designed to be more type agnostic! If it happens to fail due to not being type agnostic, it will at least fail loudly. If you leave that object special case I am *very* hesitant about it. That I think I would like a `default` argument as well, is another issue and it can wait to another day. - Sebastian - Sebastian On 26/03/2018 at 17:45, Sebastian wrote: On Mon, 2018-03-26 at 11:39 -0400, Hameer Abbasi wrote: That is the idea, but NaN functions are in a separate branch for another PR to be discussed later. You can see it on my fork, if you're interested. Except that as far as I understand I am not sure it will help much with it, since it is not a default, but an initializer. Initializing to NaN would just make all results NaN. - Sebastian On 26/03/2018 at 17:35, Benjamin wrote: Hmm, this is neat. I imagine it would finally give some people a choice on what np.nansum([np.nan]) should return? It caused a huge hullabeloo a few years ago when we changed it from returning NaN to returning zero. Ben Root On Mon, Mar 26, 2018 at 11:16 AM, Sebastian Berg wrote: OK, the new documentation is actually clear: initializer : scalar, optional The value with which to start the reduction. Defaults to the `~numpy.ufunc.identity` of the ufunc. If ``None`` is given, the first element of the reduction is used, and an error is thrown if the reduction is empty. If ``a.dtype`` is ``object``, then the initializer is _only_ used if reduction is empty. I would actually like to say that I do not like the object special case much (and it is probably the reason why I was confused), nor am I quite sure this is what helps a lot? Logically, I would argue there are two things: 1. initializer/start (always used) 2. default (oly used for empty reductions) For example, I might like to give `np.nan` as the default for some empty reductions, this will not work. I understand that this is a minimal invasive PR and I am not sure I find the solution bad enough to really dislike it, but what do other think? My first expectation was the default behaviour (in all cases, not just object case) for some reason. To be honest, for now I just wonder a bit: How hard would it be to do both, or is that too annoying? It would at least get rid of that annoying thing with object ufuncs (which currently have a default, but not really an identity/initializer). Best, Sebastian On Mon, 2018-03-26 at 08:20 -0400, Hameer Abbasi wrote: > Actually, the behavior right now isn?t that of `default` but that of > `initializer` or `start`. > > This was discussed further down in the PR but to reiterate: > `np.sum([10], initializer=5)` becomes `15`. > > Also, `np.min([5], initializer=0)` becomes `0`, so it isn?t really > the default value, it?s the initial value among which the reduction > is performed. > > This was the reason to call it initializer in the first place. I like > `initial` and `initial_value` as well, and `start` also makes sense > but isn?t descriptive enough. > > Hameer > Sent from Astro for Mac > > > On Mar 26, 2018 at 12:06, Sebastian Berg > t> wrote: > > > > Initializer or this sounds fine to me. As an other data point which > > I > > think has been mentioned before, `sum` uses start and min/max use > > default. `start` does not work, unless we also change the code to > > always use the identity if given (currently that is not the case), > > in > > which case it might be nice. However, "start" seems a bit like > > solving > > a different issue in any case. > > > > Anyway, mostly noise. I really like adding this, the only thing > > worth > > discussing a bit is the name :). - Sebastian > > > > > > On Mon, 2018-03-26 at 05:57 -0400, Hameer Abbasi wrote: > > > It calls it `initializer` - See https://docs.python.org/3.5/libra > > > ry/f > > > unctools.html#functools.reduce > > > > > > Sent from Astro for Mac On Mar 26, 2018 at 09:54, Eric Wieser > com> > > > > wrote: > > > > > > > > It turns out I mispoke - functools.reduce calls the argument > > > > `initial` > > > > > > > On Mon, 26 Mar 2018 at 00:17 Stephan Hoyer > > > wrote: > > > > > This looks like a very logical addition to the reduce interface. > > > > > It has my support! > > > > > > > > > I would have preferred the more descriptive name > > > > > "initial_value", > > > > > but consistency with functools.reduce makes a compelling case > > > > > for > > > > > "initializer". > > > > > On Sun, Mar 25, 2018 at 1:15 PM Eric Wieser > > > > ail.com> wrote: To reiterate my comments in the issue - I'm in favor of > this. > > > > > > > > > > > > It seems seem especially valuable for identity-less > > > > > > functions > > > > (`min`, `max`, `lcm`), and the argument name is consistent > > > with > `functools.reduce`. too. > > > > > > > > > > > > The only argument I can see against merging this would be > > > > > > `kwarg`-creep of `reduce`, and I think this has enough use > > > > cases to justify that. > > > > > > > > > > > > I'd like to merge in a few days, if no one else has any > > > > > > opinions. > > > > > > Eric > > > > > > > > > > > > On Fri, 16 Mar 2018 at 10:13 Hameer Abbasi > > > > > @gma > > > > > > il.com> wrote: Hello, everyone. I?ve submitted a PR to add a initializer kwarg to ufunc.reduce. This is useful in a few cases, e.g., > > > > > > > it allows one to supply a ?default? value for identity- > > > > > > > less > > > > > > > ufunc reductions, and specify an initial value for > > > > > > > reductions such as sum (other than zero.) > > > > > > > > > > > > Please feel free to review or leave feedback, (although I > > > > > think Eric and Marten have picked it apart pretty well). > > > > https://github.com/numpy/numpy/pull/10635 > > > > > Thanks, > > > > Hameer > > > > Sent from Astro for Mac > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > > > > > NumPy-Discussion at python.org > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at python.o rg https://mail.python.org/mailman/listinfo/numpy-discussi on _______________________________________________ > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > > NumPy- Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy- Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Mon Mar 26 13:09:52 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 26 Mar 2018 19:09:52 +0200 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: References: <1522079156.4888.12.camel@sipsolutions.net> <1522082927.4888.24.camel@sipsolutions.net> <1522083291.8319.3.camel@sipsolutions.net> Message-ID: <1522084192.8883.6.camel@sipsolutions.net> On Mon, 2018-03-26 at 12:59 -0400, Hameer Abbasi wrote: > That may be complicated. Currently, the identity isn't used in object > dtype reductions. We may need to change that, which could cause a > whole lot of other backwards incompatible changes. For example, sum > actually including zero in object reductions. Or we could pass in a > flag saying an initializer was passed in to change that behaviour. If > this is agreed upon and someone is kind enough to point me to the > code, I'd be willing to make this change. I realize the implication, I am not suggesting to change the default behaviour (when no initial=... is passed), I would think about deprecating it, but probably only if we also have the `default` argument, since otherwise you cannot replicate the old behaviour. What I think I would like to see is to change how it works if (and only if) the initializer is passed in. Yes, this will require holding on to some extra information since you will have to know/remember whether the "identity" was passed in or defined otherwise. I did not check the code, but I would hope that it is not awfully tricky to do that. - Sebastian PS: A side note, but I see your emails as a single block of text with no/broken new-lines. > On 26/03/2018 at 18:54, > Sebastian wrote: On Mon, 2018-03-26 at 18:48 +0200, Sebastian Berg > wrote: On Mon, 2018-03-26 at 11:53 -0400, Hameer Abbasi wrote: It'll > need to be thought out for object arrays and subclasses. But for > Regular numeric stuff, Numpy uses fmin and this would have the > desired > effect. I do not want to block this, but I would like a clearer > opinion about this issue, `np.nansum` as Benjamin noted would require > something like: np.nansum([np.nan], default=np.nan) because > np.sum([1], initializer=np.nan) np.nansum([1], initializer=np.nan) > would both give NaN if the logic is the same as the current `np.sum`. > And yes, I guess for fmin/fmax NaN happens to work. And then there > are > many nonsense reduces which could make sense with `initializer`. Now > nansum is not implemented in a way that could make use of the new > kwarg anyway, so maybe it does not matter in some sense. We can in > principle use `default` in nansum and at some point possibly add > `default` to the normal ufuncs. If we argue like that, the only > annoying thing is the `object` dtype which confuses the two use cases > currently. This confusion IMO is not harmless, because I might want > to > use it (e.g. sum with initializer=5), and I would expect things like > dropping in `decimal.Decimal` to work most of the time, while here it > would give silently bad results. In other words: I am very very much > in favor if you get rid that object dtype special case. I frankly not > see why not (except that it needs a bit more code change). If given > explicitly, we might as well force the use and not do the funny stuff > which is designed to be more type agnostic! If it happens to fail due > to not being type agnostic, it will at least fail loudly. If you > leave > that object special case I am *very* hesitant about it. That I think > I > would like a `default` argument as well, is another issue and it can > wait to another day. - Sebastian - Sebastian On 26/03/2018 at 17:45, > Sebastian wrote: On Mon, 2018-03-26 at 11:39 -0400, Hameer Abbasi > wrote: That is the idea, but NaN functions are in a separate branch > for another PR to be discussed later. You can see it on my fork, if > you're interested. Except that as far as I understand I am not sure > it > will help much with it, since it is not a default, but an > initializer. > Initializing to NaN would just make all results NaN. - Sebastian On > 26/03/2018 at 17:35, Benjamin wrote: Hmm, this is neat. I imagine it > would finally give some people a choice on what np.nansum([np.nan]) > should return? It caused a huge hullabeloo a few years ago when we > changed it from returning NaN to returning zero. Ben Root On Mon, Mar > 26, 2018 at 11:16 AM, Sebastian Berg > wrote: OK, the new documentation is actually clear: initializer : > scalar, optional The value with which to start the reduction. > Defaults > to the `~numpy.ufunc.identity` of the ufunc. If ``None`` is given, > the > first element of the reduction is used, and an error is thrown if the > reduction is empty. If ``a.dtype`` is ``object``, then the > initializer > is _only_ used if reduction is empty. I would actually like to say > that I do not like the object special case much (and it is probably > the reason why I was confused), nor am I quite sure this is what > helps > a lot? Logically, I would argue there are two things: 1. > initializer/start (always used) 2. default (oly used for empty > reductions) For example, I might like to give `np.nan` as the default > for some empty reductions, this will not work. I understand that this > is a minimal invasive PR and I am not sure I find the solution bad > enough to really dislike it, but what do other think? My first > expectation was the default behaviour (in all cases, not just object > case) for some reason. To be honest, for now I just wonder a bit: How > hard would it be to do both, or is that too annoying? It would at > least get rid of that annoying thing with object ufuncs (which > currently have a default, but not really an identity/initializer). > Best, Sebastian On Mon, 2018-03-26 at 08:20 -0400, Hameer Abbasi > wrote: > Actually, the behavior right now isn?t that of `default` but > that of > `initializer` or `start`. > > This was discussed further > down in the PR but to reiterate: > `np.sum([10], initializer=5)` > becomes `15`. > > Also, `np.min([5], initializer=0)` becomes `0`, so > it isn?t really > the default value, it?s the initial value among > which the reduction > is performed. > > This was the reason to call > it > initializer in the first place. I like > `initial` and > `initial_value` > as well, and `start` also makes sense > but isn?t descriptive enough. > > > Hameer > Sent from Astro for Mac > > > On Mar 26, 2018 at 12:06, > > Sebastian Berg > t> wrote: > > > > > Initializer or this sounds fine to me. As an other data point which > > > I > > think has been mentioned before, `sum` uses start and min/max > > use > > default. `start` does not work, unless we also change the > code > to > > always use the identity if given (currently that is not the > case), > > in > > which case it might be nice. However, "start" seems > a bit like > > solving > > a different issue in any case. > > > > > Anyway, mostly noise. I really like adding this, the only thing > > > worth > > discussing a bit is the name :). - Sebastian > > > > > > On > Mon, 2018-03-26 at 05:57 -0400, Hameer Abbasi wrote: > > > It calls > it > `initializer` - See https://docs.python.org/3.5/libra > > > ry/f > > > > > unctools.html#functools.reduce > > > > > > Sent from Astro for Mac On > Mar 26, 2018 at 09:54, Eric Wieser > com> > > > > > wrote: > > > > > > > > It turns out I mispoke - > > functools.reduce calls the argument > > > > `initial` > > > > > > > > On > Mon, 26 Mar 2018 at 00:17 Stephan Hoyer > > > > wrote: > > > > > This looks like a very logical addition to the > reduce > interface. > > > > > It has my support! > > > > > > > > > I would > have > preferred the more descriptive name > > > > > "initial_value", > > > > > > > but consistency with functools.reduce makes a compelling case > > > > > > for > > > > > "initializer". > > > > > On Sun, Mar 25, 2018 at > > 1:15 PM Eric Wieser > > > > ail.com> wrote: > To reiterate my comments in the issue - I'm in favor of > this. > > > > > > > > > > > > > It seems seem especially valuable for identity-less > > > > > > > > > > > > > > functions > > > > (`min`, `max`, `lcm`), and the argument > > name is consistent > > > with > `functools.reduce`. too. > > > > > > > > > > > > > > The only argument I can see against merging this would be > > > > > > > `kwarg`-creep of `reduce`, and I think this has enough use > > > > > > > > > > > > > > > cases to justify that. > > > > > > > > > > > > I'd like to > > > > merge > > in a few days, if no one else has any > > > > > > opinions. > > > > > > > Eric > > > > > > > > > > > > On Fri, 16 Mar 2018 at 10:13 Hameer > > Abbasi > > > > > @gma > > > > > > il.com> wrote: > Hello, everyone. I?ve submitted a PR to add a initializer kwarg to > ufunc.reduce. This is useful in a few cases, e.g., > > > > > > > it > allows one to supply a ?default? value for identity- > > > > > > > > less > > > > > > > ufunc reductions, and specify an initial value for > > > > > > > > reductions such as sum (other than zero.) > > > > > > > > > > > > > > > > > > > Please feel free to review or leave feedback, (although I > > > > > > > > > > > think Eric and Marten have picked it apart pretty well). > > > > > > > > > > > > https://github.com/numpy/numpy/pull/10635 > > > > > Thanks, > > > > > > Hameer > > > > Sent from Astro for Mac > > > > > > _______________________________________________ > > > NumPy- > Discussion > mailing list > > > > > > > NumPy-Discussion at python.org > > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at python.o > rg > https://mail.python.org/mailman/listinfo/numpy-discussi on > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > > > NumPy- > Discussion mailing list > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > > NumPy- Discussion > mailing list > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion > mailing list > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion > mailing list NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion > mailing list NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion > mailing list NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion > mailing list NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From wieser.eric+numpy at gmail.com Mon Mar 26 13:40:34 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Mon, 26 Mar 2018 17:40:34 +0000 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: <1522084192.8883.6.camel@sipsolutions.net> References: <1522079156.4888.12.camel@sipsolutions.net> <1522082927.4888.24.camel@sipsolutions.net> <1522083291.8319.3.camel@sipsolutions.net> <1522084192.8883.6.camel@sipsolutions.net> Message-ID: The difficulty in supporting object arrays is that func.reduce(arr, initial=func.identity) and func.reduce(arr) have different meanings - whereas with the current patch, they are equivalent. ? On Mon, 26 Mar 2018 at 10:10 Sebastian Berg wrote: > On Mon, 2018-03-26 at 12:59 -0400, Hameer Abbasi wrote: > > That may be complicated. Currently, the identity isn't used in object > > dtype reductions. We may need to change that, which could cause a > > whole lot of other backwards incompatible changes. For example, sum > > actually including zero in object reductions. Or we could pass in a > > flag saying an initializer was passed in to change that behaviour. If > > this is agreed upon and someone is kind enough to point me to the > > code, I'd be willing to make this change. > > I realize the implication, I am not suggesting to change the default > behaviour (when no initial=... is passed), I would think about > deprecating it, but probably only if we also have the `default` > argument, since otherwise you cannot replicate the old behaviour. > > What I think I would like to see is to change how it works if (and only > if) the initializer is passed in. Yes, this will require holding on to > some extra information since you will have to know/remember whether the > "identity" was passed in or defined otherwise. > > I did not check the code, but I would hope that it is not awfully > tricky to do that. > > - Sebastian > > > PS: A side note, but I see your emails as a single block of text with > no/broken new-lines. > > > > On 26/03/2018 at 18:54, > > Sebastian wrote: On Mon, 2018-03-26 at 18:48 +0200, Sebastian Berg > > wrote: On Mon, 2018-03-26 at 11:53 -0400, Hameer Abbasi wrote: It'll > > need to be thought out for object arrays and subclasses. But for > > Regular numeric stuff, Numpy uses fmin and this would have the > > desired > > effect. I do not want to block this, but I would like a clearer > > opinion about this issue, `np.nansum` as Benjamin noted would require > > something like: np.nansum([np.nan], default=np.nan) because > > np.sum([1], initializer=np.nan) np.nansum([1], initializer=np.nan) > > would both give NaN if the logic is the same as the current `np.sum`. > > And yes, I guess for fmin/fmax NaN happens to work. And then there > > are > > many nonsense reduces which could make sense with `initializer`. Now > > nansum is not implemented in a way that could make use of the new > > kwarg anyway, so maybe it does not matter in some sense. We can in > > principle use `default` in nansum and at some point possibly add > > `default` to the normal ufuncs. If we argue like that, the only > > annoying thing is the `object` dtype which confuses the two use cases > > currently. This confusion IMO is not harmless, because I might want > > to > > use it (e.g. sum with initializer=5), and I would expect things like > > dropping in `decimal.Decimal` to work most of the time, while here it > > would give silently bad results. In other words: I am very very much > > in favor if you get rid that object dtype special case. I frankly not > > see why not (except that it needs a bit more code change). If given > > explicitly, we might as well force the use and not do the funny stuff > > which is designed to be more type agnostic! If it happens to fail due > > to not being type agnostic, it will at least fail loudly. If you > > leave > > that object special case I am *very* hesitant about it. That I think > > I > > would like a `default` argument as well, is another issue and it can > > wait to another day. - Sebastian - Sebastian On 26/03/2018 at 17:45, > > Sebastian wrote: On Mon, 2018-03-26 at 11:39 -0400, Hameer Abbasi > > wrote: That is the idea, but NaN functions are in a separate branch > > for another PR to be discussed later. You can see it on my fork, if > > you're interested. Except that as far as I understand I am not sure > > it > > will help much with it, since it is not a default, but an > > initializer. > > Initializing to NaN would just make all results NaN. - Sebastian On > > 26/03/2018 at 17:35, Benjamin wrote: Hmm, this is neat. I imagine it > > would finally give some people a choice on what np.nansum([np.nan]) > > should return? It caused a huge hullabeloo a few years ago when we > > changed it from returning NaN to returning zero. Ben Root On Mon, Mar > > 26, 2018 at 11:16 AM, Sebastian Berg > > wrote: OK, the new documentation is actually clear: initializer : > > scalar, optional The value with which to start the reduction. > > Defaults > > to the `~numpy.ufunc.identity` of the ufunc. If ``None`` is given, > > the > > first element of the reduction is used, and an error is thrown if the > > reduction is empty. If ``a.dtype`` is ``object``, then the > > initializer > > is _only_ used if reduction is empty. I would actually like to say > > that I do not like the object special case much (and it is probably > > the reason why I was confused), nor am I quite sure this is what > > helps > > a lot? Logically, I would argue there are two things: 1. > > initializer/start (always used) 2. default (oly used for empty > > reductions) For example, I might like to give `np.nan` as the default > > for some empty reductions, this will not work. I understand that this > > is a minimal invasive PR and I am not sure I find the solution bad > > enough to really dislike it, but what do other think? My first > > expectation was the default behaviour (in all cases, not just object > > case) for some reason. To be honest, for now I just wonder a bit: How > > hard would it be to do both, or is that too annoying? It would at > > least get rid of that annoying thing with object ufuncs (which > > currently have a default, but not really an identity/initializer). > > Best, Sebastian On Mon, 2018-03-26 at 08:20 -0400, Hameer Abbasi > > wrote: > Actually, the behavior right now isn?t that of `default` but > > that of > `initializer` or `start`. > > This was discussed further > > down in the PR but to reiterate: > `np.sum([10], initializer=5)` > > becomes `15`. > > Also, `np.min([5], initializer=0)` becomes `0`, so > > it isn?t really > the default value, it?s the initial value among > > which the reduction > is performed. > > This was the reason to call > > it > > initializer in the first place. I like > `initial` and > > `initial_value` > > as well, and `start` also makes sense > but isn?t descriptive enough. > > > > Hameer > Sent from Astro for Mac > > > On Mar 26, 2018 at 12:06, > > > > Sebastian Berg > t> wrote: > > > > > > Initializer or this sounds fine to me. As an other data point which > > > > I > > think has been mentioned before, `sum` uses start and min/max > > > > use > > default. `start` does not work, unless we also change the > > code > > to > > always use the identity if given (currently that is not the > > case), > > in > > which case it might be nice. However, "start" seems > > a bit like > > solving > > a different issue in any case. > > > > > > Anyway, mostly noise. I really like adding this, the only thing > > > > worth > > discussing a bit is the name :). - Sebastian > > > > > > On > > Mon, 2018-03-26 at 05:57 -0400, Hameer Abbasi wrote: > > > It calls > > it > > `initializer` - See https://docs.python.org/3.5/libra > > > ry/f > > > > > > > unctools.html#functools.reduce > > > > > > Sent from Astro for Mac On > > Mar 26, 2018 at 09:54, Eric Wieser > com> > > > > > > wrote: > > > > > > > > It turns out I mispoke - > > > > functools.reduce calls the argument > > > > `initial` > > > > > > > > > On > > Mon, 26 Mar 2018 at 00:17 Stephan Hoyer > > > > > wrote: > > > > > This looks like a very logical addition to the > > reduce > > interface. > > > > > It has my support! > > > > > > > > > I would > > have > > preferred the more descriptive name > > > > > "initial_value", > > > > > > > > > but consistency with functools.reduce makes a compelling case > > > > > > > for > > > > > "initializer". > > > > > On Sun, Mar 25, 2018 at > > > > 1:15 PM Eric Wieser > > > > ail.com> wrote: > > To reiterate my comments in the issue - I'm in favor of > this. > > > > > > > > > > > > > > It seems seem especially valuable for identity-less > > > > > > > > > > > > > > > > functions > > > > (`min`, `max`, `lcm`), and the argument > > > > name is consistent > > > with > `functools.reduce`. too. > > > > > > > > > > > > > > > > The only argument I can see against merging this would be > > > > > > > > `kwarg`-creep of `reduce`, and I think this has enough use > > > > > > > > > > > > > > > > > > cases to justify that. > > > > > > > > > > > > I'd like to > > > > > merge > > > > in a few days, if no one else has any > > > > > > opinions. > > > > > > > > Eric > > > > > > > > > > > > On Fri, 16 Mar 2018 at 10:13 Hameer > > > > Abbasi > > > > > @gma > > > > > > il.com> wrote: > > Hello, everyone. I?ve submitted a PR to add a initializer kwarg to > > ufunc.reduce. This is useful in a few cases, e.g., > > > > > > > it > > allows one to supply a ?default? value for identity- > > > > > > > > > less > > > > > > > ufunc reductions, and specify an initial value for > > > > > > > > > reductions such as sum (other than zero.) > > > > > > > > > > > > > > > > > > > > > Please feel free to review or leave feedback, (although I > > > > > > > > > > > > > think Eric and Marten have picked it apart pretty well). > > > > > > > > > > > > > > > https://github.com/numpy/numpy/pull/10635 > > > > > Thanks, > > > > > > > > Hameer > > > > Sent from Astro for Mac > > > > > > > _______________________________________________ > > > NumPy- > > Discussion > > mailing list > > > > > > > NumPy-Discussion at python.org > > > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at python.o > > rg > > https://mail.python.org/mailman/listinfo/numpy-discussi on > > _______________________________________________ > > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > > NumPy- > > Discussion mailing list > > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > NumPy- Discussion > > mailing list > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion > > mailing list > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ NumPy-Discussion > > mailing list NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ NumPy-Discussion > > mailing list NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ NumPy-Discussion > > mailing list NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ NumPy-Discussion > > mailing list NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Mar 26 14:09:00 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 26 Mar 2018 20:09:00 +0200 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: References: <1522079156.4888.12.camel@sipsolutions.net> <1522082927.4888.24.camel@sipsolutions.net> <1522083291.8319.3.camel@sipsolutions.net> <1522084192.8883.6.camel@sipsolutions.net> Message-ID: <1522087740.11797.7.camel@sipsolutions.net> On Mon, 2018-03-26 at 17:40 +0000, Eric Wieser wrote: > The difficulty in supporting object arrays is that func.reduce(arr, > initial=func.identity) and func.reduce(arr) have different meanings - > whereas with the current patch, they are equivalent. > True, but the current meaning is: func.reduce(arr, intial=, default=func.identity) in the case for object dtype. Luckily for normal dtypes, func.identity is both the correct default "default" and a no-op for initial. Thus the name "identity" kinda works there. I am also not really sure that both kwargs would make real sense (plus initial probably disallows default...), but I got some feeling that the "default" meaning may be even more useful to simplify special casing the empty case. Anyway, still just pointing out that I it gives me some headaches to see such a special case for objects :(. - Sebastian > > On Mon, 26 Mar 2018 at 10:10 Sebastian Berg et> wrote: > > On Mon, 2018-03-26 at 12:59 -0400, Hameer Abbasi wrote: > > > That may be complicated. Currently, the identity isn't used in > > object > > > dtype reductions. We may need to change that, which could cause a > > > whole lot of other backwards incompatible changes. For example, > > sum > > > actually including zero in object reductions. Or we could pass in > > a > > > flag saying an initializer was passed in to change that > > behaviour. If > > > this is agreed upon and someone is kind enough to point me to the > > > code, I'd be willing to make this change. > > > > I realize the implication, I am not suggesting to change the > > default > > behaviour (when no initial=... is passed), I would think about > > deprecating it, but probably only if we also have the `default` > > argument, since otherwise you cannot replicate the old behaviour. > > > > What I think I would like to see is to change how it works if (and > > only > > if) the initializer is passed in. Yes, this will require holding on > > to > > some extra information since you will have to know/remember whether > > the > > "identity" was passed in or defined otherwise. > > > > I did not check the code, but I would hope that it is not awfully > > tricky to do that. > > > > - Sebastian > > > > > > PS: A side note, but I see your emails as a single block of text > > with > > no/broken new-lines. > > > > > > > On 26/03/2018 at 18:54, > > > Sebastian wrote: On Mon, 2018-03-26 at 18:48 +0200, Sebastian > > Berg > > > wrote: On Mon, 2018-03-26 at 11:53 -0400, Hameer Abbasi wrote: > > It'll > > > need to be thought out for object arrays and subclasses. But for > > > Regular numeric stuff, Numpy uses fmin and this would have the > > > desired > > > effect. I do not want to block this, but I would like a clearer > > > opinion about this issue, `np.nansum` as Benjamin noted would > > require > > > something like: np.nansum([np.nan], default=np.nan) because > > > np.sum([1], initializer=np.nan) np.nansum([1], > > initializer=np.nan) > > > would both give NaN if the logic is the same as the current > > `np.sum`. > > > And yes, I guess for fmin/fmax NaN happens to work. And then > > there > > > are > > > many nonsense reduces which could make sense with `initializer`. > > Now > > > nansum is not implemented in a way that could make use of the new > > > kwarg anyway, so maybe it does not matter in some sense. We can > > in > > > principle use `default` in nansum and at some point possibly add > > > `default` to the normal ufuncs. If we argue like that, the only > > > annoying thing is the `object` dtype which confuses the two use > > cases > > > currently. This confusion IMO is not harmless, because I might > > want > > > to > > > use it (e.g. sum with initializer=5), and I would expect things > > like > > > dropping in `decimal.Decimal` to work most of the time, while > > here it > > > would give silently bad results. In other words: I am very very > > much > > > in favor if you get rid that object dtype special case. I frankly > > not > > > see why not (except that it needs a bit more code change). If > > given > > > explicitly, we might as well force the use and not do the funny > > stuff > > > which is designed to be more type agnostic! If it happens to fail > > due > > > to not being type agnostic, it will at least fail loudly. If you > > > leave > > > that object special case I am *very* hesitant about it. That I > > think > > > I > > > would like a `default` argument as well, is another issue and it > > can > > > wait to another day. - Sebastian - Sebastian On 26/03/2018 at > > 17:45, > > > Sebastian wrote: On Mon, 2018-03-26 at 11:39 -0400, Hameer Abbasi > > > wrote: That is the idea, but NaN functions are in a separate > > branch > > > for another PR to be discussed later. You can see it on my fork, > > if > > > you're interested. Except that as far as I understand I am not > > sure > > > it > > > will help much with it, since it is not a default, but an > > > initializer. > > > Initializing to NaN would just make all results NaN. - Sebastian > > On > > > 26/03/2018 at 17:35, Benjamin wrote: Hmm, this is neat. I imagine > > it > > > would finally give some people a choice on what > > np.nansum([np.nan]) > > > should return? It caused a huge hullabeloo a few years ago when > > we > > > changed it from returning NaN to returning zero. Ben Root On Mon, > > Mar > > > 26, 2018 at 11:16 AM, Sebastian Berg > > > wrote: OK, the new documentation is actually clear: initializer : > > > scalar, optional The value with which to start the reduction. > > > Defaults > > > to the `~numpy.ufunc.identity` of the ufunc. If ``None`` is > > given, > > > the > > > first element of the reduction is used, and an error is thrown if > > the > > > reduction is empty. If ``a.dtype`` is ``object``, then the > > > initializer > > > is _only_ used if reduction is empty. I would actually like to > > say > > > that I do not like the object special case much (and it is > > probably > > > the reason why I was confused), nor am I quite sure this is what > > > helps > > > a lot? Logically, I would argue there are two things: 1. > > > initializer/start (always used) 2. default (oly used for empty > > > reductions) For example, I might like to give `np.nan` as the > > default > > > for some empty reductions, this will not work. I understand that > > this > > > is a minimal invasive PR and I am not sure I find the solution > > bad > > > enough to really dislike it, but what do other think? My first > > > expectation was the default behaviour (in all cases, not just > > object > > > case) for some reason. To be honest, for now I just wonder a bit: > > How > > > hard would it be to do both, or is that too annoying? It would at > > > least get rid of that annoying thing with object ufuncs (which > > > currently have a default, but not really an > > identity/initializer). > > > Best, Sebastian On Mon, 2018-03-26 at 08:20 -0400, Hameer Abbasi > > > wrote: > Actually, the behavior right now isn?t that of `default` > > but > > > that of > `initializer` or `start`. > > This was discussed > > further > > > down in the PR but to reiterate: > `np.sum([10], initializer=5)` > > > becomes `15`. > > Also, `np.min([5], initializer=0)` becomes `0`, > > so > > > it isn?t really > the default value, it?s the initial value among > > > which the reduction > is performed. > > This was the reason to > > call > > > it > > > initializer in the first place. I like > `initial` and > > > `initial_value` > > > as well, and `start` also makes sense > but isn?t descriptive > > enough. > > > > > Hameer > Sent from Astro for Mac > > > On Mar 26, 2018 at > > 12:06, > > > > > > Sebastian Berg > t> wrote: > > > > > > > Initializer or this sounds fine to me. As an other data point > > which > > > > > I > > think has been mentioned before, `sum` uses start and > > min/max > > > > > > use > > default. `start` does not work, unless we also change the > > > code > > > to > > always use the identity if given (currently that is not > > the > > > case), > > in > > which case it might be nice. However, "start" > > seems > > > a bit like > > solving > > a different issue in any case. > > > > > > > Anyway, mostly noise. I really like adding this, the only thing > > > > > > > worth > > discussing a bit is the name :). - Sebastian > > > > > > > > On > > > Mon, 2018-03-26 at 05:57 -0400, Hameer Abbasi wrote: > > > It > > calls > > > it > > > `initializer` - See https://docs.python.org/3.5/libra > > > ry/f > > > > > > > > > > > unctools.html#functools.reduce > > > > > > Sent from Astro for > > Mac On > > > Mar 26, 2018 at 09:54, Eric Wieser > > > com> > > > > > > > wrote: > > > > > > > > It turns out I mispoke - > > > > > > functools.reduce calls the argument > > > > `initial` > > > > > > > > > > > > On > > > Mon, 26 Mar 2018 at 00:17 Stephan Hoyer > > > > > > wrote: > > > > > This looks like a very logical addition to the > > > reduce > > > interface. > > > > > It has my support! > > > > > > > > > I would > > > have > > > preferred the more descriptive name > > > > > "initial_value", > > > > > > > > > > > > > but consistency with functools.reduce makes a compelling case > > > > > > > > > > for > > > > > "initializer". > > > > > On Sun, Mar 25, 2018 > > at > > > > > > 1:15 PM Eric Wieser > > > > ail.com> > > wrote: > > > To reiterate my comments in the issue - I'm in favor of > this. > > > > > > > > > > > > > > > > > It seems seem especially valuable for identity- > > less > > > > > > > > > > > > > > > > > > functions > > > > (`min`, `max`, `lcm`), and the > > argument > > > > > > name is consistent > > > with > `functools.reduce`. too. > > > > > > > > > > > > > > > > > > > > The only argument I can see against merging this would > > be > > > > > > > > > `kwarg`-creep of `reduce`, and I think this has enough > > use > > > > > > > > > > > > > > > > > > > > > cases to justify that. > > > > > > > > > > > > I'd like to > > > > > > merge > > > > > > in a few days, if no one else has any > > > > > > opinions. > > > > > > > > > > > Eric > > > > > > > > > > > > On Fri, 16 Mar 2018 at 10:13 > > Hameer > > > > > > Abbasi > > > > > @gma > > > > > > il.com> > > wrote: > > > Hello, everyone. I?ve submitted a PR to add a initializer kwarg > > to > > > ufunc.reduce. This is useful in a few cases, e.g., > > > > > > > > > it > > > allows one to supply a ?default? value for identity- > > > > > > > > > > > > less > > > > > > > ufunc reductions, and specify an initial value > > for > > > > > > > > > > reductions such as sum (other than zero.) > > > > > > > > > > > > > > > > > > > > > > > > > Please feel free to review or leave feedback, (although > > I > > > > > > > > > > > > > > > think Eric and Marten have picked it apart pretty well). > > > > > > > > > > > > > > > > > > > > https://github.com/numpy/numpy/pull/10635 > > > > > Thanks, > > > > > > > > > > > > Hameer > > > > Sent from Astro for Mac > > > > > > > > _______________________________________________ > > > NumPy- > > > Discussion > > > mailing list > > > > > > > NumPy-Discussion at python.org > > > > > > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > > > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at python > > .o > > > rg > > > https://mail.python.org/mailman/listinfo/numpy-discussi on > > > _______________________________________________ > > > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.o > > rg > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > > > > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > > NumPy- > > > Discussion mailing list > > > NumPy-Discussion at python.org > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > NumPy- > > Discussion > > > mailing list > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy- > > Discussion > > > mailing list > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ NumPy-Discussion > > > mailing list NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ NumPy-Discussion > > > mailing list NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ NumPy-Discussion > > > mailing list NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ NumPy-Discussion > > > mailing list NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion________ > > _______________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From ben.v.root at gmail.com Mon Mar 26 14:24:27 2018 From: ben.v.root at gmail.com (Benjamin Root) Date: Mon, 26 Mar 2018 14:24:27 -0400 Subject: [Numpy-discussion] Right way to do fancy indexing from argsort() result? Message-ID: I seem to be losing my mind... I can't seem to get this to work right. I have a (N, k) array `distances` (along with a bunch of other arrays of the same shape). I need to resort the rows, so I do: indexs = np.argsort(distances, axis=1) How do I use this index array correctly to get back distances sorted along rows? Note, telling me to use `np.sort()` isn't going to work because I need to apply the same indexing to a couple of other arrays. new_dists = distances[indexs] gives me a (N, k, k) array, while new_dists = np.take(indexs, axis=1) gives me a (N, N, k) array. What am I missing? Thanks! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Mar 26 14:28:53 2018 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 26 Mar 2018 11:28:53 -0700 Subject: [Numpy-discussion] Right way to do fancy indexing from argsort() result? In-Reply-To: References: Message-ID: On Mon, Mar 26, 2018 at 11:24 AM, Benjamin Root wrote: > > I seem to be losing my mind... I can't seem to get this to work right. > > I have a (N, k) array `distances` (along with a bunch of other arrays of the same shape). I need to resort the rows, so I do: > > indexs = np.argsort(distances, axis=1) > > How do I use this index array correctly to get back distances sorted along rows? Note, telling me to use `np.sort()` isn't going to work because I need to apply the same indexing to a couple of other arrays. > > new_dists = distances[indexs] > > gives me a (N, k, k) array, while > > new_dists = np.take(indexs, axis=1) > > gives me a (N, N, k) array. > > What am I missing? Broadcasting! new_dists = distances[np.arange(N)[:, np.newaxis], indexs] -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Mon Mar 26 14:34:21 2018 From: ben.v.root at gmail.com (Benjamin Root) Date: Mon, 26 Mar 2018 14:34:21 -0400 Subject: [Numpy-discussion] Right way to do fancy indexing from argsort() result? In-Reply-To: References: Message-ID: Ah, yes, I should have thought about that. Kind of seems like something that we could make `np.take()` do, somehow, for something that is easier to read. Thank you! Ben Root On Mon, Mar 26, 2018 at 2:28 PM, Robert Kern wrote: > On Mon, Mar 26, 2018 at 11:24 AM, Benjamin Root > wrote: > > > > I seem to be losing my mind... I can't seem to get this to work right. > > > > I have a (N, k) array `distances` (along with a bunch of other arrays of > the same shape). I need to resort the rows, so I do: > > > > indexs = np.argsort(distances, axis=1) > > > > How do I use this index array correctly to get back distances sorted > along rows? Note, telling me to use `np.sort()` isn't going to work because > I need to apply the same indexing to a couple of other arrays. > > > > new_dists = distances[indexs] > > > > gives me a (N, k, k) array, while > > > > new_dists = np.take(indexs, axis=1) > > > > gives me a (N, N, k) array. > > > > What am I missing? > > Broadcasting! > > new_dists = distances[np.arange(N)[:, np.newaxis], indexs] > > -- > Robert Kern > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Mon Mar 26 14:36:36 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Mon, 26 Mar 2018 18:36:36 +0000 Subject: [Numpy-discussion] Right way to do fancy indexing from argsort() result? In-Reply-To: References: Message-ID: https://github.com/numpy/numpy/issues/8708 is a proposal to add such a function, with an implementation in https://github.com/numpy/numpy/pull/8714 Eric On Mon, 26 Mar 2018 at 11:35 Benjamin Root wrote: > Ah, yes, I should have thought about that. Kind of seems like something > that we could make `np.take()` do, somehow, for something that is easier to > read. > > Thank you! > Ben Root > > > On Mon, Mar 26, 2018 at 2:28 PM, Robert Kern > wrote: > >> On Mon, Mar 26, 2018 at 11:24 AM, Benjamin Root >> wrote: >> > >> > I seem to be losing my mind... I can't seem to get this to work right. >> > >> > I have a (N, k) array `distances` (along with a bunch of other arrays >> of the same shape). I need to resort the rows, so I do: >> > >> > indexs = np.argsort(distances, axis=1) >> > >> > How do I use this index array correctly to get back distances sorted >> along rows? Note, telling me to use `np.sort()` isn't going to work because >> I need to apply the same indexing to a couple of other arrays. >> > >> > new_dists = distances[indexs] >> > >> > gives me a (N, k, k) array, while >> > >> > new_dists = np.take(indexs, axis=1) >> > >> > gives me a (N, N, k) array. >> > >> > What am I missing? >> >> Broadcasting! >> >> new_dists = distances[np.arange(N)[:, np.newaxis], indexs] >> >> -- >> Robert Kern >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Mar 26 21:24:49 2018 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 26 Mar 2018 18:24:49 -0700 Subject: [Numpy-discussion] round(numpy.float64(0.0)) is a numpy.float64 In-Reply-To: References: <422941419.2737564.1521718689632.JavaMail.zimbra@laposte.net> Message-ID: Even knowing that, it's still confusing that round(np.float64(0.0)) isn't the same as round(0.0). The reason is a Python 2 / Python 3 thing: in Python 2, round returns a float, while on Python 3, it returns an integer ? but numpy still uses the python 2 behavior everywhere. I'm not sure if it's possible or worthwhile to change this. If we'd changed it when we first added python 3 support then it would have been easy (and obviously a good idea), but at this point it might be tricky? -n On Thu, Mar 22, 2018 at 12:32 PM, Nathan Goldbaum wrote: > numpy.float is an alias to the python float builtin. > > https://github.com/numpy/numpy/issues/3998 > > > On Thu, Mar 22, 2018 at 2:26 PM Olivier wrote: >> >> Hello, >> >> >> Is it normal, expected and desired that : >> >> >> round(numpy.float64(0.0)) is a numpy.float64 >> >> >> while >> >> round(numpy.float(0.0)) is an integer? >> >> >> I find it disturbing and misleading. What do you think? Has it already >> been >> discussed somewhere else? >> >> >> Best regards, >> >> >> Olivier >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Mon Mar 26 21:28:39 2018 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 26 Mar 2018 18:28:39 -0700 Subject: [Numpy-discussion] round(numpy.float64(0.0)) is a numpy.float64 In-Reply-To: References: <422941419.2737564.1521718689632.JavaMail.zimbra@laposte.net> Message-ID: On Mon, Mar 26, 2018 at 6:24 PM, Nathaniel Smith wrote: > Even knowing that, it's still confusing that round(np.float64(0.0)) > isn't the same as round(0.0). The reason is a Python 2 / Python 3 > thing: in Python 2, round returns a float, while on Python 3, it > returns an integer ? but numpy still uses the python 2 behavior > everywhere. > > I'm not sure if it's possible or worthwhile to change this. If we'd > changed it when we first added python 3 support then it would have > been easy (and obviously a good idea), but at this point it might be > tricky? Oh right, and I forgot: part of the reason it's tricky is that it really would have to return a Python 'int', *not* any of numpy's integer types, because floats have a much larger range than numpy integers, e.g.: In [4]: round(1e50) Out[4]: 100000000000000007629769841091887003294964970946560 In [5]: round(np.float64(1e50)) Out[5]: 1e+50 In [6]: np.uint64(round(np.float64(1e50))) Out[6]: 0 (Actually that last case illustrates another weird inconsistency: np.uint64(1e50) -> OverflowError, but np.uint64(np.float64(1e50)) -> 0. I have no idea what's going on there.) -n -- Nathaniel J. Smith -- https://vorpus.org From robert.kern at gmail.com Mon Mar 26 22:29:10 2018 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 26 Mar 2018 19:29:10 -0700 Subject: [Numpy-discussion] round(numpy.float64(0.0)) is a numpy.float64 In-Reply-To: References: <422941419.2737564.1521718689632.JavaMail.zimbra@laposte.net> Message-ID: On Mon, Mar 26, 2018 at 6:28 PM, Nathaniel Smith wrote: > > On Mon, Mar 26, 2018 at 6:24 PM, Nathaniel Smith wrote: > > Even knowing that, it's still confusing that round(np.float64(0.0)) > > isn't the same as round(0.0). The reason is a Python 2 / Python 3 > > thing: in Python 2, round returns a float, while on Python 3, it > > returns an integer ? but numpy still uses the python 2 behavior > > everywhere. > > > > I'm not sure if it's possible or worthwhile to change this. If we'd > > changed it when we first added python 3 support then it would have > > been easy (and obviously a good idea), but at this point it might be > > tricky? > > Oh right, and I forgot: part of the reason it's tricky is that it > really would have to return a Python 'int', *not* any of numpy's > integer types, because floats have a much larger range than numpy > integers, e.g.: I don't think that's the tricky part. We don't have to change anything but our implementation of Python 3's __round__() special method for np.generic scalar types, which would be straightforward. The only issue, besides backwards compatibility, is that it would introduce a new inconsistency between scalars and arrays (which can't use the Python ints). However, that's "paid for" by the increased compatibility with the rest of Python. For a special method that is used for to interoperate with a Python builtin function, that's probably the more important consistency to worry about. As for the backwards compatibility concern, I don't think it would matter much. Everyone who has written code that expects round(np.float64(...)) to return a np.float64 is probably already wrapping that with int() anyways. Anyone who really wants to keep the scalar type of the output same as the input can use np.around(). -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Mar 27 01:03:43 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 27 Mar 2018 01:03:43 -0400 Subject: [Numpy-discussion] round(numpy.float64(0.0)) is a numpy.float64 In-Reply-To: References: <422941419.2737564.1521718689632.JavaMail.zimbra@laposte.net> Message-ID: On Mon, Mar 26, 2018 at 10:29 PM, Robert Kern wrote: > On Mon, Mar 26, 2018 at 6:28 PM, Nathaniel Smith wrote: >> >> On Mon, Mar 26, 2018 at 6:24 PM, Nathaniel Smith wrote: >> > Even knowing that, it's still confusing that round(np.float64(0.0)) >> > isn't the same as round(0.0). The reason is a Python 2 / Python 3 >> > thing: in Python 2, round returns a float, while on Python 3, it >> > returns an integer ? but numpy still uses the python 2 behavior >> > everywhere. >> > >> > I'm not sure if it's possible or worthwhile to change this. If we'd >> > changed it when we first added python 3 support then it would have >> > been easy (and obviously a good idea), but at this point it might be >> > tricky? >> >> Oh right, and I forgot: part of the reason it's tricky is that it >> really would have to return a Python 'int', *not* any of numpy's >> integer types, because floats have a much larger range than numpy >> integers, e.g.: > > I don't think that's the tricky part. We don't have to change anything but > our implementation of Python 3's __round__() special method for np.generic > scalar types, which would be straightforward. The only issue, besides > backwards compatibility, is that it would introduce a new inconsistency > between scalars and arrays (which can't use the Python ints). However, > that's "paid for" by the increased compatibility with the rest of Python. > For a special method that is used for to interoperate with a Python builtin > function, that's probably the more important consistency to worry about. > > As for the backwards compatibility concern, I don't think it would matter > much. Everyone who has written code that expects round(np.float64(...)) to > return a np.float64 is probably already wrapping that with int() anyways. > Anyone who really wants to keep the scalar type of the output same as the > input can use np.around(). same would need to apply for ceil, floor, trunc, I guess. However, np.round has a decimal argument that I use pretty often and that needs to return a float >>> np.round(5.33333, 2) 5.3300000000000001 Python makes the return type conditional on whether ndigits is used or not AFAICS. >>> round(5.33333, 0) 5.0 >>> round(5.33333) 5 (I'm currently using Python 3.4.4) Josef > > -- > Robert Kern > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Tue Mar 27 01:52:35 2018 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 26 Mar 2018 22:52:35 -0700 Subject: [Numpy-discussion] round(numpy.float64(0.0)) is a numpy.float64 In-Reply-To: References: <422941419.2737564.1521718689632.JavaMail.zimbra@laposte.net> Message-ID: On Mon, Mar 26, 2018 at 10:03 PM, wrote: > > On Mon, Mar 26, 2018 at 10:29 PM, Robert Kern wrote: > > On Mon, Mar 26, 2018 at 6:28 PM, Nathaniel Smith wrote: > >> > >> On Mon, Mar 26, 2018 at 6:24 PM, Nathaniel Smith wrote: > >> > Even knowing that, it's still confusing that round(np.float64(0.0)) > >> > isn't the same as round(0.0). The reason is a Python 2 / Python 3 > >> > thing: in Python 2, round returns a float, while on Python 3, it > >> > returns an integer ? but numpy still uses the python 2 behavior > >> > everywhere. > >> > > >> > I'm not sure if it's possible or worthwhile to change this. If we'd > >> > changed it when we first added python 3 support then it would have > >> > been easy (and obviously a good idea), but at this point it might be > >> > tricky? > >> > >> Oh right, and I forgot: part of the reason it's tricky is that it > >> really would have to return a Python 'int', *not* any of numpy's > >> integer types, because floats have a much larger range than numpy > >> integers, e.g.: > > > > I don't think that's the tricky part. We don't have to change anything but > > our implementation of Python 3's __round__() special method for np.generic > > scalar types, which would be straightforward. The only issue, besides > > backwards compatibility, is that it would introduce a new inconsistency > > between scalars and arrays (which can't use the Python ints). However, > > that's "paid for" by the increased compatibility with the rest of Python. > > For a special method that is used for to interoperate with a Python builtin > > function, that's probably the more important consistency to worry about. > > > > As for the backwards compatibility concern, I don't think it would matter > > much. Everyone who has written code that expects round(np.float64(...)) to > > return a np.float64 is probably already wrapping that with int() anyways. > > Anyone who really wants to keep the scalar type of the output same as the > > input can use np.around(). > > same would need to apply for ceil, floor, trunc, I guess. ceil and floor don't have __special__ methods for them; math.ceil() and math.floor() do not defer their implementation to the type. math.trunc() might (there is a __trunc__), but it looks like math.trunc(np.float64(...)) already returns an int. I'm not suggesting changing np.ceil(), np.floor(), etc. Nor am I suggesting that we change np.around(), np.round(), or the .round() method on scalar types. Only .__round__(). > However, np.round has a decimal argument that I use pretty often and > that needs to return a float > > >>> np.round(5.33333, 2) > 5.3300000000000001 > > Python makes the return type conditional on whether ndigits is used or not > AFAICS. > >>> round(5.33333, 0) > 5.0 > >>> round(5.33333) > 5 Sorry, I took that as a given. If someone followed my suggestion to implement np.generic.__round__, yes, they I intended that they handle both cases correctly. But also, to reiterate, I'm not suggesting that we change np.round(). Only the behavior of numpy scalar types under the builtin round() function. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From Catherine.M.Moroney at jpl.nasa.gov Wed Mar 28 20:56:12 2018 From: Catherine.M.Moroney at jpl.nasa.gov (Moroney, Catherine M (398E)) Date: Thu, 29 Mar 2018 00:56:12 +0000 Subject: [Numpy-discussion] best way of speeding up a filtering-like algorithm Message-ID: <43BD456C-B5E0-4203-B069-13A49A53E5F6@jpl.nasa.gov> Hello, I have the following sample code (pretty simple algorithm that uses a rolling filter window) and am wondering what the best way is of speeding it up. I tried rewriting it in Cython by pre-declaring the variables but that didn?t buy me a lot of time. Then I rewrote it in Fortran (and compiled it with f2py) and now it?s lightning fast. But I would still like to know if I could rewrite it in pure python/numpy/scipy or in Cython and get a similar speedup. Here is the raw Python code: def mixed_coastline_slow(nsidc, radius, count, mask=None): nsidc_copy = numpy.copy(nsidc) if (mask is None): idx_coastline = numpy.where(nsidc_copy == NSIDC_COASTLINE_MIXED) else: idx_coastline = numpy.where(mask & (nsidc_copy == NSIDC_COASTLINE_MIXED)) for (irow0, icol0) in zip(idx_coastline[0], idx_coastline[1]): rows = ( max(irow0-radius, 0), min(irow0+radius+1, nsidc_copy.shape[0]) ) cols = ( max(icol0-radius, 0), min(icol0+radius+1, nsidc_copy.shape[1]) ) window = nsidc[rows[0]:rows[1], cols[0]:cols[1]] npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, False).sum() nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= NSIDC_FRESHSNOW), \ True, False).sum() if (100.0*nsnowice/npoints >= count): nsidc_copy[irow0, icol0] = MISR_SEAICE_THRESHOLD return nsidc_copy and here is my attempt at Cython-izing it: import numpy cimport numpy as cnumpy cimport cython cdef int NSIDC_SIZE = 721 cdef int NSIDC_NO_SNOW = 0 cdef int NSIDC_ALL_SNOW = 100 cdef int NSIDC_FRESHSNOW = 103 cdef int NSIDC_PERMSNOW = 101 cdef int NSIDC_SEAICE_LOW = 1 cdef int NSIDC_SEAICE_HIGH = 100 cdef int NSIDC_COASTLINE_MIXED = 252 cdef int NSIDC_SUSPECT_ICE = 253 cdef int MISR_SEAICE_THRESHOLD = 6 def mixed_coastline(cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc, int radius, int count): cdef int irow, icol, irow1, irow2, icol1, icol2, npoints, nsnowice cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc2 \ = numpy.empty(shape=(NSIDC_SIZE, NSIDC_SIZE), dtype=numpy.uint8) cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] window \ = numpy.empty(shape=(2*radius+1, 2*radius+1), dtype=numpy.uint8) nsidc2 = numpy.copy(nsidc) idx_coastline = numpy.where(nsidc2 == NSIDC_COASTLINE_MIXED) for (irow, icol) in zip(idx_coastline[0], idx_coastline[1]): irow1 = max(irow-radius, 0) irow2 = min(irow+radius+1, NSIDC_SIZE) icol1 = max(icol-radius, 0) icol2 = min(icol+radius+1, NSIDC_SIZE) window = nsidc[irow1:irow2, icol1:icol2] npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, False).sum() nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= NSIDC_FRESHSNOW), \ True, False).sum() if (100.0*nsnowice/npoints >= count): nsidc2[irow, icol] = MISR_SEAICE_THRESHOLD return nsidc2 Thanks in advance for any advice! Catherine -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Wed Mar 28 21:43:33 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Thu, 29 Mar 2018 01:43:33 +0000 Subject: [Numpy-discussion] best way of speeding up a filtering-like algorithm In-Reply-To: <43BD456C-B5E0-4203-B069-13A49A53E5F6@jpl.nasa.gov> References: <43BD456C-B5E0-4203-B069-13A49A53E5F6@jpl.nasa.gov> Message-ID: Well, one tip to start with: numpy.where(some_comparison, True, False) is the same as but slower than some_comparison Eric On Wed, 28 Mar 2018 at 18:36 Moroney, Catherine M (398E) < Catherine.M.Moroney at jpl.nasa.gov> wrote: > Hello, > > > > I have the following sample code (pretty simple algorithm that uses a > rolling filter window) and am wondering what the best way is of speeding it > up. I tried rewriting it in Cython by pre-declaring the variables but that > didn?t buy me a lot of time. Then I rewrote it in Fortran (and compiled it > with f2py) and now it?s lightning fast. But I would still like to know if > I could rewrite it in pure python/numpy/scipy or in Cython and get a > similar speedup. > > > > Here is the raw Python code: > > > > def mixed_coastline_slow(nsidc, radius, count, mask=None): > > > > nsidc_copy = numpy.copy(nsidc) > > > > if (mask is None): > > idx_coastline = numpy.where(nsidc_copy == NSIDC_COASTLINE_MIXED) > > else: > > idx_coastline = numpy.where(mask & (nsidc_copy == > NSIDC_COASTLINE_MIXED)) > > > > for (irow0, icol0) in zip(idx_coastline[0], idx_coastline[1]): > > > > rows = ( max(irow0-radius, 0), min(irow0+radius+1, > nsidc_copy.shape[0]) ) > > cols = ( max(icol0-radius, 0), min(icol0+radius+1, > nsidc_copy.shape[1]) ) > > window = nsidc[rows[0]:rows[1], cols[0]:cols[1]] > > > > npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, > False).sum() > > nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= > NSIDC_FRESHSNOW), \ > > True, False).sum() > > > > if (100.0*nsnowice/npoints >= count): > > nsidc_copy[irow0, icol0] = MISR_SEAICE_THRESHOLD > > > > return nsidc_copy > > > > and here is my attempt at Cython-izing it: > > > > import numpy > > cimport numpy as cnumpy > > cimport cython > > > > cdef int NSIDC_SIZE = 721 > > cdef int NSIDC_NO_SNOW = 0 > > cdef int NSIDC_ALL_SNOW = 100 > > cdef int NSIDC_FRESHSNOW = 103 > > cdef int NSIDC_PERMSNOW = 101 > > cdef int NSIDC_SEAICE_LOW = 1 > > cdef int NSIDC_SEAICE_HIGH = 100 > > cdef int NSIDC_COASTLINE_MIXED = 252 > > cdef int NSIDC_SUSPECT_ICE = 253 > > > > cdef int MISR_SEAICE_THRESHOLD = 6 > > > > def mixed_coastline(cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc, int > radius, int count): > > > > cdef int irow, icol, irow1, irow2, icol1, icol2, npoints, nsnowice > > cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc2 \ > > = numpy.empty(shape=(NSIDC_SIZE, NSIDC_SIZE), dtype=numpy.uint8) > > cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] window \ > > = numpy.empty(shape=(2*radius+1, 2*radius+1), dtype=numpy.uint8) > > > > nsidc2 = numpy.copy(nsidc) > > > > idx_coastline = numpy.where(nsidc2 == NSIDC_COASTLINE_MIXED) > > > > for (irow, icol) in zip(idx_coastline[0], idx_coastline[1]): > > > > irow1 = max(irow-radius, 0) > > irow2 = min(irow+radius+1, NSIDC_SIZE) > > icol1 = max(icol-radius, 0) > > icol2 = min(icol+radius+1, NSIDC_SIZE) > > window = nsidc[irow1:irow2, icol1:icol2] > > > > npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, > False).sum() > > nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window > <= NSIDC_FRESHSNOW), \ > > True, False).sum() > > > > if (100.0*nsnowice/npoints >= count): > > nsidc2[irow, icol] = MISR_SEAICE_THRESHOLD > > > > return nsidc2 > > > > Thanks in advance for any advice! > > > > Catherine > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfoxrabinovitz at gmail.com Thu Mar 29 00:10:08 2018 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Thu, 29 Mar 2018 00:10:08 -0400 Subject: [Numpy-discussion] best way of speeding up a filtering-like algorithm In-Reply-To: References: <43BD456C-B5E0-4203-B069-13A49A53E5F6@jpl.nasa.gov> Message-ID: It looks like you are creating a coastline mask (or a coastline mask + some other mask), and computing the ratio of two quantities in a particular window around each point. If your coastline covers a sufficiently large portion of the image, you may get quite a bit of mileage using an efficient convolution instead of summing the windows directly. For example, you could use scipy.signal.convolve2d with inputs being (nsidc_copy != NSIDC_COASTLINE_MIXED), (nsidc_copy == NSIDC_SEAICE_LOW & nsdic_copy == NSIDC_FRESHSNOW) for the frst array, and a (2*radius x 2*radius) array of ones for the second. You may have to center the block of ones in an array of zeros the same size as nsdic_copy, but I am not sure about that. Another option you may want to try is implementing your window movement more efficiently. If you step your window center along using an algorithm like flood-fill, you can insure that there will be very large overlap between successive steps (even if there is a break in the coastline). That means that you can reuse most of the data you've extracted. You will only need to subtract off the non-overlapping portion of the previous window and add in the non-overlapping portion of the updated window. If radius is 16, giving you a 32x32 window, you go from summing ~1000 pixels per quantity of interest, to summing only ~120 if the window moves along a diagonal, and only 64 if it moves vertically or horizontally. While an algorithm like this will probably give you the greatest boost, it is a pain to implement. If I had to guess, this looks like L2 processing for a multi-spectral instrument. If you don't mind me asking, what mission is this for? I'm working on space-looking detectors at the moment, but have spent many years on the L0, L1b and L1 portions of the GOES-R ground system. - Joe On Wed, Mar 28, 2018 at 9:43 PM, Eric Wieser wrote: > Well, one tip to start with: > > numpy.where(some_comparison, True, False) > > is the same as but slower than > > some_comparison > > Eric > > On Wed, 28 Mar 2018 at 18:36 Moroney, Catherine M (398E) > wrote: >> >> Hello, >> >> >> >> I have the following sample code (pretty simple algorithm that uses a >> rolling filter window) and am wondering what the best way is of speeding it >> up. I tried rewriting it in Cython by pre-declaring the variables but that >> didn?t buy me a lot of time. Then I rewrote it in Fortran (and compiled it >> with f2py) and now it?s lightning fast. But I would still like to know if I >> could rewrite it in pure python/numpy/scipy or in Cython and get a similar >> speedup. >> >> >> >> Here is the raw Python code: >> >> >> >> def mixed_coastline_slow(nsidc, radius, count, mask=None): >> >> >> >> nsidc_copy = numpy.copy(nsidc) >> >> >> >> if (mask is None): >> >> idx_coastline = numpy.where(nsidc_copy == NSIDC_COASTLINE_MIXED) >> >> else: >> >> idx_coastline = numpy.where(mask & (nsidc_copy == >> NSIDC_COASTLINE_MIXED)) >> >> >> >> for (irow0, icol0) in zip(idx_coastline[0], idx_coastline[1]): >> >> >> >> rows = ( max(irow0-radius, 0), min(irow0+radius+1, >> nsidc_copy.shape[0]) ) >> >> cols = ( max(icol0-radius, 0), min(icol0+radius+1, >> nsidc_copy.shape[1]) ) >> >> window = nsidc[rows[0]:rows[1], cols[0]:cols[1]] >> >> >> >> npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, >> False).sum() >> >> nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= >> NSIDC_FRESHSNOW), \ >> >> True, False).sum() >> >> >> >> if (100.0*nsnowice/npoints >= count): >> >> nsidc_copy[irow0, icol0] = MISR_SEAICE_THRESHOLD >> >> >> >> return nsidc_copy >> >> >> >> and here is my attempt at Cython-izing it: >> >> >> >> import numpy >> >> cimport numpy as cnumpy >> >> cimport cython >> >> >> >> cdef int NSIDC_SIZE = 721 >> >> cdef int NSIDC_NO_SNOW = 0 >> >> cdef int NSIDC_ALL_SNOW = 100 >> >> cdef int NSIDC_FRESHSNOW = 103 >> >> cdef int NSIDC_PERMSNOW = 101 >> >> cdef int NSIDC_SEAICE_LOW = 1 >> >> cdef int NSIDC_SEAICE_HIGH = 100 >> >> cdef int NSIDC_COASTLINE_MIXED = 252 >> >> cdef int NSIDC_SUSPECT_ICE = 253 >> >> >> >> cdef int MISR_SEAICE_THRESHOLD = 6 >> >> >> >> def mixed_coastline(cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc, int >> radius, int count): >> >> >> >> cdef int irow, icol, irow1, irow2, icol1, icol2, npoints, nsnowice >> >> cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc2 \ >> >> = numpy.empty(shape=(NSIDC_SIZE, NSIDC_SIZE), dtype=numpy.uint8) >> >> cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] window \ >> >> = numpy.empty(shape=(2*radius+1, 2*radius+1), dtype=numpy.uint8) >> >> >> >> nsidc2 = numpy.copy(nsidc) >> >> >> >> idx_coastline = numpy.where(nsidc2 == NSIDC_COASTLINE_MIXED) >> >> >> >> for (irow, icol) in zip(idx_coastline[0], idx_coastline[1]): >> >> >> >> irow1 = max(irow-radius, 0) >> >> irow2 = min(irow+radius+1, NSIDC_SIZE) >> >> icol1 = max(icol-radius, 0) >> >> icol2 = min(icol+radius+1, NSIDC_SIZE) >> >> window = nsidc[irow1:irow2, icol1:icol2] >> >> >> >> npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, >> False).sum() >> >> nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window >> <= NSIDC_FRESHSNOW), \ >> >> True, False).sum() >> >> >> >> if (100.0*nsnowice/npoints >= count): >> >> nsidc2[irow, icol] = MISR_SEAICE_THRESHOLD >> >> >> >> return nsidc2 >> >> >> >> Thanks in advance for any advice! >> >> >> >> Catherine >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From jfoxrabinovitz at gmail.com Thu Mar 29 02:31:22 2018 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Thu, 29 Mar 2018 02:31:22 -0400 Subject: [Numpy-discussion] PR adding support for object arrays to np.isinf, np.isnan, np.isfinite Message-ID: I have opened PR #10820 to add support for `dtype=object` to `np.isinf`, `np.isnan`, `np.isfinite`. The PR is a fairly minor change, but I would like to make sure that I understand at least the basics of ufuncs before I start adding support for datetimes and timedeltas to `np.isfinite` and eventually to `np.histogram`. I have left a few comments in areas I am not sure about, and would greatly appreciate feedback, even if the PR is not found suitable for merging. With this PR, object arrays containing any numerical or simulated numerical types (implementing `__float__` or `__complex__` methods) are processed as would be expected. While working on PR, I came up with two questions for the gurus: 1. Am I correct in understanding that `isinf`, `isnan` and `isfinite` currently cast integer inputs to float to process them? Why are integer inputs not optimized to return arrays of all False, False, True, respectively for those functions? 2. Why are `isneginf` and `isposinf` not ufuncs? Is there any reason not to make them ufuncs (besides the renaming of the `y` parameter to `out`, which technically breaks some backward compatibility)? Regards, - Joe From stuart at stuartreynolds.net Thu Mar 29 11:14:16 2018 From: stuart at stuartreynolds.net (Stuart Reynolds) Date: Thu, 29 Mar 2018 15:14:16 +0000 Subject: [Numpy-discussion] best way of speeding up a filtering-like algorithm In-Reply-To: References: <43BD456C-B5E0-4203-B069-13A49A53E5F6@jpl.nasa.gov> Message-ID: Install snakeviz to visualize what?s taking all the time. You might want to check out numba.jit(nopython) for optimizing specific sections. On Wed, Mar 28, 2018 at 9:10 PM Joseph Fox-Rabinovitz < jfoxrabinovitz at gmail.com> wrote: > It looks like you are creating a coastline mask (or a coastline mask + > some other mask), and computing the ratio of two quantities in a > particular window around each point. If your coastline covers a > sufficiently large portion of the image, you may get quite a bit of > mileage using an efficient convolution instead of summing the windows > directly. For example, you could use scipy.signal.convolve2d with > inputs being (nsidc_copy != NSIDC_COASTLINE_MIXED), (nsidc_copy == > NSIDC_SEAICE_LOW & nsdic_copy == NSIDC_FRESHSNOW) for the frst array, > and a (2*radius x 2*radius) array of ones for the second. You may have > to center the block of ones in an array of zeros the same size as > nsdic_copy, but I am not sure about that. > > Another option you may want to try is implementing your window > movement more efficiently. If you step your window center along using > an algorithm like flood-fill, you can insure that there will be very > large overlap between successive steps (even if there is a break in > the coastline). That means that you can reuse most of the data you've > extracted. You will only need to subtract off the non-overlapping > portion of the previous window and add in the non-overlapping portion > of the updated window. If radius is 16, giving you a 32x32 window, you > go from summing ~1000 pixels per quantity of interest, to summing only > ~120 if the window moves along a diagonal, and only 64 if it moves > vertically or horizontally. While an algorithm like this will probably > give you the greatest boost, it is a pain to implement. > > If I had to guess, this looks like L2 processing for a multi-spectral > instrument. If you don't mind me asking, what mission is this for? I'm > working on space-looking detectors at the moment, but have spent many > years on the L0, L1b and L1 portions of the GOES-R ground system. > > - Joe > > On Wed, Mar 28, 2018 at 9:43 PM, Eric Wieser > wrote: > > Well, one tip to start with: > > > > numpy.where(some_comparison, True, False) > > > > is the same as but slower than > > > > some_comparison > > > > Eric > > > > On Wed, 28 Mar 2018 at 18:36 Moroney, Catherine M (398E) > > wrote: > >> > >> Hello, > >> > >> > >> > >> I have the following sample code (pretty simple algorithm that uses a > >> rolling filter window) and am wondering what the best way is of > speeding it > >> up. I tried rewriting it in Cython by pre-declaring the variables but > that > >> didn?t buy me a lot of time. Then I rewrote it in Fortran (and > compiled it > >> with f2py) and now it?s lightning fast. But I would still like to know > if I > >> could rewrite it in pure python/numpy/scipy or in Cython and get a > similar > >> speedup. > >> > >> > >> > >> Here is the raw Python code: > >> > >> > >> > >> def mixed_coastline_slow(nsidc, radius, count, mask=None): > >> > >> > >> > >> nsidc_copy = numpy.copy(nsidc) > >> > >> > >> > >> if (mask is None): > >> > >> idx_coastline = numpy.where(nsidc_copy == NSIDC_COASTLINE_MIXED) > >> > >> else: > >> > >> idx_coastline = numpy.where(mask & (nsidc_copy == > >> NSIDC_COASTLINE_MIXED)) > >> > >> > >> > >> for (irow0, icol0) in zip(idx_coastline[0], idx_coastline[1]): > >> > >> > >> > >> rows = ( max(irow0-radius, 0), min(irow0+radius+1, > >> nsidc_copy.shape[0]) ) > >> > >> cols = ( max(icol0-radius, 0), min(icol0+radius+1, > >> nsidc_copy.shape[1]) ) > >> > >> window = nsidc[rows[0]:rows[1], cols[0]:cols[1]] > >> > >> > >> > >> npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, > >> False).sum() > >> > >> nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window > <= > >> NSIDC_FRESHSNOW), \ > >> > >> True, False).sum() > >> > >> > >> > >> if (100.0*nsnowice/npoints >= count): > >> > >> nsidc_copy[irow0, icol0] = MISR_SEAICE_THRESHOLD > >> > >> > >> > >> return nsidc_copy > >> > >> > >> > >> and here is my attempt at Cython-izing it: > >> > >> > >> > >> import numpy > >> > >> cimport numpy as cnumpy > >> > >> cimport cython > >> > >> > >> > >> cdef int NSIDC_SIZE = 721 > >> > >> cdef int NSIDC_NO_SNOW = 0 > >> > >> cdef int NSIDC_ALL_SNOW = 100 > >> > >> cdef int NSIDC_FRESHSNOW = 103 > >> > >> cdef int NSIDC_PERMSNOW = 101 > >> > >> cdef int NSIDC_SEAICE_LOW = 1 > >> > >> cdef int NSIDC_SEAICE_HIGH = 100 > >> > >> cdef int NSIDC_COASTLINE_MIXED = 252 > >> > >> cdef int NSIDC_SUSPECT_ICE = 253 > >> > >> > >> > >> cdef int MISR_SEAICE_THRESHOLD = 6 > >> > >> > >> > >> def mixed_coastline(cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc, int > >> radius, int count): > >> > >> > >> > >> cdef int irow, icol, irow1, irow2, icol1, icol2, npoints, nsnowice > >> > >> cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc2 \ > >> > >> = numpy.empty(shape=(NSIDC_SIZE, NSIDC_SIZE), dtype=numpy.uint8) > >> > >> cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] window \ > >> > >> = numpy.empty(shape=(2*radius+1, 2*radius+1), dtype=numpy.uint8) > >> > >> > >> > >> nsidc2 = numpy.copy(nsidc) > >> > >> > >> > >> idx_coastline = numpy.where(nsidc2 == NSIDC_COASTLINE_MIXED) > >> > >> > >> > >> for (irow, icol) in zip(idx_coastline[0], idx_coastline[1]): > >> > >> > >> > >> irow1 = max(irow-radius, 0) > >> > >> irow2 = min(irow+radius+1, NSIDC_SIZE) > >> > >> icol1 = max(icol-radius, 0) > >> > >> icol2 = min(icol+radius+1, NSIDC_SIZE) > >> > >> window = nsidc[irow1:irow2, icol1:icol2] > >> > >> > >> > >> npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, > >> False).sum() > >> > >> nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window > >> <= NSIDC_FRESHSNOW), \ > >> > >> True, False).sum() > >> > >> > >> > >> if (100.0*nsnowice/npoints >= count): > >> > >> nsidc2[irow, icol] = MISR_SEAICE_THRESHOLD > >> > >> > >> > >> return nsidc2 > >> > >> > >> > >> Thanks in advance for any advice! > >> > >> > >> > >> Catherine > >> > >> > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at python.org > >> https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Mar 29 13:23:36 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 29 Mar 2018 10:23:36 -0700 Subject: [Numpy-discussion] best way of speeding up a filtering-like algorithm In-Reply-To: <43BD456C-B5E0-4203-B069-13A49A53E5F6@jpl.nasa.gov> References: <43BD456C-B5E0-4203-B069-13A49A53E5F6@jpl.nasa.gov> Message-ID: sorry, not enough time to look closely, but a couple general comments: On Wed, Mar 28, 2018 at 5:56 PM, Moroney, Catherine M (398E) < Catherine.M.Moroney at jpl.nasa.gov> wrote: > I have the following sample code (pretty simple algorithm that uses a > rolling filter window) and am wondering what the best way is of speeding it > up. I tried rewriting it in Cython by pre-declaring the variables but that > didn?t buy me a lot of time. Then I rewrote it in Fortran (and compiled it > with f2py) and now it?s lightning fast. > if done right, Cython should be almost as fast as Fortran, and just as fast if you use the "restrict" correctly (which I hope can be done in Cython): https://en.wikipedia.org/wiki/Pointer_aliasing > But I would still like to know if I could rewrite it in pure > python/numpy/scipy > you can use stride_tricks to make arrays "appear" to be N+1 D, to implement windows without actually duplicating the data, and then use array operations on them. This can buy a lot of speed, but will not be as fast (by a factor of 10 or so) as Cython or Fortran see: https://github.com/PythonCHB/IRIS_Python_Class/blob/master/Numpy/code/filter_example.py for and example in 1D > or in Cython and get a similar speedup. > > see above -- a direct port of your Fortran code to Cython should get you within a factor of two or so of the Fortran, and then using "restrict" to let the compiler know your pointers aren't aliased should get you the reset of the way. Here is an example of a Automatic Gain Control filter in 1D, iplimented in numpy with stride_triks, and C and Cython and Fortran. https://github.com/PythonCHB/IRIS_Python_Class/tree/master/Interfacing_C/agc_example Note that in that example, I never got C or Cython as fast as Fortran -- but I think using "restrict" in the C would do it. HTH, -CHB > > Here is the raw Python code: > > > > def mixed_coastline_slow(nsidc, radius, count, mask=None): > > > > nsidc_copy = numpy.copy(nsidc) > > > > if (mask is None): > > idx_coastline = numpy.where(nsidc_copy == NSIDC_COASTLINE_MIXED) > > else: > > idx_coastline = numpy.where(mask & (nsidc_copy == > NSIDC_COASTLINE_MIXED)) > > > > for (irow0, icol0) in zip(idx_coastline[0], idx_coastline[1]): > > > > rows = ( max(irow0-radius, 0), min(irow0+radius+1, > nsidc_copy.shape[0]) ) > > cols = ( max(icol0-radius, 0), min(icol0+radius+1, > nsidc_copy.shape[1]) ) > > window = nsidc[rows[0]:rows[1], cols[0]:cols[1]] > > > > npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, > False).sum() > > nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= > NSIDC_FRESHSNOW), \ > > True, False).sum() > > > > if (100.0*nsnowice/npoints >= count): > > nsidc_copy[irow0, icol0] = MISR_SEAICE_THRESHOLD > > > > return nsidc_copy > > > > and here is my attempt at Cython-izing it: > > > > import numpy > > cimport numpy as cnumpy > > cimport cython > > > > cdef int NSIDC_SIZE = 721 > > cdef int NSIDC_NO_SNOW = 0 > > cdef int NSIDC_ALL_SNOW = 100 > > cdef int NSIDC_FRESHSNOW = 103 > > cdef int NSIDC_PERMSNOW = 101 > > cdef int NSIDC_SEAICE_LOW = 1 > > cdef int NSIDC_SEAICE_HIGH = 100 > > cdef int NSIDC_COASTLINE_MIXED = 252 > > cdef int NSIDC_SUSPECT_ICE = 253 > > > > cdef int MISR_SEAICE_THRESHOLD = 6 > > > > def mixed_coastline(cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc, int > radius, int count): > > > > cdef int irow, icol, irow1, irow2, icol1, icol2, npoints, nsnowice > > cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc2 \ > > = numpy.empty(shape=(NSIDC_SIZE, NSIDC_SIZE), dtype=numpy.uint8) > > cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] window \ > > = numpy.empty(shape=(2*radius+1, 2*radius+1), dtype=numpy.uint8) > > > > nsidc2 = numpy.copy(nsidc) > > > > idx_coastline = numpy.where(nsidc2 == NSIDC_COASTLINE_MIXED) > > > > for (irow, icol) in zip(idx_coastline[0], idx_coastline[1]): > > > > irow1 = max(irow-radius, 0) > > irow2 = min(irow+radius+1, NSIDC_SIZE) > > icol1 = max(icol-radius, 0) > > icol2 = min(icol+radius+1, NSIDC_SIZE) > > window = nsidc[irow1:irow2, icol1:icol2] > > > > npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, > False).sum() > > nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window > <= NSIDC_FRESHSNOW), \ > > True, False).sum() > > > > if (100.0*nsnowice/npoints >= count): > > nsidc2[irow, icol] = MISR_SEAICE_THRESHOLD > > > > return nsidc2 > > > > Thanks in advance for any advice! > > > > Catherine > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Mar 29 13:26:02 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 29 Mar 2018 10:26:02 -0700 Subject: [Numpy-discussion] best way of speeding up a filtering-like algorithm In-Reply-To: <43BD456C-B5E0-4203-B069-13A49A53E5F6@jpl.nasa.gov> References: <43BD456C-B5E0-4203-B069-13A49A53E5F6@jpl.nasa.gov> Message-ID: one other note: As a rule, using numpy array operations from Cython doesn't buy you much, as you discovered. YOu need to use numpy arrays as n-d containers, and write the loops yourself. You may want to check out numba as another alternative -- it DOES optimize numpy operations. -CHB On Wed, Mar 28, 2018 at 5:56 PM, Moroney, Catherine M (398E) < Catherine.M.Moroney at jpl.nasa.gov> wrote: > Hello, > > > > I have the following sample code (pretty simple algorithm that uses a > rolling filter window) and am wondering what the best way is of speeding it > up. I tried rewriting it in Cython by pre-declaring the variables but that > didn?t buy me a lot of time. Then I rewrote it in Fortran (and compiled it > with f2py) and now it?s lightning fast. But I would still like to know if > I could rewrite it in pure python/numpy/scipy or in Cython and get a > similar speedup. > > > > Here is the raw Python code: > > > > def mixed_coastline_slow(nsidc, radius, count, mask=None): > > > > nsidc_copy = numpy.copy(nsidc) > > > > if (mask is None): > > idx_coastline = numpy.where(nsidc_copy == NSIDC_COASTLINE_MIXED) > > else: > > idx_coastline = numpy.where(mask & (nsidc_copy == > NSIDC_COASTLINE_MIXED)) > > > > for (irow0, icol0) in zip(idx_coastline[0], idx_coastline[1]): > > > > rows = ( max(irow0-radius, 0), min(irow0+radius+1, > nsidc_copy.shape[0]) ) > > cols = ( max(icol0-radius, 0), min(icol0+radius+1, > nsidc_copy.shape[1]) ) > > window = nsidc[rows[0]:rows[1], cols[0]:cols[1]] > > > > npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, > False).sum() > > nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= > NSIDC_FRESHSNOW), \ > > True, False).sum() > > > > if (100.0*nsnowice/npoints >= count): > > nsidc_copy[irow0, icol0] = MISR_SEAICE_THRESHOLD > > > > return nsidc_copy > > > > and here is my attempt at Cython-izing it: > > > > import numpy > > cimport numpy as cnumpy > > cimport cython > > > > cdef int NSIDC_SIZE = 721 > > cdef int NSIDC_NO_SNOW = 0 > > cdef int NSIDC_ALL_SNOW = 100 > > cdef int NSIDC_FRESHSNOW = 103 > > cdef int NSIDC_PERMSNOW = 101 > > cdef int NSIDC_SEAICE_LOW = 1 > > cdef int NSIDC_SEAICE_HIGH = 100 > > cdef int NSIDC_COASTLINE_MIXED = 252 > > cdef int NSIDC_SUSPECT_ICE = 253 > > > > cdef int MISR_SEAICE_THRESHOLD = 6 > > > > def mixed_coastline(cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc, int > radius, int count): > > > > cdef int irow, icol, irow1, irow2, icol1, icol2, npoints, nsnowice > > cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc2 \ > > = numpy.empty(shape=(NSIDC_SIZE, NSIDC_SIZE), dtype=numpy.uint8) > > cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] window \ > > = numpy.empty(shape=(2*radius+1, 2*radius+1), dtype=numpy.uint8) > > > > nsidc2 = numpy.copy(nsidc) > > > > idx_coastline = numpy.where(nsidc2 == NSIDC_COASTLINE_MIXED) > > > > for (irow, icol) in zip(idx_coastline[0], idx_coastline[1]): > > > > irow1 = max(irow-radius, 0) > > irow2 = min(irow+radius+1, NSIDC_SIZE) > > icol1 = max(icol-radius, 0) > > icol2 = min(icol+radius+1, NSIDC_SIZE) > > window = nsidc[irow1:irow2, icol1:icol2] > > > > npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, > False).sum() > > nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window > <= NSIDC_FRESHSNOW), \ > > True, False).sum() > > > > if (100.0*nsnowice/npoints >= count): > > nsidc2[irow, icol] = MISR_SEAICE_THRESHOLD > > > > return nsidc2 > > > > Thanks in advance for any advice! > > > > Catherine > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Thu Mar 29 17:34:36 2018 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 29 Mar 2018 21:34:36 +0000 Subject: [Numpy-discussion] best way of speeding up a filtering-like algorithm In-Reply-To: <43BD456C-B5E0-4203-B069-13A49A53E5F6@jpl.nasa.gov> References: <43BD456C-B5E0-4203-B069-13A49A53E5F6@jpl.nasa.gov> Message-ID: Hi Catherine, One problem with sliding window algorithms is that the straightforward approach can be very inefficient. Ideally you would want to not recompute your windowed quantity from all points in the window, but to reuse the result from an overlapping window and only take into account the points that have changed in the sliding of the window. In your case this can be efficiently done using a summed area table . Consider these two auxiliary functions: def summed_area_table(array): rows, cols = array.shape out = np.zeros((rows + 1, cols + 1), np.intp) np.cumsum(array, axis=0, out=out[1:, 1:]) np.cumsum(out[1:, 1:], axis=1, out=out[1:, 1:]) return out def windowed_sum_from_summed_area_table(array, size): sat = summed_area_table(array) return (sat[:-size, :-size] + sat[size:, size:] - sat[:-size, size:] - sat[size:, -size:]) Using these, you can compute npoints and nsnowice for all points in your input nsidc array as: mask_coastline = nsidc == NSIDC_COASTLINE_MIXED mask_not_coastline = ~mask_coastline mask_snowice = (nsidc >= NSIDC_SEAICE_LOW) & (nsidc <= NSIDC_FRESHSNOW) nsnowice = windowed_sum_from_summed_area_table(mask_snowice, 2*radius + 1) npoints = windowed_sum_from_summed_area_table(mask_not_coastline, 2*radius + 1) >From here it should be more or less straightforward to reproduce the rest of your calculations. As written this code only handles points a distance of at least radius from an array edge. If the edges are important to you, they can also be extracted from the summed area table, but the expressions get ugly: it may be cleaner, even if slower, to pad the masks with zeros before summing them up. Also, if the fraction of points that are in mask_coastline is very small, you may be doing way too many unnecessary calculations. Good luck! Jaime On Thu, Mar 29, 2018 at 3:36 AM Moroney, Catherine M (398E) < Catherine.M.Moroney at jpl.nasa.gov> wrote: > Hello, > > > > I have the following sample code (pretty simple algorithm that uses a > rolling filter window) and am wondering what the best way is of speeding it > up. I tried rewriting it in Cython by pre-declaring the variables but that > didn?t buy me a lot of time. Then I rewrote it in Fortran (and compiled it > with f2py) and now it?s lightning fast. But I would still like to know if > I could rewrite it in pure python/numpy/scipy or in Cython and get a > similar speedup. > > > > Here is the raw Python code: > > > > def mixed_coastline_slow(nsidc, radius, count, mask=None): > > > > nsidc_copy = numpy.copy(nsidc) > > > > if (mask is None): > > idx_coastline = numpy.where(nsidc_copy == NSIDC_COASTLINE_MIXED) > > else: > > idx_coastline = numpy.where(mask & (nsidc_copy == > NSIDC_COASTLINE_MIXED)) > > > > for (irow0, icol0) in zip(idx_coastline[0], idx_coastline[1]): > > > > rows = ( max(irow0-radius, 0), min(irow0+radius+1, > nsidc_copy.shape[0]) ) > > cols = ( max(icol0-radius, 0), min(icol0+radius+1, > nsidc_copy.shape[1]) ) > > window = nsidc[rows[0]:rows[1], cols[0]:cols[1]] > > > > npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, > False).sum() > > nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= > NSIDC_FRESHSNOW), \ > > True, False).sum() > > > > if (100.0*nsnowice/npoints >= count): > > nsidc_copy[irow0, icol0] = MISR_SEAICE_THRESHOLD > > > > return nsidc_copy > > > > and here is my attempt at Cython-izing it: > > > > import numpy > > cimport numpy as cnumpy > > cimport cython > > > > cdef int NSIDC_SIZE = 721 > > cdef int NSIDC_NO_SNOW = 0 > > cdef int NSIDC_ALL_SNOW = 100 > > cdef int NSIDC_FRESHSNOW = 103 > > cdef int NSIDC_PERMSNOW = 101 > > cdef int NSIDC_SEAICE_LOW = 1 > > cdef int NSIDC_SEAICE_HIGH = 100 > > cdef int NSIDC_COASTLINE_MIXED = 252 > > cdef int NSIDC_SUSPECT_ICE = 253 > > > > cdef int MISR_SEAICE_THRESHOLD = 6 > > > > def mixed_coastline(cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc, int > radius, int count): > > > > cdef int irow, icol, irow1, irow2, icol1, icol2, npoints, nsnowice > > cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc2 \ > > = numpy.empty(shape=(NSIDC_SIZE, NSIDC_SIZE), dtype=numpy.uint8) > > cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] window \ > > = numpy.empty(shape=(2*radius+1, 2*radius+1), dtype=numpy.uint8) > > > > nsidc2 = numpy.copy(nsidc) > > > > idx_coastline = numpy.where(nsidc2 == NSIDC_COASTLINE_MIXED) > > > > for (irow, icol) in zip(idx_coastline[0], idx_coastline[1]): > > > > irow1 = max(irow-radius, 0) > > irow2 = min(irow+radius+1, NSIDC_SIZE) > > icol1 = max(icol-radius, 0) > > icol2 = min(icol+radius+1, NSIDC_SIZE) > > window = nsidc[irow1:irow2, icol1:icol2] > > > > npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, > False).sum() > > nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window > <= NSIDC_FRESHSNOW), \ > > True, False).sum() > > > > if (100.0*nsnowice/npoints >= count): > > nsidc2[irow, icol] = MISR_SEAICE_THRESHOLD > > > > return nsidc2 > > > > Thanks in advance for any advice! > > > > Catherine > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fernanvieira at gmail.com Fri Mar 30 09:45:34 2018 From: fernanvieira at gmail.com (Fernando Fernandes Vieira) Date: Fri, 30 Mar 2018 10:45:34 -0300 Subject: [Numpy-discussion] NEUROLAB - SPYDER Message-ID: Hello everyone How to install neurolab in spyder? Can someone help me Att.. _______________________________________________________ * FERNANDO FERNANDES VIEIRA* Departamento de Engenharia Sanit?ria e Ambiental - DESA Centro de Ci?ncias e Tecnologia - CCT Universidade Estadual da Para?ba - UEPB Tel: (83) 3315-3333 (DESA) - (83) 98852-1461 (Pessoal) e-mail: fernando at uepb.edu.br (fernanvieira at gmail.com) Campina Grande - PB - Brasil _______________________________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From solarjoe at posteo.org Fri Mar 30 11:56:00 2018 From: solarjoe at posteo.org (Joe) Date: Fri, 30 Mar 2018 17:56:00 +0200 Subject: [Numpy-discussion] NEUROLAB - SPYDER In-Reply-To: References: Message-ID: <068ca20e-baf4-4a2f-e013-fade965f0415@posteo.org> Hi, Download here: https://pypi.python.org/pypi/neurolab Though, I can't recommend to use it. I did a while ago and it is a pretty basic project that seems to be no longer maintained. I use Keras / Theano now instead, which is a mature and widely used package. Kind regards, Joe Am 30.03.2018 um 15:45 schrieb Fernando Fernandes Vieira: > Hello everyone > How to install neurolab in spyder? > Can someone help me > Att.. > _______________________________________________________ > *FERNANDO FERNANDES VIEIRA* > ???Departamento de Engenharia Sanit?ria e Ambiental - DESA > ?? Centro de Ci?ncias e Tecnologia - CCT > ?? Universidade Estadual da Para?ba - UEPB > ?? Tel: (83) 3315-3333 (DESA) - (83) 98852-1461 (Pessoal) > ? ?e-mail: fernando at uepb.edu.br > (fernanvieira at gmail.com ) > ? ?Campina Grande - PB - Brasil > _______________________________________________________ > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From fernanvieira at gmail.com Fri Mar 30 15:42:14 2018 From: fernanvieira at gmail.com (Fernando Fernandes Vieira) Date: Fri, 30 Mar 2018 16:42:14 -0300 Subject: [Numpy-discussion] NEUROLAB - SPYDER In-Reply-To: <068ca20e-baf4-4a2f-e013-fade965f0415@posteo.org> References: <068ca20e-baf4-4a2f-e013-fade965f0415@posteo.org> Message-ID: Hi joe, Thanks for your help. Att.. _______________________________________________________ * FERNANDO FERNANDES VIEIRA* Departamento de Engenharia Sanit?ria e Ambiental - DESA Centro de Ci?ncias e Tecnologia - CCT Universidade Estadual da Para?ba - UEPB Tel: (83) 3315-3333 (DESA) - (83) 98852-1461 (Pessoal) e-mail: fernando at uepb.edu.br (fernanvieira at gmail.com) Campina Grande - PB - Brasil _______________________________________________________ 2018-03-30 12:56 GMT-03:00 Joe : > Hi, > > Download here: > https://pypi.python.org/pypi/neurolab > > Though, I can't recommend to use it. I did a while ago and it is > a pretty basic project that seems to be no longer maintained. > > I use Keras / Theano now instead, which is a mature and widely used > package. > > Kind regards, > Joe > > > Am 30.03.2018 um 15:45 schrieb Fernando Fernandes Vieira: > >> Hello everyone >> How to install neurolab in spyder? >> Can someone help me >> Att.. >> _______________________________________________________ >> *FERNANDO FERNANDES VIEIRA* >> Departamento de Engenharia Sanit?ria e Ambiental - DESA >> Centro de Ci?ncias e Tecnologia - CCT >> Universidade Estadual da Para?ba - UEPB >> Tel: (83) 3315-3333 (DESA) - (83) 98852-1461 (Pessoal) >> e-mail: fernando at uepb.edu.br ( >> fernanvieira at gmail.com ) >> Campina Grande - PB - Brasil >> _______________________________________________________ >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Mar 30 20:03:08 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 30 Mar 2018 17:03:08 -0700 Subject: [Numpy-discussion] ANN: numpydoc 0.8.0 release Message-ID: Hi all, I'm pleased to announce that a new release of numpydoc is available: - package: https://pypi.python.org/pypi/numpydoc - documentation (new in this release): https://numpydoc.readthedocs.io/en/latest/ This is a maintenance release with many small improvements. Likely your documentation will render with fewer warnings; e.g. for NumPy it removed ~300 irrelevant ones, while improving the rendered results. Thanks to everyone who contributed to this release! Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: