numpy grant update
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
Hi all, I wanted to give everyone an update on what's going on with the NumPy grant [1]. As you may have noticed, things have been moving a bit slower than originally hoped -- unfortunately my health is improving but has continued to be rocky [2]. Fortunately, I have awesome co-workers, and BIDS has an institutional interest/mandate for figuring out how to make these things happen, so after thinking it over we've decided to reorganize how we're doing things internally and split up the work to let me focus on the core technical/community aspects without getting overloaded. Specifically, Fernando Pérez and Jonathan Dugan [3] are taking on PI/administration duties, Stéfan van der Walt will focus on handling day-to-day management of the incoming hires, and Nelle Varoquaux & Jarrod Millman will also be joining the team (exact details TBD). This shouldn't really affect any of you, except that you might see some familiar faces with @berkeley.edu emails becoming more engaged. I'm still leading the Berkeley effort, and in any case it's still ultimately the community and NumPy steering council who will be making decisions about the project – this is just some internal details about how we're planning to manage our contributions. But in the interest of full transparency I figured I'd let you know what's happening. In other news, the job ad to start the official hiring process has now been submitted for HR review, so it should hopefully be up soon -- depending on how efficient the bureaucracy is. I'll definitely let everyone know as soon as its posted. I'll also be giving a lunch talk at BIDS tomorrow to let folks locally know about what's going on, which I think will be recorded – I'll send around a link after in case others are interested. -n [1] https://mail.python.org/pipermail/numpy-discussion/2017-May/076818.html [2] https://vorpus.org/blog/emerging-from-the-underworld/ [3] https://bids.berkeley.edu/people/jonathan-dugan -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Thu, Oct 19, 2017 at 10:02 AM, Charles R Harris < charlesr.harris@gmail.com> wrote:
Thanks Nathaniel. I'm looking forward to all of those people getting involved. Hiring always takes longer than you want, but next year the pace of development promises to pick up significantly:) Ralf
![](https://secure.gravatar.com/avatar/d9ac9213ada4a807322f99081296784b.jpg?s=120&d=mm&r=g)
On Thu, Oct 19, 2017, at 23:11, Ralf Gommers wrote: the first package I contributed to in the scientific Python ecosystem, at a time--back when segfaults were still a thing ;)--that could not have been more exciting to a young undergrad. Now a bit older, but hopefully not too rusty ;) Best regards Stéfan
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Wed, Oct 18, 2017 at 10:24 PM, Nathaniel Smith <njs@pobox.com> wrote:
Here's that link: https://www.youtube.com/watch?v=fowHwlpGb34 -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Nathaniel, Thanks for the link. The plans sounds great! You'll not be surprised to hear I'm particularly interested in the units aspect (and, no, I don't mind at all if we can stop subclassing ndarray...). Is the idea that there will be a general way for allow a dtype to define how to convert an array to one with another dtype? (Just as one now implicitly is able to convert between, say, int and float.) And, if so, is the idea that one of those conversion possibilities might involve checking units? Or were you thinking of implementing units more directly? The former would seem most sensible, if only so you can initially focus on other things than deciding how to support, say, esu vs emu units, or whether or not to treat radians as equal to dimensionless (which they formally are, but it is not always handy to do so). Anyway, do keep us posted! All the best, Marten On Thu, Oct 26, 2017 at 3:40 PM, Nathaniel Smith <njs@pobox.com> wrote:
![](https://secure.gravatar.com/avatar/7857f26c1ef2e9bdbfa843f9087710f7.jpg?s=120&d=mm&r=g)
My understanding of this is that the dtype will only hold the unit metadata. So that means units would propogate through calculations automatically, but the dtype wouldn't be able to manipulate the array data (in an in-place unit conversion for example). In this world, astropy quantities and yt's YTArray would become containers around an ndarray that would make use of the dtype metadata but also implement all of the unit semantics that they already implement. Since they would become container classes and would no longer be ndarray subclasses, that avoids most of the pitfalls one encounters these days. Please correct me if I'm wrong, Nathaniel. -Nathan On Thu, Oct 26, 2017 at 5:14 PM, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
That sounds somewhat puzzling as units cannot really propagate without them somehow telling how they would change! (e.g., the outcome of sin(a) is possible only for angular units and then depends on that unit). But in any case, the mailing list is probably not the best case to discuss this - rather, I look forward to -- and will most happily give feedback on -- a NEP or other more detailed explanation! -- Marten
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Thu, Oct 26, 2017 at 2:11 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:
I think that'd be fine actually... dtypes have methods[1] that are invoked to do any operation that involves touching the actual array data. For example, when you copy array data from one place to another (because someone called arr.copy(), or did x[...] = y, or because the ufunc internals need to copy part of the array into a temporary bounce buffer, etc.), you have to let the dtype do that, because only the dtype knows how to safely copy entries of this dtype. (For many dtypes it's just a simple (strided) memmove, but then for the object dtype you have to take care of refcounting...) Similarly, if your unit dtype implemented casting, then array(..., dtype=WithUnits(float, meters)).astype(WithUnits(float, feet)) would Just Work. It looks like we don't currently expose a user-level API for doing in-place dtype conversions, but there's no reason we can't add one; all the underlying casting machinery already exists and works on arbitrary memory buffers. (And in the mean time there's a cute trick here [2] you could use to implement it yourself.) And if we do add one, then you could use it equally well to do in-place conversion from float64->int64 as for float64-in-meters to float64-in-feet. [1] Well, technically right now they're not methods, but instead a bunch of instance attributes holding C level function pointers that act like methods. But basically this is just an obfuscated way of implementing methods; it made sense at the time, but in retrospect making them use the more usual Python machinery for this will make things easier. [2] https://stackoverflow.com/a/4396247/
I don't think you'd need a container class for basic functionality, but it might turn out to be useful for some kind of convenience/backwards-compatibility issues. For example, right now with Quantity you can do 'arr.unit' to get the unit and 'arr.value' to get the raw values with units stripped. It should definitely be possible to support these with spellings like 'arr.dtype.unit' and 'asarray(arr, dtype=float)' (or 'astropy.quantities.value(arr)'), but maybe not the short array attribute based spellings? We'll have to have the discussion about whether we want to provide some mechanism for *dtypes* to add new attributes to the *ndarray* namespace. (There's some precedent in numpy's built-in .real and .imag, but OTOH this is a kind of 'import *' feature that can easily be confusing and create backwards compatibility issues -- what if ndarray and the dtype have a name clash? Keeping in mind that it could be a clash between a third-party dtype we don't even know about and a new ndarray attribute that didn't exist when the third-party dtype was created...) -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Nathaniel, That sounds like it could work very well indeed! Somewhat related only, for the inner loops I've been thinking whether it might be possible to automatically create composite ufuncs, where the inner loops are executed in some prescribed order, so that for instance one could define ``` sinmul = sin(multiply(Input(1), Input(2))) ``` which would then create a new ufunc with 2 inputs and one output, which would internally first multiply the inputs and the take the sin (you'll see some similarity with an example in the talk you gave...). For this purpose, I'm thinking one could just reuse the iterator, but call the inner loops sequentially (being somewhat smart in that the sin can be done in-place on the output of the multiply). I could see that even complicated "casting" from dtypes could be implemented similarly (it probably already happens for int/float/etc.?) Anyway, looking forward to hearing more (in due time)! All the best, Marten
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Thu, Oct 26, 2017 at 1:14 PM, Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
Well, to some extent the answers here are going to be "you tell me" :-). I'm not an expert in unit handling, and these plans are pretty high-level right now -- there will be lots more discussions to work out details once we've hired people and they're ramping up, and as we work out the larger context around how to improve the dtype system. But, generally, yeah, one of the things that a custom dtype will need to be able to do is to hook into the casting and ufunc dispatch systems. That means, when you define a dtype, you get to answer questions like "can you cast yourself into float32 without loss of precision?", or "can you cast yourself into int64, truncating values if you have to?". (Or even, "can you cast yourself to <this other unit type>?", which would presumably trigger unit conversion.) And you'd also get to define how things like overriding how np.add and np.multiply work for your dtype -- it's already the case that ufuncs have multiple implementations for different dtypes and there's machinery to pick the best one; this would just be extending that to these new dtypes as well. One possible approach that I think might be particularly nice would be to implement units as a "wrapper dtype". The idea would be that if we have a standard interface that dtypes implement, then not only can you implement those methods yourself to make a new dtype, but you can also call those methods on an existing dtype. So you could do something like: class WithUnits(np.dtype): def __init__(self, inner_dtype, unit): self.inner_dtype = np.dtype(inner_dtype) self.unit = unit # Simple operations like bulk data copying are delegated to the inner dtype # (Invoked by arr.copy(), making temporary buffers for calculations, etc.) def copy_data(self, source, dest): return self.inner_dtype.copy_data(source, dest) # Other operations like casting can do some unit-specific stuff and then # delegate def cast_to(self, other_dtype, source, dest): if isinstance(other_dtype, WithUnits): if other_dtype.unit == self.unit: # Something like casting WithUnits(float64, meters) -> WithUnits(float32, meters) # So no unit trickiness needed; delegate to the inner dtype to handle the storage # conversion (e.g. float64 -> float32) self.inner_dtype.cast_to(other_dtype.inner_dtype, source, dest) # ... other cases to handle unit conversion, etc. ... And then as a user you'd use it like np.array([1, 2, 3], dtype=WithUnits(float, meters)) or whatever. (Or some convenience function that ultimately does this.) This is obviously a hand-wavey sketch, I'm sure the actual details will look very different. But hopefully it gives some sense of the kind of possibilities here? -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Thu, Oct 26, 2017 at 12:40 PM, Nathaniel Smith <njs@pobox.com> wrote:
Still no update on that job ad (though we're learning interesting things about Berkeley's HR system!), but we did make a little scratch repo to start brainstorming. This is mostly for getting our own thoughts in order, but if anyone's curious then here it is: https://github.com/njsmith/numpy-grant-planning/ -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Thu, Oct 19, 2017 at 10:02 AM, Charles R Harris < charlesr.harris@gmail.com> wrote:
Thanks Nathaniel. I'm looking forward to all of those people getting involved. Hiring always takes longer than you want, but next year the pace of development promises to pick up significantly:) Ralf
![](https://secure.gravatar.com/avatar/d9ac9213ada4a807322f99081296784b.jpg?s=120&d=mm&r=g)
On Thu, Oct 19, 2017, at 23:11, Ralf Gommers wrote: the first package I contributed to in the scientific Python ecosystem, at a time--back when segfaults were still a thing ;)--that could not have been more exciting to a young undergrad. Now a bit older, but hopefully not too rusty ;) Best regards Stéfan
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Wed, Oct 18, 2017 at 10:24 PM, Nathaniel Smith <njs@pobox.com> wrote:
Here's that link: https://www.youtube.com/watch?v=fowHwlpGb34 -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Nathaniel, Thanks for the link. The plans sounds great! You'll not be surprised to hear I'm particularly interested in the units aspect (and, no, I don't mind at all if we can stop subclassing ndarray...). Is the idea that there will be a general way for allow a dtype to define how to convert an array to one with another dtype? (Just as one now implicitly is able to convert between, say, int and float.) And, if so, is the idea that one of those conversion possibilities might involve checking units? Or were you thinking of implementing units more directly? The former would seem most sensible, if only so you can initially focus on other things than deciding how to support, say, esu vs emu units, or whether or not to treat radians as equal to dimensionless (which they formally are, but it is not always handy to do so). Anyway, do keep us posted! All the best, Marten On Thu, Oct 26, 2017 at 3:40 PM, Nathaniel Smith <njs@pobox.com> wrote:
![](https://secure.gravatar.com/avatar/7857f26c1ef2e9bdbfa843f9087710f7.jpg?s=120&d=mm&r=g)
My understanding of this is that the dtype will only hold the unit metadata. So that means units would propogate through calculations automatically, but the dtype wouldn't be able to manipulate the array data (in an in-place unit conversion for example). In this world, astropy quantities and yt's YTArray would become containers around an ndarray that would make use of the dtype metadata but also implement all of the unit semantics that they already implement. Since they would become container classes and would no longer be ndarray subclasses, that avoids most of the pitfalls one encounters these days. Please correct me if I'm wrong, Nathaniel. -Nathan On Thu, Oct 26, 2017 at 5:14 PM, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
That sounds somewhat puzzling as units cannot really propagate without them somehow telling how they would change! (e.g., the outcome of sin(a) is possible only for angular units and then depends on that unit). But in any case, the mailing list is probably not the best case to discuss this - rather, I look forward to -- and will most happily give feedback on -- a NEP or other more detailed explanation! -- Marten
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Thu, Oct 26, 2017 at 2:11 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:
I think that'd be fine actually... dtypes have methods[1] that are invoked to do any operation that involves touching the actual array data. For example, when you copy array data from one place to another (because someone called arr.copy(), or did x[...] = y, or because the ufunc internals need to copy part of the array into a temporary bounce buffer, etc.), you have to let the dtype do that, because only the dtype knows how to safely copy entries of this dtype. (For many dtypes it's just a simple (strided) memmove, but then for the object dtype you have to take care of refcounting...) Similarly, if your unit dtype implemented casting, then array(..., dtype=WithUnits(float, meters)).astype(WithUnits(float, feet)) would Just Work. It looks like we don't currently expose a user-level API for doing in-place dtype conversions, but there's no reason we can't add one; all the underlying casting machinery already exists and works on arbitrary memory buffers. (And in the mean time there's a cute trick here [2] you could use to implement it yourself.) And if we do add one, then you could use it equally well to do in-place conversion from float64->int64 as for float64-in-meters to float64-in-feet. [1] Well, technically right now they're not methods, but instead a bunch of instance attributes holding C level function pointers that act like methods. But basically this is just an obfuscated way of implementing methods; it made sense at the time, but in retrospect making them use the more usual Python machinery for this will make things easier. [2] https://stackoverflow.com/a/4396247/
I don't think you'd need a container class for basic functionality, but it might turn out to be useful for some kind of convenience/backwards-compatibility issues. For example, right now with Quantity you can do 'arr.unit' to get the unit and 'arr.value' to get the raw values with units stripped. It should definitely be possible to support these with spellings like 'arr.dtype.unit' and 'asarray(arr, dtype=float)' (or 'astropy.quantities.value(arr)'), but maybe not the short array attribute based spellings? We'll have to have the discussion about whether we want to provide some mechanism for *dtypes* to add new attributes to the *ndarray* namespace. (There's some precedent in numpy's built-in .real and .imag, but OTOH this is a kind of 'import *' feature that can easily be confusing and create backwards compatibility issues -- what if ndarray and the dtype have a name clash? Keeping in mind that it could be a clash between a third-party dtype we don't even know about and a new ndarray attribute that didn't exist when the third-party dtype was created...) -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Nathaniel, That sounds like it could work very well indeed! Somewhat related only, for the inner loops I've been thinking whether it might be possible to automatically create composite ufuncs, where the inner loops are executed in some prescribed order, so that for instance one could define ``` sinmul = sin(multiply(Input(1), Input(2))) ``` which would then create a new ufunc with 2 inputs and one output, which would internally first multiply the inputs and the take the sin (you'll see some similarity with an example in the talk you gave...). For this purpose, I'm thinking one could just reuse the iterator, but call the inner loops sequentially (being somewhat smart in that the sin can be done in-place on the output of the multiply). I could see that even complicated "casting" from dtypes could be implemented similarly (it probably already happens for int/float/etc.?) Anyway, looking forward to hearing more (in due time)! All the best, Marten
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Thu, Oct 26, 2017 at 1:14 PM, Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
Well, to some extent the answers here are going to be "you tell me" :-). I'm not an expert in unit handling, and these plans are pretty high-level right now -- there will be lots more discussions to work out details once we've hired people and they're ramping up, and as we work out the larger context around how to improve the dtype system. But, generally, yeah, one of the things that a custom dtype will need to be able to do is to hook into the casting and ufunc dispatch systems. That means, when you define a dtype, you get to answer questions like "can you cast yourself into float32 without loss of precision?", or "can you cast yourself into int64, truncating values if you have to?". (Or even, "can you cast yourself to <this other unit type>?", which would presumably trigger unit conversion.) And you'd also get to define how things like overriding how np.add and np.multiply work for your dtype -- it's already the case that ufuncs have multiple implementations for different dtypes and there's machinery to pick the best one; this would just be extending that to these new dtypes as well. One possible approach that I think might be particularly nice would be to implement units as a "wrapper dtype". The idea would be that if we have a standard interface that dtypes implement, then not only can you implement those methods yourself to make a new dtype, but you can also call those methods on an existing dtype. So you could do something like: class WithUnits(np.dtype): def __init__(self, inner_dtype, unit): self.inner_dtype = np.dtype(inner_dtype) self.unit = unit # Simple operations like bulk data copying are delegated to the inner dtype # (Invoked by arr.copy(), making temporary buffers for calculations, etc.) def copy_data(self, source, dest): return self.inner_dtype.copy_data(source, dest) # Other operations like casting can do some unit-specific stuff and then # delegate def cast_to(self, other_dtype, source, dest): if isinstance(other_dtype, WithUnits): if other_dtype.unit == self.unit: # Something like casting WithUnits(float64, meters) -> WithUnits(float32, meters) # So no unit trickiness needed; delegate to the inner dtype to handle the storage # conversion (e.g. float64 -> float32) self.inner_dtype.cast_to(other_dtype.inner_dtype, source, dest) # ... other cases to handle unit conversion, etc. ... And then as a user you'd use it like np.array([1, 2, 3], dtype=WithUnits(float, meters)) or whatever. (Or some convenience function that ultimately does this.) This is obviously a hand-wavey sketch, I'm sure the actual details will look very different. But hopefully it gives some sense of the kind of possibilities here? -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Thu, Oct 26, 2017 at 12:40 PM, Nathaniel Smith <njs@pobox.com> wrote:
Still no update on that job ad (though we're learning interesting things about Berkeley's HR system!), but we did make a little scratch repo to start brainstorming. This is mostly for getting our own thoughts in order, but if anyone's curious then here it is: https://github.com/njsmith/numpy-grant-planning/ -n -- Nathaniel J. Smith -- https://vorpus.org
participants (6)
-
Charles R Harris
-
Marten van Kerkwijk
-
Nathan Goldbaum
-
Nathaniel Smith
-
Ralf Gommers
-
Stefan van der Walt