[Numpy-discussion] Custom Dtype/Units discussion

Ralf Gommers ralf.gommers at gmail.com
Tue Jul 12 02:21:22 EDT 2016

On Tue, Jul 12, 2016 at 7:56 AM, Travis Oliphant <travis at continuum.io>

> http://www.continuum.io
> On Mon, Jul 11, 2016 at 12:58 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>> On Mon, Jul 11, 2016 at 11:39 AM, Chris Barker <chris.barker at noaa.gov>
>> wrote:
>>> On Sun, Jul 10, 2016 at 8:12 PM, Nathan Goldbaum <nathan12343 at gmail.com>
>>> wrote:
>>>> Maybe this can be an informal BOF session?
>>> or  maybe a formal BoF? after all, how formal do they get?
>>> Anyway, it was my understanding that we really needed to do some
>>> significant refactoring of how numpy deals with dtypes in order to do this
>>> kind of thing cleanly -- so where has that gone since last year?
>>> Maybe this conversation should be about how to build a more flexible
>>> dtype system generally, rather than specifically about unit support.
>>> (though unit support is a great use-case to focus on)
>> Note that Mark Wiebe will also be giving a talk Friday, so he may be
>> around. As the last person to add a type to Numpy and the designer of DyND
>> he might have some useful input. DyND development is pretty active and I'm
>> always curious how we can somehow move in that direction.
> There has been a lot of work over the past 6 months on making DyND
> implement the "pluribus" concept that I have talked about briefly in the
> past.   DyND now has a separate C++ ndt data-type library.  The Python
> interface to that type library is still unified in the dynd module but it
> is separable and work is in progress to make a separate Python-wrapper to
> this type library.    The dynd type library is datashape described at
> http://datashape.pydata.org
> This type system is extensible and could be the foundation of a
> re-factored NumPy.      My view (and what I am encouraging work in the
> direction of) is that array computing in Python should be refactored into a
> "type-subsystem"  (I think ndt is the right model there), a generic
> ufunc-system (I think dynd has a very promising approach there as well),
> and then a container (the memoryview already in Python might be enough
> already).      These modules could be separately installed, maintained and
> eventually moved into Python itself.
> Then, a potential future NumPy project could be ported to be a layer of
> calculations and connections to other C-libraries on-top of this system.
> Many parts of the current code could be re-used in that effort --- or the
> new system could be part of a re-factoring of NumPy to make the innards of
> NumPy more accessible to a JIT compiler.
> We are already far enough along that this could be pursued with a
> motivated person.   It would take 18 months to complete the system but
> first-light would be less than 6 months for a dedicated, motivated, and
> talented resource.   DyND is far enough along as well as Cython and/or
> Numba to make this pretty straight-forward.    For this re-factored
> array-computing project to take the NumPy name, this community would have
> to decide that that is the right thing to do.     But, other projects like
> Pandas and/or xarray and/or numpy-py and/or NumPy on Jython could use this
> sub-system also.
> It has taken me a long time to actually get to the point where I would
> recommend a specific way forward.   I have thought about this for many
> years and don't make these recommendations lightly.    The pluribus concept
> is my recommendation about what would be best now and in the future --- and
> I will be pursuing this concept and working to get to a point where this
> community will accept it if possible because it would be ideal if this new
> array library were still called NumPy.
> My working view is that someone will have to build the new prototype NumPy
> for the community to evaluate whether it's the right approach and get
> consensus that it is the right way forward.    There is enough there now
> with DyND, data-shape, and Numba/Cython to do this fairly quickly.     It
> is not strictly necessary to use DyND or Numba or even data-shape to
> accomplish this general plan --- but these are already available and a
> great place to start as they have been built explicitly with the intention
> of improving array-computing in Python.
> This potential NumPy could be backwards compatible from an API perspective
> (including a C-API) --- though recompliation would be necessary and there
> would be some semantic differences in corner-cases that could either be
> fixed where necessary but potentially just made part of the new version.
> I will be at the Continuum Happy hour on Thursday at our offices and
> welcome anyone to come discuss things with me there --- I am also willing
> to meet with anyone on Thursday and Friday if I can --- but I don't have a
> ticket to ScPy itself.     Please CC me directly if you have questions.   I
> try to follow the numpy-discussion mailing list but I am not always
> successful at keeping up.
> To be clear as some have mis-interpreted me in the past, while I
> originally wrote NumPy (borrowing heavily from Numeric and drawing
> inspiration from Numarray and receiving a lot of help for specific modules
> from many of you), the community has continued to develop NumPy and now has
> a proper governance model.   I am now simply an interested NumPy user and
> previous NumPy developer who finally has some concrete ideas to share based
> on work that I have been funding, leading, and encouraging for the past
> several years.
> I am still very interested in helping NumPy progress, but we are also
> going to be taking these ideas to create a general concept of the "buffer
> protocol in Python" to enable cross-language code-sharing to enable more
> code re-use for data analytics among language communities.     This is the
> concept of "data-fabric" which is pre-alpha vapor-ware at this point but
> with some ideas expressed at http://datashape.pydata.org and here:
> https://github.com/blaze/datafabric and is something DyND is enabling.
> NumPy itself has a clear governance model and whether NumPy (the project)
> adopts any of the new array-computing concepts I am proposing will depend
> on this community's decisions as well as work done by motivated developers
> willing to work on prototypes.    I will be wiling to help get funding for
> someone motivated to work on this.

Thanks Travis! I'm going to let the technical parts sink in for a bit
first, but wanted to say already that your continued interest and sharing
of new ideas are much appreciated.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160712/fb52f3b4/attachment.html>

More information about the NumPy-Discussion mailing list