[Numpy-discussion] Custom Dtype/Units discussion

Travis Oliphant travis at continuum.io
Tue Jul 12 01:56:01 EDT 2016


On Mon, Jul 11, 2016 at 12:58 PM, Charles R Harris <
charlesr.harris at gmail.com> wrote:

>
>
> On Mon, Jul 11, 2016 at 11:39 AM, Chris Barker <chris.barker at noaa.gov>
> wrote:
>
>>
>>
>> On Sun, Jul 10, 2016 at 8:12 PM, Nathan Goldbaum <nathan12343 at gmail.com>
>> wrote:
>>
>>>
>>> Maybe this can be an informal BOF session?
>>>
>>
>> or  maybe a formal BoF? after all, how formal do they get?
>>
>> Anyway, it was my understanding that we really needed to do some
>> significant refactoring of how numpy deals with dtypes in order to do this
>> kind of thing cleanly -- so where has that gone since last year?
>>
>> Maybe this conversation should be about how to build a more flexible
>> dtype system generally, rather than specifically about unit support.
>> (though unit support is a great use-case to focus on)
>>
>
> Note that Mark Wiebe will also be giving a talk Friday, so he may be
> around. As the last person to add a type to Numpy and the designer of DyND
> he might have some useful input. DyND development is pretty active and I'm
> always curious how we can somehow move in that direction.
>
>
There has been a lot of work over the past 6 months on making DyND
implement the "pluribus" concept that I have talked about briefly in the
past.   DyND now has a separate C++ ndt data-type library.  The Python
interface to that type library is still unified in the dynd module but it
is separable and work is in progress to make a separate Python-wrapper to
this type library.    The dynd type library is datashape described at
http://datashape.pydata.org

This type system is extensible and could be the foundation of a re-factored
NumPy.      My view (and what I am encouraging work in the direction of) is
that array computing in Python should be refactored into a "type-subsystem"
 (I think ndt is the right model there), a generic ufunc-system (I think
dynd has a very promising approach there as well), and then a container
(the memoryview already in Python might be enough already).      These
modules could be separately installed, maintained and eventually moved into
Python itself.

Then, a potential future NumPy project could be ported to be a layer of
calculations and connections to other C-libraries on-top of this system.
Many parts of the current code could be re-used in that effort --- or the
new system could be part of a re-factoring of NumPy to make the innards of
NumPy more accessible to a JIT compiler.

We are already far enough along that this could be pursued with a motivated
person.   It would take 18 months to complete the system but first-light
would be less than 6 months for a dedicated, motivated, and talented
resource.   DyND is far enough along as well as Cython and/or Numba to make
this pretty straight-forward.    For this re-factored array-computing
project to take the NumPy name, this community would have to decide that
that is the right thing to do.     But, other projects like Pandas and/or
xarray and/or numpy-py and/or NumPy on Jython could use this sub-system
also.

It has taken me a long time to actually get to the point where I would
recommend a specific way forward.   I have thought about this for many
years and don't make these recommendations lightly.    The pluribus concept
is my recommendation about what would be best now and in the future --- and
I will be pursuing this concept and working to get to a point where this
community will accept it if possible because it would be ideal if this new
array library were still called NumPy.

My working view is that someone will have to build the new prototype NumPy
for the community to evaluate whether it's the right approach and get
consensus that it is the right way forward.    There is enough there now
with DyND, data-shape, and Numba/Cython to do this fairly quickly.     It
is not strictly necessary to use DyND or Numba or even data-shape to
accomplish this general plan --- but these are already available and a
great place to start as they have been built explicitly with the intention
of improving array-computing in Python.

This potential NumPy could be backwards compatible from an API perspective
(including a C-API) --- though recompliation would be necessary and there
would be some semantic differences in corner-cases that could either be
fixed where necessary but potentially just made part of the new version.

I will be at the Continuum Happy hour on Thursday at our offices and
welcome anyone to come discuss things with me there --- I am also willing
to meet with anyone on Thursday and Friday if I can --- but I don't have a
ticket to ScPy itself.     Please CC me directly if you have questions.   I
try to follow the numpy-discussion mailing list but I am not always
successful at keeping up.

To be clear as some have mis-interpreted me in the past, while I originally
wrote NumPy (borrowing heavily from Numeric and drawing inspiration from
Numarray and receiving a lot of help for specific modules from many of
you), the community has continued to develop NumPy and now has a proper
governance model.   I am now simply an interested NumPy user and previous
NumPy developer who finally has some concrete ideas to share based on work
that I have been funding, leading, and encouraging for the past several
years.

I am still very interested in helping NumPy progress, but we are also going
to be taking these ideas to create a general concept of the "buffer
protocol in Python" to enable cross-language code-sharing to enable more
code re-use for data analytics among language communities.     This is the
concept of "data-fabric" which is pre-alpha vapor-ware at this point but
with some ideas expressed at http://datashape.pydata.org and here:
https://github.com/blaze/datafabric and is something DyND is enabling.

NumPy itself has a clear governance model and whether NumPy (the project)
adopts any of the new array-computing concepts I am proposing will depend
on this community's decisions as well as work done by motivated developers
willing to work on prototypes.    I will be wiling to help get funding for
someone motivated to work on this.

Best,

-Travis




> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 

*Travis Oliphant, PhD*
*Co-founder and CEO*


@teoliphant
512-222-5440
http://www.continuum.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160712/78e60297/attachment.html>


More information about the NumPy-Discussion mailing list