[Numpy-discussion] Numpy BoF at SciPy 2014 - quick report

Fernando Perez fperez.net at gmail.com
Wed Jul 16 23:08:58 EDT 2014


Hi all,

sorry for not posting earlier, post-conference InboxInfinity blues and all
that...

The BoF did go as planned, and it was a good discussion, mostly following
the tentative agenda outlined here:

https://github.com/numpy/numpy/wiki/Numpy-BoF-at-Scipy-2014

Various folks were kind enough to take notes during the conversation on an
Etherpad instance:

https://scipy2014.etherpad.mozilla.org/35

For the sake of completeness and future reference, below I'm including a
copy of the notes in this email.

Other than what's in the notes, my take home from the discussion is mostly
that:

- we probably needed a longer slot than 45 minutes to have a chance to dig
in a little deeper.

- it would have been more productive if a focused numpy sprint had been
also planned, so that there could be more structured follow-up on the ideas
that came up.

It would be great to hear from others who were present at the conference.
In particular, Chris Barker brought up a number of things regarding
datetime and planned on following up during the sprints, but I'm not sure
what ended up happening.

Thanks to everyone who participated!

Cheers

f


#### Copy of Etherpad notes as of 7/16/2014:

Notes from BoF:
  1:30, July 19, 2014


Working with topics on this page:
https://github.com/numpy/numpy/wiki/Numpy-BoF-at-Scipy-2014

chuck: where do we go from here? -- what is the role of numpy now?

Generalized ufuncs -- still some more to do -- (LA stuff - norms)
 - some ufuncs don't impliment array interface -- which are those -- sprint
topic?
 - zeros_like, ones_like, more... (duplicate) github issue:
https://github.com/numpy/numpy/issues/4862

 Here's the original issue: https://github.com/numpy/numpy/issues/3602

Implementation of @ (matrix multiplication)
 - will be in 3.5 ~ 18months
 - no work started yet -- have to make sure we do it.
 - @@ was not added.
 - The PEP for numpy is well-defined. Not much thinking to be done. (Good
for a sprint)

 Datetime:
  - Can it be done? -- too many calendars -- to many time scales, etc.
  -  Can we cover most applications?
  - DynND -- higher abstraction -- convert to back end implimentation
  - Also look at what R and Julia do?
  - Maybe fix up the little issues in datetime64, first?
  - Pandas does not use numpy machinery
    - uses a array of objects: those objects are subclassed form
datetime.datetime
     - does use int64, but gets unboxed on storage.
  - Root cause is using UTC, rather than a naive time.
   - Naive is not associated with a time zone. Can be interpreted in any
way.
    - Ripping out the locale timezone on I/O would help.
    - More often than not, using the locale timezone is not desired.
   - For example, many experimental data do not attach time zones. (Or
wrong timezone)
   - Consider laboratory time (stopwatch rather than a clock). (timedelta)
   - The C++ committee is standardizing this.
   - A key feature which is missing, is being able to choose your epoch.

New DTypes
 - Example: quad float types. A solution for missing values? Adding units
support.
 - Record & structured arrays play around with dtypes. Needs to be easier
to use these.
 - Improve documentation.
 - How to extend to support things like labeled arrays?
  - This is orthogonal to dtypes.
  - Would rather access time column instead of 3rd column.
  - Would provide a better foundation for pandas.
 - Key is to keep inputs simple.
 - Finish the DataArray push?
  - We are very closely there. It has been sitting there for a while.
  - If interested, talk at sprints on July 10.

Missing values?
 - maybe improve masked array.
 - give up for now.

Inheriting ndarray
 - introduces many bugs.
 - should discourage this, but make it easier to work with it.

Dynd
 - The issues discussed so far were motivation for starting dynd
  - for example, a pluggable type system
  - adding a categorical type in numpy (at Continuum) broke lots. Easier in
dynd.
 - Commitment for dynd is to give it a numpy-like API
 - Both need to evolve together.
  - Find ways to make things more uniform (in numpy)
  - Dynd is more an experimental phase, changing quickly.
 - Can we import dynd as np?
  - Not a goal. More exploratory in this phase.
  - Adding a layer like that at a later time would be good. Not there, yet.
  - Do not want to repeat py2->py3 debacle.
 - Buffer protocol:
  - Supported, but dynd extends it.
  - As a pure C++ library, goal is to freeze once stable so systems beyond
Python can depend on it as a stable interface for working with array data.

Boost::Python
 - Nothing official from numpy for using numpy arrays in C++
 - Not prioritized.
 - Numpy has gotten better about namespace pollution?
 - It kind of works already. Talk to Mike Droettboom

-- 
Fernando Perez (@fperez_org; http://fperez.org)
fperez.net-at-gmail: mailing lists only (I ignore this when swamped!)
fernando.perez-at-berkeley: contact me here for any direct mail
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140716/9d457329/attachment.html>


More information about the NumPy-Discussion mailing list