Numpy BoF at SciPy 2014 - quick report
Hi all, sorry for not posting earlier, post-conference InboxInfinity blues and all that... The BoF did go as planned, and it was a good discussion, mostly following the tentative agenda outlined here: https://github.com/numpy/numpy/wiki/Numpy-BoF-at-Scipy-2014 Various folks were kind enough to take notes during the conversation on an Etherpad instance: https://scipy2014.etherpad.mozilla.org/35 For the sake of completeness and future reference, below I'm including a copy of the notes in this email. Other than what's in the notes, my take home from the discussion is mostly that: - we probably needed a longer slot than 45 minutes to have a chance to dig in a little deeper. - it would have been more productive if a focused numpy sprint had been also planned, so that there could be more structured follow-up on the ideas that came up. It would be great to hear from others who were present at the conference. In particular, Chris Barker brought up a number of things regarding datetime and planned on following up during the sprints, but I'm not sure what ended up happening. Thanks to everyone who participated! Cheers f #### Copy of Etherpad notes as of 7/16/2014: Notes from BoF: 1:30, July 19, 2014 Working with topics on this page: https://github.com/numpy/numpy/wiki/Numpy-BoF-at-Scipy-2014 chuck: where do we go from here? -- what is the role of numpy now? Generalized ufuncs -- still some more to do -- (LA stuff - norms) - some ufuncs don't impliment array interface -- which are those -- sprint topic? - zeros_like, ones_like, more... (duplicate) github issue: https://github.com/numpy/numpy/issues/4862 Here's the original issue: https://github.com/numpy/numpy/issues/3602 Implementation of @ (matrix multiplication) - will be in 3.5 ~ 18months - no work started yet -- have to make sure we do it. - @@ was not added. - The PEP for numpy is well-defined. Not much thinking to be done. (Good for a sprint) Datetime: - Can it be done? -- too many calendars -- to many time scales, etc. - Can we cover most applications? - DynND -- higher abstraction -- convert to back end implimentation - Also look at what R and Julia do? - Maybe fix up the little issues in datetime64, first? - Pandas does not use numpy machinery - uses a array of objects: those objects are subclassed form datetime.datetime - does use int64, but gets unboxed on storage. - Root cause is using UTC, rather than a naive time. - Naive is not associated with a time zone. Can be interpreted in any way. - Ripping out the locale timezone on I/O would help. - More often than not, using the locale timezone is not desired. - For example, many experimental data do not attach time zones. (Or wrong timezone) - Consider laboratory time (stopwatch rather than a clock). (timedelta) - The C++ committee is standardizing this. - A key feature which is missing, is being able to choose your epoch. New DTypes - Example: quad float types. A solution for missing values? Adding units support. - Record & structured arrays play around with dtypes. Needs to be easier to use these. - Improve documentation. - How to extend to support things like labeled arrays? - This is orthogonal to dtypes. - Would rather access time column instead of 3rd column. - Would provide a better foundation for pandas. - Key is to keep inputs simple. - Finish the DataArray push? - We are very closely there. It has been sitting there for a while. - If interested, talk at sprints on July 10. Missing values? - maybe improve masked array. - give up for now. Inheriting ndarray - introduces many bugs. - should discourage this, but make it easier to work with it. Dynd - The issues discussed so far were motivation for starting dynd - for example, a pluggable type system - adding a categorical type in numpy (at Continuum) broke lots. Easier in dynd. - Commitment for dynd is to give it a numpy-like API - Both need to evolve together. - Find ways to make things more uniform (in numpy) - Dynd is more an experimental phase, changing quickly. - Can we import dynd as np? - Not a goal. More exploratory in this phase. - Adding a layer like that at a later time would be good. Not there, yet. - Do not want to repeat py2->py3 debacle. - Buffer protocol: - Supported, but dynd extends it. - As a pure C++ library, goal is to freeze once stable so systems beyond Python can depend on it as a stable interface for working with array data. Boost::Python - Nothing official from numpy for using numpy arrays in C++ - Not prioritized. - Numpy has gotten better about namespace pollution? - It kind of works already. Talk to Mike Droettboom -- Fernando Perez (@fperez_org; http://fperez.org) fperez.net-at-gmail: mailing lists only (I ignore this when swamped!) fernando.perez-at-berkeley: contact me here for any direct mail
On Wed, Jul 16, 2014 at 8:08 PM, Fernando Perez <fperez.net@gmail.com> wrote:
- it would have been more productive if a focused numpy sprint had been also planned, so that there could be more structured follow-up on the ideas that came up.
The trick is people to do it -- there are a scary few number of people with skills, time, and inclination to work on the core numpy code. Exactly one of them (thanks Chuck!) was there for the sprints this year. If there were a way to put together a stand-alone numpy sprint at some point, that would be really great! In particular, Chris Barker brought up a number of things regarding
datetime and planned on following up during the sprints, but I'm not sure what ended up happening.
We did indeed follow op. No code was written, but: Chuck, Mark W. and I come up with a rough proposal. A handful of other folks came by to chat about it, and seemed to think it would be useful. In short: Some minor changes to time zone handling, with a hook in place to potentially plug in fancier support in the future. Possibly a hook in to plug in addition calendars. We're working on a NEP as we speak (or, correctly speaking, I'm distracted from working on the PEP by reading the numpy list....) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
participants (2)
-
Chris Barker
-
Fernando Perez