Hi all, sorry for not posting earlier, post-conference InboxInfinity blues and all that... The BoF did go as planned, and it was a good discussion, mostly following the tentative agenda outlined here: https://github.com/numpy/numpy/wiki/Numpy-BoF-at-Scipy-2014 Various folks were kind enough to take notes during the conversation on an Etherpad instance: https://scipy2014.etherpad.mozilla.org/35 For the sake of completeness and future reference, below I'm including a copy of the notes in this email. Other than what's in the notes, my take home from the discussion is mostly that: - we probably needed a longer slot than 45 minutes to have a chance to dig in a little deeper. - it would have been more productive if a focused numpy sprint had been also planned, so that there could be more structured follow-up on the ideas that came up. It would be great to hear from others who were present at the conference. In particular, Chris Barker brought up a number of things regarding datetime and planned on following up during the sprints, but I'm not sure what ended up happening. Thanks to everyone who participated! Cheers f #### Copy of Etherpad notes as of 7/16/2014: Notes from BoF: 1:30, July 19, 2014 Working with topics on this page: https://github.com/numpy/numpy/wiki/Numpy-BoF-at-Scipy-2014 chuck: where do we go from here? -- what is the role of numpy now? Generalized ufuncs -- still some more to do -- (LA stuff - norms) - some ufuncs don't impliment array interface -- which are those -- sprint topic? - zeros_like, ones_like, more... (duplicate) github issue: https://github.com/numpy/numpy/issues/4862 Here's the original issue: https://github.com/numpy/numpy/issues/3602 Implementation of @ (matrix multiplication) - will be in 3.5 ~ 18months - no work started yet -- have to make sure we do it. - @@ was not added. - The PEP for numpy is well-defined. Not much thinking to be done. (Good for a sprint) Datetime: - Can it be done? -- too many calendars -- to many time scales, etc. - Can we cover most applications? - DynND -- higher abstraction -- convert to back end implimentation - Also look at what R and Julia do? - Maybe fix up the little issues in datetime64, first? - Pandas does not use numpy machinery - uses a array of objects: those objects are subclassed form datetime.datetime - does use int64, but gets unboxed on storage. - Root cause is using UTC, rather than a naive time. - Naive is not associated with a time zone. Can be interpreted in any way. - Ripping out the locale timezone on I/O would help. - More often than not, using the locale timezone is not desired. - For example, many experimental data do not attach time zones. (Or wrong timezone) - Consider laboratory time (stopwatch rather than a clock). (timedelta) - The C++ committee is standardizing this. - A key feature which is missing, is being able to choose your epoch. New DTypes - Example: quad float types. A solution for missing values? Adding units support. - Record & structured arrays play around with dtypes. Needs to be easier to use these. - Improve documentation. - How to extend to support things like labeled arrays? - This is orthogonal to dtypes. - Would rather access time column instead of 3rd column. - Would provide a better foundation for pandas. - Key is to keep inputs simple. - Finish the DataArray push? - We are very closely there. It has been sitting there for a while. - If interested, talk at sprints on July 10. Missing values? - maybe improve masked array. - give up for now. Inheriting ndarray - introduces many bugs. - should discourage this, but make it easier to work with it. Dynd - The issues discussed so far were motivation for starting dynd - for example, a pluggable type system - adding a categorical type in numpy (at Continuum) broke lots. Easier in dynd. - Commitment for dynd is to give it a numpy-like API - Both need to evolve together. - Find ways to make things more uniform (in numpy) - Dynd is more an experimental phase, changing quickly. - Can we import dynd as np? - Not a goal. More exploratory in this phase. - Adding a layer like that at a later time would be good. Not there, yet. - Do not want to repeat py2->py3 debacle. - Buffer protocol: - Supported, but dynd extends it. - As a pure C++ library, goal is to freeze once stable so systems beyond Python can depend on it as a stable interface for working with array data. Boost::Python - Nothing official from numpy for using numpy arrays in C++ - Not prioritized. - Numpy has gotten better about namespace pollution? - It kind of works already. Talk to Mike Droettboom -- Fernando Perez (@fperez_org; http://fperez.org) fperez.net-at-gmail: mailing lists only (I ignore this when swamped!) fernando.perez-at-berkeley: contact me here for any direct mail