[Numpy-discussion] Numpy 1.6 schedule (was: Numpy 2.0 schedule)

Travis Oliphant oliphant at enthought.com
Sat Mar 5 23:13:12 EST 2011

On Mar 5, 2011, at 5:10 PM, Mark Wiebe wrote:

> On Thu, Mar 3, 2011 at 10:54 PM, Ralf Gommers <ralf.gommers at googlemail.com> wrote:
> <snip>
> >>> I've had a look at the bug tracker, here's a list of tickets for 1.6:
> >>> #1748 (blocker: regression for astype('str'))
> >>> #1619 (issue with dtypes, with patch)
> >>> #1749 (distutils, py 3.2)
> >>> #1601 (distutils, py 3.2)
> >>> #1622 (Solaris segfault, with patch)
> >>> #1713 (Solaris segfault)
> >>> #1631 (Solaris segfault)
> The distutils tickets are resolved.
> >>> Proposed schedule:
> >>> March 15: beta 1
> >>> March 28: rc 1
> >>> April 17: rc 2 (if needed)
> >>> April 24: final release
> Any comments on the schedule or tickets?
> That all looks fine to me. There are a few things that I've changed in the core that could stand some discussion before being finalized in 1.6, mostly due to what was required to make things work without depending on the data type enumeration order. The combination of the numpy and scipy tests were pretty effective, but as Travis mentioned my changes are fairly invasive.
> * When copying array to array, structured types now copy based on field names instead of positions, effectively behaving like a 'dict' instead of a 'labeled tuple'. This behaviour is more intuitive to me, and several fixed bugs such as dtype comparison completely ignoring the structured type data suggest that this changes an area of numpy that has been used in a more limited fashion. It might be worthwhile to introduce a tuple-style flag in a future version which causes data to be copied by position instead of by name, as it is likely useful in some contexts.

This is a semantic change that does make me a tiny bit nervous.    Structured arrays are actually used quite a bit in the wild, and so this could raise some errors.     What I don't know is how often sub-parts of a structured arrays get copied into other structured arrays with a different order to the fields.    From what I gather, Mark's changes would allow this case and do an arguably useful thing.    Previously, a copy was only allowed if the structured array contained the same fields in the same order.     It seems like this is a relaxation of a rule and should not raise any errors (unless extant code was relying on the previous errors for some reason). 

> * Array memory layouts are preserved in many cases. This means that if a, b are Fortran ordered, a+b will be as well. It could be made more pervasive, for example ndarray.copy defaults to C-order, and that could be changed to 'K' to preserve the memory layout by default. Any comments about that?

I like this change quite a bit, but it has similar potential "expectation" issues.   I think the default should be changed to 'K' in NumPy 2.0, but perhaps we should preserve C-order for now to avoid the subtle breakages that might occur based on changed expectations.    What are others thoughts? 

> * The ufunc uses a more consistent algorithm for loop selection. The previous algorithm was ad hoc and lacked symmetry, while the new algorithm is based on a simple minimization definition. This change exposed a bug in scipy's ndimage, which did not handle all of the numpy data type enums properly, so its possible there is more code out there which will be affected similarly.

This change has me the most nervous.  I'm looking forward to the more consistent algorithm.  As I said, the algorithm presently used as been there since Numeric in 1995 (I modified it only a little bit to handle scalar-array casting rules a bit differently).    This kind of change will have different corner cases and this should be understood before a release.    

I'm also wondering what happened to the optional arguments to ufuncs (are they still there)?   One of these allowed you to choose the loop 
yourself and bypass the selection algorithm.

> In general, I've used the implementation strategy of substituting my code into the core critical paths of numpy to maximize the amount of exercise it gets. While this creates more short-term hiccups as we are seeing now, it also means the new functionality conforms to the current system better and is much more stable since it is getting well tested.

Thanks again for all the good core-algorithm work, Mark.  You have being doing a great job. 


> -Mark
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

Travis Oliphant
Enthought, Inc.
oliphant at enthought.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110305/c6728b1d/attachment.html>

More information about the NumPy-Discussion mailing list