Mailman 3 1.2 tasks - NumPy-Discussion

1.2 tasks

Jarrod Millman

Aug. 4, 2008

5:45 p.m.

Here are the remaining tasks that I am aware of that need to be done before tagging 1.2.0b1 on the 8th. Median ====== The call signature for median needs to change from def median(a, axis=0, out=None, overwrite_input=False): to def median(a, axis=None, out=None, overwrite_input=False): in both numpy/ma/extras.py and numpy/lib/function_base.py Histogram ======== The call signature for histogram needs to change from def histogram(a, bins=10, range=None, normed=False, weights=None, new=False): to def histogram(a, bins=10, range=None, normed=False, weights=None, new=True): in numpy/lib/function_base.py Documentation ============ The documentation project needs to merge in its changes. Stefan will take care of this on the 5th. Please let me know ASAP if there is anything I am missing. Thanks, -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/

Show replies by date

Darren Dale

August 2008

6:16 p.m.

Hi Jarrod, I was wondering if someone knowledgeable would please look into the behavior of concatenate() with subclasses of ndarray. I posted at scipy-dev some work on a subclass that handles units, and when two of these are concatenated, a regular ndarray is returned rather than another instance of the subclass. This behavior was discussed a while back at http://thread.gmane.org/gmane.comp.python.numeric.general/14494 . It resurfaced on enthought-dev in Sept-2007, at which point Travis responded (I hope he doesnt mind being quoted):

...

there should be no array creation functions which do not call __array_finalize__ if it is defined, and therefore, the described behavior is a bug that should be reported. The concatentate code should also respect the sub-type of the arrays being concatenated (it seems to as I look at the code...)

However, the larger question of priority occurs in mixed concatenation cases which are probably being encountered. In such cases, the sub-type with the largest priority will be the resulting type (and if it's not your class then your __array_finalize__ will not be called). I wonder if this was what was being encountered rather than a "limitation of NumPy"

I encountered the problem again over the weekend with the package I am working on, which can be checked out from: "hg clone \ http://dale.chess.cornell.edu/~darren/cgi-bin/hgwebdir.cgi/quantities \ quantities" it installs in the usual way. The concat issue can the be reproduced by doing: [~] |1> from quantities import NDQuantity udunits(3): Already initialized from file "/usr/lib64/python2.5/site-packages/quantities/quantities-data/udunits.dat" [~] |2> J=NDQuantity([1.,2.,3.],'J') [~] |3> J <3> NDQuantity([ 1., 2., 3.]), kg * m^2 / s^2 [~] |4> import numpy [~] |5> numpy.concatenate([J,J]) <5> array([ 1., 2., 3., 1., 2., 3.]) Maybe this is an issue with my code, and not numpy, but since others have reported the problem before me, perhaps someone could please have a look before releasing 1.2? I dont think array priority is the problem, since I see the issue concatenating two of the same array subtypes. Regards, Darren On Monday 04 August 2008 01:45:51 pm Jarrod Millman wrote:

...

Here are the remaining tasks that I am aware of that need to be done before tagging 1.2.0b1 on the 8th.

Median ====== The call signature for median needs to change from def median(a, axis=0, out=None, overwrite_input=False): to def median(a, axis=None, out=None, overwrite_input=False): in both numpy/ma/extras.py and numpy/lib/function_base.py

Histogram ======== The call signature for histogram needs to change from def histogram(a, bins=10, range=None, normed=False, weights=None, new=False): to def histogram(a, bins=10, range=None, normed=False, weights=None, new=True): in numpy/lib/function_base.py

Documentation ============ The documentation project needs to merge in its changes. Stefan will take care of this on the 5th.

Please let me know ASAP if there is anything I am missing.

Thanks,

Pierre GM

8:29 p.m.

On Monday 04 August 2008 14:16:33 Darren Dale wrote:

...

Hi Jarrod,

I was wondering if someone knowledgeable would please look into the behavior of concatenate() with subclasses of ndarray.

Darren, I ran into similar problems when I started working on numpy.ma and scikits.timeseries, and I remember having conversations with Bryce Hendrix of Enthought about that on this very list. I think you have a problem in `NDQuantities.__array_finalize__`: you don't initialize `._units` if `obj` doesn't have a `_units`. At least, you should use something like self._units = getattr(obj, '_units', None), which will set `._units` to None if `obj` is not a NDQuantities, and will also ensure that your NDQuantities array always have a ._units attribute. In other words, if I take a view of a ndarray as a NDQuantities, I should have a _units defined by default (None being the most appropriate in that case). About concatenation: In a lot of cases, concatenate requires some tweaking. In scikits.timeseries, we have to decide what to do when the series being concatenated overlap. In your case, you have to decide what to do when concatenating 2 arrays A & B, for example: * if A & B have the same unit, keep it * If A (B) doesn't have a unit, keep the unit of B (A) * If the units of A and B are compatible, convert one to the other * If the units are incompatible, raise an exception. What I suggest is to implement your own concatenate: check the units, concatenate the arrays as regular ndarrays, take a view of the result, modify the unit of the result accordingly. About subclassing: * You may want to check the cookbook: http://www.scipy.org/Subclasses * You may also want to check other ndarray subclasses, such as MaskedArray, TimeSeries (</pushing product>) for some implementation details. * You may also want add your own experience on http://www.scipy.org/Subclasses, so that we don't lose it. Drop me a line offlist if you'd like, I'd be happy to help (depending on the time at hand). Cheers. P.

Darren Dale

9:53 p.m.

Hi Pierre, On Monday 04 August 2008 04:29:57 pm Pierre GM wrote:

...

On Monday 04 August 2008 14:16:33 Darren Dale wrote:

...
Hi Jarrod,

I was wondering if someone knowledgeable would please look into the behavior of concatenate() with subclasses of ndarray.

Darren, I ran into similar problems when I started working on numpy.ma and scikits.timeseries, and I remember having conversations with Bryce Hendrix of Enthought about that on this very list.

I think you have a problem in `NDQuantities.__array_finalize__`: you don't initialize `._units` if `obj` doesn't have a `_units`. At least, you should use something like self._units = getattr(obj, '_units', None), which will set `._units` to None if `obj` is not a NDQuantities, and will also ensure that your NDQuantities array always have a ._units attribute. In other words, if I take a view of a ndarray as a NDQuantities, I should have a _units defined by default (None being the most appropriate in that case).

About concatenation: In a lot of cases, concatenate requires some tweaking. In scikits.timeseries, we have to decide what to do when the series being concatenated overlap. In your case, you have to decide what to do when concatenating 2 arrays A & B, for example: * if A & B have the same unit, keep it * If A (B) doesn't have a unit, keep the unit of B (A) * If the units of A and B are compatible, convert one to the other * If the units are incompatible, raise an exception. What I suggest is to implement your own concatenate: check the units, concatenate the arrays as regular ndarrays, take a view of the result, modify the unit of the result accordingly.

About subclassing: * You may want to check the cookbook: http://www.scipy.org/Subclasses * You may also want to check other ndarray subclasses, such as MaskedArray, TimeSeries (</pushing product>) for some implementation details. * You may also want add your own experience on http://www.scipy.org/Subclasses, so that we don't lose it.

Drop me a line offlist if you'd like, I'd be happy to help (depending on the time at hand). Cheers.

Thank you for the comments. You made a number of good points and I will look into each. I agree that it looks like I need to write my own concatenate function, but judging from Travis' comment a while back, there may still be an issue with numpy's concatenate. In my example, it is not just that concat is stripping the units, it is returning a different type. Maybe that is not such a bad thing in this case, but I thought it should be brought to the numpy maintainers attention. Darren

Darren Dale

10:43 p.m.

On Monday 04 August 2008 6:27:12 pm you wrote:

...

On Monday 04 August 2008 17:53:54 you wrote: In my example, it is not

...
just that concat is stripping the units, it is returning a different type.

By that, you mean that the result is not a NDQuantity ? Ah.

As you mentioned Travis' email, you must have noticed he was talking about priorities: set a __array_priority__ argument as class variable of NDQuantity with a value higher than 1 and hop ! You have a ndquantity at the end. Except that in your case, you can't print it, because it doesn't have a ._units. Not a surprise, as your __array_finalize__ doesn't define a ._units unless the obj parameter is already a ndquantity: here, that's not the case, there's a conversion to ndarray under the hood during concatenate, then a call to __array_finalize__ with a ndarray as obj.

So, in fact, all is well and works as it should on the numpy part.

I stand corrected. I must have misunderstood the documentation of __array_priority__ in the numpy manual. Thank you for helping clear this up! Darren

David Huard

6:41 p.m.

On Mon, Aug 4, 2008 at 1:45 PM, Jarrod Millman <millman@berkeley.edu> wrote:

...

Here are the remaining tasks that I am aware of that need to be done before tagging 1.2.0b1 on the 8th.

Median ====== The call signature for median needs to change from def median(a, axis=0, out=None, overwrite_input=False): to def median(a, axis=None, out=None, overwrite_input=False): in both numpy/ma/extras.py and numpy/lib/function_base.py

Histogram ======== The call signature for histogram needs to change from def histogram(a, bins=10, range=None, normed=False, weights=None, new=False): to def histogram(a, bins=10, range=None, normed=False, weights=None, new=True): in numpy/lib/function_base.py

Question: Should histogram raise a warning by default (new=True) to warn users that the behaviour has changed ? Or warn only if new=False to remind that the old behaviour will be deprecated in 1.3 ? I think that users will prefer being annoyed at warnings than surprised by an unexpected change, but repeated warnings can become a nuisance. To minimize annoyance, we could also offer three possibilities: new=None (default) : Equivalent to True, print warning about change. new=True : Don't print warning. new=False : Print warning about future deprecation. So those who have already set new=True don't get warnings, and all others are warned. Feedback ? David H.

...

Documentation ============ The documentation project needs to merge in its changes. Stefan will take care of this on the 5th.

Please let me know ASAP if there is anything I am missing.

Thanks,

-- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Vincent Schut

8:04 a.m.

David Huard wrote:

...

On Mon, Aug 4, 2008 at 1:45 PM, Jarrod Millman <millman@berkeley.edu <mailto:millman@berkeley.edu>> wrote:

...

<snip>

...

Question: Should histogram raise a warning by default (new=True) to warn users that the behaviour has changed ? Or warn only if new=False to remind that the old behaviour will be deprecated in 1.3 ? I think that users will prefer being annoyed at warnings than surprised by an unexpected change, but repeated warnings can become a nuisance.

To minimize annoyance, we could also offer three possibilities:

new=None (default) : Equivalent to True, print warning about change. new=True : Don't print warning. new=False : Print warning about future deprecation.

So those who have already set new=True don't get warnings, and all others are warned. Feedback ?

As a regular user of histogram I say: please warn! Your proposal above seems OK to me. I do have histogram in a lot of kind of old (and sometimes long-running) code of mine, and I certainly would prefer to be warned. Vincent.

David Huard

3:48 p.m.

On Tue, Aug 5, 2008 at 4:04 AM, Vincent Schut <schut@sarvision.nl> wrote:

...

David Huard wrote:

...
On Mon, Aug 4, 2008 at 1:45 PM, Jarrod Millman <millman@berkeley.edu <mailto:millman@berkeley.edu>> wrote:

...
<snip>

...
Question: Should histogram raise a warning by default (new=True) to warn users that the behaviour has changed ? Or warn only if new=False to remind that the old behaviour will be deprecated in 1.3 ? I think that users will prefer being annoyed at warnings than surprised by an unexpected change, but repeated warnings can become a nuisance.

To minimize annoyance, we could also offer three possibilities:

new=None (default) : Equivalent to True, print warning about change. new=True : Don't print warning. new=False : Print warning about future deprecation.

So those who have already set new=True don't get warnings, and all others are warned. Feedback ?

As a regular user of histogram I say: please warn! Your proposal above seems OK to me. I do have histogram in a lot of kind of old (and sometimes long-running) code of mine, and I certainly would prefer to be warned.

Vincent.

Thanks for the feedback. Here is what will be printed: If new=False The original semantics of histogram is scheduled to be deprecated in NumPy 1.3. The new semantics fixes long-standing issues with outliers handling. The main changes concern 1. the definition of the bin edges, now including the rightmost edge, and 2. the handling of upper outliers, now ignored rather than tallied in the rightmost bin. Please read the docstring for more information. If new=None (default) The semantics of histogram has been modified in the current release to fix long-standing issues with outliers handling. The main changes concern 1. the definition of the bin edges, now including the rightmost edge, and 2. the handling of upper outliers, now ignored rather than tallied in the rightmost bin. The previous behaviour is still accessible using `new=False`, but is scheduled to be deprecated in the next release (1.3). *This warning will not printed in the 1.3 release.* Please read the docstring for more information. I modified the docstring to put the emphasis on the new semantics, adapted the tests and updated the ticket. David

...

_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Jarrod Millman

5:18 p.m.

On Tue, Aug 5, 2008 at 8:48 AM, David Huard <david.huard@gmail.com> wrote:

...

Thanks for the feedback. Here is what will be printed:

If new=False

The original semantics of histogram is scheduled to be deprecated in NumPy 1.3. The new semantics fixes long-standing issues with outliers handling. The main changes concern 1. the definition of the bin edges, now including the rightmost edge, and 2. the handling of upper outliers, now ignored rather than tallied in the rightmost bin.

Please read the docstring for more information.

If new=None (default)

The semantics of histogram has been modified in the current release to fix long-standing issues with outliers handling. The main changes concern 1. the definition of the bin edges, now including the rightmost edge, and 2. the handling of upper outliers, now ignored rather than tallied in the rightmost bin. The previous behaviour is still accessible using `new=False`, but is scheduled to be deprecated in the next release (1.3).

*This warning will not printed in the 1.3 release.*

Please read the docstring for more information.

Thanks for taking care of this. I thought that we were going to remove the new parameter in the 1.3 release. Is that still the plan? If so, shouldn't the warning state "will be removed in the next minor release (1.3)" rather than "is scheduled to be deprecated in the next release (1.3)"? In my mind the old behavior is deprecated in this release (1.2). The 1.2 release will be longer lived (~6 months) than the 1.1 release and I anticipate several bugfix releases (1.2.1, 1.2.2, 1.2.3, etc). So I think it is reasonable to just remove the old behavior in the 1.3 release. -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/

Stéfan van der Walt

5:24 p.m.

2008/8/5 Jarrod Millman <millman@berkeley.edu>:

...

...
If new=None (default)

Could you put in a check for new=True, and suppress those messages? A user that knows about the changes wouldn't want to see anything. Regards Stéfan

Jarrod Millman

5:36 p.m.

On Tue, Aug 5, 2008 at 10:24 AM, Stéfan van der Walt <stefan@sun.ac.za> wrote:

...

Could you put in a check for new=True, and suppress those messages? A user that knows about the changes wouldn't want to see anything.

Yes, that is all ready available. Maybe the warning message for 'new=None' should mention this, though. -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/

David Huard

6:01 p.m.

On Tue, Aug 5, 2008 at 1:36 PM, Jarrod Millman <millman@berkeley.edu> wrote:

...

On Tue, Aug 5, 2008 at 10:24 AM, Stéfan van der Walt <stefan@sun.ac.za> wrote:

...
Could you put in a check for new=True, and suppress those messages? A user that knows about the changes wouldn't want to see anything.

Yes, that is all ready available. Maybe the warning message for 'new=None' should mention this, though.

Done

...

-- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

David Huard

5:58 p.m.

On Tue, Aug 5, 2008 at 1:18 PM, Jarrod Millman <millman@berkeley.edu> wrote:

...

On Tue, Aug 5, 2008 at 8:48 AM, David Huard <david.huard@gmail.com> wrote:

...
Thanks for the feedback. Here is what will be printed:

If new=False

The original semantics of histogram is scheduled to be deprecated in NumPy 1.3. The new semantics fixes long-standing issues with outliers handling. The main changes concern 1. the definition of the bin edges, now including the rightmost edge, and 2. the handling of upper outliers, now ignored rather than tallied in the rightmost bin.

Please read the docstring for more information.

If new=None (default)

The semantics of histogram has been modified in the current release to fix long-standing issues with outliers handling. The main changes concern 1. the definition of the bin edges, now including the rightmost edge, and 2. the handling of upper outliers, now ignored rather than tallied in the rightmost bin. The previous behaviour is still accessible using `new=False`, but is scheduled to be deprecated in the next release (1.3).

*This warning will not printed in the 1.3 release.*

Please read the docstring for more information.

Thanks for taking care of this. I thought that we were going to remove the new parameter in the 1.3 release. Is that still the plan? If so, shouldn't the warning state "will be removed in the next minor release (1.3)" rather than "is scheduled to be deprecated in the next release (1.3)"? In my mind the old behavior is deprecated in this release (1.2).

The roadmap that I propose is the following: 1.1 we warn about upcoming change, (new=False) 1.2 we make that change, (new=None) + warnings 1.3 we deprecate the old behaviour (new=True), no warnings. 1.4 remove the old behavior and remove the new keyword. It's pretty much the roadmap exposed in the related ticket that I wrote following discussions on the ML. This leaves plenty of time for people to make their changes, and my guess is that a lot of people will appreciate this, given that you were asked to delay the changes to histogram.

...

The 1.2 release will be longer lived (~6 months) than the 1.1 release and I anticipate several bugfix releases (1.2.1, 1.2.2, 1.2.3, etc). So I think it is reasonable to just remove the old behavior in the 1.3 release.

...

-- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Jarrod Millman

6:51 p.m.

On Tue, Aug 5, 2008 at 10:58 AM, David Huard <david.huard@gmail.com> wrote:

...

The roadmap that I propose is the following:

1.1 we warn about upcoming change, (new=False) 1.2 we make that change, (new=None) + warnings 1.3 we deprecate the old behaviour (new=True), no warnings. 1.4 remove the old behavior and remove the new keyword.

It's pretty much the roadmap exposed in the related ticket that I wrote following discussions on the ML.

This leaves plenty of time for people to make their changes, and my guess is that a lot of people will appreciate this, given that you were asked to delay the changes to histogram.

Sounds good. Thanks for the clarification. -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/

Pierre GM

8:11 p.m.

On Monday 04 August 2008 13:45:51 Jarrod Millman wrote:

...

Here are the remaining tasks that I am aware of that need to be done before tagging 1.2.0b1 on the 8th.

...

The call signature for median needs to change from def median(a, axis=0, out=None, overwrite_input=False): to def median(a, axis=None, out=None, overwrite_input=False): in both numpy/ma/extras.py and numpy/lib/function_base.py

Done for numpy.ma.extras (r5607)

6028

Age (days ago)

6029

Last active (days ago)

List overview

Download

14 comments

6 participants

participants (6)

Darren Dale
David Huard
Jarrod Millman
Pierre GM
Stéfan van der Walt
Vincent Schut

1.2 tasks

Darren Dale

Pierre GM

Darren Dale

Darren Dale

Vincent Schut

Pierre GM

tags

participants (6)