Behavior of .base
Hey all, In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list. Traditionally, .base has always pointed to None for arrays that owned their own memory and to the "most immediate" array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created. Recently this behavior was altered so that .base always points to "the original" object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object. The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to "the most original object" that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported. What are reactions to this proposal? -Travis
On Sun, Sep 30, 2012 at 9:59 PM, Travis Oliphant <travis@continuum.io> wrote:
Hey all,
In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list. Traditionally, .base has always pointed to None for arrays that owned their own memory and to the "most immediate" array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created.
Recently this behavior was altered so that .base always points to "the original" object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object.
The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to "the most original object" that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported.
What are reactions to this proposal?
-Travis
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I think the current behaviour of the .base attribute is much more stable and predictable than past behaviour. For views for instance, this makes sure you don't hold references of 'intermediate' views, but always point to the original *base* object. Also, I think a lot of internal logic depends on this behaviour, so I am not in favour of changing this back (yet) again. Also, considering that this behaviour already exists in past versions of NumPy, namely 1.6, and is very fundamental to how arrays work, I find it strange that it is now up for change in 1.7 at the last minute.
We are not talking about changing it "back". The change in 1.6 caused problems that need to be addressed. Can you clarify your concerns? The proposal is not a major change to the behavior on master, but it does fix a real issue. -- Travis Oliphant (on a mobile) 512-826-7480 On Sep 30, 2012, at 3:30 PM, Han Genuit <hangenuit@gmail.com> wrote:
On Sun, Sep 30, 2012 at 9:59 PM, Travis Oliphant <travis@continuum.io> wrote:
Hey all,
In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list. Traditionally, .base has always pointed to None for arrays that owned their own memory and to the "most immediate" array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created.
Recently this behavior was altered so that .base always points to "the original" object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object.
The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to "the most original object" that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported.
What are reactions to this proposal?
-Travis
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I think the current behaviour of the .base attribute is much more stable and predictable than past behaviour. For views for instance, this makes sure you don't hold references of 'intermediate' views, but always point to the original *base* object. Also, I think a lot of internal logic depends on this behaviour, so I am not in favour of changing this back (yet) again.
Also, considering that this behaviour already exists in past versions of NumPy, namely 1.6, and is very fundamental to how arrays work, I find it strange that it is now up for change in 1.7 at the last minute. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sun, Sep 30, 2012 at 10:35 PM, Travis Oliphant <travis@continuum.io> wrote:
We are not talking about changing it "back". The change in 1.6 caused problems that need to be addressed.
Can you clarify your concerns? The proposal is not a major change to the behavior on master, but it does fix a real issue.
-- Travis Oliphant (on a mobile) 512-826-7480
On Sep 30, 2012, at 3:30 PM, Han Genuit <hangenuit@gmail.com> wrote:
On Sun, Sep 30, 2012 at 9:59 PM, Travis Oliphant <travis@continuum.io> wrote:
Hey all,
In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list. Traditionally, .base has always pointed to None for arrays that owned their own memory and to the "most immediate" array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created.
Recently this behavior was altered so that .base always points to "the original" object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object.
The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to "the most original object" that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported.
What are reactions to this proposal?
-Travis
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I think the current behaviour of the .base attribute is much more stable and predictable than past behaviour. For views for instance, this makes sure you don't hold references of 'intermediate' views, but always point to the original *base* object. Also, I think a lot of internal logic depends on this behaviour, so I am not in favour of changing this back (yet) again.
Also, considering that this behaviour already exists in past versions of NumPy, namely 1.6, and is very fundamental to how arrays work, I find it strange that it is now up for change in 1.7 at the last minute. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Well, the current behaviour makes sure you can have an endless chain of views derived from each other without keeping a copy of each view alive. If I understand correctly, you propose to change this behaviour to where it would keep a copy of each view alive.. My concern is that the problems that occurred from the 1.6 change are now seen as paramount above a correct implementation. There are problems with backward compatibility, but most of these are due to lack of documentation and testing. And now there will be a lot of people depending on the new behaviour, which is also something to take into account.
I think you are misunderstanding the proposal. The proposal is to traverse the views as far as you can but stop just short of having base point to an object of a different type. This fixes the infinite chain of views problem but also fixes the problem sklearn was having with base pointing to an unexpected mmap object. -- Travis Oliphant (on a mobile) 512-826-7480 On Sep 30, 2012, at 3:50 PM, Han Genuit <hangenuit@gmail.com> wrote:
On Sun, Sep 30, 2012 at 10:35 PM, Travis Oliphant <travis@continuum.io> wrote:
We are not talking about changing it "back". The change in 1.6 caused problems that need to be addressed.
Can you clarify your concerns? The proposal is not a major change to the behavior on master, but it does fix a real issue.
-- Travis Oliphant (on a mobile) 512-826-7480
On Sep 30, 2012, at 3:30 PM, Han Genuit <hangenuit@gmail.com> wrote:
On Sun, Sep 30, 2012 at 9:59 PM, Travis Oliphant <travis@continuum.io> wrote:
Hey all,
In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list. Traditionally, .base has always pointed to None for arrays that owned their own memory and to the "most immediate" array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created.
Recently this behavior was altered so that .base always points to "the original" object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object.
The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to "the most original object" that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported.
What are reactions to this proposal?
-Travis
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I think the current behaviour of the .base attribute is much more stable and predictable than past behaviour. For views for instance, this makes sure you don't hold references of 'intermediate' views, but always point to the original *base* object. Also, I think a lot of internal logic depends on this behaviour, so I am not in favour of changing this back (yet) again.
Also, considering that this behaviour already exists in past versions of NumPy, namely 1.6, and is very fundamental to how arrays work, I find it strange that it is now up for change in 1.7 at the last minute. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Well, the current behaviour makes sure you can have an endless chain of views derived from each other without keeping a copy of each view alive. If I understand correctly, you propose to change this behaviour to where it would keep a copy of each view alive.. My concern is that the problems that occurred from the 1.6 change are now seen as paramount above a correct implementation. There are problems with backward compatibility, but most of these are due to lack of documentation and testing. And now there will be a lot of people depending on the new behaviour, which is also something to take into account. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sun, Sep 30, 2012 at 10:55 PM, Travis Oliphant <travis@continuum.io> wrote:
I think you are misunderstanding the proposal. The proposal is to traverse the views as far as you can but stop just short of having base point to an object of a different type.
This fixes the infinite chain of views problem but also fixes the problem sklearn was having with base pointing to an unexpected mmap object.
-- Travis Oliphant (on a mobile) 512-826-7480
On Sep 30, 2012, at 3:50 PM, Han Genuit <hangenuit@gmail.com> wrote:
On Sun, Sep 30, 2012 at 10:35 PM, Travis Oliphant <travis@continuum.io> wrote:
We are not talking about changing it "back". The change in 1.6 caused problems that need to be addressed.
Can you clarify your concerns? The proposal is not a major change to the behavior on master, but it does fix a real issue.
-- Travis Oliphant (on a mobile) 512-826-7480
On Sep 30, 2012, at 3:30 PM, Han Genuit <hangenuit@gmail.com> wrote:
On Sun, Sep 30, 2012 at 9:59 PM, Travis Oliphant <travis@continuum.io> wrote:
Hey all,
In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list. Traditionally, .base has always pointed to None for arrays that owned their own memory and to the "most immediate" array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created.
Recently this behavior was altered so that .base always points to "the original" object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object.
The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to "the most original object" that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported.
What are reactions to this proposal?
-Travis
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I think the current behaviour of the .base attribute is much more stable and predictable than past behaviour. For views for instance, this makes sure you don't hold references of 'intermediate' views, but always point to the original *base* object. Also, I think a lot of internal logic depends on this behaviour, so I am not in favour of changing this back (yet) again.
Also, considering that this behaviour already exists in past versions of NumPy, namely 1.6, and is very fundamental to how arrays work, I find it strange that it is now up for change in 1.7 at the last minute. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Well, the current behaviour makes sure you can have an endless chain of views derived from each other without keeping a copy of each view alive. If I understand correctly, you propose to change this behaviour to where it would keep a copy of each view alive.. My concern is that the problems that occurred from the 1.6 change are now seen as paramount above a correct implementation. There are problems with backward compatibility, but most of these are due to lack of documentation and testing. And now there will be a lot of people depending on the new behaviour, which is also something to take into account. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Ah, sorry, I get it. You mean to make sure that base is an object of type ndarray. No problems there. :-)
-- Travis Oliphant (on a mobile) 512-826-7480 On Sep 30, 2012, at 4:00 PM, Han Genuit <hangenuit@gmail.com> wrote:
On Sun, Sep 30, 2012 at 10:55 PM, Travis Oliphant <travis@continuum.io> wrote:
I think you are misunderstanding the proposal. The proposal is to traverse the views as far as you can but stop just short of having base point to an object of a different type.
This fixes the infinite chain of views problem but also fixes the problem sklearn was having with base pointing to an unexpected mmap object.
-- Travis Oliphant (on a mobile) 512-826-7480
On Sep 30, 2012, at 3:50 PM, Han Genuit <hangenuit@gmail.com> wrote:
On Sun, Sep 30, 2012 at 10:35 PM, Travis Oliphant <travis@continuum.io> wrote:
We are not talking about changing it "back". The change in 1.6 caused problems that need to be addressed.
Can you clarify your concerns? The proposal is not a major change to the behavior on master, but it does fix a real issue.
-- Travis Oliphant (on a mobile) 512-826-7480
On Sep 30, 2012, at 3:30 PM, Han Genuit <hangenuit@gmail.com> wrote:
On Sun, Sep 30, 2012 at 9:59 PM, Travis Oliphant <travis@continuum.io> wrote:
Hey all,
In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list. Traditionally, .base has always pointed to None for arrays that owned their own memory and to the "most immediate" array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created.
Recently this behavior was altered so that .base always points to "the original" object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object.
The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to "the most original object" that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported.
What are reactions to this proposal?
-Travis
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I think the current behaviour of the .base attribute is much more stable and predictable than past behaviour. For views for instance, this makes sure you don't hold references of 'intermediate' views, but always point to the original *base* object. Also, I think a lot of internal logic depends on this behaviour, so I am not in favour of changing this back (yet) again.
Also, considering that this behaviour already exists in past versions of NumPy, namely 1.6, and is very fundamental to how arrays work, I find it strange that it is now up for change in 1.7 at the last minute. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Well, the current behaviour makes sure you can have an endless chain of views derived from each other without keeping a copy of each view alive. If I understand correctly, you propose to change this behaviour to where it would keep a copy of each view alive.. My concern is that the problems that occurred from the 1.6 change are now seen as paramount above a correct implementation. There are problems with backward compatibility, but most of these are due to lack of documentation and testing. And now there will be a lot of people depending on the new behaviour, which is also something to take into account. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Ah, sorry, I get it. You mean to make sure that base is an object of type ndarray. No problems there. :-)
Yes. Exactly. I realize I didn't explain it very well. For a subtype it would ensure base is a subtype. Thanks for feedback. Travis
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sun, Sep 30, 2012 at 10:30:52PM +0200, Han Genuit wrote:
Also, considering that this behaviour already exists in past versions of NumPy, namely 1.6,
I just checked: in numpy 1.6.1, the behaviour is to create an endless chain of base.base.base... In some sens, what Travis is proposing is going one step in the direction of the old behavior, without its major drawbacks. I am actually very favorable to his suggestion. My 2 cents, Gaël
On Sun, Sep 30, 2012 at 1:59 PM, Travis Oliphant <travis@continuum.io>wrote:
Hey all,
In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list. Traditionally, .base has always pointed to None for arrays that owned their own memory and to the "most immediate" array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created.
Recently this behavior was altered so that .base always points to "the original" object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object.
The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to "the most original object" that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported.
What are reactions to this proposal?
It sounds like this would solve the problem in the short term, but it is a bit of a hack in that the behaviour is more complicated than either the original or the current version. So I could see this in 1.7, but it might be preferable in the long term to work out what attributes are needed to solve Gael's problem more directly. Chuck
On Sun, Sep 30, 2012 at 8:30 PM, Charles R Harris <charlesr.harris@gmail.com
wrote:
On Sun, Sep 30, 2012 at 1:59 PM, Travis Oliphant <travis@continuum.io>wrote:
Hey all,
In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list. Traditionally, .base has always pointed to None for arrays that owned their own memory and to the "most immediate" array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created.
Recently this behavior was altered so that .base always points to "the original" object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object.
The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to "the most original object" that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported.
What are reactions to this proposal?
It sounds like this would solve the problem in the short term, but it is a bit of a hack in that the behaviour is more complicated than either the original or the current version. So I could see this in 1.7, but it might be preferable in the long term to work out what attributes are needed to solve Gael's problem more directly.
Although I think the proposal needs to be laid out more exactly with more details in order to understand what it is. Perhaps an explanation of the problem with an explanation of how it is solved. A diagram would be helpful and could go into the documentation. Chuck
Chuck
It sounds like there are no objections and this has a strong chance to fix the problems. We will put it on the TODO list for 1.7.0 release. -Travis On Sep 30, 2012, at 9:30 PM, Charles R Harris wrote:
On Sun, Sep 30, 2012 at 1:59 PM, Travis Oliphant <travis@continuum.io> wrote: Hey all,
In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list. Traditionally, .base has always pointed to None for arrays that owned their own memory and to the "most immediate" array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created.
Recently this behavior was altered so that .base always points to "the original" object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object.
The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to "the most original object" that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported.
What are reactions to this proposal?
It sounds like this would solve the problem in the short term, but it is a bit of a hack in that the behaviour is more complicated than either the original or the current version. So I could see this in 1.7, but it might be preferable in the long term to work out what attributes are needed to solve Gael's problem more directly.
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sun, Sep 30, 2012 at 8:59 PM, Travis Oliphant <travis@continuum.io> wrote:
Hey all,
In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list. Traditionally, .base has always pointed to None for arrays that owned their own memory and to the "most immediate" array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created.
To be *completely* accurate, I'd say that they've always pointed to some object that owned the underlying memory. Usually that's an ndarray, but sometimes that's a thing exposing the buffer interface, sometimes it's a thing exposing __array_interface__, sometimes it's a mmap object, sometimes it's some random ad hoc C-level wrapper object[1], etc. [1] e.g. https://github.com/njsmith/scikits-sparse/blob/master/scikits/sparse/cholmod...
Recently this behavior was altered so that .base always points to "the original" object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object.
The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to "the most original object" that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported.
What are reactions to this proposal?
As a band-aid to avoid breaking some code in 1.7, it seems reasonable to me. I was actually considering proposing basically the same idea. But it's only a band-aid; the larger problem is that we don't *know* what semantics people are relying on for "base" (and probably aren't implementing the ones people think we are, either before or after this change). As an example of how messy this is: do you know whether Gael's code will still work, after we make this fix, if someone uses as_strided() on a (view of a) memmap array? Answer: as_strided() creates an ndarray view on an ad-hoc object with __array_interface__ attribute, and this dummy object ends up as the returned ndarray's .base. According to the proposed rule, the .base chain collapsing will stop at this point. So it isn't true that an array that is ultimately backed by mmap will have a .memmap() array as its .base. However, if you read stride_tricks.py, it turns out the dummy object as_strided makes does happen to use the name ".base" for its attribute holding the original array, so Gael's code will work correctly in this case iff he keeps the .base walking code in place (which would otherwise serve no purpose after Travis' change). Anyway, my point is: If we have to carefully analyze interactions between code in numpy.lib.stride_tricks, numpy.core.memmap, and a third-party library, just to figure out which sorts of reference-counting changes are correct in the core ndarray object, then we have a problem. This is horrible cross-coupling, the sort of thing that, if allowed to proliferate, makes it impossible to ever know whether code is correct or not. So even if we put in a band-aid for 1.7, we really don't want to be guaranteeing this kind of stuff forever, and should aggressively encourage people to stop using .base in these ways. The mmap thing should really switch to something more reliable and less tightly coupled to the rest of the code all over numpy, like I described here: http://mail.scipy.org/pipermail/numpy-discussion/2012-September/064003.html How can we discourage people from doing this in the future? Can we make .base write-only from the Python level (with suitable deprecation period)? Rename it to ._base (likewise) so that it's still possible to peek under the covers but we remind people that it's really an implementation detail with poorly defined semantics that might change? -n
On Mon, Oct 1, 2012 at 6:20 AM, Nathaniel Smith <njs@pobox.com> wrote:
Hey all,
In a github-discussion with Gael and Nathaniel, we came up with a
On Sun, Sep 30, 2012 at 8:59 PM, Travis Oliphant <travis@continuum.io> wrote: proposal for .base that we should put before this list. Traditionally, .base has always pointed to None for arrays that owned their own memory and to the "most immediate" array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created.
To be *completely* accurate, I'd say that they've always pointed to some object that owned the underlying memory. Usually that's an ndarray, but sometimes that's a thing exposing the buffer interface, sometimes it's a thing exposing __array_interface__, sometimes it's a mmap object, sometimes it's some random ad hoc C-level wrapper object[1], etc.
[1] e.g. https://github.com/njsmith/scikits-sparse/blob/master/scikits/sparse/cholmod...
Recently this behavior was altered so that .base always points to "the original" object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object.
The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to "the most original object" that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported.
What are reactions to this proposal?
As a band-aid to avoid breaking some code in 1.7, it seems reasonable to me. I was actually considering proposing basically the same idea. But it's only a band-aid; the larger problem is that we don't *know* what semantics people are relying on for "base" (and probably aren't implementing the ones people think we are, either before or after this change).
As an example of how messy this is: do you know whether Gael's code will still work, after we make this fix, if someone uses as_strided() on a (view of a) memmap array?
Answer: as_strided() creates an ndarray view on an ad-hoc object with __array_interface__ attribute, and this dummy object ends up as the returned ndarray's .base. According to the proposed rule, the .base chain collapsing will stop at this point. So it isn't true that an array that is ultimately backed by mmap will have a .memmap() array as its .base. However, if you read stride_tricks.py, it turns out the dummy object as_strided makes does happen to use the name ".base" for its attribute holding the original array, so Gael's code will work correctly in this case iff he keeps the .base walking code in place (which would otherwise serve no purpose after Travis' change).
Anyway, my point is: If we have to carefully analyze interactions between code in numpy.lib.stride_tricks, numpy.core.memmap, and a third-party library, just to figure out which sorts of reference-counting changes are correct in the core ndarray object, then we have a problem. This is horrible cross-coupling, the sort of thing that, if allowed to proliferate, makes it impossible to ever know whether code is correct or not.
So even if we put in a band-aid for 1.7, we really don't want to be guaranteeing this kind of stuff forever, and should aggressively encourage people to stop using .base in these ways. The mmap thing should really switch to something more reliable and less tightly coupled to the rest of the code all over numpy, like I described here:
http://mail.scipy.org/pipermail/numpy-discussion/2012-September/064003.html
How can we discourage people from doing this in the future? Can we make .base write-only from the Python level (with suitable deprecation period)? Rename it to ._base (likewise) so that it's still possible to peek under the covers but we remind people that it's really an implementation detail with poorly defined semantics that might change?
Well said. This reminds me of the fellow who used genetic programming to design an algorithm for a signal processing chip and discovered that the result was making use of some stray capacitance present on the chip. Here users such as Gael are the genetic programmers and .base is the stray capacitance. I tend to the ._base idea, but I think this needs to be addressed in detail. Chuck
On Mon, Oct 1, 2012 at 8:20 AM, Nathaniel Smith <njs@pobox.com> wrote:
[...] How can we discourage people from doing this in the future? Can we make .base write-only from the Python level (with suitable deprecation period)? Rename it to ._base (likewise) so that it's still possible to peek under the covers but we remind people that it's really an implementation detail with poorly defined semantics that might change?
Could we use the simpler .base behavior (fully collapsing the .base chain), but be more aggressive about propagating information like address/filename/offset for np.arrays that are created by slicing, asarray(), etc.? Ray (Sorry if I'm missing some context that makes this suggestion idiotic. I'm still trying to catch back up on the list and may have missed relevant discussion on other threads.)
On Mon, Oct 1, 2012 at 8:40 AM, Thouis (Ray) Jones <thouis@gmail.com> wrote:
On Mon, Oct 1, 2012 at 8:20 AM, Nathaniel Smith <njs@pobox.com> wrote:
[...] How can we discourage people from doing this in the future? Can we make .base write-only from the Python level (with suitable deprecation period)? Rename it to ._base (likewise) so that it's still possible to peek under the covers but we remind people that it's really an implementation detail with poorly defined semantics that might change?
Could we use the simpler .base behavior (fully collapsing the .base chain), but be more aggressive about propagating information like address/filename/offset for np.arrays that are created by slicing, asarray(), etc.?
Ray (Sorry if I'm missing some context that makes this suggestion idiotic. I'm still trying to catch back up on the list and may have missed relevant discussion on other threads.)
It might be productive to step back a bit and ask if this is a memmap problem or a workflow problem. My impression is that pickling memmaps is a solution to a higher level problem in Scikits.learn workflow and I'd like more details on what that problem is. Chuck
On 10/01/2012 04:56 PM, Charles R Harris wrote:
On Mon, Oct 1, 2012 at 8:40 AM, Thouis (Ray) Jones <thouis@gmail.com <mailto:thouis@gmail.com>> wrote:
On Mon, Oct 1, 2012 at 8:20 AM, Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote: > [...] > How can we discourage people from doing this in the future? Can we > make .base write-only from the Python level (with suitable deprecation > period)? Rename it to ._base (likewise) so that it's still possible to > peek under the covers but we remind people that it's really an > implementation detail with poorly defined semantics that might change?
Could we use the simpler .base behavior (fully collapsing the .base chain), but be more aggressive about propagating information like address/filename/offset for np.arrays that are created by slicing, asarray(), etc.?
Ray (Sorry if I'm missing some context that makes this suggestion idiotic. I'm still trying to catch back up on the list and may have missed relevant discussion on other threads.)
It might be productive to step back a bit and ask if this is a memmap problem or a workflow problem. My impression is that pickling memmaps is a solution to a higher level problem in Scikits.learn workflow and I'd like more details on what that problem is.
I'm not scikits-learn, but I'm pretty sure this is about wanting to use multiprocessing to parallelise code. You send pickled views of arrays, but the memory is shared amongst all processes (using either a file, or process shared memory). It would be cool to have some support for this in NumPy itself. The scikits-learn people should chime in here, but a suggestion: # pickles by reference to process-shared memory, or raises an exception # if memory can't be process-shared s = dumps(arr.byref) # in another process: arr = loads(s) Of course, *real* fixes would be to remove the GIL, or push forward the work in CPython on multiple independent interpreters in the same process. But that's rather more difficult. Dag Sverre
On Mon, Oct 1, 2012 at 3:40 PM, Thouis (Ray) Jones <thouis@gmail.com> wrote:
On Mon, Oct 1, 2012 at 8:20 AM, Nathaniel Smith <njs@pobox.com> wrote:
[...] How can we discourage people from doing this in the future? Can we make .base write-only from the Python level (with suitable deprecation period)? Rename it to ._base (likewise) so that it's still possible to peek under the covers but we remind people that it's really an implementation detail with poorly defined semantics that might change?
Could we use the simpler .base behavior (fully collapsing the .base chain), but be more aggressive about propagating information like address/filename/offset for np.arrays that are created by slicing, asarray(), etc.?
There are definitely other solutions to the memmap pickling problem that don't rely on the semantics of .base at all, yeah. I don't particularly like the one you suggest, because it requires that many pieces of code in many places all be careful to preserve this information, whereas keeping a global table of the process's active memory maps requires only a single piece of new code in the memmap module (and will be more reliable to boot). But strictly speaking the details here are irrelevant to the discussion about .base itself. There are two questions about .base: 1) Is it okay to break people's code when we release 1.7, given that they have relied on this behaviour that they probably ought not to have? 2) Can we untangle things enough that making changes to .base *won't* break people's code, so that we don't end up having to ask question (1) again in the future? I'm totally happy with deciding that we need to band-aid the 1.7 release b/c working code is working code and breaking it isn't okay, so long as we also address question (2). -n
On 09/30/2012 03:59 PM, Travis Oliphant wrote:
Hey all,
In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list. Traditionally, .base has always pointed to None for arrays that owned their own memory and to the "most immediate" array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created.
Recently this behavior was altered so that .base always points to "the original" object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object.
The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to "the most original object" that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported.
What are reactions to this proposal?
In the past, I've relied on putting arbitrary Python objects in .base in my C++ to NumPy conversion code to make sure reference counting for array memory works properly. In particular, I've used Python CObjects that hold boost::shared_ptrs, which don't even have a buffer interface. So it sounds like I may be a few steps behind on the rules of what actually should go in .base. I'm very concerned that if we do demand that .base always point to a NumPy array (rather than an arbitrary Python object or even just one with a buffer interface), there's no longer any way for a NumPy array to hold data allocated by something other than NumPy. If I want to put external memory in a NumPy array and indicate that it's owned by some non-NumPy Python object, what is the recommended way to do that? Thanks! Jim Bosch
On Oct 1, 2012, at 9:11 AM, Jim Bosch wrote:
On 09/30/2012 03:59 PM, Travis Oliphant wrote:
Hey all,
In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list. Traditionally, .base has always pointed to None for arrays that owned their own memory and to the "most immediate" array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created.
Recently this behavior was altered so that .base always points to "the original" object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object.
The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to "the most original object" that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported.
What are reactions to this proposal?
In the past, I've relied on putting arbitrary Python objects in .base in my C++ to NumPy conversion code to make sure reference counting for array memory works properly. In particular, I've used Python CObjects that hold boost::shared_ptrs, which don't even have a buffer interface. So it sounds like I may be a few steps behind on the rules of what actually should go in .base.
This should still work, nothing has been proposed to change this use-case.
I'm very concerned that if we do demand that .base always point to a NumPy array (rather than an arbitrary Python object or even just one with a buffer interface), there's no longer any way for a NumPy array to hold data allocated by something other than NumPy.
I don't recall a suggestion to demand that .base always point to a NumPy array. The suggestion is that a view of a view of an array that has your boost::shared_ptr as a PyCObject pointed to by base will have it's base point to the first array instead of the PyCObject (as the recent change made).
If I want to put external memory in a NumPy array and indicate that it's owned by some non-NumPy Python object, what is the recommended way to do that?
The approach you took is still the way I would recommend doing that. There may be other suggestions. -Travis
participants (8)
-
Charles R Harris
-
Dag Sverre Seljebotn
-
Gael Varoquaux
-
Han Genuit
-
Jim Bosch
-
Nathaniel Smith
-
Thouis (Ray) Jones
-
Travis Oliphant