I'm working on getting some old code working with numpy and I noticed that bool_ is not a subclass of int. Given that python's bool subclasses into and that the other scalar types are subclasses of their respective counterparts it seems at first glance that numpy.bool_ should subclass python's bool, which in turn subclasses int. Or am I missing something here? -- . __ . |-\ . . tim.hochberg@ieee.org
Timothy Hochberg wrote:
I'm working on getting some old code working with numpy and I noticed that bool_ is not a subclass of int. Given that python's bool subclasses into and that the other scalar types are subclasses of their respective counterparts it seems at first glance that numpy.bool_ should subclass python's bool, which in turn subclasses int. Or am I missing something here?
That would certainly be desirable. There might be a technical reason why it's not, but if you can do it, and it seems to work for you, let's check it in. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Timothy Hochberg wrote:
I'm working on getting some old code working with numpy and I noticed that bool_ is not a subclass of int. Given that python's bool subclasses into and that the other scalar types are subclasses of their respective counterparts it seems at first glance that numpy.bool_ should subclass python's bool, which in turn subclasses int. Or am I missing something here?
The reason it is not, is because it is not binary compatible with Python's integer. The numpy bool_ is always only 8-bits while the Python integer is 32-bits or 64-bits. This could be changed I suspect, but then it would break the relationship between scalars and their array counterparts and I'm sure we would not want to bump up all bool arrays to 32 or 64-bits. -Travis
On 7/6/07, Travis Oliphant <oliphant.travis@ieee.org> wrote:
Timothy Hochberg wrote:
I'm working on getting some old code working with numpy and I noticed that bool_ is not a subclass of int. Given that python's bool subclasses into and that the other scalar types are subclasses of their respective counterparts it seems at first glance that numpy.bool_ should subclass python's bool, which in turn subclasses int. Or am I missing something here?
The reason it is not, is because it is not binary compatible with Python's integer. The numpy bool_ is always only 8-bits while the Python integer is 32-bits or 64-bits.
This could be changed I suspect, but then it would break the relationship between scalars and their array counterparts
Do you have and idea off the top of your head head how painful this would be from an implementation standpoint. And is there a theoretical reason that it is important that the scalar and array implementations match? I would think that, conceptually, they are all 1-bit integers, and it seems that the 8-bit, versus 32- or 64-bits is just an implementation detail. My case is not particularly pressing or important, but I have a feeling that this is going to bite other people eventually. In particular, if you pull a value out of a boolean array and pass it to some third party module that doesn't know about numpy. If that function is doing some sort of check on argument type, which while not common does happen, then it will fail. The workaround is straightforward of course, simply apply bool to scalars when you get them back if you're going to be passing them to a finicky function. That's kind of clunky and surprising though. and I'm sure
we would not want to bump up all bool arrays to 32 or 64-bits.
No. I wouldn't think so. -- . __ . |-\ . . tim.hochberg@ieee.org
On 7/6/07, *Travis Oliphant* <oliphant.travis@ieee.org <mailto:oliphant.travis@ieee.org>> wrote:
Timothy Hochberg wrote: > > I'm working on getting some old code working with numpy and I noticed > that bool_ is not a subclass of int. Given that python's bool > subclasses into and that the other scalar types are subclasses of > their respective counterparts it seems at first glance that > numpy.bool_ should subclass python's bool, which in turn subclasses > int. Or am I missing something here? The reason it is not, is because it is not binary compatible with Python's integer. The numpy bool_ is always only 8-bits while the Python integer is 32-bits or 64-bits.
This could be changed I suspect, but then it would break the relationship between scalars and their array counterparts
Do you have and idea off the top of your head head how painful this would be from an implementation standpoint. And is there a theoretical reason that it is important that the scalar and array implementations match? I would think that, conceptually, they are all 1-bit integers, and it seems that the 8-bit, versus 32- or 64-bits is just an implementation detail.
It would probably take about 2-3 hours to make the change and about 3 more hours to fix the problems that were not anticipated. Basically, we would have to special-case the bool like we do the unicode scalar (which also doesn't necessarily match the array-based representation but instead follows the Python implementation). I guess I don't really see a problem in switching just the numpy.bool_ scalar to be a sub-class of the Python bool type and adjusting the code to make the switch when creating a scalar. -Travis
On 7/7/07, Travis Oliphant <oliphant.travis@ieee.org> wrote:
On 7/6/07, *Travis Oliphant* <oliphant.travis@ieee.org <mailto:oliphant.travis@ieee.org>> wrote:
Timothy Hochberg wrote: > > I'm working on getting some old code working with numpy and I noticed > that bool_ is not a subclass of int. Given that python's bool > subclasses into and that the other scalar types are subclasses of > their respective counterparts it seems at first glance that > numpy.bool_ should subclass python's bool, which in turn
subclasses
> int. Or am I missing something here? The reason it is not, is because it is not binary compatible with Python's integer. The numpy bool_ is always only 8-bits while the Python integer is 32-bits or 64-bits.
This could be changed I suspect, but then it would break the relationship between scalars and their array counterparts
Do you have and idea off the top of your head head how painful this would be from an implementation standpoint. And is there a theoretical reason that it is important that the scalar and array implementations match? I would think that, conceptually, they are all 1-bit integers, and it seems that the 8-bit, versus 32- or 64-bits is just an implementation detail.
It would probably take about 2-3 hours to make the change and about 3 more hours to fix the problems that were not anticipated. Basically, we would have to special-case the bool like we do the unicode scalar (which also doesn't necessarily match the array-based representation but instead follows the Python implementation).
I guess I don't really see a problem in switching just the numpy.bool_ scalar to be a sub-class of the Python bool type and adjusting the code to make the switch when creating a scalar.
Thanks for info. I'll put this on my list of things to look into, although it may take me a few weeks to get around to it, depending on how busy next week is. I don't see this as urgent, but it seems like a good change to make going forward. -- . __ . |-\ . . tim.hochberg@ieee.org
On 7/7/07, Timothy Hochberg <tim.hochberg@ieee.org> wrote:
On 7/7/07, Travis Oliphant <oliphant.travis@ieee.org> wrote:
On 7/6/07, *Travis Oliphant* <oliphant.travis@ieee.org <mailto:oliphant.travis@ieee.org>> wrote:
<snip> Here is a link to PEP 285 <http://www.python.org/dev/peps/pep-0285/> where Guido discusses his reasoning about the bool type. I note that boolean arrays behave as integers under addition of a scalar, but not under addition of boolean arrays, where '+' seems to mean 'or'. The latter looks inconsistent with the Python convention. In [60]: a Out[60]: array([ True, True, True, True], dtype=bool) In [61]: a + a Out[61]: array([ True, True, True, True], dtype=bool) In [62]: a + 1 Out[62]: array([2, 2, 2, 2]) In [66]: True + True Out[66]: 2 Now might be a good time to discuss and document these choices. Chuck
On Sat, 7 Jul 2007, Charles R Harris apparently wrote:
In [60]: a Out[60]: array([ True, True, True, True], dtype=bool) In [61]: a + a Out[61]: array([ True, True, True, True], dtype=bool)
Yea! Behaves like a boolean array. And for multiplication to. And in boolean matrices, powers work right. (I use this.)
In [62]: a + 1 Out[62]: array([2, 2, 2, 2])
Yea! Coercion to int, as expected.
In [66]: True + True Out[66]: 2
Boo! Hopefully Python will "fix" this one day. Cheers, Alan Isaac
On 7/7/07, Alan G Isaac <aisaac@american.edu> wrote:
On Sat, 7 Jul 2007, Charles R Harris apparently wrote:
In [60]: a Out[60]: array([ True, True, True, True], dtype=bool) In [61]: a + a Out[61]: array([ True, True, True, True], dtype=bool)
Yea! Behaves like a boolean array. And for multiplication to. And in boolean matrices, powers work right. (I use this.)
In [62]: a + 1 Out[62]: array([2, 2, 2, 2])
Yea! Coercion to int, as expected.
In [66]: True + True Out[66]: 2
Boo! Hopefully Python will "fix" this one day.
It will almost certainly not. And the fact that numpy and Python are inconsistent this way gives my the creeps. Why not simply use & and | instead of + and *? -- . __ . |-\ . . tim.hochberg@ieee.org
On Mon, 9 Jul 2007, Timothy Hochberg apparently wrote:
Why not simply use & and | instead of + and *?
A couple reasons, none determinative. 1. numpy is right a Python is wrong it this case (but granted, I would usually go with Python is such cases) 2. consistency with Boolean matrices Elaboration on 2: Boolean matrices currently behave as expected, with standard notation. Related to this, they handle exponents correctly. Suppose arrays are changed as you suggest. Then either - array behavior and matrix behavior are decoupled, or - matrix behavior is completely broken for boolen matrices Alan Isaac PS Examples of good behavior:
x matrix([[True, True], [True, False]], dtype=bool) y matrix([[False, True], [True, False]], dtype=bool) x*y matrix([[True, True], [False, True]], dtype=bool) x**2 matrix([[True, True], [True, True]], dtype=bool)
On 7/9/07, Alan G Isaac <aisaac@american.edu> wrote:
On Mon, 9 Jul 2007, Timothy Hochberg apparently wrote:
Why not simply use & and | instead of + and *?
A couple reasons, none determinative. 1. numpy is right a Python is wrong it this case
I don't think I agree with this. Once you've decided to make Boolean a subclass of Int, then Python's behavior seems to be the most sensible. One could argue (and people did) about whether that was a good choice, but it's useful for a lot of practical applications. In any event, given that Boolean subclasses Int, I think the current behavior is probably for the best. (but granted, I would usually go with Python is such cases)
2. consistency with Boolean matrices
OK. I sort of read past the fact that you were referring to matrices not arrays. This doesn't matter to me personally because I don't use the matrix class. I do do matrix algebra on occasion, but the matrix class has never been helpful for me. YMMV. Elaboration on 2:
Boolean matrices currently behave as expected, with standard notation. Related to this, they handle exponents correctly.
Suppose arrays are changed as you suggest. Then either - array behavior and matrix behavior are decoupled, or - matrix behavior is completely broken for boolen matrices
Alan Isaac
PS Examples of good behavior:
x matrix([[True, True], [True, False]], dtype=bool) y matrix([[False, True], [True, False]], dtype=bool) x*y matrix([[True, True], [False, True]], dtype=bool) x**2 matrix([[True, True], [True, True]], dtype=bool)
x*y and x**2 are already decoupled for arrays and matrices. What if x*y was simply defined to do a boolean matrix multiply when the arguments are boolean matrices? I don't care about this that much though, so I'll let it drop. -- . __ . |-\ . . tim.hochberg@ieee.org
Hi, On Mon, 9 Jul 2007, Timothy Hochberg apparently wrote:
Why not simply use & and | instead of + and *?
A couple reasons, none determinative. 1. numpy is right a Python is wrong it this case
I don't think I agree with this. Once you've decided to make Boolean a subclass of Int, then Python's behavior seems to be the most sensible. One could argue (and people did) about whether that was a good choice, but it's useful for a lot of practical applications. In any event, given that Boolean subclasses Int, I think the current behavior is probably for the best.
If bool subclasses int, this does not enforce True+True=2. Never. Boolean operation live in the Boole algebra and that's it. It's not the case with integers that cannot be represented with int. Now, if you take the algebra point of view, which is the point here, for a scientific application, you have to have True+True = True. Matthieu
On 7/10/07, Matthieu Brucher <matthieu.brucher@gmail.com> wrote:
Hi,
On Mon, 9 Jul 2007, Timothy Hochberg apparently wrote:
Why not simply use & and | instead of + and *?
A couple reasons, none determinative. 1. numpy is right a Python is wrong it this case
I don't think I agree with this. Once you've decided to make Boolean a
subclass of Int, then Python's behavior seems to be the most sensible. One could argue (and people did) about whether that was a good choice, but it's useful for a lot of practical applications. In any event, given that Boolean subclasses Int, I think the current behavior is probably for the best.
If bool subclasses int, this does not enforce True+True=2. Never. Boolean operation live in the Boole algebra and that's it. It's not the case with integers that cannot be represented with int. Now, if you take the algebra point of view, which is the point here, for a scientific application, you have to have True+True = True. Matthieu
When you talk about algebra - one might have to restrict one self to '|' and '&' -- not use '+' and '-' E.g.: True - True = False # right !? # but if: True+True = True. # then True+True -False = True -False # ???? # here I'm already lost ... I don't think this can be done in a consistent way. In other words: a "+" operator would also need a corresponding "-" operator, and that will just look funny. I think if you want algebra, you should restrict yourself to "|" (or) and "&" (and) My two cents, Sebastian
When you talk about algebra - one might have to restrict one self to '|' and '&' -- not use '+' and '-' E.g.: True - True = False # right !?
Not exactly because - True = + True So True - True = True + True = True You have to stay in the algebra the whole time. # but if:
True+True = True. # then True+True -False = True -False # ???? # here I'm already lost ... I don't think this can be done in a consistent way.
In other words: a "+" operator would also need a corresponding "-" operator, and that will just look funny. I think if you want algebra, you should restrict yourself to "|" (or) and "&" (and)
When you make computation in the Bool algebra, you use + and * in every math book. In IT books, you see | and &. As Numpy is scientists oriented, I suppose that the definition of + and * is correct. Matthieu
On Tue, Jul 10, 2007 at 02:39:28PM +0200, Sebastian Haase wrote:
When you talk about algebra - one might have to restrict one self to '|' and '&' -- not use '+' and '-' E.g.: True - True = False # right !? # but if: True+True = True. # then True+True -False = True -False # ???? # here I'm already lost ... I don't think this can be done in a consistent way.
It can, its called the Bool algebra, and it is a consistent algebra, in a mathematical sense of algebra (http://en.wikipedia.org/wiki/Boolean_algebra), actually what we are talking about is the two element bool algebra (http://en.wikipedia.org/wiki/Two-element_Boolean_algebra), and the mathematical structure we are taling about is a ring, the wikipedia article is quite comprehensible (http://en.wikipedia.org/wiki/Ring_(mathematics))
In other words: a "+" operator would also need a corresponding "-"
Yes. In other words (the ensemble theory words) each element needs to have an opposite concerning the '+' law. To understand this you need a bit of algebra theory. * An algebra has 2 laws, lets call them "+" and "*". * Each law has a neutral element for this law, ie an element a for which "a + b = b" for all b in the algebra, lets write these "n+", and "n*". * Each element a is required to have an inverse for the "+", ie an element b for wich b + a = n+, lets write the opposite of b "-b". For integer, n+ = 0, n* = 1. For Booleans, n+ = False, and n+ = True, therefore, as Matthieu points out, -True = True, as True + True = n+ = True, and -False = True, as True + False = n+ = True. So you have a consistent algebra. Now there is a law for which every element does not have an inverse, it the "*" law. You can check the out for integers. It is also true for booleans. In fact, you can proove that in an ring, n+ cannot have an inverse for the * law (it the famous divide by zero error !). In conclusion, I would like to stress that, yes, +, - and * are well defined on booleans, the definition is universal, and please don't try to change it. Gaël
On 7/10/07, Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
On Tue, Jul 10, 2007 at 02:39:28PM +0200, Sebastian Haase wrote:
When you talk about algebra - one might have to restrict one self to '|' and '&' -- not use '+' and '-' E.g.: True - True = False # right !? # but if: True+True = True. # then True+True -False = True -False # ???? # here I'm already lost ... I don't think this can be done in a consistent way.
It can, its called the Bool algebra, and it is a consistent algebra, in a mathematical sense of algebra (http://en.wikipedia.org/wiki/Boolean_algebra), actually what we are talking about is the two element bool algebra (http://en.wikipedia.org/wiki/Two-element_Boolean_algebra), and the mathematical structure we are taling about is a ring, the wikipedia article is quite comprehensible (http://en.wikipedia.org/wiki/Ring_(mathematics))
In other words: a "+" operator would also need a corresponding "-"
Yes. In other words (the ensemble theory words) each element needs to have an opposite concerning the '+' law. To understand this you need a bit of algebra theory.
* An algebra has 2 laws, lets call them "+" and "*".
* Each law has a neutral element for this law, ie an element a for which "a + b = b" for all b in the algebra, lets write these "n+", and "n*".
* Each element a is required to have an inverse for the "+", ie an element b for wich b + a = n+, lets write the opposite of b "-b".
For integer, n+ = 0, n* = 1.
For Booleans, n+ = False, and n+ = True, therefore, as Matthieu points out, -True = True, as True + True = n+ = True, and -False = True, as True + False = n+ = True.
So you have a consistent algebra.
Now there is a law for which every element does not have an inverse, it the "*" law. You can check the out for integers. It is also true for booleans. In fact, you can proove that in an ring, n+ cannot have an inverse for the * law (it the famous divide by zero error !).
In conclusion, I would like to stress that, yes, +, - and * are well defined on booleans, the definition is universal, and please don't try to change it.
The proper additive operation to make boolean algebra a ring is 'xor', so that 1 becomes its own inverse. Same thing in sigma rings, where folks used to use exclusive union just to make the algebra to work. But plain 'or' and 'union' work fine and are more intuitive even if they don't give the ring structure. Chuck
I found Gael's presentation rather puzzling for two reasons. 1. It appears to contain a `+` vs. `*` confusion. See http://en.wikipedia.org/wiki/Two-element_Boolean_algebra 2. MUCH more importantly: In implementations of TWO, we interpret `-` as unary complementation (not e.g. as additive inverse; note True does not have one). So -True is False -False is True This matches numpy:
-N.array([False]) array([True], dtype=bool) -N.array([True]) array([False], dtype=bool)
This is a GOOD THING. However, a-b should then just be shorthand for a+(-b). Here numpy does not in my opinion behave correctly:
N.array([False])-N.array([True]) array([True], dtype=bool) N.array([False])+(-N.array([True])) array([False], dtype=bool)
The second answer is the right one, in this context. I would call this second answer a bug. Cheers, Alan Isaac
On Tue, Jul 10, 2007 at 10:36:55AM -0400, Alan G Isaac wrote:
I found Gael's presentation rather puzzling for two reasons.
1. It appears to contain a `+` vs. `*` confusion. See http://en.wikipedia.org/wiki/Two-element_Boolean_algebra
Damn it. I used math conventions, for "+" and "*" (in math the "+" law of a ring is the law for which every element has an inverse). I hadn't realized it was the opposite for intuitive understanding of booleans.
2. MUCH more importantly: In implementations of TWO, we interpret `-` as unary complementation (not e.g. as additive inverse; note True does not have one).
Yes, indeed, as the law for which every element has an inverse is "*", the inverse for the "+" is not defined, and therefore the "-" sign cannot design it. You are quite right that it is impossible to define "-" on the boolean set in a way that makes it follow tradition integer operations. I don't know what the conclusion of this should be in terms of the original discussion. Sorry for the noise. Gaël
Hi Gael, More important is the following. On Tue, 10 Jul 2007, Alan G Isaac apparently wrote:
N.array([False])-N.array([True]) array([True], dtype=bool) N.array([False])+(-N.array([True])) array([False], dtype=bool)
The second answer is the right one, in this context. I would call this [first!!!] answer a bug.
Do you agree that the first (!!!) answer is a bug? (The basis is apparently performed as follows: integer array subtraction is first performed, and then nonzero ints are converted to True. But this gives the wrong answer and most critically breaks the equivalence of a-b and a+(-b).) Cheers, Alan
On Tue, Jul 10, 2007 at 11:31:35AM -0400, Alan G Isaac wrote:
Do you agree that the first (!!!) answer is a bug? (The basis is apparently performed as follows: integer array subtraction is first performed, and then nonzero ints are converted to True. But this gives the wrong answer and most critically breaks the equivalence of a-b and a+(-b).)
OK, putting aside the useless maths, I agree that specifically having a-b != a+(-b) If numpy developpers agree, I think the proper solution is : """ def __sub__(self, b): return self.__add__(-b) """ I think to should allow to have more or less consistent operations. Gaël
[CHOP: lots of examples] It looks like bool_s could use some general rejiggering. Let me put forth a concrete proposal that's based on matching bool_ behaviour to that of Python's bools. There is another route that could be taken where bool_ and bool are completely decoupled, but I'll skip over that for now since I don't really think it's a good idea. 1. +,- are arithmetic operators and return ints not booleans 2. *,** are arithmetic operators on scalars and arrays and return ints as above. 3. &,|,^ are the logical operators and return booleans. 4. *,** are defined on matrices to perform logical matrix multiplication and exponation. This seems like the simplest route towards something that is both internally self consistent and consistent with Python. -- . __ . |-\ . . tim.hochberg@ieee.org
On 7/10/07, Timothy Hochberg <tim.hochberg@ieee.org> wrote:
[CHOP: lots of examples]
It looks like bool_s could use some general rejiggering. Let me put forth a concrete proposal that's based on matching bool_ behaviour to that of Python's bools. There is another route that could be taken where bool_ and bool are completely decoupled, but I'll skip over that for now since I don't really think it's a good idea.
1. +,- are arithmetic operators and return ints not booleans 2. *,** are arithmetic operators on scalars and arrays and return ints as above. 3. &,|,^ are the logical operators and return booleans. 4. *,** are defined on matrices to perform logical matrix multiplication and exponation.
This seems like the simplest route towards something that is both internally self consistent and consistent with Python.
Looks good to me. At least it would make things consistent with bool_ being a subclass of integers if we go that way. Chuck
On Tue, 10 Jul 2007, Timothy Hochberg wrote:
1. +,- are arithmetic operators and return ints not booleans 2. *,** are arithmetic operators on scalars and arrays and return ints as above. 3. &,|,^ are the logical operators and return booleans. 4. *,** are defined on matrices to perform logical matrix multiplication and exponation.
I am not objecting to this, but I want to make sure the costs are not overlooked. Will multiplication of boolean matrices will be different than `dot`? (It will certainly be different than `dot` for "equivalent" 2-d arrays). If I understand, unary complementation (using `-`) will be lost: so there will be no operator for unary complementation. (You might say, what about `~`, which currently works, but if we are to match Python's behavior, that is lost too.) Cheers, Alan Isaac
Just ran across something that doesn't quite make sense to me at the moment. Here's some code:
numpy.__version__ '1.0.2'
def f1(b,c): b=b.astype(int) c=c.astype(int) return b,c
b,c = numpy.fromfunction(f1,(5,5)) a=numpy.zeros((2,12,5,5),int) a1=a[0] a1[:,b,c].shape (12, 5, 5) a[0,:,b,c].shape (5, 5, 12) ###why does this not return (12,5,5)?
So in a nutshell, it's not completely clear to me why these are returning arrays of different shapes. Can someone shed some light? Thanks, -Mark
On 7/10/07, Mark.Miller <mpmusu@cc.usu.edu> wrote:
Just ran across something that doesn't quite make sense to me at the moment.
Here's some code:
numpy.__version__ '1.0.2'
def f1(b,c): b=b.astype(int) c=c.astype(int) return b,c
b,c = numpy.fromfunction(f1,(5,5)) a=numpy.zeros((2,12,5,5),int) a1=a[0] a1[:,b,c].shape (12, 5, 5) a[0,:,b,c].shape (5, 5, 12) ###why does this not return (12,5,5)?
So in a nutshell, it's not completely clear to me why these are returning arrays of different shapes. Can someone shed some light?
It's because you are using arrays as indices (aka Fancy-Indexing). When you do this everything works differently. In this case, everything is being broadcast to the same shape. As I understand it (and I try to use only the simplest forms of fancy indexing), what you are doing is equivalent to: -- . __ . |-\ . . tim.hochberg@ieee.org
Sorry...can you clarify? I think that some of your message got cut off. -Mark Timothy Hochberg wrote:
It's because you are using arrays as indices (aka Fancy-Indexing). When you do this everything works differently. In this case, everything is being broadcast to the same shape. As I understand it (and I try to use only the simplest forms of fancy indexing), what you are doing is equivalent to:
-- . __ . |-\ .
On 7/10/07, Mark.Miller <mpmusu@cc.usu.edu> wrote:
Sorry...can you clarify? I think that some of your message got cut off.
-Mark
Timothy Hochberg wrote:
It's because you are using arrays as indices (aka Fancy-Indexing). When you do this everything works differently. In this case, everything is being broadcast to the same shape. As I understand it (and I try to use only the simplest forms of fancy indexing), what you are doing is equivalent to:
Sorry about that. The missing line is: a[zeros([5,5]),:,b,c].shape That is, your '0' is being broadcast into a 5x5 array to match the shapes of b and c. That is why the two forms you give are not equivalent. As to why you get that exact shape, I'd have to peruse the fancy indexing docs to figure it out -- things are a little weird when you use multidimensional indexing.
-- . __ . |-\ .
Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- . __ . |-\ . . tim.hochberg@ieee.org
On Mon, 9 Jul 2007, Timothy Hochberg apparently wrote:
x*y and x**2 are already decoupled for arrays and matrices. What if x*y was simply defined to do a boolean matrix multiply when the arguments are boolean matrices? I don't care about this that much though, so I'll let it drop.
So if x and y are arrays and you use `dot` you would get a different result than turning them into matrices and using `*`? I'd find that pretty odd. I'd also find it odd that equivalent element-by-element operations (`+`, `-`) would then return different outcomes for boolean arrays and boolean matrices. (This is what I meant by "decoupled".) This is just a user's perspective. I do not pretend to see into the design issues. However, daring to tread where I should not, I offer two observations: - matrices and 2-d arrays with dtype 'bool' should give the same result for "comparable" operations (where `dot` for arrays compares with `*` for matrices). - it would be possible to have a new class, say `boolmat`, that implements the expected behavior for boolen matrices and then make matrices and arrays of dtype 'bool' behave in the Python way (e.g., True+True is 2, yuck!). I am definitely NOT advocating this (I like the current arrangement), but it is a possibility. Cheers, Alan Isaac PS Here is Guido's justification for bool inheriting from int (http://www.python.org/dev/peps/pep-0285/). It seems that numpy's current behavior is closer to his "ideal world". 6) Should bool inherit from int? => Yes. In an ideal world, bool might be better implemented as a separate integer type that knows how to perform mixed-mode arithmetic. However, inheriting bool from int eases the implementation enormously (in part since all C code that calls PyInt_Check() will continue to work -- this returns true for subclasses of int). Also, I believe this is right in terms of substitutability: code that requires an int can be fed a bool and it will behave the same as 0 or 1. Code that requires a bool may not work when it is given an int; for example, 3 & 4 is 0, but both 3 and 4 are true when considered as truth values.
On 7/7/07, Travis Oliphant <oliphant.travis@ieee.org> wrote:
On 7/6/07, *Travis Oliphant* <oliphant.travis@ieee.org <mailto:oliphant.travis@ieee.org>> wrote:
Timothy Hochberg wrote: > > I'm working on getting some old code working with numpy and I noticed > that bool_ is not a subclass of int. Given that python's bool > subclasses into and that the other scalar types are subclasses of > their respective counterparts it seems at first glance that > numpy.bool_ should subclass python's bool, which in turn
subclasses
> int. Or am I missing something here? The reason it is not, is because it is not binary compatible with Python's integer. The numpy bool_ is always only 8-bits while the Python integer is 32-bits or 64-bits.
This could be changed I suspect, but then it would break the relationship between scalars and their array counterparts
Do you have and idea off the top of your head head how painful this would be from an implementation standpoint. And is there a theoretical reason that it is important that the scalar and array implementations match? I would think that, conceptually, they are all 1-bit integers, and it seems that the 8-bit, versus 32- or 64-bits is just an implementation detail.
It would probably take about 2-3 hours to make the change and about 3 more hours to fix the problems that were not anticipated. Basically, we would have to special-case the bool like we do the unicode scalar (which also doesn't necessarily match the array-based representation but instead follows the Python implementation).
I guess I don't really see a problem in switching just the numpy.bool_ scalar to be a sub-class of the Python bool type and adjusting the code to make the switch when creating a scalar.
I gave this a try. Since so much code is auto-generated, it can be difficult to figure out what's going on in the core matrix stuff. Still, it seems like the solution is almost absurdly easy, consisting of changing only three lines. First off, does this seem right? Code compiled against this patch passes all tests and seems to run my application right, but that's not conclusive. Please let me know if I missed something obvious. -- . __ . |-\ . . tim.hochberg@ieee.org =================================================================== Index: numpy/core/code_generators/generate_array_api.py =================================================================== --- numpy/core/code_generators/generate_array_api.py (revision 3883) +++ numpy/core/code_generators/generate_array_api.py (working copy) @@ -17,7 +17,7 @@ typedef struct { PyObject_HEAD - npy_bool obval; + npy_long obval; } PyBoolScalarObject; Index: numpy/core/include/numpy/arrayscalars.h =================================================================== --- numpy/core/include/numpy/arrayscalars.h (revision 3883) +++ numpy/core/include/numpy/arrayscalars.h (working copy) @@ -1,7 +1,7 @@ #ifndef _MULTIARRAYMODULE typedef struct { PyObject_HEAD - npy_bool obval; + npy_long obval; } PyBoolScalarObject; #endif Index: numpy/core/src/multiarraymodule.c =================================================================== --- numpy/core/src/multiarraymodule.c (revision 3883) +++ numpy/core/src/multiarraymodule.c (working copy) @@ -7417,7 +7417,7 @@ return -1; \ } - SINGLE_INHERIT(Bool, Generic); + DUAL_INHERIT(Bool, Bool, Generic); SINGLE_INHERIT(Byte, SignedInteger); SINGLE_INHERIT(Short, SignedInteger); #if SIZEOF_INT == SIZEOF_LONG
On Mon, Jul 09, 2007 at 12:32:02PM -0700, Timothy Hochberg wrote:
I gave this a try. Since so much code is auto-generated, it can be difficult to figure out what's going on in the core matrix stuff. Still, it seems like the solution is almost absurdly easy, consisting of changing only three lines. First off, does this seem right? Code compiled against this patch passes all tests and seems to run my application right, but that's not conclusive.
Please let me know if I missed something obvious.
Can we make this change, or should we discuss the patch further? Any comments, Travis? Stéfan
participants (11)
-
Alan G Isaac
-
Alan Isaac
-
Charles R Harris
-
Gael Varoquaux
-
Mark.Miller
-
Matthieu Brucher
-
Robert Kern
-
Sebastian Haase
-
Stefan van der Walt
-
Timothy Hochberg
-
Travis Oliphant