
I vaguely recall this generated an array from all the characters. In [1]: array('123', dtype='c') Out[1]: array('1', dtype='|S1') Chuck

On Mon, May 26, 2008 at 11:29 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
I vaguely recall this generated an array from all the characters.
In [1]: array('123', dtype='c') Out[1]: array('1', dtype='|S1')
When was the last time it did otherwise? This behavior is a consequence of treating strings as scalars rather than containers of characters. I believe we settled on this behavior before 1.0. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On Mon, May 26, 2008 at 1:44 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, May 26, 2008 at 11:29 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
I vaguely recall this generated an array from all the characters.
In [1]: array('123', dtype='c') Out[1]: array('1', dtype='|S1')
When was the last time it did otherwise? This behavior is a consequence of treating strings as scalars rather than containers of characters. I believe we settled on this behavior before 1.0.
The 'c' type is special, it is a left over compatibility type for numeric. It would, I think, have been several months ago that it behaved differently. Maybe I should check out a version from before Travis's latest fixes for matrix types went in, because there used to be an exception in the code for the 'c' type. Chuck

On Mon, May 26, 2008 at 1:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, May 26, 2008 at 1:44 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, May 26, 2008 at 11:29 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
I vaguely recall this generated an array from all the characters.
In [1]: array('123', dtype='c') Out[1]: array('1', dtype='|S1')
When was the last time it did otherwise? This behavior is a consequence of treating strings as scalars rather than containers of characters. I believe we settled on this behavior before 1.0.
The 'c' type is special, it is a left over compatibility type for numeric. It would, I think, have been several months ago that it behaved differently. Maybe I should check out a version from before Travis's latest fixes for matrix types went in, because there used to be an exception in the code for the 'c' type.
It works the same in r5101, so it looks like it hasn't changed. What I vaguely remembered was the whole string being treated as a sequence of characters, but evidently that is not the case. Probably I remembered the opposite of the case from looking at the code back when. Chuck

On Mon, May 26, 2008 at 3:03 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, May 26, 2008 at 1:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, May 26, 2008 at 1:44 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, May 26, 2008 at 11:29 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
I vaguely recall this generated an array from all the characters.
In [1]: array('123', dtype='c') Out[1]: array('1', dtype='|S1')
When was the last time it did otherwise? This behavior is a consequence of treating strings as scalars rather than containers of characters. I believe we settled on this behavior before 1.0.
The 'c' type is special, it is a left over compatibility type for numeric. It would, I think, have been several months ago that it behaved differently. Maybe I should check out a version from before Travis's latest fixes for matrix types went in, because there used to be an exception in the code for the 'c' type.
It works the same in r5101, so it looks like it hasn't changed. What I vaguely remembered was the whole string being treated as a sequence of characters, but evidently that is not the case. Probably I remembered the opposite of the case from looking at the code back when.
numpy 1.0 had the behaviour you describe.
import numpy numpy.__version__ '1.0' numpy.array('123', dtype='c') array(['1', '2', '3'], dtype='|S1')
-- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On Mon, May 26, 2008 at 3:13 PM, Robert Kern <robert.kern@gmail.com> wrote:
numpy 1.0 had the behaviour you describe.
import numpy numpy.__version__ '1.0' numpy.array('123', dtype='c') array(['1', '2', '3'], dtype='|S1')
numpy.dtype('c')
Of course, this has its own inconsistencies: dtype('|S1')
numpy.array('123', dtype='|S1') array('1', dtype='|S1')
-- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On Mon, May 26, 2008 at 2:15 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, May 26, 2008 at 3:13 PM, Robert Kern <robert.kern@gmail.com> wrote:
numpy 1.0 had the behaviour you describe.
import numpy numpy.__version__ '1.0' numpy.array('123', dtype='c') array(['1', '2', '3'], dtype='|S1')
Of course, this has its own inconsistencies:
numpy.dtype('c') dtype('|S1') numpy.array('123', dtype='|S1') array('1', dtype='|S1')
Since it is a compatibility type, we should probably check to be sure what it is supposed to do. I think Travis would be the one to ask. Chuck

On Mon, May 26, 2008 at 2:25 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, May 26, 2008 at 2:15 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, May 26, 2008 at 3:13 PM, Robert Kern <robert.kern@gmail.com> wrote:
numpy 1.0 had the behaviour you describe.
import numpy numpy.__version__ '1.0' numpy.array('123', dtype='c') array(['1', '2', '3'], dtype='|S1')
Of course, this has its own inconsistencies:
numpy.dtype('c') dtype('|S1') numpy.array('123', dtype='|S1') array('1', dtype='|S1')
Since it is a compatibility type, we should probably check to be sure what it is supposed to do. I think Travis would be the one to ask.
It's a bug introduced in r5080 by, ahem, yours truly. And I thought I had it fixed. Off to get it right. Chuck

Charles R Harris wrote:
I vaguely recall this generated an array from all the characters.
In [1]: array('123', dtype='c') Out[1]: array('1', dtype='|S1') This may be a bug.
import Numeric Numeric.array('123','c') array([1, 2, 3],'c')
My memory of the point of 'c' was to mimic Numeric's behavior for character arrays. -Travis

On Tue, May 27, 2008 at 1:31 PM, Travis E. Oliphant <oliphant@enthought.com> wrote:
Charles R Harris wrote:
I vaguely recall this generated an array from all the characters.
In [1]: array('123', dtype='c') Out[1]: array('1', dtype='|S1') This may be a bug.
import Numeric Numeric.array('123','c') array([1, 2, 3],'c')
My memory of the point of 'c' was to mimic Numeric's behavior for character arrays.
Current behavior after fix is In [1]: array('123','c') Out[1]: array(['1', '2', '3'], dtype='|S1') Is that correct, then? Chuck

Charles R Harris wrote:
On Tue, May 27, 2008 at 1:31 PM, Travis E. Oliphant <oliphant@enthought.com <mailto:oliphant@enthought.com>> wrote:
Charles R Harris wrote: > I vaguely recall this generated an array from all the characters. > > In [1]: array('123', dtype='c') > Out[1]: > array('1', > dtype='|S1') This may be a bug.
>>> import Numeric >>> Numeric.array('123','c') array([1, 2, 3],'c')
My memory of the point of 'c' was to mimic Numeric's behavior for character arrays.
Current behavior after fix is
In [1]: array('123','c') Out[1]: array(['1', '2', '3'], dtype='|S1')
Is that correct, then?
Yes. -Travis

On Tue, May 27, 2008 at 3:15 PM, Travis E. Oliphant <oliphant@enthought.com> wrote:
Charles R Harris wrote:
On Tue, May 27, 2008 at 1:31 PM, Travis E. Oliphant <oliphant@enthought.com <mailto:oliphant@enthought.com>> wrote:
Charles R Harris wrote: > I vaguely recall this generated an array from all the characters. > > In [1]: array('123', dtype='c') > Out[1]: > array('1', > dtype='|S1') This may be a bug.
>>> import Numeric >>> Numeric.array('123','c') array([1, 2, 3],'c')
My memory of the point of 'c' was to mimic Numeric's behavior for character arrays.
Current behavior after fix is
In [1]: array('123','c') Out[1]: array(['1', '2', '3'], dtype='|S1')
Is that correct, then?
Yes.
Can we make it so that dtype('c') is preserved instead of displaying '|S1'? It does not behave the same as dtype('|S1') although it compares equal to it. In [90]: dtype('c') Out[90]: dtype('|S1') In [91]: array('123', dtype='c') Out[91]: array(['1', '2', '3'], dtype='|S1') In [92]: array('123', dtype=dtype('c')) Out[92]: array(['1', '2', '3'], dtype='|S1') In [93]: array('123', dtype=dtype('|S1')) Out[93]: array('1', dtype='|S1') In [94]: array('456', dtype=array('123', dtype=dtype('c')).dtype) Out[94]: array(['4', '5', '6'], dtype='|S1') In [95]: dtype('c') == dtype('|S1') Out[95]: True -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Robert Kern wrote:
Can we make it so that dtype('c') is preserved instead of displaying '|S1'? It does not behave the same as dtype('|S1') although it compares equal to it.
We could with some special-casing in the representation for string data-types. Right now, dtype('c') is equivalent to dtype('S1') except the type member of the underlying C-structure (char attribute in Python) is 'c' instead of 'S' -Travis

2008/5/27 Robert Kern <robert.kern@gmail.com>:
Can we make it so that dtype('c') is preserved instead of displaying '|S1'? It does not behave the same as dtype('|S1') although it compares equal to it.
It seems alarming to me that they should compare equal but behave differently. Is it possible to change more than just the way it prints? Anne

Anne Archibald wrote:
2008/5/27 Robert Kern <robert.kern@gmail.com>:
Can we make it so that dtype('c') is preserved instead of displaying '|S1'? It does not behave the same as dtype('|S1') although it compares equal to it.
It seems alarming to me that they should compare equal but behave differently. Is it possible to change more than just the way it prints?
comparison on dtype objects is about memory layout equivalency. Characters and length-1 strings are equivalent from a memory-layout perspective. -Travis

On Wed, May 28, 2008 at 7:52 PM, Travis E. Oliphant <oliphant@enthought.com> wrote:
Anne Archibald wrote:
2008/5/27 Robert Kern <robert.kern@gmail.com>:
Can we make it so that dtype('c') is preserved instead of displaying '|S1'? It does not behave the same as dtype('|S1') although it compares equal to it.
It seems alarming to me that they should compare equal but behave differently. Is it possible to change more than just the way it prints?
comparison on dtype objects is about memory layout equivalency. Characters and length-1 strings are equivalent from a memory-layout perspective.
That would be fine if dtypes only represented memory layout. However, in this case, they also represent a difference in interpretation of str objects in the array() constructor. That is a real difference that needs to be reflected in __eq__ and __repr__. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Robert Kern wrote:
On Wed, May 28, 2008 at 7:52 PM, Travis E. Oliphant <oliphant@enthought.com> wrote:
Anne Archibald wrote:
2008/5/27 Robert Kern <robert.kern@gmail.com>:
Can we make it so that dtype('c') is preserved instead of displaying '|S1'? It does not behave the same as dtype('|S1') although it compares equal to it.
It seems alarming to me that they should compare equal but behave differently. Is it possible to change more than just the way it prints?
comparison on dtype objects is about memory layout equivalency. Characters and length-1 strings are equivalent from a memory-layout perspective.
That would be fine if dtypes only represented memory layout. However, in this case, they also represent a difference in interpretation of str objects in the array() constructor. That is a real difference that needs to be reflected in __eq__ and __repr__.
I think __repr__ can be changed without trouble. I'm concerned about changing __eq__, however. -Travis
participants (4)
-
Anne Archibald
-
Charles R Harris
-
Robert Kern
-
Travis E. Oliphant