Padding policy in CharArrays
data:image/s3,"s3://crabby-images/6915d/6915d0fd7637dc170514dea0ec3dd2364d8559f1" alt=""
Hi, I'm experiencing some problems derived from the fact that CharArrays in numarray are padded with spaces. That leads to somewhat curious consequences like this: In [180]: a=strings.array(None, itemsize = 4, shape=1) In [181]: a[0] = '0' In [182]: a >= '0\x00\x00\x00\x01' Out[182]: array([1], type=Bool) # Incorrect but... In [183]: a[0] >= '0\x00\x00\x00\x01' Out[183]: False # correct While this is not a bug (see the padding policy for chararrays) I think it would be much better to use '\0x00' as default padding. Would be any problem with that?. If yes, well, I've found a workaround on this, but quite inelegant I'm afraid :-/ Have a Happy New Year! -- Francesc Altet >qo< http://www.carabos.com/ Cárabos Coop. V. V V Enjoy Data ""
data:image/s3,"s3://crabby-images/4e1bf/4e1bff9f64c66e081948eead1d34d3ee25b06db6" alt=""
On Sat, 2005-01-01 at 16:44, Francesc Altet wrote:
Hi,
I'm experiencing some problems derived from the fact that CharArrays in numarray are padded with spaces. That leads to somewhat curious consequences like this:
In [180]: a=strings.array(None, itemsize = 4, shape=1) In [181]: a[0] = '0' In [182]: a >= '0\x00\x00\x00\x01' Out[182]: array([1], type=Bool) # Incorrect
but...
In [183]: a[0] >= '0\x00\x00\x00\x01' Out[183]: False # correct
While this is not a bug (see the padding policy for chararrays) I think it would be much better to use '\0x00' as default padding. Would be any problem with that?.
The design intent of numarray.strings was that you could use RawCharArray, the baseclass of CharArray, for NULL padded arrays. I tried it out like this:
a=strings.array(None, itemsize = 4, shape=1, kind=strings.RawCharArray) a[0] = '0\0\0\0' print repr(a >= '0\x00\x00\x00\x01') array([0], type=Bool)
You'll note that I "hand padded" the assigned value; because RawCharArray is a little used feature, it needs more work. I think RawCharArray either makes partial/inconsistent use of NULL padding.
If yes, well, I've found a workaround on this, but quite inelegant I'm afraid :-/
Give RawCharArray a try; it *is* the basis of CharArray, so it basically works but there will likely be a few issues to sort out. My guess is that anything that really needs fixing can be added for numarray-1.2. Regards, Todd
data:image/s3,"s3://crabby-images/6915d/6915d0fd7637dc170514dea0ec3dd2364d8559f1" alt=""
A Dilluns 03 Gener 2005 17:40, Todd Miller va escriure:
a=strings.array(None, itemsize = 4, shape=1, kind=strings.RawCharArray) a[0] = '0\0\0\0' print repr(a >= '0\x00\x00\x00\x01') array([0], type=Bool)
You'll note that I "hand padded" the assigned value; because RawCharArray is a little used feature, it needs more work. I think RawCharArray either makes partial/inconsistent use of NULL padding.
Well, I've already tried that, but what I would like is to have the possibility to assign values *and* padding with NULL values. However, using a RawCharArray does not allow this:
a=strings.array(None, itemsize = 4, shape=1, kind=strings.RawCharArray) a RawCharArray([' ']) a[0] = str(0) Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/local/lib/python2.4/site-packages/numarray/strings.py", line 185, in _setitem where[bo:bo+self._itemsize] = self.pad(value)[0:self._itemsize] TypeError: right operand length must match slice length
Give RawCharArray a try; it *is* the basis of CharArray, so it basically works but there will likely be a few issues to sort out. My guess is that anything that really needs fixing can be added for numarray-1.2.
Mmm, perhaps having the possibility to select the pad value in CharArray creation time would be nice. Cheers, -- Francesc Altet >qo< http://www.carabos.com/ Cárabos Coop. V. V V Enjoy Data ""
data:image/s3,"s3://crabby-images/6915d/6915d0fd7637dc170514dea0ec3dd2364d8559f1" alt=""
A Dilluns 03 Gener 2005 22:25, Francesc Altet va escriure:
Mmm, perhaps having the possibility to select the pad value in CharArray creation time would be nice.
I've ended making an implementation of this in numarray. With the patches (against numarray 1.1.1) I'm attaching, the next works:
b=strings.array(['0'], itemsize = 4, padc="\x00") b.raw() RawCharArray(['0\x00\x00\x00']) b.raw() >= '0\x00\x00\x00\x01' array([0], type=Bool)
While the actual behaviour in numarray 1.1.1 is:
b=strings.array(['0'], itemsize = 4) b.raw() RawCharArray(['0 ']) b.raw() >= '0\x00\x00\x00\x01' array([1], type=Bool)
As you may have already noted, I've added a new parameter named padc to the CharArray/RawCharArray constructor being the default pad character value the space (" "), for backward compatibility. All the current tests for CharArray passes with patch applied. The new functionality is restricted to what I needed, but I guess it should be easily extended to be completely consistent in other cases. Feel free to add the patch to numarray if you feel it to be appropriate. Cheers, -- Francesc Altet >qo< http://www.carabos.com/ Cárabos Coop. V. V V Enjoy Data ""
data:image/s3,"s3://crabby-images/4e1bf/4e1bff9f64c66e081948eead1d34d3ee25b06db6" alt=""
In some kind of cosmic irony, your bona-fide-patch was filed as Junk Mail by my filter. Anyway, thanks, it's committed in CVS. I added the extra code to handle the PadAll case you flagged as "to be corrected." Regards, Todd On Thu, 2005-01-06 at 12:53, Francesc Altet wrote:
A Dilluns 03 Gener 2005 22:25, Francesc Altet va escriure:
Mmm, perhaps having the possibility to select the pad value in CharArray creation time would be nice.
I've ended making an implementation of this in numarray. With the patches (against numarray 1.1.1) I'm attaching, the next works:
b=strings.array(['0'], itemsize = 4, padc="\x00") b.raw() RawCharArray(['0\x00\x00\x00']) b.raw() >= '0\x00\x00\x00\x01' array([0], type=Bool)
While the actual behaviour in numarray 1.1.1 is:
b=strings.array(['0'], itemsize = 4) b.raw() RawCharArray(['0 ']) b.raw() >= '0\x00\x00\x00\x01' array([1], type=Bool)
As you may have already noted, I've added a new parameter named padc to the CharArray/RawCharArray constructor being the default pad character value the space (" "), for backward compatibility. All the current tests for CharArray passes with patch applied.
The new functionality is restricted to what I needed, but I guess it should be easily extended to be completely consistent in other cases. Feel free to add the patch to numarray if you feel it to be appropriate.
Cheers, --
participants (2)
-
Francesc Altet
-
Todd Miller