Incorrect removal of NULL char in buffers
![](https://secure.gravatar.com/avatar/5c7407de6b47afcd3b3e2164ff5bcd45.jpg?s=120&d=mm&r=g)
Hi, I'm trying to build-up numpy arrays coming from buffers, and I'm getting a somewhat unexpected result. First, for numeric values, everything seems ok (i.e. the NULL character is correctly interpretated), and works equally for both numarray and numpy: In [98]: numarray.array("a\x00b"*4, dtype='Float32',shape=3) Out[98]: array([ 2.60561966e+20, 8.94319890e-39, 5.92050103e+20], type=Float32) In [99]: numpy.ndarray(buffer="a\x00b"*4, dtype='Float32',shape=3) Out[99]: array([ 2.60561966e+20, 8.94319890e-39, 5.92050103e+20], dtype=float32) However, for string values, numpy seems to work in a strange way. The numarray have an expected behaviour, IMO: In [100]: numarray.strings.array(buffer="a\x00b"*4, itemsize=4, shape=3) Out[100]: CharArray(['a', '', 'ba']) but numpy haven't: In [101]: numpy.ndarray(buffer="a\x00b"*4, dtype="S4", shape=3) Out[101]: array([aba, ba, bab], dtype='|S4') i.e. it seems like numpy is striping-off NULL chars before building the object and I don't think this is correct. Cheers, --
![](https://secure.gravatar.com/avatar/49df8cd4b1b6056c727778925f86147a.jpg?s=120&d=mm&r=g)
Francesc Altet wrote:
I'm not sure why you think this is "expected." You have non-terminating NULLs in this array and yet they are not printing for you. Just look at the tostring()...
Hmmm. I don't see that at all. This is what I get (version of numpy is 1.0.dev3233) In [33]: numpy.ndarray(buffer="a\x00b"*4, dtype="S4", shape=3) Out[33]: array(['a\x00ba', '\x00ba', 'ba\x00b'], dtype='|S4') which to me is very much expected. I.e. only terminating NULLs are stripped off of the strings on printing. I think you are getting different results because string printing used to not include the quotes (which had the side-effect of not printing NULLs in the middle of strings). They are still there, just not showing up in your output. In the end both numarray and numpy have the same data stored internally. It's just a matter of how it is being printed that seems to differ a bit. From my perspective, only NULLs at the end of strings should be stripped off and that is the (current) behavior of NumPy. You are getting different results, because the array-printing for strings was recently updated (to insert the quotes so that it makes more sense). Without these changes, I think the NULLs were being stripped away on printing. In other words, something like print 'a\x00ba' aba used to be happening. -Travis
![](https://secure.gravatar.com/avatar/49df8cd4b1b6056c727778925f86147a.jpg?s=120&d=mm&r=g)
Francesc Altet wrote:
I'm not sure why you think this is "expected." You have non-terminating NULLs in this array and yet they are not printing for you. Just look at the tostring()...
Hmmm. I don't see that at all. This is what I get (version of numpy is 1.0.dev3233) In [33]: numpy.ndarray(buffer="a\x00b"*4, dtype="S4", shape=3) Out[33]: array(['a\x00ba', '\x00ba', 'ba\x00b'], dtype='|S4') which to me is very much expected. I.e. only terminating NULLs are stripped off of the strings on printing. I think you are getting different results because string printing used to not include the quotes (which had the side-effect of not printing NULLs in the middle of strings). They are still there, just not showing up in your output. In the end both numarray and numpy have the same data stored internally. It's just a matter of how it is being printed that seems to differ a bit. From my perspective, only NULLs at the end of strings should be stripped off and that is the (current) behavior of NumPy. You are getting different results, because the array-printing for strings was recently updated (to insert the quotes so that it makes more sense). Without these changes, I think the NULLs were being stripped away on printing. In other words, something like print 'a\x00ba' aba used to be happening. -Travis
participants (2)
-
Francesc Altet
-
Travis Oliphant