Incorrect removal of NULL char in buffers
Hi,
I'm trying to buildup numpy arrays coming from buffers, and I'm getting a somewhat unexpected result.
First, for numeric values, everything seems ok (i.e. the NULL character is correctly interpretated), and works equally for both numarray and numpy:
In [98]: numarray.array("a\x00b"*4, dtype='Float32',shape=3) Out[98]: array([ 2.60561966e+20, 8.94319890e39, 5.92050103e+20], type=Float32)
In [99]: numpy.ndarray(buffer="a\x00b"*4, dtype='Float32',shape=3) Out[99]: array([ 2.60561966e+20, 8.94319890e39, 5.92050103e+20], dtype=float32)
However, for string values, numpy seems to work in a strange way. The numarray have an expected behaviour, IMO:
In [100]: numarray.strings.array(buffer="a\x00b"*4, itemsize=4, shape=3) Out[100]: CharArray(['a', '', 'ba'])
but numpy haven't:
In [101]: numpy.ndarray(buffer="a\x00b"*4, dtype="S4", shape=3) Out[101]: array([aba, ba, bab], dtype='S4')
i.e. it seems like numpy is stripingoff NULL chars before building the object and I don't think this is correct.
Cheers,
Francesc Altet wrote:
Hi,
However, for string values, numpy seems to work in a strange way. The numarray have an expected behaviour, IMO:
In [100]: numarray.strings.array(buffer="a\x00b"*4, itemsize=4, shape=3) Out[100]: CharArray(['a', '', 'ba'])
I'm not sure why you think this is "expected." You have nonterminating NULLs in this array and yet they are not printing for you.
Just look at the tostring()...
but numpy haven't:
In [101]: numpy.ndarray(buffer="a\x00b"*4, dtype="S4", shape=3) Out[101]: array([aba, ba, bab], dtype='S4')
i.e. it seems like numpy is stripingoff NULL chars before building the object and I don't think this is correct.
Hmmm. I don't see that at all. This is what I get (version of numpy is 1.0.dev3233)
In [33]: numpy.ndarray(buffer="a\x00b"*4, dtype="S4", shape=3) Out[33]: array(['a\x00ba', '\x00ba', 'ba\x00b'], dtype='S4')
which to me is very much expected. I.e. only terminating NULLs are stripped off of the strings on printing. I think you are getting different results because string printing used to not include the quotes (which had the sideeffect of not printing NULLs in the middle of strings). They are still there, just not showing up in your output.
In the end both numarray and numpy have the same data stored internally. It's just a matter of how it is being printed that seems to differ a bit. From my perspective, only NULLs at the end of strings should be stripped off and that is the (current) behavior of NumPy.
You are getting different results, because the arrayprinting for strings was recently updated (to insert the quotes so that it makes more sense). Without these changes, I think the NULLs were being stripped away on printing. In other words, something like
print 'a\x00ba'
aba
used to be happening.
Travis
participants (2)

Francesc Altet

Travis Oliphant