numpy 00 character bug?

Nathaniel Rook nrook at
Fri Jun 5 18:14:10 CEST 2009

Hello, all!

I've recently encountered a bug in NumPy's string arrays, where the 00 
ASCII character ('\x00') is not stored properly when put at the end of a 

For example:

Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> import numpy
 >>> print numpy.version.version
 >>> arr = numpy.empty(1, 'S2')
 >>> arr[0] = 'ab'
 >>> arr
 >>> arr[0] = 'c\x00'
 >>> arr

It seems that the string array is using the 00 character to pad strings 
smaller than the maximum size, and thus is treating any 00 characters at 
the end of a string as padding.  Obviously, as long as I don't use 
smaller strings, there is no information lost here, but I don't want to 
have to re-add my 00s each time I ask the array what it is holding.

Is this a well-known bug already?  I couldn't find it on the NumPy bug 
tracker, but I could have easily missed it, or it could be triaged, 
deemed acceptable because there's no better way to deal with 
arbitrary-length strings.  Is there an easy way to avoid this problem? 
Pretty much any performance-intensive part of my program is going to be 
dealing with these arrays, so I don't want to just replace them with a 
slower dictionary instead.

I can't imagine this issue hasn't come up before; I encountered it by 
using NumPy arrays to store Python structs, something I can imagine is 
done fairly often.  As such, I apologize for bringing it up again!


More information about the Python-list mailing list