[Numpy-discussion] Empty strings not empty?

Tue Dec 29 19:57:50 EST 2009

On Wed, Dec 30, 2009 at 9:33 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
>> Ok, it looks like there are at least two issues:
>>  - if an item in a string array is set to '¥x00', this seems to be
>> replace with '', but '' != '¥x00']
>
> Sorry - I'm afraid I don't understand

Compare this:

x = "¥00"
arr = np.array([x])
lst = [x]

arr[0] == x # False
arr[0] == "" # True

lst[0] == x # True
lst[0] == "" # False

> It looks to me as though the
> buffer contents of [''] is a length 1 string with a 0 byte, and an
> array.size of 1 - is that also what you think?  I guess I think that
> it should be a length 0 string, with a array.size of 0

Array size of 0 would be very weird: it means it would have no items,
whereas it actually has one item (which itself has a size 0). If you
create a list with an empty string (x = [""]), you have len(x) == 1
and len(x[0]) == 0. But an empty string has size 0, so the
corresponding dtype should have an itemsize of 0 (assuming the array
only contains empty strings).

> I
> guess that I will have to special-case the writing code to detect
> 'empty' strings, but I can't (I don't think) distinguish a real string
> with \x00 from an empty string.

In python "proper", they are different: "¥x00" != "". The problem is
that it does not seem possible ATM to create an numpy array with an
empty string.

David