[Numpy-discussion] Bytes vs. Unicode in Python3

Fri Nov 27 15:52:43 EST 2009

Anne Archibald wrote:

>>> I don't think it makes sense to handle format strings in Unicode
>>> internally -- they should always be coerced to bytes.
>> This should be fine -- we control what is a valid format string, and
>> thus they can always be ASCII-safe.
> 
> I have to disagree. Why should we force the user to use bytes?

One of us mis-understood that -- I THINK the idea was that internally 
numpy would use bytes (for easy conversion to/from char*), but they 
would get converted, so the use could pass in unicode strings (or 
bytes). I guess the questions remains as to what you'd get when you 
printed a format string.

 > Keep in mind that "coercing" strings to bytes
> requires extra information, namely the encoding.

but that is built-in to the unicode object.

I think the idea is that a format string is ALWAYS ASCII -f there are 
any other characters in there, it's an invalid format anyway.

Unless I mis-understand what a format string is. I think it's a string 
you use to represent a custom dtype -- it that right?

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov