[Python-Dev] just say no...

Guido van Rossum guido@CNRI.Reston.VA.US
Tue, 16 Nov 1999 08:45:17 -0500


> > Ah, ok. I interpreted 8-bit to mean: 8 bits in length, not
> > "8-bit clean" as you obviously did.
> 
> Hrm. That might be dangerous. Many of the functions that use "t#" assume
> that each character is 8-bits long. i.e. the returned length == the number
> of characters.
> 
> I'm not sure what the implications would be if you interpret the semantics
> of "t#" as multi-byte characters.

Hrm.  Can you quote examples of users of t# who would be confused by
multibyte characters?  I guess that there are quite a few places where
they will be considered illegal, but that's okay -- the string will be
parsed at some point and rejected, e.g. as an illegal filename,
hostname or whatever.  On the other hand, there are quite some places
where I would think that multibyte characters would do just the right
thing.  Many places using t# could just as well be using 's' except
they need to know the length and they don't want to call strlen().
In all cases I've looked at, the reason they need the length because
they are allocating a buffer (or checking whether it fits in a
statically allocated buffer) -- and there the number of bytes in a
multibyte string is just fine.

Note that I take the same stance on 's' -- it should return multibyte
characters.

> > What for ?
> 
> How about: "because I'm the application developer, and I say that I want
> the raw bytes in the file."

Here I'm with you, man!

> Greg Stein, http://www.lyra.org/

--Guido van Rossum (home page: http://www.python.org/~guido/)