[Python-Dev] just say no...

Greg Stein gstein@lyra.org
Fri, 12 Nov 1999 15:09:13 -0800 (PST)


On Fri, 12 Nov 1999, M.-A. Lemburg wrote:
> Fredrik Lundh wrote:
>...
> > why?  I don't understand why "s" and "s#" has
> > to deal with encoding issues at all...
> > 
> > > unless, of course, you want to give up Unicode object support
> > > for all APIs using these parsers.
> > 
> > hmm.  maybe that's exactly what I want...
> 
> If we don't add that support, lot's of existing APIs won't
> accept Unicode object instead of strings. While it could be
> argued that automatic conversion to UTF-8 is not transparent
> enough for the user, the other solution of using str(u)
> everywhere would probably make writing Unicode-aware code a
> rather clumsy task and introduce other pitfalls, since str(obj)
> calls PyObject_Str() which also works on integers, floats,
> etc.

No no no...

"s" and "s#" are NOT SUPPOSED TO return a UTF-8 encoding. They are
supposed to return the raw bytes.

If a caller wants 8-bit characters, then that caller will use "t#".

If you want to argue for that separate, encoded buffer, then argue for it
for support for the "t#" format. But do NOT say that it is needed for "s#"
which simply means "give me some bytes."

-g

--
Greg Stein, http://www.lyra.org/