[Python-Dev] Unicode byte order mark decoding
"Martin v. Löwis"
martin at v.loewis.de
Wed Apr 6 08:06:08 CEST 2005
Stephen J. Turnbull wrote:
> Of course it must be supported. My point is that many strings (in my
> applications, all but those strings that result from slurping in a
> file or process output in one go -- example, not a statistically valid
> sample!) are not the beginning of "what once was a stream". It is
> error-prone (not to mention unaesthetic) to not make that distinction.
>
> "Explicit is better than implicit."
I can't put these two paragraphs together. If you think that explicit
is better than implicit, why do you not want to make different calls
for the first chunk of a stream, and the subsequent chunks?
> >>> s=cStringIO.StringIO()
> >>> s1=codecs.getwriter("utf-8")(s)
> >>> s1.write(u"Hallo")
> >>> s.getvalue()
> 'Hallo'
>
> Yes! Exactly (except in reverse, we want to _read_ from the slurped
> stream-as-string, not write to one)! ... and there's no need for a
> utf-8-sig codec for strings, since you can support the usage in
> exactly this way.
However, if there is an utf-8-sig codec for streams, there is currently
no way of *preventing* this codec to also be available for strings. The
very same code is used for streams and for strings, and automatically
so.
Regards,
Martin
More information about the Python-Dev
mailing list