Re: [Python-Dev] Unicode byte order mark decoding

6 Apr 2005


      Stephen J. Turnbull wrote:
...
Of course it must be supported.  My point is that many strings (in my
applications, all but those strings that result from slurping in a
file or process output in one go -- example, not a statistically valid
sample!) are not the beginning of "what once was a stream".  It is
error-prone (not to mention unaesthetic) to not make that distinction.
"Explicit is better than implicit."
I can't put these two paragraphs together. If you think that explicit
is better than implicit, why do you not want to make different calls
for the first chunk of a stream, and the subsequent chunks?
...
...
...
...
s=cStringIO.StringIO()
s1=codecs.getwriter("utf-8")(s)
s1.write(u"Hallo")
s.getvalue()
'Hallo'
Yes!  Exactly (except in reverse, we want to _read_ from the slurped
stream-as-string, not write to one)!  ... and there's no need for a
utf-8-sig codec for strings, since you can support the usage in
exactly this way.
However, if there is an utf-8-sig codec for streams, there is currently
no way of *preventing* this codec to also be available for strings. The
very same code is used for streams and for strings, and automatically
so.

Regards,
Martin

Re: [Python-Dev] Unicode byte order mark decoding

"Martin v. Löwis"