"Martin" == Martin v Löwis firstname.lastname@example.org writes:
Martin> So people do use the "decode-it-all" mode, where no Martin> sequential access is necessary - yet the beginning of the Martin> string is still the beginning of what once was a Martin> stream. This case must be supported.
Of course it must be supported. My point is that many strings (in my applications, all but those strings that result from slurping in a file or process output in one go -- example, not a statistically valid sample!) are not the beginning of "what once was a stream". It is error-prone (not to mention unaesthetic) to not make that distinction.
"Explicit is better than implicit."
Martin> Whether or not to use the codec would be the application's Martin> choice.
>> What I think should be provided is a stateful object >> encapsulating the codec. Ie, to avoid the need to write
>> out = chunk.encode("utf-8-sig") + chunk.encode("utf-8")
Martin> No. People who want streaming should use cStringIO, i.e.
s=cStringIO.StringIO() s1=codecs.getwriter("utf-8")(s) s1.write(u"Hallo") s.getvalue()
Yes! Exactly (except in reverse, we want to _read_ from the slurped stream-as-string, not write to one)! ... and there's no need for a utf-8-sig codec for strings, since you can support the usage in exactly this way.