Stephen J. Turnbull wrote:
Martin> With the UTF-8-SIG codec, it would apply to all operation Martin> modes of the codec, whether stream-based or from strings.
I had in mind the ability to treat a string as a stream.
Hmm. A string is not a stream, but it could be the contents of a stream.
A typical application of codecs goes like this:
data = stream.read() [analyze data, e.g. by checking whether there is encoding= in <?xml...] data = data.decode(encoding analyzed)
So people do use the "decode-it-all" mode, where no sequential access is necessary - yet the beginning of the string is still the beginning of what once was a stream. This case must be supported.
Martin> Whether or not to use the codec would be the application's Martin> choice.
What I think should be provided is a stateful object encapsulating the codec. Ie, to avoid the need to write
out = chunk.encode("utf-8-sig") + chunk.encode("utf-8")
No. People who want streaming should use cStringIO, i.e.
s=cStringIO.StringIO() s1=codecs.getwriter("utf-8")(s) s1.write(u"Hallo") s.getvalue()