[Python-Dev] Unicode byte order mark decoding
Stephen J. Turnbull
stephen at xemacs.org
Tue Apr 5 14:03:19 CEST 2005
>>>>> "Martin" == Martin v Löwis <martin at v.loewis.de> writes:
Martin> Stephen J. Turnbull wrote:
>> However, this option should be part of the initialization of an
>> IO stream which produces Unicodes, _not_ an operation on
>> arbitrary internal strings (whether raw or Unicode).
Martin> With the UTF-8-SIG codec, it would apply to all operation
Martin> modes of the codec, whether stream-based or from strings.
I had in mind the ability to treat a string as a stream.
Martin> Whether or not to use the codec would be the application's
Martin> choice.
What I think should be provided is a stateful object encapsulating the
codec. Ie, to avoid the need to write
out = chunk[0].encode("utf-8-sig") + chunk[1].encode("utf-8")
--
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.
More information about the Python-Dev
mailing list