[Python-Dev] Unicode byte order mark decoding

Stephen J. Turnbull stephen at xemacs.org
Tue Apr 5 14:03:19 CEST 2005


>>>>> "Martin" == Martin v Löwis <martin at v.loewis.de> writes:

    Martin> Stephen J. Turnbull wrote:

    >> However, this option should be part of the initialization of an
    >> IO stream which produces Unicodes, _not_ an operation on
    >> arbitrary internal strings (whether raw or Unicode).

    Martin> With the UTF-8-SIG codec, it would apply to all operation
    Martin> modes of the codec, whether stream-based or from strings.

I had in mind the ability to treat a string as a stream.

    Martin> Whether or not to use the codec would be the application's
    Martin> choice.

What I think should be provided is a stateful object encapsulating the
codec.  Ie, to avoid the need to write

    out = chunk[0].encode("utf-8-sig") + chunk[1].encode("utf-8")

    

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.


More information about the Python-Dev mailing list