[Python-Dev] Unicode byte order mark decoding
Stephen J. Turnbull
stephen at xemacs.org
Wed Apr 6 11:31:21 CEST 2005
>>>>> "Martin" == Martin v Löwis <martin at v.loewis.de> writes:
Martin> I can't put these two paragraphs together. If you think
Martin> that explicit is better than implicit, why do you not want
Martin> to make different calls for the first chunk of a stream,
Martin> and the subsequent chunks?
Because the signature/BOM is not a chunk, it's a header. Handling the
signature/BOM is part of stream initialization, not translation, to my
mind.
The point is that explicitly using a stream shows that initialization
(and finalization) matter. The default can be BOM or not, as a
pragmatic matter. But then the stream data itself can be treated
homogeneously, as implied by the notion of stream.
I think it probably also would solve Walter's conundrum about
buffering the signature/BOM if responsibility for that were moved out
of the codecs and into the objects where signatures make sense.
I don't know whether that's really feasible in the short run---I
suspect there may be a lot of stream-like modules that would need to
be updated---but it would be a saner in the long run.
>> Yes! Exactly (except in reverse, we want to _read_ from the
>> slurped stream-as-string, not write to one)! ... and there's
>> no need for a utf-8-sig codec for strings, since you can
>> support the usage in exactly this way.
Martin> However, if there is an utf-8-sig codec for streams, there
Martin> is currently no way of *preventing* this codec to also be
Martin> available for strings. The very same code is used for
Martin> streams and for strings, and automatically so.
And of course it should be. But if it's not possible to move the -sig
facility out of the codecs into the streams, that would be a shame. I
think we should encourage people to use streams where initialization or
finalization semantics are non-trivial, as they are with signatures.
But as long as both utf-8-we-dont-need-no-steenkin-sigs-in-strings and
utf-8-sig are available, I can program as I want to (and refer those
whose strings get cratered by stray BOMs to you<wink>).
--
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.
More information about the Python-Dev
mailing list