Stephen J. Turnbull wrote:
"Martin" == Martin v Löwis email@example.com writes:
Martin> I can't put these two paragraphs together. If you think Martin> that explicit is better than implicit, why do you not want Martin> to make different calls for the first chunk of a stream, Martin> and the subsequent chunks?
Because the signature/BOM is not a chunk, it's a header. Handling the signature/BOM is part of stream initialization, not translation, to my mind.
The point is that explicitly using a stream shows that initialization (and finalization) matter. The default can be BOM or not, as a pragmatic matter. But then the stream data itself can be treated homogeneously, as implied by the notion of stream.
I think it probably also would solve Walter's conundrum about buffering the signature/BOM if responsibility for that were moved out of the codecs and into the objects where signatures make sense.
Not really. In every encoding where a sequence of more than one byte maps to one Unicode character, you will always need some kind of buffering. If we remove the handling of initial BOMs from the codecs (except for UTF-16 where it is required), this wouldn't change any buffering requirements.
I don't know whether that's really feasible in the short run---I suspect there may be a lot of stream-like modules that would need to be updated---but it would be a saner in the long run.
I'm not exactly sure, what you're proposing here. That all codecs (even UTF-16) pass the BOM through and some other infrastructure is responsible for dropping it?
Bye, Walter Dörwald