[I18n-sig] JapaneseCodecs 1.1 with ISO-2022-JP codec
Martin v. Loewis
martin@loewis.home.cs.tu-berlin.de
Fri, 24 Nov 2000 22:28:38 +0100
> ReadStream.readline() reads lines by calling readline() of
> the underlying stream object, and converts them into Unicode
> objects. Therefore, line breaking is done in the layer of
> native encodings. I believe it works well for at least the
> three Japanese encodings.
That is the case that may make trouble. Consider u"Hello\nWorld" in
UTF-16LE; it is
H \0 e \0 l \0 l \0 o \0 \n \0 W \0 r \0 l \0 d \0
Now, if you do readline on the underlying stream, you get
H \0 e \0 l \0 l \0 o \0 \n
Passing that to the UTF-16 decoder causes an exception: this is an
uneven number of bytes, which is illegal in UTF-16 (it should have
read \n\0 instead).
I was merely asking for confirmation that this is not a problem in
your encodings (i.e. the byte \012 always means newline, no matter
where it appears in the encoding).
> However, ReadStream.readlines() reads a Unicode character sequence
> by calling ReadStream.read() internally, and breaks it into a set of
> lines by looking for a CR. Is there a case that this does not work?
No, that should work just fine.
Thanks,
Martin