[I18n-sig] JapaneseCodecs 1.1 with ISO-2022-JP codec

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Fri, 24 Nov 2000 22:28:38 +0100


> ReadStream.readline() reads lines by calling readline() of
> the underlying stream object, and converts them into Unicode
> objects.  Therefore, line breaking is done in the layer of
> native encodings.  I believe it works well for at least the
> three Japanese encodings.

That is the case that may make trouble. Consider u"Hello\nWorld" in
UTF-16LE; it is

  H \0 e \0 l \0 l \0 o \0 \n \0 W \0 r \0 l \0 d \0

Now, if you do readline on the underlying stream, you get

  H \0 e \0 l \0 l \0 o \0 \n

Passing that to the UTF-16 decoder causes an exception: this is an
uneven number of bytes, which is illegal in UTF-16 (it should have
read \n\0 instead).

I was merely asking for confirmation that this is not a problem in
your encodings (i.e. the byte \012 always means newline, no matter
where it appears in the encoding).

> However, ReadStream.readlines() reads a Unicode character sequence
> by calling ReadStream.read() internally, and breaks it into a set of
> lines by looking for a CR.  Is there a case that this does not work?

No, that should work just fine.

Thanks,
Martin