[I18n-sig] JapaneseCodecs 1.1 with ISO-2022-JP codec
Sat, 25 Nov 2000 05:19:25 +0900
* Martin v. Loewis:
| > In addition, the ReadStream classes for all encodings are
| > improved so that read(), readline() and readlines() do not
| > return an empty string unless an EOF is reached.
| Does readline() correctly deal with line breaks under all
| circumstances? In some encodings, detecting line breaking is more
| difficult than looking for a CR or LF byte; e.g. in UCS-2, you need to
| consume two bytes for a line break, and a CR byte may be part of a
| two-byte code that doesn't represent a line break.
| Perhaps your encodings "play nicely" here?
Hmm, I'm not sure...
ReadStream.read() does not anything on CR and LF characters.
It converts a byte sequence in a native encoding into Unicode
characters using unicode(buffer, "ascii"). I don't think this
ReadStream.readline() reads lines by calling readline() of
the underlying stream object, and converts them into Unicode
objects. Therefore, line breaking is done in the layer of
native encodings. I believe it works well for at least the
three Japanese encodings.
However, ReadStream.readlines() reads a Unicode character
sequence by calling ReadStream.read() internally, and breaks it
into a set of lines by looking for a CR. Is there a case that
this does not work?
KAJIYAMA, Tamito <email@example.com>