[I18n-sig] JapaneseCodecs 1.1 with ISO-2022-JP codec

Tamito KAJIYAMA kajiyama@grad.sccs.chukyo-u.ac.jp
Sat, 25 Nov 2000 05:19:25 +0900


* Martin v. Loewis:
|
| > In addition, the ReadStream classes for all encodings are
| > improved so that read(), readline() and readlines() do not
| > return an empty string unless an EOF is reached.
| 
| Does readline() correctly deal with line breaks under all
| circumstances? In some encodings, detecting line breaking is more
| difficult than looking for a CR or LF byte; e.g. in UCS-2, you need to
| consume two bytes for a line break, and a CR byte may be part of a
| two-byte code that doesn't represent a line break.
| 
| Perhaps your encodings "play nicely" here?

Hmm, I'm not sure...

ReadStream.read() does not anything on CR and LF characters.
It converts a byte sequence in a native encoding into Unicode
characters using unicode(buffer, "ascii").  I don't think this
cause trouble.

ReadStream.readline() reads lines by calling readline() of
the underlying stream object, and converts them into Unicode
objects.  Therefore, line breaking is done in the layer of
native encodings.  I believe it works well for at least the
three Japanese encodings.

However, ReadStream.readlines() reads a Unicode character
sequence by calling ReadStream.read() internally, and breaks it
into a set of lines by looking for a CR.  Is there a case that
this does not work?

Thanks,

-- 
KAJIYAMA, Tamito <kajiyama@grad.sccs.chukyo-u.ac.jp>