[Python-Dev] Unicode input issues

M.-A. Lemburg mal@lemburg.com
Mon, 10 Apr 2000 18:39:45 +0200


Guido van Rossum wrote:
> 
> > > Aha!  It actually seems that your read() and readline() are
> > > inconsistent!
> >
> > They are because I haven't yet found a way to implement
> > readline() without buffering read-ahead data. The only way
> > I can think of to implement it without buffering would be
> > to read one char at a time which is much too slow.
> >
> > Buffering is hard to implement right when assuming that
> > streams are stacked... every level would have its own
> > buffering scheme and mixing .read() and .readline()
> > wouldn't work too well. Anyway, I'll give it try...
> 
> Since you're calling methods on the underlying file object anyway,
> can't you avoid buffering by calling the *corresponding* underlying
> method and doing the conversion on that?

The problem here is that Unicode has far more line
break characters than plain ASCII. The underlying API would
break on ASCII lines (or even worse on those CRLF sequences
defined by the C lib), not the ones I need for Unicode.

BTW, I think that we may need a new Codec class layer
here: .readline() et al. are all text based methods,
while the Codec base classes clearly work on all kinds of
binary and text data.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/