[Python-Dev] Unicode input issues
M.-A. Lemburg
mal@lemburg.com
Mon, 10 Apr 2000 23:00:53 +0200
Guido van Rossum wrote:
>
> > > Since you're calling methods on the underlying file object anyway,
> > > can't you avoid buffering by calling the *corresponding* underlying
> > > method and doing the conversion on that?
> >
> > The problem here is that Unicode has far more line
> > break characters than plain ASCII. The underlying API would
> > break on ASCII lines (or even worse on those CRLF sequences
> > defined by the C lib), not the ones I need for Unicode.
>
> Hm, can't we just use \n for now?
>
> > BTW, I think that we may need a new Codec class layer
> > here: .readline() et al. are all text based methods,
> > while the Codec base classes clearly work on all kinds of
> > binary and text data.
>
> Not sure what you mean here. Can you explain through an example?
Well, the line concept is really only applicable to text
data. Binary data doesn't have lines and e.g. a ZIP codec
(probably) couldn't implement this kind of method.
As it turns out, only the .writelines() method needs to know
what kinds of input/output data objects are used (and then
only to be able to specify a joining seperator).
I'll just leave things as they are for now: quite shallow
w/r to the class hierarchy.
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/