[Python-Dev] Unicode input issues

M.-A. Lemburg mal@lemburg.com
Mon, 10 Apr 2000 22:34:12 +0200


Guido van Rossum wrote:
> 
> > > Finally, I believe we need a way to discover the encoding used by
> > > stdin or stdout.  I have to admit I know very little about the file
> > > wrappers that Marc wrote -- is it easy to get the encoding out of
> > > them?
> >
> > I'm not sure what you mean: the name of the input encoding ?
> > Currently, only the names of the encoding and decoding functions
> > are available to be queried.
> 
> Whatever is helpful for a module or program that wants to know what
> kind of encoding is used.

Hmm, you mean something like file.encoding ? I'll add some
additional attributes holding the encoding names to the
wrapper classes (they will then be set by the wrapper constructor
functions).

BTW, I've just added .readline() et al. to the codecs...
all except .readline() are easy to do. For .readline() I
simply delegated line breaking to the underlying stream's
.readline() method. This is far from optimal, but better than
not having the method at all.

I also adjusted the interfaces of the .splitlines() methods:
they now take a different optional argument:

"""
S.splitlines([keepends]]) -> list of strings

Return a list of the lines in S, breaking at line boundaries.
Line breaks are not included in the resulting list unless keepends
is given and true.
"""

This made implementing the above methods very simple and
also allows writing codecs working with other basic
storage types (UserString.py anyone ;-).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/