[I18n-sig] Codecs

M.-A. Lemburg mal@lemburg.com
Mon, 05 Jun 2000 15:37:37 +0200

Andy Robinson wrote:
> >
> > Should codecs be returned to the user as objects instead of tuples?
> > Today we have:
> >
> > (UTF8_encode, UTF8_decode,
> >       UTF8_streamreader, UTF8_streamwriter) = codecs.lookup('UTF-8')
> >
> > output = UTF8_streamwriter( open( '/tmp/output', 'wb') )
> >
> > I think this would be a little simpler:
> >
> > output=codecs.lookup('UTF-8').stream_writer( open( '/tmp/output', 'wb')
> > )
> >
> > The object solution is more extensible, requires less "bogus"
> > assignments and does not require the user to remember the order of the
> > return values.
> >
> I suggested this a while back, for a different reason.  Right now you get
> four things back from lookup() relating to the given encoding.  But in many
> cases there may be other encoding-specific routines of great use, and
> returning an object would give us a place to hang them;  codec.repair(...)
> and codec.validate(...), for example.  There are accepted and useful bits of
> code around to repair Shift-JIS or EUC data in which one or two bytes are
> corrupt.  We would also have a place to hang language-specific routines.
> So I would be very, very happy to see codecs.lookup return a 'codec object'
> with the four attributes encode, decode, streamreader() and streamwriter()
> rather than a tuple.

(Please also see my other post on the subject...)

The tuple design was chosen for speed and because of its
simplicity... please remember that much of the codec registry
stuff is written in C and should be easily accessible and
managable from there.

Note that things like "validate" and "repair" can be handled
by providing new error handling codes and then checking for
the encoding/decoding calls for exceptions. 

New functionality can easily be added to the stream read/writer
objects which are returned by the factory functions given in
the tuple -- these also allow keeping state and can work on string
like objects via StringIO.

Perhaps all we need is a simpler interface for codecs.lookup() ? ...
Something like:

encoder = codecs.encoder('utf-8')
# dito for .decoder, .streamwriter, .streamreader

Marc-Andre Lemburg
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/