[I18n-sig] Storing string encoding information (Pre-PEP: Proposed Python Character Model)

M.-A. Lemburg mal@lemburg.com
Sat, 10 Feb 2001 23:03:51 +0100


"Martin v. Loewis" wrote:
> 
> >               encoded 8-bit string (with encoding
> >                                     information !)
> 
> I'd like to point out that this is something that Bill Janssen always
> wanted to see. In CORBA, they number encodings for efficient
> representation; that's something that Python could do as well. CORBA
> took the OSF charset registry. That was a mistake, they think about
> using the IANA registry now. This registry provides both textual and
> numeric identifiers for encodings (numeric in the form of MIBEnum
> values).

I was thinking of using plain integers which map into a list
of currently used encodings. Every time a new encodings is used,
the new encoding is appended to the list and the new index is used
in the generated string objects.

This allows us to separate the internal representation of the
encoding from an outside view, e.g. there could be translators
which map the integers into IANA identifiers or OSF charset numbers.

We'd have to find a way to store this encoding information in Python
pickles and the marshal format, though... a job for our compression
experts ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/