
Hi everybody, I've just uploaded a new snapshot to the secret URL. New in this snapshot is a generic character mapping codec which can decode and encode a large number of code pages used on PCs and Macs. I used a Unicode mapping file parser to automatically generate the codecs from the mapping files available at http://www.unicode.org/ and then included all those files which use less than 10k for the Python source code (with comments). These codecs are thus available and need some serious testing: cp855.py iso_8859_6.py cp856.py iso_8859_7.py ascii.py cp857.py iso_8859_8.py charmap.py cp860.py iso_8859_9.py cp1006.py cp861.py koi8_r.py cp1250.py cp862.py latin_1.py cp1251.py cp863.py mac_cyrillic.py cp1252.py cp864.py mac_greek.py cp1253.py cp865.py mac_iceland.py cp1254.py cp866.py mac_latin2.py cp1255.py cp869.py mac_roman.py cp1256.py cp874.py mac_turkish.py cp1257.py iso_8859_10.py raw_unicode_escape.py cp1258.py iso_8859_13.py unicode_escape.py cp424.py iso_8859_14.py unicode_internal.py cp437.py iso_8859_15.py utf_16.py cp737.py iso_8859_2.py utf_16_be.py cp775.py iso_8859_3.py utf_16_le.py cp850.py iso_8859_4.py utf_8.py cp852.py iso_8859_5.py All these codecs are stored in the encodings package of the standard lib and directly useable via the unicode(input, encoding) and u"abc".encode(encoding) APIs. I would like some feedback on which of these code pages are really in common use... we could make all not so common ones available as separate package then. Also, I'm curious if we should rename the cpXXX.py files to cp_XXX.py or not (or whether to just add aliases to the encodings/aliases.py file for them). The naming scheme usually defines letters-numbers-etc. but for code pages the above names are quite common. Another feature of the patch is that it has some optimizations for short Unicode strings. Unfortunately, the implementation still has some bugs, so it is currently disabled. To reenable it, edit the file Objects/unicodeobject.c and set e.g. #define STAYALIVE_SIZE_LIMIT 5 This will cause to the Unicode objects on the free list having a size below or equal to this limit to stay alive even when on the free list. Note that this is the final patch for the next week. I'll be offline until 2000-02-28 and then hope to make some serious progress on documenting the different parts (most docs are still buried in the C and header files and the unicode proposal which is included in the file Misc/unicode.txt). Now it's up to you to give the code the final swirl... :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
participants (1)
-
M.-A. Lemburg