[Python-Dev] New codecs checked in

Walter Dörwald walter at livinglogic.de
Mon Oct 24 12:17:31 CEST 2005


M.-A. Lemburg wrote:

> Walter Dörwald wrote:
> 
>>Martin v. Löwis wrote:
>>
>>>M.-A. Lemburg wrote:
>>>
>>>>I've checked in a whole bunch of newly generated codecs
>>>>which now make use of the faster charmap decoding variant added
>>>>by Walter a short while ago.
>>>>
>>>>Please let me know if you find any problems.
>>>
>>>I think we should work on eliminating the decoding_map variables.
>>>There are some codecs which rely on them being present in other codecs
>>>(e.g. koi8_u.py is based on koi8_r.py); however, this could be updated
>>>to use, say
>>>
>>>decoding_table = codecs.update_decoding_map(koi8_r.decoding_table, {
>>>         0x00a4: 0x0454, #       CYRILLIC SMALL LETTER UKRAINIAN IE
>>>         0x00a6: 0x0456, #       CYRILLIC SMALL LETTER
>>>BYELORUSSIAN-UKRAINIAN I
>>>         0x00a7: 0x0457, #       CYRILLIC SMALL LETTER YI (UKRAINIAN)
>>>         0x00ad: 0x0491, #       CYRILLIC SMALL LETTER UKRAINIAN GHE
>>>WITH UPTURN
>>>         0x00b4: 0x0404, #       CYRILLIC CAPITAL LETTER UKRAINIAN IE
>>>         0x00b6: 0x0406, #       CYRILLIC CAPITAL LETTER
>>>BYELORUSSIAN-UKRAINIAN I
>>>         0x00b7: 0x0407, #       CYRILLIC CAPITAL LETTER YI (UKRAINIAN)
>>>         0x00bd: 0x0490, #       CYRILLIC CAPITAL LETTER UKRAINIAN GHE
>>>WITH UPTURN
>>>})
>>>
>>>With all these cross-references gone, the decoding_maps could also go.
> 
> I just left them in because I thought they wouldn't do any harm
> and might be useful in some applications.
 >
> Removing them where not directly needed by the codec would not
> be a problem.

Recreating them is quite simple via dict(enumerate(decoding_table)) so I 
think we should remove them.

>>Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put
>>a complete decoding_table into koi8_u.py?
> 
> KOI8-U is not available as mapping on ftp.unicode.org and
> I only recreated codecs from the mapping files available
> there.

OK, so we'd need something that creates a new decoding table from an old 
one + changes, i.e. something like:

def update_decoding_table(table, new):
    table = list[table]
    for (key, value) in new.iteritems():
       table[key] = unichr(value)
    return u"".join(table)

>>I'd like to suggest a small cosmetic change: gencodec.py should output
>>byte values with two hexdigits instead of four. This makes it easier to
>>see what is a byte values and what is a codepoint. And it would make
>>grepping for stuff simpler.
> 
> True.
> 
> I'll rerun the creation with the above changes sometime this
> week.

Great, thanks!

Bye,
    Walter Dörwald


More information about the Python-Dev mailing list