[Python-Dev] New codecs checked in

Mon Oct 24 11:00:42 CEST 2005

Martin v. Löwis wrote:

> M.-A. Lemburg wrote:
> 
>>I've checked in a whole bunch of newly generated codecs
>>which now make use of the faster charmap decoding variant added
>>by Walter a short while ago.
>>
>>Please let me know if you find any problems.
> 
> I think we should work on eliminating the decoding_map variables.
> There are some codecs which rely on them being present in other codecs
> (e.g. koi8_u.py is based on koi8_r.py); however, this could be updated
> to use, say
> 
> decoding_table = codecs.update_decoding_map(koi8_r.decoding_table, {
>          0x00a4: 0x0454, #       CYRILLIC SMALL LETTER UKRAINIAN IE
>          0x00a6: 0x0456, #       CYRILLIC SMALL LETTER 
> BYELORUSSIAN-UKRAINIAN I
>          0x00a7: 0x0457, #       CYRILLIC SMALL LETTER YI (UKRAINIAN)
>          0x00ad: 0x0491, #       CYRILLIC SMALL LETTER UKRAINIAN GHE 
> WITH UPTURN
>          0x00b4: 0x0404, #       CYRILLIC CAPITAL LETTER UKRAINIAN IE
>          0x00b6: 0x0406, #       CYRILLIC CAPITAL LETTER 
> BYELORUSSIAN-UKRAINIAN I
>          0x00b7: 0x0407, #       CYRILLIC CAPITAL LETTER YI (UKRAINIAN)
>          0x00bd: 0x0490, #       CYRILLIC CAPITAL LETTER UKRAINIAN GHE 
> WITH UPTURN
> })
> 
> With all these cross-references gone, the decoding_maps could also go.

Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put 
a complete decoding_table into koi8_u.py?

I'd like to suggest a small cosmetic change: gencodec.py should output 
byte values with two hexdigits instead of four. This makes it easier to 
see what is a byte values and what is a codepoint. And it would make 
grepping for stuff simpler.

I.e. change:

decoding_map.update({
     0x0080: 0x0402, #  CYRILLIC CAPITAL LETTER DJE

to

decoding_map.update({
     0x80: 0x0402, #  CYRILLIC CAPITAL LETTER DJE

and

decoding_table = (
     u'\x00' #  0x0000 -> NULL

to

decoding_table = (
     u'\x00' # 0x00 -> U+0000 NULL

and

encoding_map = {
     0x0000: 0x0000, #  NULL

to

encoding_map = {
     0x0000: 0x00, #  NULL