[Python-Dev] New codecs checked in
Walter Dörwald
walter at livinglogic.de
Mon Oct 24 11:00:42 CEST 2005
Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
>
>>I've checked in a whole bunch of newly generated codecs
>>which now make use of the faster charmap decoding variant added
>>by Walter a short while ago.
>>
>>Please let me know if you find any problems.
>
> I think we should work on eliminating the decoding_map variables.
> There are some codecs which rely on them being present in other codecs
> (e.g. koi8_u.py is based on koi8_r.py); however, this could be updated
> to use, say
>
> decoding_table = codecs.update_decoding_map(koi8_r.decoding_table, {
> 0x00a4: 0x0454, # CYRILLIC SMALL LETTER UKRAINIAN IE
> 0x00a6: 0x0456, # CYRILLIC SMALL LETTER
> BYELORUSSIAN-UKRAINIAN I
> 0x00a7: 0x0457, # CYRILLIC SMALL LETTER YI (UKRAINIAN)
> 0x00ad: 0x0491, # CYRILLIC SMALL LETTER UKRAINIAN GHE
> WITH UPTURN
> 0x00b4: 0x0404, # CYRILLIC CAPITAL LETTER UKRAINIAN IE
> 0x00b6: 0x0406, # CYRILLIC CAPITAL LETTER
> BYELORUSSIAN-UKRAINIAN I
> 0x00b7: 0x0407, # CYRILLIC CAPITAL LETTER YI (UKRAINIAN)
> 0x00bd: 0x0490, # CYRILLIC CAPITAL LETTER UKRAINIAN GHE
> WITH UPTURN
> })
>
> With all these cross-references gone, the decoding_maps could also go.
Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put
a complete decoding_table into koi8_u.py?
I'd like to suggest a small cosmetic change: gencodec.py should output
byte values with two hexdigits instead of four. This makes it easier to
see what is a byte values and what is a codepoint. And it would make
grepping for stuff simpler.
I.e. change:
decoding_map.update({
0x0080: 0x0402, # CYRILLIC CAPITAL LETTER DJE
to
decoding_map.update({
0x80: 0x0402, # CYRILLIC CAPITAL LETTER DJE
and
decoding_table = (
u'\x00' # 0x0000 -> NULL
to
decoding_table = (
u'\x00' # 0x00 -> U+0000 NULL
and
encoding_map = {
0x0000: 0x0000, # NULL
to
encoding_map = {
0x0000: 0x00, # NULL
More information about the Python-Dev
mailing list