[Python-Dev] New codecs checked in
Walter Dörwald
walter at livinglogic.de
Mon Oct 24 12:17:31 CEST 2005
M.-A. Lemburg wrote:
> Walter Dörwald wrote:
>
>>Martin v. Löwis wrote:
>>
>>>M.-A. Lemburg wrote:
>>>
>>>>I've checked in a whole bunch of newly generated codecs
>>>>which now make use of the faster charmap decoding variant added
>>>>by Walter a short while ago.
>>>>
>>>>Please let me know if you find any problems.
>>>
>>>I think we should work on eliminating the decoding_map variables.
>>>There are some codecs which rely on them being present in other codecs
>>>(e.g. koi8_u.py is based on koi8_r.py); however, this could be updated
>>>to use, say
>>>
>>>decoding_table = codecs.update_decoding_map(koi8_r.decoding_table, {
>>> 0x00a4: 0x0454, # CYRILLIC SMALL LETTER UKRAINIAN IE
>>> 0x00a6: 0x0456, # CYRILLIC SMALL LETTER
>>>BYELORUSSIAN-UKRAINIAN I
>>> 0x00a7: 0x0457, # CYRILLIC SMALL LETTER YI (UKRAINIAN)
>>> 0x00ad: 0x0491, # CYRILLIC SMALL LETTER UKRAINIAN GHE
>>>WITH UPTURN
>>> 0x00b4: 0x0404, # CYRILLIC CAPITAL LETTER UKRAINIAN IE
>>> 0x00b6: 0x0406, # CYRILLIC CAPITAL LETTER
>>>BYELORUSSIAN-UKRAINIAN I
>>> 0x00b7: 0x0407, # CYRILLIC CAPITAL LETTER YI (UKRAINIAN)
>>> 0x00bd: 0x0490, # CYRILLIC CAPITAL LETTER UKRAINIAN GHE
>>>WITH UPTURN
>>>})
>>>
>>>With all these cross-references gone, the decoding_maps could also go.
>
> I just left them in because I thought they wouldn't do any harm
> and might be useful in some applications.
>
> Removing them where not directly needed by the codec would not
> be a problem.
Recreating them is quite simple via dict(enumerate(decoding_table)) so I
think we should remove them.
>>Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put
>>a complete decoding_table into koi8_u.py?
>
> KOI8-U is not available as mapping on ftp.unicode.org and
> I only recreated codecs from the mapping files available
> there.
OK, so we'd need something that creates a new decoding table from an old
one + changes, i.e. something like:
def update_decoding_table(table, new):
table = list[table]
for (key, value) in new.iteritems():
table[key] = unichr(value)
return u"".join(table)
>>I'd like to suggest a small cosmetic change: gencodec.py should output
>>byte values with two hexdigits instead of four. This makes it easier to
>>see what is a byte values and what is a codepoint. And it would make
>>grepping for stuff simpler.
>
> True.
>
> I'll rerun the creation with the above changes sometime this
> week.
Great, thanks!
Bye,
Walter Dörwald
More information about the Python-Dev
mailing list