[I18n-sig] error handling in charmap-based codecs

M.-A. Lemburg mal@lemburg.com
Thu, 21 Dec 2000 19:46:56 +0100

"Martin v. Loewis" wrote:
> > This is because I wanted to avoid having to put a huge number of
> > mappings to None into the codec dictionaries. This would have
> > caused the codec modules and dictionaries to become much larger
> > than acceptable for the standard distribution.
> I can't see the problem. If KeyError means "character not in the
> target character set", then why exactly would you have to put mappings
> to None into the codec dictionaries? Can you please give an example of
> a mapping that would need to be changed?

A mapping to None means: this mapping is undefined, so raise an
exception. If this were the default, then all cpXXX.py would have
to include all 1-1 mappings explicitely, e.g. 0x0020: 0x0020.
This would cause the tables to enlarge substantially.

To explicitely declare a mapping undefined, you'd have to add
mappings to None. This is what causes the bug you reported on SF.
A proper fix would involve adding the relevant mappings to all
decode maps in the standard codecs.

> > > I can't see any reason for defaulting to *Latin-1*.
> >
> > See above. The encodings using the charmap codec are usually
> > only minor modifications of Latin-1.
> I see, but I don't see. Let's take koi8_r.py as an example. It has a
> complete mapping for the range 128..255, the rest (0..127) is intended
> as a 1:1 mapping. I can't see a problem writing
> decoding_map = codecs.identity_dictionary(range(0,128))
> decoding_map.update({
>         0x0080: 0x2500, #       BOX DRAWINGS LIGHT HORIZONTAL
>         0x0081: 0x2502, #       BOX DRAWINGS LIGHT VERTICAL
> ...
> })
> where codecs.identity_dictionary is defined as
> def identity_dictionary(rng):
>     res = {}
>     for i in rng:res[i]=i
>     return res
> That will produce somewhat larger dictionaries once a codec is *used*,
> but it won't change the distribution significantly.

True; that would be an at runtime possibility -- perhaps
we ought to provide more tools for creating those mapping
tables ?!
> > Huh ? The solution is simple: you only have to add mappings to None
> > as appropriate. There's no need to change the codec.
> So how can I correct the koi8_r codec without changing the C code?

Simple: add the missing mappings to None for the range 0..255.
The mapping lives in the Python module koi8_r.py -- there's
no need to touch any C code.

Marc-Andre Lemburg
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/