[I18n-sig] error handling in charmap-based codecs
M.-A. Lemburg
mal@lemburg.com
Thu, 21 Dec 2000 19:46:56 +0100
"Martin v. Loewis" wrote:
>
> > This is because I wanted to avoid having to put a huge number of
> > mappings to None into the codec dictionaries. This would have
> > caused the codec modules and dictionaries to become much larger
> > than acceptable for the standard distribution.
>
> I can't see the problem. If KeyError means "character not in the
> target character set", then why exactly would you have to put mappings
> to None into the codec dictionaries? Can you please give an example of
> a mapping that would need to be changed?
A mapping to None means: this mapping is undefined, so raise an
exception. If this were the default, then all cpXXX.py would have
to include all 1-1 mappings explicitely, e.g. 0x0020: 0x0020.
This would cause the tables to enlarge substantially.
To explicitely declare a mapping undefined, you'd have to add
mappings to None. This is what causes the bug you reported on SF.
A proper fix would involve adding the relevant mappings to all
decode maps in the standard codecs.
> > > I can't see any reason for defaulting to *Latin-1*.
> >
> > See above. The encodings using the charmap codec are usually
> > only minor modifications of Latin-1.
>
> I see, but I don't see. Let's take koi8_r.py as an example. It has a
> complete mapping for the range 128..255, the rest (0..127) is intended
> as a 1:1 mapping. I can't see a problem writing
>
> decoding_map = codecs.identity_dictionary(range(0,128))
> decoding_map.update({
>
> 0x0080: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
> 0x0081: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
> ...
> })
>
> where codecs.identity_dictionary is defined as
>
> def identity_dictionary(rng):
> res = {}
> for i in rng:res[i]=i
> return res
>
> That will produce somewhat larger dictionaries once a codec is *used*,
> but it won't change the distribution significantly.
True; that would be an at runtime possibility -- perhaps
we ought to provide more tools for creating those mapping
tables ?!
> > Huh ? The solution is simple: you only have to add mappings to None
> > as appropriate. There's no need to change the codec.
>
> So how can I correct the koi8_r codec without changing the C code?
Simple: add the missing mappings to None for the range 0..255.
The mapping lives in the Python module koi8_r.py -- there's
no need to touch any C code.
--
Marc-Andre Lemburg
______________________________________________________________________
Company: http://www.egenix.com/
Consulting: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/