[Python-Dev] Unicode charmap decoders slow

"Martin v. Löwis" martin at v.loewis.de
Wed Oct 5 00:08:45 CEST 2005


Walter Dörwald wrote:
>> This array would have to be sparse, of course.
> 
> 
> For encoding yes, for decoding no.
[...]
> For decoding it should be sufficient to use a unicode string of  length 
> 256. u"\ufffd" could be used for "maps to undefined". Or the  string 
> might be shorter and byte values greater than the length of  the string 
> are treated as "maps to undefined" too.

Right. That's what I meant with "sparse": you somehow need to represent
"no value".

> This might work, although nobody has complained about charmap  encoding 
> yet. Another option would be to generate a big switch  statement in C 
> and let the compiler decide about the best data  structure.

I would try to avoid generating C code at all costs. Maintaining the 
build processes will just be a nightmare.

Regards,
Martin


More information about the Python-Dev mailing list