[Python-Dev] Unicode charmap decoders slow

"Martin v. Löwis" martin at v.loewis.de
Thu Oct 6 09:04:14 CEST 2005


Hye-Shik Chang wrote:
> If the encoding optimization can be easily done in Walter's approach,
> the fastmap codec would be too expensive way for the objective because
> we must maintain not only fastmap but also charmap for backward
> compatibility.

IMO, whether a new function is added or whether the existing function
becomes polymorphic (depending on the type of table being passed) is
a minor issue. Clearly, the charmap API needs to stay for backwards
compatibility; in terms of code size or maintenance, I would actually
prefer separate functions.

One issue apparently is people tweaking the existing dictionaries,
with additional entries they think belong there. I don't think we
need to preserve compatibility with that approach in 2.5, but I
also think that breakage should be obvious: the dictionary should
either go away completely at run-time, or be stored under a
different name, so that any attempt of modifying the dictionary
gives an exception instead of having no interesting effect.

I envision a layout of the codec files like this:

decoding_dict = ...
decoding_map, encoding_map = codecs.make_lookup_tables(decoding_dict)

I think it should be possible to build efficient tables in a single
pass over the dictionary, so startup time should be fairly small
(given that the dictionaries are currently built incrementally, anyway,
due to the way dictionary literals work).

Regards,
Martin


More information about the Python-Dev mailing list