[issue19619] Blacklist base64, hex, ... codecs from bytes.decode() and str.encode()

Fri Nov 22 01:51:20 CET 2013

STINNER Victor added the comment:

> There is no "codec registry" - there is only the default codec search
function, the encodings import namespace, the normalisation algorithm
and the alias dictionary.

interp->codec_search_cache can be seen as the "registry". If you store codecs in two different registries depending a property, attribute, whatever; you keep O(1) complexity (bo extra strcmp or getting an attribute at each lookup). The overhead is only when you load a codec for the first time.

It should not be so hard to add a second dictionary.

You don't need to touch all parts of the codecs machinery, only interp->codec_search_cache.

It would not be possible to have the name in the two registries. So codecs.lookup() would still return any kind of codecs, it would just lookup in two dictionaries instead of one. So codecs.encode/decode would be unchanged too (if you want to keep these functions ;-)).

Only bytes.decode/str.encode would be modified to only lookup in the "text codecs" only registry.

Yet another option: add a new dictionary, but leave interp->codec_search_cache unchanged. Text codecs would also be registered twice: once in interp->codec_search_cache, once in the second dictionary. So bytes.decode/str.encode would only lookup in the text codecs dictionary, instead of interp->codec_search_cache. That's all ;-)

> Victor pointed out this should now raise LookupError rather than TypeError.

If you accept to raise a LookupError, the "two registries" option may become more obvious, isn't it?

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19619>
_______________________________________