[issue19619] Blacklist base64, hex, ... codecs from bytes.decode() and str.encode()

Nick Coghlan report at bugs.python.org
Sat Nov 16 13:44:33 CET 2013


Nick Coghlan added the comment:

Now that I understand Victor's proposal better, I actually agree with it, I just think the attribute names need to be "encodes_to" and "decodes_to".

With Victor's proposal, *input* validity checks (including type checks) would remain the responsibility of the codec itself. What the new attributes would enable is *output* type checks *without having to perform the encoding or decoding operation first*. codecs will be free to leave these as None to retain the current behaviour of "try it and see".

The specific field names "input_type" and "output_type" aren't accurate, since the acceptable input types for encoding or decoding are likely to be more permissive than the specific output type for the other operation. Most of the binary codecs, for example, accept any bytes-like object as input, but produce bytes objects as output for both encoding and decoding. For Unicode encodings, encoding is strictly str->bytes, but decoding is generally the more permissive bytes-like object -> str.

I would still suggest providing the following helper function in the codecs module (the name has changed from my earlier suggestion and I now suggest implementing it in terms of Victor's suggestion with more appropriate field names):

    def is_text_encoding(name):
        """Returns true if the named encoding is a Unicode text encoding"""
        info = codecs.lookup(name)
        return info.encodes_to is bytes and info.decodes_to is str

This approach covers all the current stdlib codecs:

- the text encodings encode to bytes and decode to str
- the binary transforms encode to bytes and also decode to bytes
- the lone text transform (rot_13) encodes and decodes to str

This approach also makes it possible for a type inference engine (like mypy) to potentially analyse codec use, and could be expanded in 3.5 to offer type checked binary and text transform APIs that filtered codecs appropriately according to their output types.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19619>
_______________________________________


More information about the Python-bugs-list mailing list