[portland] detecting implicit encoding conversion

jason kirtland jek at discorporate.us
Thu Feb 25 02:08:07 CET 2010


On Wed, Feb 24, 2010 at 4:49 PM, Christopher Hiller
<chiller at decipherinc.com> wrote:
> List,
>
> I'm having a difficult time with this particular problem.  I have a codebase
> where I would like to find all occurrences of implicit decodes.  It's
> difficult to do this with grep, and I was wondering if there was another way
> by means of decorators or monkeypatching or compiler/parse tree analysis or
> something.  An example:
>
> foo = u'bar' + 'baz'
>
> This implicitly decodes "baz" using the system default encoding.  In my case
> this encoding is ASCII.
>
> However -- and this is where problems can arise -- what if you had this:
>
> foo = u'bar' + 'büz'
>
> ...which results in a SyntaxError if your default encoding is ASCII.
>
> Any ideas?  I'm having problems googling for solutions because I'm not
> entirely sure what to google for.

I went through this process myself recently.  The path I took was to
switch out the default unicode codec with one that explodes, run the
unit tests, and incrementally fix the problems.  The code is open
source and you can snag it here:

http://bitbucket.org/jek/flatland/src/75d8155a30a2/tests/__init__.py
http://bitbucket.org/jek/flatland/src/75d8155a30a2/tests/_util.py

The short version looks like:

class NoCoercionCodec(codecs.Codec):
    def encode(self, input, errors='string'):
        raise UnicodeError("encoding coercion blocked")

    def decode(self, input, errors='strict'):
        raise UnicodeError("encoding coercion blocked")

The real version is a little longer.  The stdlib does some implicit
conversions, and in my case I didn't want those to explode.


More information about the Portland mailing list