[New-bugs-announce] [issue11322] encoding package's normalize_encoding() function is too slow

Marc-Andre Lemburg report at bugs.python.org
Fri Feb 25 16:55:32 CET 2011

New submission from Marc-Andre Lemburg <mal at egenix.com>:

I don't know who changed the encoding's package normalize_encoding() function (wasn't me), but it's a really slow implementation.

The original version used the .translate() method which is a lot faster and can be adapted to work with the Unicode variant of the .translate() method just as well.

_norm_encoding_map = ('                                              . '
                      '0123456789       ABCDEFGHIJKLMNOPQRSTUVWXYZ     '
                      ' abcdefghijklmnopqrstuvwxyz                     '
                      '                                                '
                      '                                                '
                      '                ')

def normalize_encoding(encoding):

    """ Normalize an encoding name.

        Normalization works as follows: all non-alphanumeric
        characters except the dot used for Python package names are
        collapsed and replaced with a single underscore, e.g. '  -;#'
        becomes '_'. Leading and trailing underscores are removed.

        Note that encoding names should be ASCII only; if they do use
        non-ASCII characters, these must be Latin-1 compatible.

    # Make sure we have an 8-bit string, because .translate() works
    # differently for Unicode strings.
    if hasattr(__builtin__, "unicode") and isinstance(encoding, unicode):
        # Note that .encode('latin-1') does *not* use the codec
        # registry, so this call doesn't recurse. (See unicodeobject.c
        # PyUnicode_AsEncodedString() for details)
        encoding = encoding.encode('latin-1')
    return '_'.join(encoding.translate(_norm_encoding_map).split())

components: Unicode
messages: 129386
nosy: lemburg
priority: normal
severity: normal
status: open
title: encoding package's normalize_encoding() function is too slow
type: performance
versions: Python 3.3

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list