[issue11322] encoding package's normalize_encoding() function is too slow

Sat Feb 26 00:06:54 CET 2011

Marc-Andre Lemburg <mal at egenix.com> added the comment:

STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner at haypocalc.com> added the comment:
> 
> We should first implement the same algorithm of the 3 normalization functions and add tests for them (at least for the function in normalization):
> 
>  - normalize_encoding() in encodings: it doesn't convert to lowercase and keep non-ASCII letters
>  - normalize_encoding() in unicodeobject.c
>  - normalizestring() in codecs.c
> 
> normalize_encoding() in encodings is more laxist than the two other functions: it normalizes "  utf   8  " to 'utf_8'. But it doesn't convert to lowercase and keeps non-ASCII letters: "UTF-8é" is normalized "UTF_8é".
> 
> I don't know if the normalization functions have to be more or less strict, but I think that they should all give the same result.

Please see this message for an explanation of why we have those
three functions, why they are different and what their application
space is:

http://bugs.python.org/issue5902#msg129257

This ticket is just about the encoding package's codec search
function, not the other two, and I don't want to change
semantics, just its performance.

----------
title: encoding package's normalize_encoding() function is too slow -> encoding package's normalize_encoding() function is too	slow

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue11322>
_______________________________________