[Python-Dev] [Python-checkins] cpython: Fix _Py_normalize_encoding(): ensure that buffer is big enough to store "utf-8"

Thu Nov 7 22:38:32 CET 2013

2013/11/7 Benjamin Peterson <benjamin at python.org>:
> 2013/11/7 victor.stinner <python-checkins at python.org>:
>> http://hg.python.org/cpython/rev/99afa4c74436
>> changeset:   86995:99afa4c74436
>> user:        Victor Stinner <victor.stinner at gmail.com>
>> date:        Thu Nov 07 13:33:36 2013 +0100
>> summary:
>>   Fix _Py_normalize_encoding(): ensure that buffer is big enough to store "utf-8"
>> if the input string is NULL
>>
>> files:
>>   Objects/unicodeobject.c |  2 ++
>>   1 files changed, 2 insertions(+), 0 deletions(-)
>>
>>
>> diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c
>> --- a/Objects/unicodeobject.c
>> +++ b/Objects/unicodeobject.c
>> @@ -2983,6 +2983,8 @@
>>      char *l_end;
>>
>>      if (encoding == NULL) {
>> +        if (lower_len < 6)
>
> How about doing something like strlen("utf-8") rather than hardcoding that?

Full code:

    if (encoding == NULL) {
        if (lower_len < 6)
            return 0;
        strcpy(lower, "utf-8");
        return 1;
    }

On my opinion, it is easy to guess that 6 is len("utf-8") + 1 byte for NUL.

Calling strlen() at runtime may slow-down a function in the fast-path
of PyUnicode_Decode() and PyUnicode_AsEncodedString() which are
important functions. I know that some developers can execute strlen()
during compilation, but I don't see the need of replacing 6 with
strlen("utf-8")+1.

Victor