[Python-Dev] [Python-checkins] cpython: Fix _Py_normalize_encoding(): ensure that buffer is big enough to store "utf-8"

Eric V. Smith eric at trueblade.com
Thu Nov 7 23:06:12 CET 2013


On 11/7/2013 4:38 PM, Victor Stinner wrote:
> 2013/11/7 Benjamin Peterson <benjamin at python.org>:
>> 2013/11/7 victor.stinner <python-checkins at python.org>:
>>> http://hg.python.org/cpython/rev/99afa4c74436
>>> changeset:   86995:99afa4c74436
>>> user:        Victor Stinner <victor.stinner at gmail.com>
>>> date:        Thu Nov 07 13:33:36 2013 +0100
>>> summary:
>>>   Fix _Py_normalize_encoding(): ensure that buffer is big enough to store "utf-8"
>>> if the input string is NULL
>>>
>>> files:
>>>   Objects/unicodeobject.c |  2 ++
>>>   1 files changed, 2 insertions(+), 0 deletions(-)
>>>
>>>
>>> diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c
>>> --- a/Objects/unicodeobject.c
>>> +++ b/Objects/unicodeobject.c
>>> @@ -2983,6 +2983,8 @@
>>>      char *l_end;
>>>
>>>      if (encoding == NULL) {
>>> +        if (lower_len < 6)
>>
>> How about doing something like strlen("utf-8") rather than hardcoding that?
> 
> Full code:
> 
>     if (encoding == NULL) {
>         if (lower_len < 6)
>             return 0;
>         strcpy(lower, "utf-8");
>         return 1;
>     }
> 
> On my opinion, it is easy to guess that 6 is len("utf-8") + 1 byte for NUL.
> 
> Calling strlen() at runtime may slow-down a function in the fast-path
> of PyUnicode_Decode() and PyUnicode_AsEncodedString() which are
> important functions. I know that some developers can execute strlen()
> during compilation, but I don't see the need of replacing 6 with
> strlen("utf-8")+1.

Then how about at least a comment about how 6 is derived?

         if (lower_len < 6)   /* 6 == strlen("utf-8") + 1 */
             return 0;


Eric.




More information about the Python-Dev mailing list