[Python-Dev] cpython: PyUnicode_Join() checks output length in debug mode

Tue Oct 4 23:41:54 CEST 2011

On 10/03/11 23:35, victor.stinner wrote:
> http://hg.python.org/cpython/rev/bfd8b5d35f9c
> changeset:   72623:bfd8b5d35f9c
> user:        Victor Stinner <victor.stinner at haypocalc.com>
> date:        Mon Oct 03 23:36:02 2011 +0200
> summary:
>   PyUnicode_Join() checks output length in debug mode
> 
> PyUnicode_CopyCharacters() may copies less character than requested size, if
> the input string is smaller than the argument. (This is very unlikely, but who
> knows!?)
> 
> Avoid also calling PyUnicode_CopyCharacters() if the string is empty.
> 
> files:
>   Objects/unicodeobject.c |  34 +++++++++++++++++++---------
>   1 files changed, 23 insertions(+), 11 deletions(-)
> 
> 
> diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c
> --- a/Objects/unicodeobject.c
> +++ b/Objects/unicodeobject.c
> @@ -8890,20 +8890,32 @@
>  
>      /* Catenate everything. */
>      for (i = 0, res_offset = 0; i < seqlen; ++i) {
> -        Py_ssize_t itemlen;
> +        Py_ssize_t itemlen, copied;
>          item = items[i];
> +        /* Copy item, and maybe the separator. */
> +        if (i && seplen != 0) {
> +            copied = PyUnicode_CopyCharacters(res, res_offset,
> +                                              sep, 0, seplen);
> +            if (copied < 0)
> +                goto onError;
> +#ifdef Py_DEBUG
> +            res_offset += copied;
> +#else
> +            res_offset += seplen;
> +#endif
> +        }
>          itemlen = PyUnicode_GET_LENGTH(item);
> -        /* Copy item, and maybe the separator. */
> -        if (i) {
> -            if (PyUnicode_CopyCharacters(res, res_offset,
> -                                         sep, 0, seplen) < 0)
> +        if (itemlen != 0) {
> +            copied = PyUnicode_CopyCharacters(res, res_offset,
> +                                              item, 0, itemlen);
> +            if (copied < 0)
>                  goto onError;
> -            res_offset += seplen;
> -        }
> -        if (PyUnicode_CopyCharacters(res, res_offset,
> -                                     item, 0, itemlen) < 0)
> -            goto onError;
> -        res_offset += itemlen;
> +#ifdef Py_DEBUG
> +            res_offset += copied;
> +#else
> +            res_offset += itemlen;
> +#endif
> +        }
>      }
>      assert(res_offset == PyUnicode_GET_LENGTH(res));

I don't understand this change. Why would you not always add "copied" once you
already have it? It seems to be the more correct version anyway.

Georg