[issue7330] PyUnicode_FromFormat segfault

Mon Feb 21 04:18:05 CET 2011

Ray.Allen <ysj.ray at gmail.com> added the comment:

> > > With your patch, "%.200s" truncates the input string to 200 *characters*, but I think that it should truncate to 200 *bytes*, as printf does.
> > 
> > Sorry, I don't understand. The result of PyUnicode_FromFormatV() is a unicode object. Then how to truncate to 200 *bytes*?

> You can truncate the input char* on the call to PyUnicode_DecodeUTF8:
pass a size smaller than strlen(s).

Now I wonder how should we treat precision formatters of '%s'. First of all, the PyUnicode_FromFormat() should behave like C printf(). In C printf(), the precision formatter of %s is to specify a maximum width of the displayed result. If final result is longer than that value, it must be truncated. That means the precision is applied on the final result. While python's PyUnicode_FromFormat() is to produce unicode strings, so the width and precision formatter should be applied on the final unicode string result. And the format stage is split into two ones, one is converting each paramater to an unicode string, another one is to put the width and precision formatters on them. So I wonder if we should apply the precision formatter on the converting stage, that is, to PyUnicode_DecodeUTF8(). So in my opinion precision should not be applied to input chars, but output unicodes.

I hope I didn't misunderstand something.

So haypo, what's your opinion.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue7330>
_______________________________________