
unicodeobject.c contains this code: PyErr_Format(PyExc_ValueError, "unsupported format character '%c' (0x%x) " "at index %i", c, c, fmt -1 - PyUnicode_AS_UNICODE(uformat)); c is a Py_UNICODE; applying C's %c to it only takes the lowest 8 bits, so '%\u3000' % 1 results in an error message containing "'\000' (0x3000)". Is this worth fixing? I'd say no, since the hex value is more useful for Unicode strings anyway. (I still wanted to mention this little buglet, since I just touched this bit of code.) --amk

Sounds like the '%c' should just be deleted. --Guido van Rossum (home page: http://www.python.org/~guido/)

"A.M. Kuchling" wrote:
Why would you want to fix it ? Format characters will always be ASCII and thus 7-bit -- theres really no need to expand the set of possibilities beyond 8 bits ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

[MAL]
[AMK]
This message is for characters that aren't format characters, which therefore includes all characters >127.
I'm with the wise man who suggested to drop the %c in this case and just display the hex value. Although it would be more readable to drop the %c if and only if the bogus format character isn't printable 7-bit ASCII. Which is obvious, yes? A new if/else isn't going to hurt anything.

"M.-A. Lemburg" <mal@lemburg.com>:
But the error message is being produced because the character is NOT a valid format character. One of the reasons for that might be because it's not in the 7-bit range! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Greg Ewing wrote:
True. I think removing %c completely in that case is the right solution (in case you don't want to convert the Unicode char using the default encoding to a string first). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

mal wrote:
how likely is it that a human programmer will use a bad formatting character that's not in the ASCII range? -1 on removing it -- people shouldn't have to learn the octal ASCII table just to be able to fix trivial typos. +1 on mapping the character back to a string in the same was as "repr" -- that is, print ASCII characters as is, map anything else to an octal escape. +0 on leaving it as it is, or mapping non-printables to "?". </F>

Fredrik Lundh wrote:
Not very likely... the most common case of this error is probably the use of % as percent sign in a formatting string. The next character in those cases is usually whitespace.
Agreed. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Sounds like the '%c' should just be deleted. --Guido van Rossum (home page: http://www.python.org/~guido/)

"A.M. Kuchling" wrote:
Why would you want to fix it ? Format characters will always be ASCII and thus 7-bit -- theres really no need to expand the set of possibilities beyond 8 bits ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

[MAL]
[AMK]
This message is for characters that aren't format characters, which therefore includes all characters >127.
I'm with the wise man who suggested to drop the %c in this case and just display the hex value. Although it would be more readable to drop the %c if and only if the bogus format character isn't printable 7-bit ASCII. Which is obvious, yes? A new if/else isn't going to hurt anything.

"M.-A. Lemburg" <mal@lemburg.com>:
But the error message is being produced because the character is NOT a valid format character. One of the reasons for that might be because it's not in the 7-bit range! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Greg Ewing wrote:
True. I think removing %c completely in that case is the right solution (in case you don't want to convert the Unicode char using the default encoding to a string first). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

mal wrote:
how likely is it that a human programmer will use a bad formatting character that's not in the ASCII range? -1 on removing it -- people shouldn't have to learn the octal ASCII table just to be able to fix trivial typos. +1 on mapping the character back to a string in the same was as "repr" -- that is, print ASCII characters as is, map anything else to an octal escape. +0 on leaving it as it is, or mapping non-printables to "?". </F>

Fredrik Lundh wrote:
Not very likely... the most common case of this error is probably the use of % as percent sign in a formatting string. The next character in those cases is usually whitespace.
Agreed. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
participants (7)
-
A.M. Kuchling
-
Andrew Kuchling
-
Fredrik Lundh
-
Greg Ewing
-
Guido van Rossum
-
M.-A. Lemburg
-
Tim Peters