[issue6697] Python 3.1 segfaults when invalid UTF-8 characters are passed from command line

Martin v. Löwis report at bugs.python.org
Wed Aug 19 22:20:06 CEST 2009


Martin v. Löwis <martin at v.loewis.de> added the comment:

> It would be unfortunate to replace all usages of _PyUnicode_AsString to
> check the return value.

I agree with MAL: we do need to check for errors returned from
_PyUnicode_AsString, and it would be best if we created a fail-safe
version of it.

In the specific case (getattr), it might also be useful to create a
result that is unicode-escaped, i.e. with \u escapes for all non-ASCII
non-printable characters.

For _PyUnicode_AsString, I'm uncertain whether supporting half
surrogates is a good idea. Unless there is a compelling reason to
support them, I think we leave that as-is. Your example is not
compelling: I think the unicode string should be escaped, anyway.

The OP's case is also not compelling, we should print an error
message that the source code is incorrectly encoded.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue6697>
_______________________________________


More information about the Python-bugs-list mailing list