[New-bugs-announce] [issue10509] PyTokenizer_FindEncoding can lead to a segfault if bad characters are found
report at bugs.python.org
Mon Nov 22 19:37:12 CET 2010
New submission from Andreas Stührk <andy-python at hammerhartes.de>:
If a non-ascii character is found and there isn't an encoding cookie, a SyntaxError is raised (in `decoding_fgets`) that includes the path of the file (using ``tok->filename``), but that path is never set. You can easily reproduce the crash by calling `imp.find_module("badsyntax")`, where "badsyntax" is a Python file containing a non-ascii character (see e.g. the attached unit test), as `find_module` uses `PyTokenizer_FindEncoding`. Note that Python 3.1 uses `snprintf()` for formatting the error message and some implementations of `snprintf()` explicitly check for null pointers, hence it might not crash.
One possible fix is to set ``tok->filename`` to something like "<unknown>". Attached is a patch which does that and adds an unit test for imp.
components: Interpreter Core
title: PyTokenizer_FindEncoding can lead to a segfault if bad characters are found
versions: Python 3.1, Python 3.2
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce