Am 27.12.2010 21:12, schrieb victor.stinner:
Author: victor.stinner Date: Mon Dec 27 21:12:13 2010 New Revision: 87518
Log: Issue #10778: decoding_fgets() decodes the filename from the filesystem encoding instead of UTF-8.
Modified: python/branches/py3k/Parser/tokenizer.c
Modified: python/branches/py3k/Parser/tokenizer.c ============================================================================== --- python/branches/py3k/Parser/tokenizer.c (original) +++ python/branches/py3k/Parser/tokenizer.c Mon Dec 27 21:12:13 2010 @@ -545,6 +545,7 @@ { char *line = NULL; int badchar = 0; + PyObject *filename; for (;;) { if (tok->decoding_state == STATE_NORMAL) { /* We already have a codec associated with @@ -585,12 +586,16 @@ if (badchar) { /* Need to add 1 to the line number, since this line has not been counted, yet. */ - PyErr_Format(PyExc_SyntaxError, - "Non-UTF-8 code starting with '\\x%.2x' " - "in file %.200s on line %i, " - "but no encoding declared; " - "see http://python.org/dev/peps/pep-0263/ for details", - badchar, tok->filename, tok->lineno + 1); + filename = PyUnicode_DecodeFSDefault(tok->filename); + if (filename != NULL) { + PyErr_Format(PyExc_SyntaxError, + "Non-UTF-8 code starting with '\\x%.2x' " + "in file %.200U on line %i, " + "but no encoding declared; " + "see http://python.org/dev/peps/pep-0263/ for details", + badchar, filename, tok->lineno + 1); + Py_DECREF(filename); + }
Hmm, and in case decoding fails, we return a Unicode error (without context) instead of a syntax error? Doesn't seem like a good trade-off when the file name is just displayed in a message. Georg