Re: [Python-Dev] r87518 - python/branches/py3k/Parser/tokenizer.c
Am 27.12.2010 21:12, schrieb victor.stinner:
Author: victor.stinner Date: Mon Dec 27 21:12:13 2010 New Revision: 87518
Log: Issue #10778: decoding_fgets() decodes the filename from the filesystem encoding instead of UTF-8.
Modified: python/branches/py3k/Parser/tokenizer.c
Modified: python/branches/py3k/Parser/tokenizer.c ============================================================================== --- python/branches/py3k/Parser/tokenizer.c (original) +++ python/branches/py3k/Parser/tokenizer.c Mon Dec 27 21:12:13 2010 @@ -545,6 +545,7 @@ { char *line = NULL; int badchar = 0; + PyObject *filename; for (;;) { if (tok->decoding_state == STATE_NORMAL) { /* We already have a codec associated with @@ -585,12 +586,16 @@ if (badchar) { /* Need to add 1 to the line number, since this line has not been counted, yet. */ - PyErr_Format(PyExc_SyntaxError, - "Non-UTF-8 code starting with '\\x%.2x' " - "in file %.200s on line %i, " - "but no encoding declared; " - "see http://python.org/dev/peps/pep-0263/ for details", - badchar, tok->filename, tok->lineno + 1); + filename = PyUnicode_DecodeFSDefault(tok->filename); + if (filename != NULL) { + PyErr_Format(PyExc_SyntaxError, + "Non-UTF-8 code starting with '\\x%.2x' " + "in file %.200U on line %i, " + "but no encoding declared; " + "see http://python.org/dev/peps/pep-0263/ for details", + badchar, filename, tok->lineno + 1); + Py_DECREF(filename); + }
Hmm, and in case decoding fails, we return a Unicode error (without context) instead of a syntax error? Doesn't seem like a good trade-off when the file name is just displayed in a message. Georg
Le lundi 27 décembre 2010 à 22:22 +0100, Georg Brandl a écrit :
Am 27.12.2010 21:12, schrieb victor.stinner:
Author: victor.stinner Date: Mon Dec 27 21:12:13 2010 New Revision: 87518
Log: Issue #10778: decoding_fgets() decodes the filename from the filesystem encoding instead of UTF-8.
Hmm, and in case decoding fails, we return a Unicode error (without context) instead of a syntax error?
Yes, but it is very unlikely. I don't see in which case the decoder can fail. But a memory error can occur.
Doesn't seem like a good trade-off when the file name is just displayed in a message.
What do you suggest? --
(oops, I posted an incomplete message, stupid mailer) Le lundi 27 décembre 2010 à 22:22 +0100, Georg Brandl a écrit :
Am 27.12.2010 21:12, schrieb victor.stinner:
Author: victor.stinner Date: Mon Dec 27 21:12:13 2010 New Revision: 87518
Log: Issue #10778: decoding_fgets() decodes the filename from the filesystem encoding instead of UTF-8.
Hmm, and in case decoding fails, we return a Unicode error (without context) instead of a syntax error?
Yes, but it is very unlikely. I don't see in which case the decoder can fail. But a memory error can occur.
Doesn't seem like a good trade-off when the file name is just displayed in a message.
What do you suggest? -- Prepare the decoded filename in PyParser_ParseStringFlagsFilenameEx() and PyParser_ParseFileFlagsEx() avoids this issue. Victor
Am 28.12.2010 01:07, schrieb Victor Stinner:
(oops, I posted an incomplete message, stupid mailer)
Le lundi 27 décembre 2010 à 22:22 +0100, Georg Brandl a écrit :
Am 27.12.2010 21:12, schrieb victor.stinner:
Author: victor.stinner Date: Mon Dec 27 21:12:13 2010 New Revision: 87518
Log: Issue #10778: decoding_fgets() decodes the filename from the filesystem encoding instead of UTF-8.
Hmm, and in case decoding fails, we return a Unicode error (without context) instead of a syntax error?
Yes, but it is very unlikely. I don't see in which case the decoder can fail. But a memory error can occur.
Doesn't seem like a good trade-off when the file name is just displayed in a message.
What do you suggest?
If the probability is so low, it's probably not worth changing. I'm just somewhat sensitive to changes that enforce correctness by taking away useful information from the user. Georg
Le mardi 28 décembre 2010 à 10:12 +0100, Georg Brandl a écrit :
Author: victor.stinner Date: Mon Dec 27 21:12:13 2010 New Revision: 87518
Log: Issue #10778: decoding_fgets() decodes the filename from the filesystem encoding instead of UTF-8.
Hmm, and in case decoding fails, we return a Unicode error (without context) instead of a syntax error?
I created an issue for this problem, but also to prepare the full support of unicode in the import machinery. I patched the import machinery of Python 3 to support undecodable bytes, but Python 3 doesn't support unencodable characters on Windows (see issue #3080). http://bugs.python.org/issue10785 Victor
participants (2)
-
Georg Brandl
-
Victor Stinner