[Python-Dev] r87518 - python/branches/py3k/Parser/tokenizer.c

Mon Dec 27 22:22:05 CET 2010

Am 27.12.2010 21:12, schrieb victor.stinner:
> Author: victor.stinner
> Date: Mon Dec 27 21:12:13 2010
> New Revision: 87518
> 
> Log:
> Issue #10778: decoding_fgets() decodes the filename from the filesystem
> encoding instead of UTF-8.
> 
> 
> Modified:
>    python/branches/py3k/Parser/tokenizer.c
> 
> Modified: python/branches/py3k/Parser/tokenizer.c
> ==============================================================================
> --- python/branches/py3k/Parser/tokenizer.c	(original)
> +++ python/branches/py3k/Parser/tokenizer.c	Mon Dec 27 21:12:13 2010
> @@ -545,6 +545,7 @@
>  {
>      char *line = NULL;
>      int badchar = 0;
> +    PyObject *filename;
>      for (;;) {
>          if (tok->decoding_state == STATE_NORMAL) {
>              /* We already have a codec associated with
> @@ -585,12 +586,16 @@
>      if (badchar) {
>          /* Need to add 1 to the line number, since this line
>             has not been counted, yet.  */
> -        PyErr_Format(PyExc_SyntaxError,
> -            "Non-UTF-8 code starting with '\\x%.2x' "
> -            "in file %.200s on line %i, "
> -            "but no encoding declared; "
> -            "see http://python.org/dev/peps/pep-0263/ for details",
> -            badchar, tok->filename, tok->lineno + 1);
> +        filename = PyUnicode_DecodeFSDefault(tok->filename);
> +        if (filename != NULL) {
> +            PyErr_Format(PyExc_SyntaxError,
> +                    "Non-UTF-8 code starting with '\\x%.2x' "
> +                    "in file %.200U on line %i, "
> +                    "but no encoding declared; "
> +                    "see http://python.org/dev/peps/pep-0263/ for details",
> +                    badchar, filename, tok->lineno + 1);
> +            Py_DECREF(filename);
> +        }

Hmm, and in case decoding fails, we return a Unicode error (without context)
instead of a syntax error?  Doesn't seem like a good trade-off when the file
name is just displayed in a message.

Georg