[Python-Dev] python3k : imp.find_module raises SyntaxError
Ron Adam
rrr at ronadam.com
Thu Nov 25 18:22:58 CET 2010
On 11/25/2010 08:30 AM, Emile Anclin wrote:
>
> hello,
>
> working on Pylint, we have a lot of voluntary corrupted files to test
> Pylint behavior; for instance
>
> $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py
> # -*- coding: IBO-8859-1 -*-
> """ check correct unknown encoding declaration
> """
>
> __revision__ = 'éééé'
>
>
> and we try to find that module :
> find_module('func_unknown_encoding', None). But python3 raises SyntaxError
> in that case ; it didn't raise SyntaxError on python2 nor does so on our
> func_nonascii_noencoding and func_wrong_encoding modules (with obvious
> names)
>
> Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36)
> [GCC 4.3.4] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> from imp import find_module
>>>> find_module('func_unknown_encoding', None)
> Traceback (most recent call last):
> File "<stdin>", line 1, in<module>
> SyntaxError: encoding problem: with BOM
>>>> find_module('func_wrong_encoding', None)
> (<_io.TextIOWrapper name=5 encoding='utf-8'>, 'func_wrong_encoding.py',
> ('.py', 'U', 1))
>>>> find_module('func_nonascii_noencoding', None)
> (<_io.TextIOWrapper name=6 encoding='utf-8'>,
> 'func_nonascii_noencoding.py', ('.py', 'U', 1))
>
>
> So what is the reason of this selective behavior?
> Furthermore, there is BOM in our func_unknown_encoding.py module.
I don't think there is a clear reason by design. Also try importing the
same modules directly and noting the differences in the errors you get.
For example, the problem that brought this to my attention in python3.2.
>>> find_module('test/badsyntax_pep3120')
Segmentation fault
>>> from test import badsyntax_pep3120
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.2/test/badsyntax_pep3120.py", line 1
SyntaxError: Non-UTF-8 code starting with '\xf6' in file
/usr/local/lib/python3.2/test/badsyntax_pep3120.py on line 1, but no
encoding declared; see http://python.org/dev/peps/pep-0263/ for details
The import statement uses parser.c, and tokenizer.c indirectly, to import a
file, but the imp module uses tokenizer.c directly. They aren't consistent
in how they handle errors because the different error messages are
generated in different places depending on what the error is, *and* what
the code path to get to that point was, *and* weather or not a filename was
set. For the example above with imp.findmodule(), the filename isn't set,
so you get a different error than if you used import, which uses the parser
module and that does set the filename.
From what I've seen, it would help if the imp module was rewritten to use
parser.c like the import statement does, rather than tokenizer.c directly.
The error handling in parser.c is much better than tokenizer.c. Possibly
tokenizer.c could be cleaned up after that and be made much simpler.
Ron Adam
More information about the Python-Dev
mailing list