[Python-Dev] python3k : imp.find_module raises SyntaxError

Thu Nov 25 18:22:58 CET 2010

On 11/25/2010 08:30 AM, Emile Anclin wrote:
>
> hello,
>
> working on Pylint, we have a lot of voluntary corrupted files to test
> Pylint behavior; for instance
>
> $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py
> # -*- coding: IBO-8859-1 -*-
> """ check correct unknown encoding declaration
> """
>
> __revision__ = 'éééé'
>
>
> and we try to find that module :
> find_module('func_unknown_encoding', None). But python3 raises SyntaxError
> in that case ; it didn't raise SyntaxError on python2 nor does so on our
> func_nonascii_noencoding and func_wrong_encoding modules (with obvious
> names)
>
> Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36)
> [GCC 4.3.4] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> from imp import find_module
>>>> find_module('func_unknown_encoding', None)
> Traceback (most recent call last):
>    File "<stdin>", line 1, in<module>
> SyntaxError: encoding problem: with BOM
>>>> find_module('func_wrong_encoding', None)
> (<_io.TextIOWrapper name=5 encoding='utf-8'>, 'func_wrong_encoding.py',
> ('.py', 'U', 1))
>>>> find_module('func_nonascii_noencoding', None)
> (<_io.TextIOWrapper name=6 encoding='utf-8'>,
> 'func_nonascii_noencoding.py', ('.py', 'U', 1))
>
>
> So what is the reason of this selective behavior?
> Furthermore, there is BOM in our func_unknown_encoding.py module.

I don't think there is a clear reason by design.  Also try importing the 
same modules directly and noting the differences in the errors you get.

For example, the problem that brought this to my attention in python3.2.

 >>> find_module('test/badsyntax_pep3120')
Segmentation fault

 >>> from test import badsyntax_pep3120
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/usr/local/lib/python3.2/test/badsyntax_pep3120.py", line 1
SyntaxError: Non-UTF-8 code starting with '\xf6' in file 
/usr/local/lib/python3.2/test/badsyntax_pep3120.py on line 1, but no 
encoding declared; see http://python.org/dev/peps/pep-0263/ for details

The import statement uses parser.c, and tokenizer.c indirectly, to import a 
file, but the imp module uses tokenizer.c directly.  They aren't consistent 
in how they handle errors because the different error messages are 
generated in different places depending on what the error is, *and* what 
the code path to get to that point was, *and* weather or not a filename was 
set.  For the example above with imp.findmodule(), the filename isn't set, 
so you get a different error than if you used import, which uses the parser 
module and that does set the filename.

 From what I've seen, it would help if the imp module was rewritten to use 
parser.c like the import statement does, rather than tokenizer.c directly. 
The error handling in parser.c is much better than tokenizer.c.  Possibly 
tokenizer.c could be cleaned up after that and be made much simpler.

Ron Adam