python3k : imp.find_module raises SyntaxError

hello, working on Pylint, we have a lot of voluntary corrupted files to test Pylint behavior; for instance $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py # -*- coding: IBO-8859-1 -*- """ check correct unknown encoding declaration """ __revision__ = 'éééé' and we try to find that module : find_module('func_unknown_encoding', None). But python3 raises SyntaxError in that case ; it didn't raise SyntaxError on python2 nor does so on our func_nonascii_noencoding and func_wrong_encoding modules (with obvious names) Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information.
from imp import find_module find_module('func_unknown_encoding', None) Traceback (most recent call last): File "<stdin>", line 1, in <module> SyntaxError: encoding problem: with BOM find_module('func_wrong_encoding', None) (<_io.TextIOWrapper name=5 encoding='utf-8'>, 'func_wrong_encoding.py', ('.py', 'U', 1)) find_module('func_nonascii_noencoding', None) (<_io.TextIOWrapper name=6 encoding='utf-8'>, 'func_nonascii_noencoding.py', ('.py', 'U', 1))
So what is the reason of this selective behavior? Furthermore, there is BOM in our func_unknown_encoding.py module. -- Emile Anclin <emile.anclin@logilab.fr> http://www.logilab.fr/ http://www.logilab.org/ Informatique scientifique & et gestion de connaissances

On 11/25/2010 08:30 AM, Emile Anclin wrote:
hello,
working on Pylint, we have a lot of voluntary corrupted files to test Pylint behavior; for instance
$ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py # -*- coding: IBO-8859-1 -*- """ check correct unknown encoding declaration """
__revision__ = 'éééé'
and we try to find that module : find_module('func_unknown_encoding', None). But python3 raises SyntaxError in that case ; it didn't raise SyntaxError on python2 nor does so on our func_nonascii_noencoding and func_wrong_encoding modules (with obvious names)
Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information.
from imp import find_module find_module('func_unknown_encoding', None) Traceback (most recent call last): File "<stdin>", line 1, in<module> SyntaxError: encoding problem: with BOM find_module('func_wrong_encoding', None) (<_io.TextIOWrapper name=5 encoding='utf-8'>, 'func_wrong_encoding.py', ('.py', 'U', 1)) find_module('func_nonascii_noencoding', None) (<_io.TextIOWrapper name=6 encoding='utf-8'>, 'func_nonascii_noencoding.py', ('.py', 'U', 1))
So what is the reason of this selective behavior? Furthermore, there is BOM in our func_unknown_encoding.py module.
I don't think there is a clear reason by design. Also try importing the same modules directly and noting the differences in the errors you get. For example, the problem that brought this to my attention in python3.2.
find_module('test/badsyntax_pep3120') Segmentation fault
from test import badsyntax_pep3120 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.2/test/badsyntax_pep3120.py", line 1 SyntaxError: Non-UTF-8 code starting with '\xf6' in file /usr/local/lib/python3.2/test/badsyntax_pep3120.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
The import statement uses parser.c, and tokenizer.c indirectly, to import a file, but the imp module uses tokenizer.c directly. They aren't consistent in how they handle errors because the different error messages are generated in different places depending on what the error is, *and* what the code path to get to that point was, *and* weather or not a filename was set. For the example above with imp.findmodule(), the filename isn't set, so you get a different error than if you used import, which uses the parser module and that does set the filename. From what I've seen, it would help if the imp module was rewritten to use parser.c like the import statement does, rather than tokenizer.c directly. The error handling in parser.c is much better than tokenizer.c. Possibly tokenizer.c could be cleaned up after that and be made much simpler. Ron Adam

On 25 novembre 11:22, Ron Adam wrote:
On 11/25/2010 08:30 AM, Emile Anclin wrote:
hello,
working on Pylint, we have a lot of voluntary corrupted files to test Pylint behavior; for instance
$ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py # -*- coding: IBO-8859-1 -*- """ check correct unknown encoding declaration """
__revision__ = 'éééé'
and we try to find that module : find_module('func_unknown_encoding', None). But python3 raises SyntaxError in that case ; it didn't raise SyntaxError on python2 nor does so on our func_nonascii_noencoding and func_wrong_encoding modules (with obvious names)
Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information.
from imp import find_module find_module('func_unknown_encoding', None) Traceback (most recent call last): File "<stdin>", line 1, in<module> SyntaxError: encoding problem: with BOM
I don't think there is a clear reason by design. Also try importing the same modules directly and noting the differences in the errors you get.
IMO the point is that we can consider as a bug the fact that find_module tries to somewhat read the content of the file, no? Though it seems to only doing this for encoding detection or like since find_module doesn't choke on a module containing another kind of syntax error. So the question is, should we deal with this in pylint/astng, or can we expect this to be fixed at some point? -- Sylvain Thénault LOGILAB, Paris (France) Formations Python, Debian, Méth. Agiles: http://www.logilab.fr/formations Développement logiciel sur mesure: http://www.logilab.fr/services CubicWeb, the semantic web framework: http://www.cubicweb.org

On Mon, Nov 29, 2010 at 03:53, Sylvain Thénault <sylvain.thenault@logilab.fr> wrote:
On 25 novembre 11:22, Ron Adam wrote:
On 11/25/2010 08:30 AM, Emile Anclin wrote:
hello,
working on Pylint, we have a lot of voluntary corrupted files to test Pylint behavior; for instance
$ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py # -*- coding: IBO-8859-1 -*- """ check correct unknown encoding declaration """
__revision__ = 'éééé'
and we try to find that module : find_module('func_unknown_encoding', None). But python3 raises SyntaxError in that case ; it didn't raise SyntaxError on python2 nor does so on our func_nonascii_noencoding and func_wrong_encoding modules (with obvious names)
Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information.
from imp import find_module find_module('func_unknown_encoding', None) Traceback (most recent call last): File "<stdin>", line 1, in<module> SyntaxError: encoding problem: with BOM
I don't think there is a clear reason by design. Also try importing the same modules directly and noting the differences in the errors you get.
IMO the point is that we can consider as a bug the fact that find_module tries to somewhat read the content of the file, no? Though it seems to only doing this for encoding detection or like since find_module doesn't choke on a module containing another kind of syntax error.
So the question is, should we deal with this in pylint/astng, or can we expect this to be fixed at some point?
Considering these semantics changed between Python 2 and 3 w/o a discernable benefit (I would consider it a negative as finding a module should not be impacted by syntactic correctness; the full act of importing should be the only thing that cares about that), I would consider it a bug that should be filed.

On 11/29/2010 01:22 PM, Brett Cannon wrote:
On Mon, Nov 29, 2010 at 03:53, Sylvain Thénault <sylvain.thenault@logilab.fr> wrote:
On 25 novembre 11:22, Ron Adam wrote:
On 11/25/2010 08:30 AM, Emile Anclin wrote:
hello,
working on Pylint, we have a lot of voluntary corrupted files to test Pylint behavior; for instance
$ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py # -*- coding: IBO-8859-1 -*- """ check correct unknown encoding declaration """
__revision__ = 'éééé'
and we try to find that module : find_module('func_unknown_encoding', None). But python3 raises SyntaxError in that case ; it didn't raise SyntaxError on python2 nor does so on our func_nonascii_noencoding and func_wrong_encoding modules (with obvious names)
Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information.
>from imp import find_module > find_module('func_unknown_encoding', None) Traceback (most recent call last): File "<stdin>", line 1, in<module> SyntaxError: encoding problem: with BOM
I don't think there is a clear reason by design. Also try importing the same modules directly and noting the differences in the errors you get.
IMO the point is that we can consider as a bug the fact that find_module tries to somewhat read the content of the file, no? Though it seems to only doing this for encoding detection or like since find_module doesn't choke on a module containing another kind of syntax error.
So the question is, should we deal with this in pylint/astng, or can we expect this to be fixed at some point?
Considering these semantics changed between Python 2 and 3 w/o a discernable benefit (I would consider it a negative as finding a module should not be impacted by syntactic correctness; the full act of importing should be the only thing that cares about that), I would consider it a bug that should be filed.
The output of imp.find_module() returns an open file io object, and it's output feeds directly into to imp.load_module().
imp.find_module('pydoc') (<_io.TextIOWrapper name=4 encoding='utf-8'>, '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1))
So I think the imp.find_module() is suppose to be used when you *do* want to do the full act of importing and not for just finding out if or where module xyz exists. Ron

On 29 novembre 14:21, Ron Adam wrote:
On 11/29/2010 01:22 PM, Brett Cannon wrote:
Considering these semantics changed between Python 2 and 3 w/o a discernable benefit (I would consider it a negative as finding a module should not be impacted by syntactic correctness; the full act of importing should be the only thing that cares about that), I would consider it a bug that should be filed.
The output of imp.find_module() returns an open file io object, and it's output feeds directly into to imp.load_module().
imp.find_module('pydoc') (<_io.TextIOWrapper name=4 encoding='utf-8'>, '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1))
So I think the imp.find_module() is suppose to be used when you *do* want to do the full act of importing and not for just finding out if or where module xyz exists.
in python 2, find_module was usable for such usage, and this is a needed api for a tool like pylint. Is there another way to do so with python 3? -- Sylvain Thénault LOGILAB, Paris (France) Formations Python, Debian, Méth. Agiles: http://www.logilab.fr/formations Développement logiciel sur mesure: http://www.logilab.fr/services CubicWeb, the semantic web framework: http://www.cubicweb.org

On Tue, Nov 30, 2010 at 00:34, Sylvain Thénault <sylvain.thenault@logilab.fr> wrote:
On 29 novembre 14:21, Ron Adam wrote:
On 11/29/2010 01:22 PM, Brett Cannon wrote:
Considering these semantics changed between Python 2 and 3 w/o a discernable benefit (I would consider it a negative as finding a module should not be impacted by syntactic correctness; the full act of importing should be the only thing that cares about that), I would consider it a bug that should be filed.
The output of imp.find_module() returns an open file io object, and it's output feeds directly into to imp.load_module().
imp.find_module('pydoc') (<_io.TextIOWrapper name=4 encoding='utf-8'>, '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1))
So I think the imp.find_module() is suppose to be used when you *do* want to do the full act of importing and not for just finding out if or where module xyz exists.
in python 2, find_module was usable for such usage, and this is a needed api for a tool like pylint. Is there another way to do so with python 3?
At the moment, no. Best option would be to create an importlib.find_module function which returns a loader if the module is found, else returns None. The loader can have its get_source method called to read the source code (w/o verification). I have this planned for Python 3.3 but not 3.2 with us so close to 3.2b1.
-- Sylvain Thénault LOGILAB, Paris (France) Formations Python, Debian, Méth. Agiles: http://www.logilab.fr/formations Développement logiciel sur mesure: http://www.logilab.fr/services CubicWeb, the semantic web framework: http://www.cubicweb.org
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org

On Mon, Nov 29, 2010 at 12:21, Ron Adam <rrr@ronadam.com> wrote:
On 11/29/2010 01:22 PM, Brett Cannon wrote:
On Mon, Nov 29, 2010 at 03:53, Sylvain Thénault <sylvain.thenault@logilab.fr> wrote:
On 25 novembre 11:22, Ron Adam wrote:
On 11/25/2010 08:30 AM, Emile Anclin wrote:
hello,
working on Pylint, we have a lot of voluntary corrupted files to test Pylint behavior; for instance
$ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py # -*- coding: IBO-8859-1 -*- """ check correct unknown encoding declaration """
__revision__ = 'éééé'
and we try to find that module : find_module('func_unknown_encoding', None). But python3 raises SyntaxError in that case ; it didn't raise SyntaxError on python2 nor does so on our func_nonascii_noencoding and func_wrong_encoding modules (with obvious names)
Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information.
> > >from imp import find_module >> >> find_module('func_unknown_encoding', None)
Traceback (most recent call last): File "<stdin>", line 1, in<module> SyntaxError: encoding problem: with BOM
I don't think there is a clear reason by design. Also try importing the same modules directly and noting the differences in the errors you get.
IMO the point is that we can consider as a bug the fact that find_module tries to somewhat read the content of the file, no? Though it seems to only doing this for encoding detection or like since find_module doesn't choke on a module containing another kind of syntax error.
So the question is, should we deal with this in pylint/astng, or can we expect this to be fixed at some point?
Considering these semantics changed between Python 2 and 3 w/o a discernable benefit (I would consider it a negative as finding a module should not be impacted by syntactic correctness; the full act of importing should be the only thing that cares about that), I would consider it a bug that should be filed.
The output of imp.find_module() returns an open file io object, and it's output feeds directly into to imp.load_module().
imp.find_module('pydoc') (<_io.TextIOWrapper name=4 encoding='utf-8'>, '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1))
So I think the imp.find_module() is suppose to be used when you *do* want to do the full act of importing and not for just finding out if or where module xyz exists.
Going with your line of argument, why can't imp.load_module be the call that figures out there is a syntax error? If you look at this from the perspective of PEP 302, finding a module has absolutely nothing to do with the validity of the found source, just that something was found somewhere which (hopefully) contains code that represents the module.

On 11/30/2010 01:41 PM, Brett Cannon wrote:
On Mon, Nov 29, 2010 at 12:21, Ron Adam<rrr@ronadam.com> wrote:
On 11/29/2010 01:22 PM, Brett Cannon wrote:
On Mon, Nov 29, 2010 at 03:53, Sylvain Thénault <sylvain.thenault@logilab.fr> wrote:
On 25 novembre 11:22, Ron Adam wrote:
On 11/25/2010 08:30 AM, Emile Anclin wrote:
hello,
working on Pylint, we have a lot of voluntary corrupted files to test Pylint behavior; for instance
$ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py # -*- coding: IBO-8859-1 -*- """ check correct unknown encoding declaration """
__revision__ = 'éééé'
and we try to find that module : find_module('func_unknown_encoding', None). But python3 raises SyntaxError in that case ; it didn't raise SyntaxError on python2 nor does so on our func_nonascii_noencoding and func_wrong_encoding modules (with obvious names)
Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >> >> >from imp import find_module >>> >>> find_module('func_unknown_encoding', None)
Traceback (most recent call last): File "<stdin>", line 1, in<module> SyntaxError: encoding problem: with BOM
I don't think there is a clear reason by design. Also try importing the same modules directly and noting the differences in the errors you get.
IMO the point is that we can consider as a bug the fact that find_module tries to somewhat read the content of the file, no? Though it seems to only doing this for encoding detection or like since find_module doesn't choke on a module containing another kind of syntax error.
So the question is, should we deal with this in pylint/astng, or can we expect this to be fixed at some point?
Considering these semantics changed between Python 2 and 3 w/o a discernable benefit (I would consider it a negative as finding a module should not be impacted by syntactic correctness; the full act of importing should be the only thing that cares about that), I would consider it a bug that should be filed.
The output of imp.find_module() returns an open file io object, and it's output feeds directly into to imp.load_module().
imp.find_module('pydoc') (<_io.TextIOWrapper name=4 encoding='utf-8'>, '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1))
So I think the imp.find_module() is suppose to be used when you *do* want to do the full act of importing and not for just finding out if or where module xyz exists.
Going with your line of argument, why can't imp.load_module be the call that figures out there is a syntax error? If you look at this from the perspective of PEP 302, finding a module has absolutely nothing to do with the validity of the found source, just that something was found somewhere which (hopefully) contains code that represents the module.
The part that I'm looking at, is what would find_module return if the encoding is bad or not found for the encoding? <_io.TextIOWrapper name=4 encoding='bad_encoding'> Maybe we could have some library introspection function in the inspect for just looking in the library rather than loading modules. But I think those would have the same issues, as packages need to be loaded in order to find sub modules.* * It almost seems like the concept of a sub-module (in a package) is flawed. I'm not sure I can explain what causes me to feel that way at the moment though. Ron

On Wed, Dec 1, 2010 at 8:48 AM, Ron Adam <rrr@ronadam.com> wrote:
* It almost seems like the concept of a sub-module (in a package) is flawed. I'm not sure I can explain what causes me to feel that way at the moment though.
It isn't flawed, it is just a *lot* more complicated than most people realise (cf. PEP 302). In this case, the signature of find_module (returning an already open file) is unfortunate, but probably necessary given the way the import internals currently work. As Brett says, returning a loader would be preferable, but the builtin import machinery doesn't have proper loaders defined (and won't until we manage to get to the point where importlib *is* the import machinery). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 11/30/2010 07:19 PM, Nick Coghlan wrote:
On Wed, Dec 1, 2010 at 8:48 AM, Ron Adam<rrr@ronadam.com> wrote:
* It almost seems like the concept of a sub-module (in a package) is flawed. I'm not sure I can explain what causes me to feel that way at the moment though.
It isn't flawed, it is just a *lot* more complicated than most people realise (cf. PEP 302).
Yes, it's realising that it is a *lot* more *complicated*, that gets me. Flawed isn't the right word, it's rather a feeling things could have been simpler if perhaps some things were done differently. Here is the gist of ideas I got from these feelings. (Food for thought and YMMV and all that.) Python doesn't have a nice way to define a collection of modules that isn't also a package. So we have packages used to organise modules, and packages inside other packages. A collection of modules wouldn't require importing a package before importing a module in it. Another idea is, to have a way to split a large module into files, and have it still *be* a module, and not a package. And also be able to tell what is what, by looking at the directory structure. The train of thought these things came from is, how can we get back to having the directory tree have enough info in it so it's clear what is what? And how can we avoid some of the *interdependent* nesting?
In this case, the signature of find_module (returning an already open file) is unfortunate, but probably necessary given the way the import internals currently work. As Brett says, returning a loader would be preferable, but the builtin import machinery doesn't have proper loaders defined (and won't until we manage to get to the point where importlib *is* the import machinery).
I'll be looking forward to the new loaders. :-) Cheers, Ron

On Wed, Dec 1, 2010 at 3:59 PM, Ron Adam <rrr@ronadam.com> wrote:
Yes, it's realising that it is a *lot* more *complicated*, that gets me. Flawed isn't the right word, it's rather a feeling things could have been simpler if perhaps some things were done differently.
*That* feeling I can understand. The import system has steadily acquired features over time, with each addition constrained by backwards compatibility concerns with all the past additions, including the exotic hacks people were using to fake features that were added more cleanly later. For the directory-as-module-not-package idea, you could probably implement a PEP 302 importer/loader that did that (independent of the stdlib). It would have the advantage of avoiding a lot of the pickle compatibility problems that a "flat package" like the new unittest layout can cause. However, you would need to be very careful with it, since all the files would be sharing a common globals() namespace. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
For the directory-as-module-not-package idea ... you would need to be very careful with it, since all the files would be sharing a common globals() namespace.
One of the things I like about Python's module system is that once I know which module a name was imported from, I also know which file to look in for its definition. If a module can be spread over several files, that feature would be lost. -- Greg

On Wed, Dec 1, 2010 at 8:22 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Nick Coghlan wrote:
For the directory-as-module-not-package idea ... you would need to be very careful with it, since all the files would be sharing a common globals() namespace.
One of the things I like about Python's module system is that once I know which module a name was imported from, I also know which file to look in for its definition. If a module can be spread over several files, that feature would be lost.
There are many potential problems with the idea, I just chose to mention one of the ones that could easily make the affected code *break* :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 12/01/2010 04:39 AM, Nick Coghlan wrote:
On Wed, Dec 1, 2010 at 8:22 PM, Greg Ewing<greg.ewing@canterbury.ac.nz> wrote:
Nick Coghlan wrote:
For the directory-as-module-not-package idea ... you would need to be very careful with it, since all the files would be sharing a common globals() namespace.
One of the things I like about Python's module system is that once I know which module a name was imported from, I also know which file to look in for its definition. If a module can be spread over several files, that feature would be lost.
There are many potential problems with the idea, I just chose to mention one of the ones that could easily make the affected code *break* :)
Right. It would require additional pieces as well. Ron :-)

On Monday 29 November 2010 20:22:22 Brett Cannon wrote:
Considering these semantics changed between Python 2 and 3 w/o a discernable benefit (I would consider it a negative as finding a module should not be impacted by syntactic correctness; the full act of importing should be the only thing that cares about that), I would consider it a bug that should be filed.
ok, here it is : http://bugs.python.org/issue10588 Since I did not understand all of it, I just quoted Brett Cannon in the ticket. -- Emile Anclin <emile.anclin@logilab.fr> http://www.logilab.fr/ http://www.logilab.org/ Informatique scientifique & et gestion de connaissances
participants (6)
-
Brett Cannon
-
Emile Anclin
-
Greg Ewing
-
Nick Coghlan
-
Ron Adam
-
Sylvain Thénault