Prevent importing yourself?

Hi, A common question we get in the #python IRC channel is, "I tried importing a module, but I get an AttributeError trying to use the things it said it provided." Turns out the beginner named their own file the same as the module they were trying to use. That is, they want to try (for example) the "azure" package. So they make a file called azure.py, and start with "import azure". The import succeeds, but it has none of the contents the documentation claims, because they have imported themselves. It's baffling, because they have used the exact statements shown in the examples, but it doesn't work. Could we make this a more obvious failure? Is there ever a valid reason for a file to import itself? Is this situation detectable in the import machinery? --Ned.

Ned Batchelder wrote:
Could we make this a more obvious failure? Is there ever a valid reason for a file to import itself?
I've done it occasionally, but only when doing something very unusual, and I probably wouldn't mind having to pull it out of sys.modules in cases like that. -- Greg

On Fri, Jan 29, 2016 at 05:42:18PM -0500, Ned Batchelder wrote:
I feel this is only a partial fix. I've been bitten by something like this, but not precisely like this. The difference in how I experienced this makes it enough for me to say: I don't think your suggestion is that useful. What I experienced was having collisions on the python-path, and modules from my codebase colliding with libraries in the stdlib (or outside it). For example, a library might import one of its dependencies which coincidentally had the same name as one of the libraries I have. Maybe a suggestion would be to add the path of the module to the error message? Currently the message is sjoerdjob$ cat json.py import json TEST = '{"foo": "bar"}' print(json.loads(TEST)) sjoerdjob$ python3.5 json.py Traceback (most recent call last): File "json.py", line 1, in <module> import json File "/Users/sjoerdjob/Development/spikes/importself/json.py", line 5, in <module> print(json.loads(TEST)) AttributeError: module 'json' has no attribute 'loads' But maybe the error could/should be AttributeError: module 'json' (imported from /Users/sjoerdjob/Development/spikes/importself/json.py) has no attribute 'loads'. As another corner case, consider the following: #json.py JSON_DATA = '{"foo": "bar"}' #mod_a.py import json def parse(blob): return json.loads(blob) #mod_b.py from json import JSON_DATA from mod_a import parse print(parse(JSON_DATA)) (Now, consider that instead of 'json' we chose a less common module name.). You still get the error `module 'json' has no attribute 'loads'`. In this case, I think it's more helpful to know the filename of the 'json' module. For me that'd sooner be a trigger to "What's going on.", because I haven't been bitten by the 'import self' issue as often as 'name collision in dependency tree'. (Of course, another option would be to look for other modules of the same name when you get an attribute-error on a module to aid debugging, but I think that's too heavy-weight.) Kind regards, Sjoerd Job

On Fri, Jan 29, 2016 at 6:42 PM, Sjoerd Job Postmus <sjoerdjob@sjec.nl> wrote:
Yes, your example is actually more likely to happen, and it happened to me many times. One reason is some of the stdlib module names are kind of commons. Once I defined my own random.py and then another time I had a requests.py which collided with requests library. I think the right solution is assume every import error needs some guidance, some hints. Don't just target a specific problem. Ned probably familiar with this, in the case of Ansible, if Ansible cannot resolve and locate the role you specify in the playbook, Ansible will complain and give this error message: ERROR: cannot find role in /current/path/roles/some-role or /current/path/some-role or /etc/ansible/roles/some-role So the import error should be more or less like this AttributeError: module 'json' has no attribute 'loads' Possible root causes: * json is not found in the current PYTHONPATH. Python tried /current/python/site-packages/json, /current/python/site-packages/json.py. For the full list of PYTHONPATH, please refer to THIS DOC ON PYTHON.ORG. * your current module has the same name as the module you intent to import. You can even simplify this to say possible cause please go to this doc on python.org and we can go verbose there. John

On 29 January 2016 at 23:42, Sjoerd Job Postmus <sjoerdjob@sjec.nl> wrote:
Another way that the error can arrive is if your script has the same name as an installed module that is indirectly imported. I commonly see my students choose the name "random.py" for a script which can lead to this problem: $ echo 'import urllib2' > random.py $ python random.py Traceback (most recent call last): File "random.py", line 1, in <module> import urllib2 File "/usr/lib/python2.7/urllib2.py", line 94, in <module> import httplib File "/usr/lib/python2.7/httplib.py", line 80, in <module> import mimetools File "/usr/lib/python2.7/mimetools.py", line 6, in <module> import tempfile File "/usr/lib/python2.7/tempfile.py", line 35, in <module> from random import Random as _Random ImportError: cannot import name Random To fully avoid this error you need to know every possible top-level module/package name and not use any of them as the name of your script. This would be avoided if module namespaces had some nesting e.g. 'import stdlib.random' but apparently flat is better than nested...
I think that would be an improvement. It would still be a problem for absolute beginners but at least the error message gives enough information to spot the problem. In general though I think it's unfortunate that it's possible to be able to override installed or even stdlib modules just by having a .py file with the same name in the same directory as the running script. I had a discussion with a student earlier today about why '.' is not usually on PATH for precisely this reason: you basically never want the ls (or whatever) command to run a program in the current directory. -- Oscar

On Sat, Jan 30, 2016 at 11:13 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
One solution would be to always work in a package. As of Python 3, implicit relative imports don't happen, so you should be safe. Maybe there could be a flag like -m that means "run as if current directory is a module"? You can change to a parent directory and run "python3 -m dir.file" to run dir/file.py; if "python3 -r file" could run file.py from the current directory (and assume the presence of an empty __init__.py if one isn't found), that would prevent all accidental imports - if you want to grab a file from right next to you, that's "from . import otherfile", which makes perfect sense. It'd be 100% backward compatible, as the new behaviour would take effect only if the option is explicitly given. Doable? ChrisA

On Jan 29, 2016, at 15:42, Sjoerd Job Postmus <sjoerdjob@sjec.nl> wrote:
Yes. The version of this I've seen most from novices is that they write a program named "json.py" that imports and uses requests, which tries to use the stdlib module json, which gives them an AttributeError on json.loads. (One of my favorite questions on StackOverflow came from a really smart novice who'd written a program called "time.py", and he got an error about time.time on one machine, but not another. He figured out that obviously, requests wants him to define his own time function, which he was able to do by using the stuff in datetime. And he figured out the probable difference between the two machines--the working one had an older version of requests. He just wanted to know why requests didn't document this new requirement that they'd added. :))
Maybe a suggestion would be to add the path of the module to the error message?
That would probably help, but think about what it entails: Most AttributeErrors aren't on module objects, they're on instances of user-defined classes with a typo, or on None because the user forgot a "return" somewhere, or on str because the user didn't realize the difference between the string representation of an object and the objects, etc. To make matters worse, AttributeError objects don't even carry the name of the object being attributed, so even if you wanted to make tracebacks do some magic if isinstance(obj, types.ModuleType), there's no way to do it. So, that means you'd have to make ModuleType.__getattr__ do the special error message formatting.
If that could be done only when the exception escapes to top level and dumps s traceback, that might be reasonable. And it would _definitely_ be helpful. But I don't think it's possible without major changes.

On 30 January 2016 at 00:16, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Oh yeah, good point. Somehow I read the AttributeError as an ImportError e.g. $ python random.py Traceback (most recent call last): File "random.py", line 1, in <module> import urllib2 File "/usr/lib/python2.7/urllib2.py", line 94, in <module> import httplib File "/usr/lib/python2.7/httplib.py", line 80, in <module> import mimetools File "/usr/lib/python2.7/mimetools.py", line 6, in <module> import tempfile File "/usr/lib/python2.7/tempfile.py", line 35, in <module> from random import Random as _Random ImportError: cannot import name Random That error message could be changed to something like ImportError: cannot import name Random from module 'random' (/home/oscar/random.py) Attribute errors would be more problematic. -- Oscar

On Fri, Jan 29, 2016 at 04:16:46PM -0800, Andrew Barnert wrote:
True. Most AttributeErrors are on user-defined classes with a typo. But that's not the case we're discussing here. Here we are discussing how a user should debug the effects of module name collisions, and the resulting AttributeError. I would expect it to be quite unlikely that two modules with the same name each have a class with the same name, and you accidentally initialize the wrong one. More likely (in my experience) is that you get an AttributeError on a module (in the case of module-name collisions).
To make matters worse, AttributeError objects don't even carry the name of the object being attributed, so even if you wanted to make tracebacks do some magic if isinstance(obj, types.ModuleType), there's no way to do it.
So, that means you'd have to make ModuleType.__getattr__ do the special error message formatting.
Yes, indeed. That's what I was thinking of. I decided to write up a quick hack that added the filename to the exception string. sjoerdjob$ ../python mod_a.py Traceback (most recent call last): File "mod_a.py", line 4, in <module> print(parse(JSON_DATA)) File "/home/sjoerdjob/dev/cpython/tmp/mod_b.py", line 4, in parse return json.loads(blob) AttributeError: module 'json' (loaded from /home/sjoerdjob/dev/cpython/tmp/json.py) has no attribute 'loads' Here's the patch, in case anyone is interested. diff --git a/Objects/moduleobject.c b/Objects/moduleobject.c index 24c5f4c..5cc144a 100644 --- a/Objects/moduleobject.c +++ b/Objects/moduleobject.c @@ -654,17 +654,25 @@ module_repr(PyModuleObject *m) static PyObject* module_getattro(PyModuleObject *m, PyObject *name) { - PyObject *attr, *mod_name; + PyObject *attr, *mod_name, *mod_file; attr = PyObject_GenericGetAttr((PyObject *)m, name); if (attr || !PyErr_ExceptionMatches(PyExc_AttributeError)) return attr; PyErr_Clear(); if (m->md_dict) { _Py_IDENTIFIER(__name__); mod_name = _PyDict_GetItemId(m->md_dict, &PyId___name__); if (mod_name) { - PyErr_Format(PyExc_AttributeError, + _Py_IDENTIFIER(__file__); + mod_file = _PyDict_GetItemId(m->md_dict, &PyId___file__); + if (mod_file && PyUnicode_Check(mod_file)) { + PyErr_Format(PyExc_AttributeError, + "module '%U' (loaded from %U) has no attribute '%U'", mod_name, mod_file, name); + } else { + PyErr_Format(PyExc_AttributeError, "module '%U' has no attribute '%U'", mod_name, name); + } return NULL; } else if (PyErr_Occurred()) { Unfortunately, I do think this might impose **some** performance issue, but on the other hand, I'd be inclined to think that attribute-errors on module objects are not that likely to begin with, except for typos and issues like these. (And of course the case that you have to support older versions of Python with a slower implementation, but you most often see those checks being done at the module-level, so it would only impact load-time and not running-time.) The added benefit would be quicker debugging when finally having posted to a forum: "Ah, I see from the message that the path of the module is not likely a standard-library path. Maybe you have a name collision? Check for files or directories named '<module name here>(.py)' in your working directory / project / ... .
No, indeed, that was also my expectation: helpful, but too big a hassle to be worth it.

On Jan 29, 2016, at 22:58, Sjoerd Job Postmus <sjoerdjob@sjec.nl> wrote:
Right. So my point is, either we have to do the extra work in module.__getattr__ when formatting the string, or we have to extend the interface of AttributeError to carry more information in general (the object and attr name, presumably). The latter may be better, but it's also clearly not going to happen any time soon. (People have been suggesting since before 3.0 that all the standard exceptions should have more useful info, but nobody's volunteered to change the hundreds of lines of C code, Python code, and docs to do it...) So, the only argument against your idea I can see is the potential performance issues. Which should be pretty easy to dismiss with a microbenchmark showing it's pretty small even in the worst case and a macrobenchmark showing it's not even measurable in real code, right?

On Jan 30, 2016 6:45 PM, "Andrew Barnert via Python-ideas" < python-ideas@python.org> wrote:
wrote:
Pep 473 centralizes all of this requests. https://www.python.org/dev/peps/pep-0473/. I started adding support for name error, the most simple change and it turned out to be much more complex as I had thought. I had a couple of tests which were failing and didn't have the bandwidth to debug.
So, the only argument against your idea I can see is the potential
performance issues. Which should be pretty easy to dismiss with a microbenchmark showing it's pretty small even in the worst case and a macrobenchmark showing it's not even measurable in real code, right?

On 30 January 2016 at 08:42, Ned Batchelder <ned@nedbatchelder.com> wrote:
We could potentially detect when __main__ is being reimported under a different name and issue a user visible warning when it happens, but we can't readily detect a file importing itself in the general case (since it may be an indirect circular reference rather than a direct). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 1/30/16 4:30 AM, Nick Coghlan wrote:
I thought about the indirect case, and for the errors I'm trying to make clearer, the direct case is plenty. While we're at it though, re-importing __main__ is a separate kind of behavior that is often a problem, since it means you'll have the same classes defined twice. --Ned.
Cheers, Nick.

On 30 January 2016 at 21:19, Ned Batchelder <ned@nedbatchelder.com> wrote:
In that case, the only problem I see off the top of my head with emitting a warning for direct self-imports is that it would rely on import system behaviour we're currently trying to reduce/minimise: the import machinery needing visibility into the globals for the module initiating the import. It's also possible that by the time we get hold of the __spec__ for the module being imported, we've already dropped our reference to the importing module's globals, so we can't check against __file__ any more. However, I'd need to go read the code to remember how quickly we get to extracting just the globals of potential interest.
Right, but it combines with the name shadowing behaviour to create a *super* confusing combination when you write a *script* that shadows the name of a standard library module: http://python-notes.curiousefficiency.org/en/latest/python_concepts/import_t... Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 30 January 2016 at 11:57, Nick Coghlan <ncoghlan@gmail.com> wrote:
Maybe this is because I don't really understand how the import machinery works but I would say that if I run $ python random.py Then the interpreter should be able to know that __main__ is called "random" and know the path to that file. It should also be evident if '' is at the front of sys.path then "import random" is going to import that same module. Why is it difficult to detect that case? I think it would be better to try and solve the problem a little more generally though. Having yesterday created a file called random.py in my user directory (on Ubuntu 15.04) I get the following today: $ cat random.py import urllib2 $ python3 Python 3.4.3 (default, Mar 26 2015, 22:03:40) [GCC 4.9.2] on linux Type "help", "copyright", "credits" or "license" for more information.
Original exception was: File "<stdin>", line 1 SyntaxError: starred assignment target must be in a list or tuple -- Oscar

On 30 January 2016 at 23:20, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Yes, this is the case I originally said we could definitely detect. The case I don't know if we can readily detect is the one where a module *other than __main__* is imported a second time under a different name. However, I'm not sure that latter capability would be at all useful, so it probably doesn't matter whether or not it's feasible. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Jan 30, 2016 at 06:19:35AM -0500, Ned Batchelder wrote:
As far as I can tell, importing __main__ is fine. It's only when you import __main__ AND the main module under its real name at the same time that you can run into problems -- and even then, not always. The sort of errors I've seen involve something like this: import myscript import __main__ # this is actually myscript a = myscript.TheClass() # later assert isinstance(a, __main__.TheClass) which fails, because myscript and __main__ don't share state, despite actually coming from the same source file. So I think it's pretty rare for something like this to actually happen. I've never seen it happen by accident, I've only seen it done deliberately as a counter-example to to the "modules are singletons" rule. -- Steve

On Sat, Jan 30, 2016 at 2:47 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Not only is importing __main__ fine, it's actually unavoidable... __main__ is just an alias to the main script's namespace. -- test_main.py -- class Foo: pass import __main__ # Prints "True" print(Foo is __main__.Foo) -- end test_main.py -- So importing __main__ never creates any new copies of any singletons; it's only importing the main script under its filesystem name that creates the problem. -n -- Nathaniel J. Smith -- https://vorpus.org

On 1/30/16 5:47 PM, Steven D'Aprano wrote:
Something like this does happen in the real world. A class is defined in the main module, and then the module is later imported with its real name. Now you have __main__.Class and module.Class both defined. You don't need to actually "import __main__" for it to happen. __main__.Class is used implicitly from the main module simply as Class. --Ned.

On Sat, Jan 30, 2016 at 07:58:49PM -0500, Ned Batchelder wrote:
Ah, yes of course you are correct, you don't need an explicit "import __main__", but you do need an explicit "import module" somewhere. I think that in order to have a problem, you need something like this set of circumstances: (1) You need a module which is intended to be used as BOTH an executable script and an importable library. (2) Your module ends up directly or indirectly importing itself when running as __main__. (3) Your code relies on the "module is a singleton" invariant that you have just violated. (If you ever do something like `if type(obj) is MyClass`, you're relying on that invariant.) (If your module is never __main__, you have no problem. If your module is always __main__ and never imported under the real file name, you have no problem. But if it is both, you may have a problem.) I'm prepared to believe that actually diagnosing this error can be difficult, especially for those who have never come across this before. But I don't think it is especially common. Rather than trying to ban self imports, could we change sys.modules so it caches the module object under *both* the original name and '__main__'? # running "script.py" as the main module sys.modules['__main__'] = sys.modules['script'] = module_object # now "import script" will pick up the same module object as __main__ If the script name isn't a valid identifier, there's no need for the second entry, since it can't be imported. (I have a vague feeling I've already asked this question before, but I can't find any mail with it. If I have asked it, and it's been answered, my apologies. What was the answer again?) -- Steve

(Wow, I'm late to this thread.) FTR, the underlying cause is that a script's directory is prepended to sys.path. [1] Otherwise importing your script wouldn't be a problem nearly as often. This is something I've hoped we could address at some point. [2] Note that PEP 395 [3] aimed to help with a number of related issues and PEP 432 [4] may still help make it easier to sort out interpreter-startup-related matters like this. Personally I'm still in favor of deprecating "sys.path[0] auto-initialisation", but I'm not gonna hold my breath on that. :) -eric [1] https://hg.python.org/cpython/file/7e48300c7f3b/Modules/main.c#l251 https://hg.python.org/cpython/file/7e48300c7f3b/Lib/runpy.py#l258 [2] http://bugs.python.org/issue13475 [3] http://www.python.org/dev/peps/pep-0395/ [4] http://www.python.org/dev/peps/pep-0432/ On Fri, Jan 29, 2016 at 3:42 PM, Ned Batchelder <ned@nedbatchelder.com> wrote:
participants (14)
-
Andrew Barnert
-
Chris Angelico
-
Eric Snow
-
Greg Ewing
-
John Wong
-
Nathaniel Smith
-
Ned Batchelder
-
Nick Coghlan
-
Oscar Benjamin
-
Ryan Gonzalez
-
Sebastian Kreft
-
Sjoerd Job Postmus
-
Steven D'Aprano
-
Terry Reedy