Re: [Python-ideas] [Python-Dev] If you shadow a module in the standard library that IDLE depends on, bad things happen

CC'ing Python-Ideas. Follow-ups to Python-Ideas please. On Thu, Oct 29, 2015 at 09:22:15PM -0400, Terry Reedy wrote:
Terry is right. Shadowing should be possible, and it should require a deliberate decision on the part of the programmer. Consider the shell, say, bash or similar. My understanding is that the shell PATH deliberately excludes the current directory because of the possibility of malicious software shadowing usual commands in /bin etc. If you want to run an executable in the current directory, you have to explicitly provide the path to it: ./myscript rather than just myscript. Now Python isn't exactly the shell, and so I'm not proposing that Python does the same thing. But surely we can agree on the following? - Shadowing explicitly installed packages, including the stdlib, is *occasionally* useful. - But when shadowing occurs, it is *nearly always* accidental. - Such accidental shadowing often causes problems. - And further more, debugging shadowing problems is sometimes tricky even for experienced coders, and almost impossible for beginners. (It's not until you've been burned once or thrice by shadowing that you recognise the symptoms, at which point it is then usually easy to debug.) - Hence, we should put the onus on those who want to shadow installed packages) to do so *explicitly*, or at least make it easier to avoid accidental shadowing. I propose the following two changes: (1) Beginning with Python 3.6, the default is that the current directory is put at the end of sys.path rather than the beginning. Instead of: >>> print(sys.path) ['', '/this', '/that', '/another'] we will have this instead: >>> print(sys.path) ['/this', '/that', '/another', ''] Those who don't shadow installed packages won't notice any difference. Scripts which deliberately or unintentionally shadow installed packages will break from this change. I don't have a problem with this. You can't fix harmful behaviour without breaking code that depends on that harmful behaviour. Additionally, I expect that those who rely on the current behaviour will be in a small minority, much fewer than those who will be bitten by accidental shadowing into the indefinite future. And if you want the old behaviour back, it is easy to do so, by changing the path before doing your imports: import sys if sys.path[-1] == "": sys.path = [""] + sys.path[:-1] or equivalent. I do not belive that it is onerous for those who want shadowing to have to take steps to do so explicitly. That can be added to your scripts on a case-by-case basis, or your PYTHONSTARTUP file, by modifying your site.py, or (I think) by putting the code into the sitecustomize or usercustomize modules. (2) IDLE doesn't need to wait for Python 3.6 to make this change. I believe that IDLE is permitted to make backwards incompatible changes in minor releases, so there is no reason why it can't change the path effective immediately. That's a simpler fix than scanning the entire path, raising warnings (which beginners won't understand and will either ignore or panic over) or other complex solutions. It may not prevent *every* shadowing incident, but it will improve the situation immeasurably. Thoughts? -- Steve

On 01.11.15 08:06, Steven D'Aprano wrote:
Unfortunately this is not such small minority https://code.openhub.net/search?s=%22sys.path.pop(0)%22&p=0 But we can workaround this problem by adding harmless path at the strt of sys.path. sys.path.insert(0, '/non-existing-stub-path') or sys.path.insert(0, sys.path[0])

On Sun, Nov 01, 2015 at 08:41:03AM +0200, Serhiy Storchaka wrote:
The search results contain MANY duplicates. For example, in the first ten results, there are three duplicates of "common.py" from "kongove's autotest", and two duplicates of "common.py" from "Chromium OS". The first hit does: sys.path.insert(0, path.dirname(__file__)) import objects from objects import constants sys.path.pop(0) which seems to be a very common pattern: insert something into the start of the path, then pop it out later. That's harmless, and won't be effected by shifting where "" in inserted. So I think that this search is not a good test for code that will be effected. Besides, anyone who unconditionally pops the first item from sys.path is already on shakey ground. You should not assume that the first item will always be "", since it may have been changed before your code runs, e.g. by the PYTHONSTARTUP file, usercustomize, etc. Now, we shouldn't break people's code for no good reason, not even it it is already broken, but we have a good reason: having "" at the start of sys.path breaks code that inadvertently shadows other modules. (And it may even be a security risk.) -- Steve

On 11/1/2015 1:06 AM, Steven D'Aprano wrote:
Serhiy pointed out that code that unconditionally executes sys.path.pop(0) to avoid shadowing of its stdlib imports might break if it imports from what is now sys.path[1]. This currently (2.7, 3.4, 3.5) is <somewhere>pythonxy.zip. (In 2.7 and 3.4 on Windows, somewhere = /windows/system32, In 3.5, the install directory). I say 'might' because I don't know if the presence of pythonxy.zip always means that <installdir>/lib is absent. If it is gone, the imports will fail. Since use of zip is rare, the transition might be doable. Adding "if sys.path[0] == '':" is backwark compatible. If the import system had available the set of /lib modules, as I proposed on python-list, then the error handler could diagnose the situation of a lib module import failing because of an existing pythonxy.zip not being on sys.path. I believe the set would also help for generating deprecation messages.
The question is whether it would be too onerous for those explicitly avoiding shadowing now to have to change their explicit code.
IDLE can change how it operates with its own code. I plan to add a conditional pop before IDLE does its own imports. However, IDLE is not free to change how user code is executed, except to better imitate CPython than it does now. Hence '' will have to be put back for user code execution. -- Terry Jan Reedy

On Sun, Nov 1, 2015 at 5:06 PM, Steven D'Aprano <steve@pearwood.info> wrote:
+1. As Serhiy says, though, this will additionally break scripts that _protect against_ shadowing. We could have an easily-recognized shim in slot zero, as a compatibility measure; not duplicating the next one, but a clearly invalid entry. I suggest None, which appears to work: rosuav@sikorsky:~$ python3 Python 3.6.0a0 (default:9ca59b3cc18b, Oct 16 2015, 15:25:11) [GCC 4.9.2] on linux Type "help", "copyright", "credits" or "license" for more information.
The None entry is happily ignored. Shim removal with "if not sys.path[0]: sys.path.pop(0)" will still remove it. But I had a quick look at the link Serhiy posted, and two of the three entries that I dug into were actually not removing the blank entry (the other removed it and replaced it with a hard path); they had previously inserted something at the beginning of sys.path, and were removing _that_ entry. They won't need to be changed. I did a similar search on GitHub and, again, most of the results weren't actually removing the '' entry from the beginning. So the None could be put there as a compatibility measure for 3.6, and then dropped in 3.7 - or even just left out altogether, so a small number of programs break. Would have been nice to do this at Python 3.0, as it'd have been on par with absolute imports and other edge cases. Oh well. ChrisA

In a message of Sun, 01 Nov 2015 17:06:30 +1100, "Steven D'Aprano" writes:
This is a bad idea, if you mean 'shadows anything in site-packages'. I write a perfectly good working program, which then silently breaks because somebody happens to install a site package with a name conflict with my code. Can you imagine being in the middle of writing and debugging such a thing and have everything start failing because suddenly your program isn't the one being found? How long is it going to take you to stop looking at your own code, and your own setup for the problem and begin looking at what packages got installed by anybody else sharing this machine, and what they are named? It is not necessarily going to make the teachers' lives any better. They will trade the confusion of 'things are acting strangely around here' for 'I just wrote a program and the stupid langauge cannot find it'. People whose code inadvertantly shadow something are better off with a warning about the potential problem. Laura

On Sun, Nov 1, 2015 at 6:53 PM, Laura Creighton <lac@openend.se> wrote:
I'm fairly sure Steven's talking about the standard library, not arbitrary packages people might happen to have installed. Whether he thought about it or not, the solution to both your problems seems to be this: sys.path = ['/usr/local/lib/python36.zip', '/usr/local/lib/python3.6', '/usr/local/lib/python3.6/plat-linux', '/usr/local/lib/python3.6/lib-dynload', '', '/home/rosuav/.local/lib/python3.6/site-packages', '/usr/local/lib/python3.6/site-packages'] Equivalently, for the system Python: sys.path = ['/usr/lib/python3.4', '/usr/lib/python3.4/plat-x86_64-linux-gnu', '/usr/lib/python3.4/lib-dynload', '', '/usr/local/lib/python3.4/dist-packages', '/usr/lib/python3/dist-packages'] So the current directory is after everything that's logically the standard library, but before things that would be installed separately (site-packages, dist-packages, etc). It'd no longer be possible to shadow wave.py, but you could shadow sqlalchemy.py. ChrisA

In a message of Sun, 01 Nov 2015 18:58:36 +1100, Chris Angelico writes:
But the kid who just wrote string.py or turtle.py will still have the 'why isn't this working at all?' experience instead of something that warns her what her problem is. Laura

On Sun, Nov 1, 2015 at 7:08 PM, Laura Creighton <lac@openend.se> wrote:
Right. The warning when you save a file of that name is still a useful thing; it's orthogonal to this, though. (FWIW I think the warning's a good idea, but it's no panacea.) ChrisA

In a message of Sun, 01 Nov 2015 20:02:23 +1100, Chris Angelico writes:
Adding warnings to IDLE when you save a file is a fine idea, and will help IDLE users avoid pain. I want to help _everybody_ with a change to Python, so that it issues a warning when you shadow something in the standard library. Something like: Warning: local file /u/lac/junk/string.py shadows module named string in the Standard Library Note, for somebody implementing this. Teachers will find the message slightly more useful if it prints the full path name. Laura

On Sun, Nov 01, 2015 at 10:49:49AM +0100, Laura Creighton wrote:
Adding warnings to IDLE when you save a file is a fine idea, and will help IDLE users avoid pain.
IDLE users must be different from most computer users then, because most computer users don't read warnings or errors. http://ignorethecode.net/blog/2008/10/31/nobody-reads/ http://stackoverflow.com/questions/125269/how-would-you-handle-users-who-don...
It's not enough to protect the stdlib. I've seen people accidently shadow numpy, or other third-party modules. -- Steve

In a message of Sun, 01 Nov 2015 22:42:50 +1100, "Steven D'Aprano" writes:
This is actually my old field of research. The reason that most users don't read dialog boxes is that they are using operating systems who chat them up using dialog boxes all the time. They are used to having a dialog box mean 'the computer is working as usual'. They do not associate this with warnings or errors. Thus they are oblivious to them. This is the argument against running your C code with lint every single time. People get used to seeing: __Warning: long assignment may lose accuracy__ treat it as noise, and then when you actually want them to find such errors are unable to find them. It's quite astonishing. They can read the lint output to you out loud, and _still_ not be able to do the exercise where they were to find rounding errors in their programs. However, when you divide people into groups and make one group use lint all the time, and tell the others not to use it, until they get to this question -- poof. The second group finds the rounding errors. So if, unknownst to me, lots and lots of people are shadowing the stdlib, then if we issue warnings we may blunt their ability to see warnings in general. And that would be a downside. But the upside is that all of the people who did this inadvertantly would get a warning that actually explained what is going on. And the other thing that is well understood is that people who are learning tend to see warnings and whatnot -- simply because they haven't had the time and the experience to file the warnings as 'just more noise'.
Yes, but I very much don't want any warnings when you shadow third party modules. If shadowing third party modules produce a warning, then we will end up with 'all warnings get ignored' in very short order. We want warnings to be rare, rare enough that people don't get used to seeing them. If you can point me at a large community of regular intentional shadowers of the standard library, that quite likely would be enough for me to think that warning people is a bad idea -- having people get into the habit of ignoring warnings is a really dreadful outcome. But the argument isn't 'we shouldn't issue warnings because people don't read them' but rather 'we shouldn't make warnings commonplace because that makes people incapable of reading them'.
-- Steve
(This ignores a different group of learners who have been studied, people who cannot learn and read at the same time. Only some of these people have the sorts of problems that get called dyslexia. But video tutorials are making life a lot easier for these people nowadays. And it is hard to see how adding a warning could harm these learners.) Laura

On 01/11/15 09:49, Laura Creighton wrote:
I think that a solution to this could be something along the lines of having a command line switch (let's say '-w') or other way of enabling a "shadow module checking" mechanism. When an exception is thrown that is caught by the default handler and output to stderr, an additional line of output could be appended: "Note: This problem could be caused by a module that shadows a standard library or third party module. Run Python with the '-w' switch for more detail." I'm not sure if this would be suitable for _all_ exceptions or only those from certain points in the exception hierarchy. When the '-w' switch is enabled, the default exception handler would instead enumerate all active modules and, for any which has another module with the same name further down the search path than the one that is loaded, would output some sort of diagnostic message (i.e., a list of which loaded modules are shadowing _something_else_ that would have been loaded otherwise) - similar to your example below:
This would prevent "power users" from always having to see a potentially large diagnostic after each uncaught exception, but generally remind everyone that such a thing is possible and "here's the easy way to check for it". A bit like Valgrind's memcheck tool's "--leak-check=full" option - when run without it, if there are leaks then memcheck reminds the user that that's the way to start digging down into what might be causing them. E.

On Nov 1, 2015 14:45, "Erik" <python@lucidity.plus.com> wrote:
having a command line switch (let's say '-w') or other way of enabling a "shadow module checking" mechanism.
When an exception is thrown that is caught by the default handler and
output to stderr, an additional line of output could be appended:
"Note: This problem could be caused by a module that shadows a standard
library or third party module. Run Python with the '-w' switch for more detail."
I'm not sure if this would be suitable for _all_ exceptions or only those
from certain points in the exception hierarchy.
When the '-w' switch is enabled, the default exception handler would
instead enumerate all active modules and, for any which has another module with the same name further down the search path than the one that is loaded, would output some sort of diagnostic message (i.e., a list of which loaded modules are shadowing _something_else_ that would have been loaded otherwise) - similar to your example below:
Something like: Warning: local file /u/lac/junk/string.py shadows module named string in
the that's the way to start digging down into what might be causing them. I was thinking of something like this too. A list of stdlib modules could be created statically when python is originally built (if it isn't already). This would have no changes in behaviour unless an error actually happens. And maybe to avoid any performance cost the message can only be inserted when the interpreter actually exits, if that makes any sense at all.

On Sun, Nov 01, 2015 at 08:02:23PM +1100, Chris Angelico wrote:
I disagree. As shown by the tutor and python-list mailing lists, beginners don't read error messages. Warning them that they're about to save a file using the name of a stdlib module will be no different. They'll either ignore the warning, and still be in the dark as to why their code is broken, or they'll panic that they did something wrong, and possibly lose their as yet unsaved work. What produces the warning? Python, or the editor? If Python, you're annoying experienced programmers who intend to do what they do. If the editor, you do nothing about people using a different editor, or people who move and rename files in the shell. -- Steve

Honestly, shadowing modules is something that should be solved by renaming modules. If you are worrying about shadowing ONLY the standard library - guess what? those names don't change often, and are well known. Don't use those names. If you are talking about shadowing site-packages, or any package anywhere on sys.path, you are bound to break something somewhere in a hard to debug way. It is sufficient to allow things to break and instruct people to jump in the repl, import a module and check __file__. On 11/1/2015 02:58, Chris Angelico wrote:

On Mon, Nov 2, 2015 at 12:25 AM, Alexander Walters <tritium-list@sdamon.com> wrote:
Well known? Okay. No cheating now! Which of these names will shadow something from the standard library, and which don't? code.py cgi.py chunk.py cmd.py cprofile.py gc.py html.py imp.py mailbox.py numbers.py test.py this.py types.py wave.py Every one of these is a reasonably plausible name for a quick throw-away script (maybe a test script as you learn how to use something). How many of them will be problematic? ChrisA

On Mon, Nov 02, 2015 at 12:44:27AM +1100, Chris Angelico wrote:
Does it matter? It really only matters if you shadow something that gets imported. It's not exactly great practice to name your script "code.py", but if you do, and it doesn't directly or indirectly try to import the code library, it's harmless. That's why it is harmful to present users with an alert (which they won't read) warning them that they're shadowing a stdlib module. -- Steve

On Mon, Nov 2, 2015 at 2:06 AM, Steven D'Aprano <steve@pearwood.info> wrote:
It matters under two circumstances: 1) Currently, if the stdlib module ends up getting imported *even indirectly* by something you use. The indirect import is the subtle one here. 2) Under the proposal to reorder sys.path, if you want to import *your* module, you wouldn't be able to. I'm responding to the suggestion that the standard library names are "well known" to the point where people can be told not to use them. I say they are not; sure, everyone knows that 'sys.py' would be a bad name, and you wouldn't call your module "pprint.py" unless you deliberately want to shadow the normal standard library module, but there are a lot that are more obscure. Suppose you start learning about Python's data types, and you create a "types.py" that lets you play with strings, integers, floats, and so on. Then, in the next lesson, you meet enumerations, and enum.py. $ touch types.py; python3 -c "import enum" Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/local/lib/python3.6/enum.py", line 2, in <module> from types import MappingProxyType, DynamicClassAttribute ImportError: cannot import name 'MappingProxyType' Tell me, is this an implausible scenario? Should a person just learning about Python's type system have to also learn not to use the name "types.py"? ChrisA

In a message of Sun, 01 Nov 2015 08:25:54 -0500, Alexander Walters writes:
The problem is that the sort of people who make these errors don't know the names. Knowing that 'turtle.py' is a bad name for 'my very first program that does turtle graphics' is a more advanced skill than the turtle program writers, or even their teachers, can be expected to have. I'm willing to have the kids and their teachers (and all the other beginners) suffer if the downside is that an even only moderately-sized set of legitimate stdlib shadowers end up with an inability to notice warning messages, because they are now ubiquitous. I just think that legitimate stdlib shadowers are rare. And I suspect that many of them do this once, for a short period, not every day of their lives. Thus the 'ignore this noise' is much less likely to develop with them. Python warnings are rare. Choosing to add a new one is a serious step. I do not want to open the floodgates and end up warning about all possibly-silly practice -- I am one of the cognitive psychology boffins who go around asking/nudging/reminding people to warn less often if you want your most serious warnings to be heeded. I just think the tradeoffs here are worth it. Laura

On 11/1/2015 08:53, Laura Creighton wrote: pure detriment to the user. Wanting these warnings is wanting to disadvantage the new user - debugging shadowed names is a skill you need in your toolbox, and this deprives people of that vital early lesson.

On Sun, Nov 01, 2015 at 08:53:17AM +0100, Laura Creighton wrote:
In a message of Sun, 01 Nov 2015 17:06:30 +1100, "Steven D'Aprano" writes:
I'm willing to consider that "" should appear in the middle of sys.path: [standard-library-directories, "", site-package-directories] but that still allows accidental shadowing of third-party packages. I don't think that the problem of accidently shadowing numpy is less deserving of a solution than accidently shadowing random. They're both problems. The downside of using a search path, instead of explicit full pathnames, is that you can have conflicts where two or more modules have the same name, and only the first can be imported. That's unavoidable. We can only try to minimize the risk of accidental shadowing, not prevent it altogether.
Now just a minute, I'm not proposing that this change occur at random, in the middle of a debugging session. Not even in a point (micro) release. So the code will only break (if it breaks) when you upgrade from version 3.x to 3.6 or higher. Moving to a new minor version number is a big step, and people expect that things *could* break. (They don't necessarily expect that they *will* break, but it's certainly a possibility that they might.) As an experienced developer, what do you do when code that works in one version stops working in the next? Everyone I know immediately assumes that something has changed in the new Python version, and proceeds from there. In this case, they'll be right. Hopefully they will read the release notes. Or the documentation. Or ask on StackOverflow. The same applies for when you install a new package: "Everything worked fine yesterday." "What changed?" "Well, I installed a new package with the same name as one of my modules." "Well there you go." Of course we can come up with scenarios that are more confusing. Suppose you have a program that you don't use very often, say, once a year, that relies on "" being at the front of the path, and then somebody who isn't you installs a site-wide third-party package that now shadows one of your modules, and you don't find out about for six months. But this sort of thing can already happen: any time you add a third-party module, it may shadow *other* third party modules that are found later in the path. So I'm not creating a brand new failure mode, I'm just changing how it applies a little. There are all sorts of debugging strategies available to an experienced developer, starting from "print out module.__file__" to "use the debugger", which a beginner who has just written "turtle.py" would never think of. If I'm going to break anyone's code, I'd rather it be yours than theirs, since I have every confidence you can debug any problems fairly quickly. I don't suggest breaking working code lightly, but the issue of shadowing the stdlib is an ever-present millstone around every Python developer's neck. The stdlib alone, never mind site packages, is too big to expect everyone to memorise what names they should not use. We can't realistically expect people to avoid shadowing. But we can shift it from "easy to shadow, and mostly affect those who are least able to cope" to "harder to shadow, and mostly affect those who can cope". -- Steve

In a message of Mon, 02 Nov 2015 02:44:10 +1100, "Steven D'Aprano" writes:
And I do. seriously. I think that shadowing third party packages is something I do all the time. I am not willing to share a namespace with all the package writers in the universe, and should I write package name with the same name as somebody else's third party package, I damn well want mine, not some other one. But learners shadowing the stdlib are a separate case. We've had similar arguments before, the slipperly slope sort. Either you don't want to solve any unless all get solved, or else you are afraid that minor precedent here will lead to all hell breaking loose when it is extended to its ultimate conclusion. But I think the entire purpose of being a human being is to make such judgments. And I think that this is an easy one. Shadowing the stdlib can be treated as a catchable mistake. Shadowing an outside package is something we want to do all the time.
Ah, I think your thinking is less than clear on this one. 1. my sysadmin does an update and suddently the system python is a different one than the one where I wrote my code. some poor sysadmin soul who never wrote anything gets to deal with the fact that my code now has name conflicts with packages that we have, but I never knew about and never imported. 2. things fall over because the new verision of python, in effect, shadows my code. I am not willing to share my namespace with every other python package creator on the planet. I am a consultant. When my code breaks, they call me. I am unwilling to have my code break every time some human being decides to make a package named something I already used over the last 18 years of writing python code. (I know that 3.x has not been out that long, but 18 years from now, the same condition holds.) Laura

On Nov 1, 2015 7:44 AM, "Steven D'Aprano" <steve@pearwood.info> wrote:
On Sun, Nov 01, 2015 at 08:53:17AM +0100, Laura Creighton wrote:
In a message of Sun, 01 Nov 2015 17:06:30 +1100, "Steven D'Aprano"
writes:
Scripts which deliberately or unintentionally shadow installed packages will break from this change. I don't have a problem with this. You
can't
In this case there actually is a common use case [1] that involves issuing using the "" entry to intentionally shadow third-party packages: if you're hacking on foo v1.3.dev, and want to test out the code you're writing as opposed to the installed version of foo v1.2, then switching to the root of your source checkout before running python or nosetests or whatever suffices. (OTOH it is also the true that numpy has had enough users run into problems due to trying to run python while accidentally in the root of a source checkout that numpy/__init__.py contains special case code to check for this case and error out early.) -n [1] Commonality here is assessed using the standard experimental procedure, i.e., "*I* do it so it's common".

Steven D'Aprano wrote:
It will also break scripts that rely on looking at the first element of sys.path to find the directory they are running from. Such code is NOT relying on harmful behaviour. -- Greg

On 11/1/2015 6:22 PM, Greg Ewing wrote:
'' is not very informative ;-). I suspect the full path added in the past, but it is not now. import os; os.getcwd() works better -- Terry Jan Reedy

Terry Reedy wrote:
You misunderstand. The point is *not* to find the cwd, it's to find the directory containing the main script, so you can load resources related to it. There are probably other and possibly better ways to go about that, but it works and is sometimes used. The proposed change would break it. -- Greg

On 01.11.15 08:06, Steven D'Aprano wrote:
Unfortunately this is not such small minority https://code.openhub.net/search?s=%22sys.path.pop(0)%22&p=0 But we can workaround this problem by adding harmless path at the strt of sys.path. sys.path.insert(0, '/non-existing-stub-path') or sys.path.insert(0, sys.path[0])

On Sun, Nov 01, 2015 at 08:41:03AM +0200, Serhiy Storchaka wrote:
The search results contain MANY duplicates. For example, in the first ten results, there are three duplicates of "common.py" from "kongove's autotest", and two duplicates of "common.py" from "Chromium OS". The first hit does: sys.path.insert(0, path.dirname(__file__)) import objects from objects import constants sys.path.pop(0) which seems to be a very common pattern: insert something into the start of the path, then pop it out later. That's harmless, and won't be effected by shifting where "" in inserted. So I think that this search is not a good test for code that will be effected. Besides, anyone who unconditionally pops the first item from sys.path is already on shakey ground. You should not assume that the first item will always be "", since it may have been changed before your code runs, e.g. by the PYTHONSTARTUP file, usercustomize, etc. Now, we shouldn't break people's code for no good reason, not even it it is already broken, but we have a good reason: having "" at the start of sys.path breaks code that inadvertently shadows other modules. (And it may even be a security risk.) -- Steve

On 11/1/2015 1:06 AM, Steven D'Aprano wrote:
Serhiy pointed out that code that unconditionally executes sys.path.pop(0) to avoid shadowing of its stdlib imports might break if it imports from what is now sys.path[1]. This currently (2.7, 3.4, 3.5) is <somewhere>pythonxy.zip. (In 2.7 and 3.4 on Windows, somewhere = /windows/system32, In 3.5, the install directory). I say 'might' because I don't know if the presence of pythonxy.zip always means that <installdir>/lib is absent. If it is gone, the imports will fail. Since use of zip is rare, the transition might be doable. Adding "if sys.path[0] == '':" is backwark compatible. If the import system had available the set of /lib modules, as I proposed on python-list, then the error handler could diagnose the situation of a lib module import failing because of an existing pythonxy.zip not being on sys.path. I believe the set would also help for generating deprecation messages.
The question is whether it would be too onerous for those explicitly avoiding shadowing now to have to change their explicit code.
IDLE can change how it operates with its own code. I plan to add a conditional pop before IDLE does its own imports. However, IDLE is not free to change how user code is executed, except to better imitate CPython than it does now. Hence '' will have to be put back for user code execution. -- Terry Jan Reedy

On Sun, Nov 1, 2015 at 5:06 PM, Steven D'Aprano <steve@pearwood.info> wrote:
+1. As Serhiy says, though, this will additionally break scripts that _protect against_ shadowing. We could have an easily-recognized shim in slot zero, as a compatibility measure; not duplicating the next one, but a clearly invalid entry. I suggest None, which appears to work: rosuav@sikorsky:~$ python3 Python 3.6.0a0 (default:9ca59b3cc18b, Oct 16 2015, 15:25:11) [GCC 4.9.2] on linux Type "help", "copyright", "credits" or "license" for more information.
The None entry is happily ignored. Shim removal with "if not sys.path[0]: sys.path.pop(0)" will still remove it. But I had a quick look at the link Serhiy posted, and two of the three entries that I dug into were actually not removing the blank entry (the other removed it and replaced it with a hard path); they had previously inserted something at the beginning of sys.path, and were removing _that_ entry. They won't need to be changed. I did a similar search on GitHub and, again, most of the results weren't actually removing the '' entry from the beginning. So the None could be put there as a compatibility measure for 3.6, and then dropped in 3.7 - or even just left out altogether, so a small number of programs break. Would have been nice to do this at Python 3.0, as it'd have been on par with absolute imports and other edge cases. Oh well. ChrisA

In a message of Sun, 01 Nov 2015 17:06:30 +1100, "Steven D'Aprano" writes:
This is a bad idea, if you mean 'shadows anything in site-packages'. I write a perfectly good working program, which then silently breaks because somebody happens to install a site package with a name conflict with my code. Can you imagine being in the middle of writing and debugging such a thing and have everything start failing because suddenly your program isn't the one being found? How long is it going to take you to stop looking at your own code, and your own setup for the problem and begin looking at what packages got installed by anybody else sharing this machine, and what they are named? It is not necessarily going to make the teachers' lives any better. They will trade the confusion of 'things are acting strangely around here' for 'I just wrote a program and the stupid langauge cannot find it'. People whose code inadvertantly shadow something are better off with a warning about the potential problem. Laura

On Sun, Nov 1, 2015 at 6:53 PM, Laura Creighton <lac@openend.se> wrote:
I'm fairly sure Steven's talking about the standard library, not arbitrary packages people might happen to have installed. Whether he thought about it or not, the solution to both your problems seems to be this: sys.path = ['/usr/local/lib/python36.zip', '/usr/local/lib/python3.6', '/usr/local/lib/python3.6/plat-linux', '/usr/local/lib/python3.6/lib-dynload', '', '/home/rosuav/.local/lib/python3.6/site-packages', '/usr/local/lib/python3.6/site-packages'] Equivalently, for the system Python: sys.path = ['/usr/lib/python3.4', '/usr/lib/python3.4/plat-x86_64-linux-gnu', '/usr/lib/python3.4/lib-dynload', '', '/usr/local/lib/python3.4/dist-packages', '/usr/lib/python3/dist-packages'] So the current directory is after everything that's logically the standard library, but before things that would be installed separately (site-packages, dist-packages, etc). It'd no longer be possible to shadow wave.py, but you could shadow sqlalchemy.py. ChrisA

In a message of Sun, 01 Nov 2015 18:58:36 +1100, Chris Angelico writes:
But the kid who just wrote string.py or turtle.py will still have the 'why isn't this working at all?' experience instead of something that warns her what her problem is. Laura

On Sun, Nov 1, 2015 at 7:08 PM, Laura Creighton <lac@openend.se> wrote:
Right. The warning when you save a file of that name is still a useful thing; it's orthogonal to this, though. (FWIW I think the warning's a good idea, but it's no panacea.) ChrisA

In a message of Sun, 01 Nov 2015 20:02:23 +1100, Chris Angelico writes:
Adding warnings to IDLE when you save a file is a fine idea, and will help IDLE users avoid pain. I want to help _everybody_ with a change to Python, so that it issues a warning when you shadow something in the standard library. Something like: Warning: local file /u/lac/junk/string.py shadows module named string in the Standard Library Note, for somebody implementing this. Teachers will find the message slightly more useful if it prints the full path name. Laura

On Sun, Nov 01, 2015 at 10:49:49AM +0100, Laura Creighton wrote:
Adding warnings to IDLE when you save a file is a fine idea, and will help IDLE users avoid pain.
IDLE users must be different from most computer users then, because most computer users don't read warnings or errors. http://ignorethecode.net/blog/2008/10/31/nobody-reads/ http://stackoverflow.com/questions/125269/how-would-you-handle-users-who-don...
It's not enough to protect the stdlib. I've seen people accidently shadow numpy, or other third-party modules. -- Steve

In a message of Sun, 01 Nov 2015 22:42:50 +1100, "Steven D'Aprano" writes:
This is actually my old field of research. The reason that most users don't read dialog boxes is that they are using operating systems who chat them up using dialog boxes all the time. They are used to having a dialog box mean 'the computer is working as usual'. They do not associate this with warnings or errors. Thus they are oblivious to them. This is the argument against running your C code with lint every single time. People get used to seeing: __Warning: long assignment may lose accuracy__ treat it as noise, and then when you actually want them to find such errors are unable to find them. It's quite astonishing. They can read the lint output to you out loud, and _still_ not be able to do the exercise where they were to find rounding errors in their programs. However, when you divide people into groups and make one group use lint all the time, and tell the others not to use it, until they get to this question -- poof. The second group finds the rounding errors. So if, unknownst to me, lots and lots of people are shadowing the stdlib, then if we issue warnings we may blunt their ability to see warnings in general. And that would be a downside. But the upside is that all of the people who did this inadvertantly would get a warning that actually explained what is going on. And the other thing that is well understood is that people who are learning tend to see warnings and whatnot -- simply because they haven't had the time and the experience to file the warnings as 'just more noise'.
Yes, but I very much don't want any warnings when you shadow third party modules. If shadowing third party modules produce a warning, then we will end up with 'all warnings get ignored' in very short order. We want warnings to be rare, rare enough that people don't get used to seeing them. If you can point me at a large community of regular intentional shadowers of the standard library, that quite likely would be enough for me to think that warning people is a bad idea -- having people get into the habit of ignoring warnings is a really dreadful outcome. But the argument isn't 'we shouldn't issue warnings because people don't read them' but rather 'we shouldn't make warnings commonplace because that makes people incapable of reading them'.
-- Steve
(This ignores a different group of learners who have been studied, people who cannot learn and read at the same time. Only some of these people have the sorts of problems that get called dyslexia. But video tutorials are making life a lot easier for these people nowadays. And it is hard to see how adding a warning could harm these learners.) Laura

On 01/11/15 09:49, Laura Creighton wrote:
I think that a solution to this could be something along the lines of having a command line switch (let's say '-w') or other way of enabling a "shadow module checking" mechanism. When an exception is thrown that is caught by the default handler and output to stderr, an additional line of output could be appended: "Note: This problem could be caused by a module that shadows a standard library or third party module. Run Python with the '-w' switch for more detail." I'm not sure if this would be suitable for _all_ exceptions or only those from certain points in the exception hierarchy. When the '-w' switch is enabled, the default exception handler would instead enumerate all active modules and, for any which has another module with the same name further down the search path than the one that is loaded, would output some sort of diagnostic message (i.e., a list of which loaded modules are shadowing _something_else_ that would have been loaded otherwise) - similar to your example below:
This would prevent "power users" from always having to see a potentially large diagnostic after each uncaught exception, but generally remind everyone that such a thing is possible and "here's the easy way to check for it". A bit like Valgrind's memcheck tool's "--leak-check=full" option - when run without it, if there are leaks then memcheck reminds the user that that's the way to start digging down into what might be causing them. E.

On Nov 1, 2015 14:45, "Erik" <python@lucidity.plus.com> wrote:
having a command line switch (let's say '-w') or other way of enabling a "shadow module checking" mechanism.
When an exception is thrown that is caught by the default handler and
output to stderr, an additional line of output could be appended:
"Note: This problem could be caused by a module that shadows a standard
library or third party module. Run Python with the '-w' switch for more detail."
I'm not sure if this would be suitable for _all_ exceptions or only those
from certain points in the exception hierarchy.
When the '-w' switch is enabled, the default exception handler would
instead enumerate all active modules and, for any which has another module with the same name further down the search path than the one that is loaded, would output some sort of diagnostic message (i.e., a list of which loaded modules are shadowing _something_else_ that would have been loaded otherwise) - similar to your example below:
Something like: Warning: local file /u/lac/junk/string.py shadows module named string in
the that's the way to start digging down into what might be causing them. I was thinking of something like this too. A list of stdlib modules could be created statically when python is originally built (if it isn't already). This would have no changes in behaviour unless an error actually happens. And maybe to avoid any performance cost the message can only be inserted when the interpreter actually exits, if that makes any sense at all.

On Sun, Nov 01, 2015 at 08:02:23PM +1100, Chris Angelico wrote:
I disagree. As shown by the tutor and python-list mailing lists, beginners don't read error messages. Warning them that they're about to save a file using the name of a stdlib module will be no different. They'll either ignore the warning, and still be in the dark as to why their code is broken, or they'll panic that they did something wrong, and possibly lose their as yet unsaved work. What produces the warning? Python, or the editor? If Python, you're annoying experienced programmers who intend to do what they do. If the editor, you do nothing about people using a different editor, or people who move and rename files in the shell. -- Steve

Honestly, shadowing modules is something that should be solved by renaming modules. If you are worrying about shadowing ONLY the standard library - guess what? those names don't change often, and are well known. Don't use those names. If you are talking about shadowing site-packages, or any package anywhere on sys.path, you are bound to break something somewhere in a hard to debug way. It is sufficient to allow things to break and instruct people to jump in the repl, import a module and check __file__. On 11/1/2015 02:58, Chris Angelico wrote:

On Mon, Nov 2, 2015 at 12:25 AM, Alexander Walters <tritium-list@sdamon.com> wrote:
Well known? Okay. No cheating now! Which of these names will shadow something from the standard library, and which don't? code.py cgi.py chunk.py cmd.py cprofile.py gc.py html.py imp.py mailbox.py numbers.py test.py this.py types.py wave.py Every one of these is a reasonably plausible name for a quick throw-away script (maybe a test script as you learn how to use something). How many of them will be problematic? ChrisA

On Mon, Nov 02, 2015 at 12:44:27AM +1100, Chris Angelico wrote:
Does it matter? It really only matters if you shadow something that gets imported. It's not exactly great practice to name your script "code.py", but if you do, and it doesn't directly or indirectly try to import the code library, it's harmless. That's why it is harmful to present users with an alert (which they won't read) warning them that they're shadowing a stdlib module. -- Steve

On Mon, Nov 2, 2015 at 2:06 AM, Steven D'Aprano <steve@pearwood.info> wrote:
It matters under two circumstances: 1) Currently, if the stdlib module ends up getting imported *even indirectly* by something you use. The indirect import is the subtle one here. 2) Under the proposal to reorder sys.path, if you want to import *your* module, you wouldn't be able to. I'm responding to the suggestion that the standard library names are "well known" to the point where people can be told not to use them. I say they are not; sure, everyone knows that 'sys.py' would be a bad name, and you wouldn't call your module "pprint.py" unless you deliberately want to shadow the normal standard library module, but there are a lot that are more obscure. Suppose you start learning about Python's data types, and you create a "types.py" that lets you play with strings, integers, floats, and so on. Then, in the next lesson, you meet enumerations, and enum.py. $ touch types.py; python3 -c "import enum" Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/local/lib/python3.6/enum.py", line 2, in <module> from types import MappingProxyType, DynamicClassAttribute ImportError: cannot import name 'MappingProxyType' Tell me, is this an implausible scenario? Should a person just learning about Python's type system have to also learn not to use the name "types.py"? ChrisA

In a message of Sun, 01 Nov 2015 08:25:54 -0500, Alexander Walters writes:
The problem is that the sort of people who make these errors don't know the names. Knowing that 'turtle.py' is a bad name for 'my very first program that does turtle graphics' is a more advanced skill than the turtle program writers, or even their teachers, can be expected to have. I'm willing to have the kids and their teachers (and all the other beginners) suffer if the downside is that an even only moderately-sized set of legitimate stdlib shadowers end up with an inability to notice warning messages, because they are now ubiquitous. I just think that legitimate stdlib shadowers are rare. And I suspect that many of them do this once, for a short period, not every day of their lives. Thus the 'ignore this noise' is much less likely to develop with them. Python warnings are rare. Choosing to add a new one is a serious step. I do not want to open the floodgates and end up warning about all possibly-silly practice -- I am one of the cognitive psychology boffins who go around asking/nudging/reminding people to warn less often if you want your most serious warnings to be heeded. I just think the tradeoffs here are worth it. Laura

On 11/1/2015 08:53, Laura Creighton wrote: pure detriment to the user. Wanting these warnings is wanting to disadvantage the new user - debugging shadowed names is a skill you need in your toolbox, and this deprives people of that vital early lesson.

On Sun, Nov 01, 2015 at 08:53:17AM +0100, Laura Creighton wrote:
In a message of Sun, 01 Nov 2015 17:06:30 +1100, "Steven D'Aprano" writes:
I'm willing to consider that "" should appear in the middle of sys.path: [standard-library-directories, "", site-package-directories] but that still allows accidental shadowing of third-party packages. I don't think that the problem of accidently shadowing numpy is less deserving of a solution than accidently shadowing random. They're both problems. The downside of using a search path, instead of explicit full pathnames, is that you can have conflicts where two or more modules have the same name, and only the first can be imported. That's unavoidable. We can only try to minimize the risk of accidental shadowing, not prevent it altogether.
Now just a minute, I'm not proposing that this change occur at random, in the middle of a debugging session. Not even in a point (micro) release. So the code will only break (if it breaks) when you upgrade from version 3.x to 3.6 or higher. Moving to a new minor version number is a big step, and people expect that things *could* break. (They don't necessarily expect that they *will* break, but it's certainly a possibility that they might.) As an experienced developer, what do you do when code that works in one version stops working in the next? Everyone I know immediately assumes that something has changed in the new Python version, and proceeds from there. In this case, they'll be right. Hopefully they will read the release notes. Or the documentation. Or ask on StackOverflow. The same applies for when you install a new package: "Everything worked fine yesterday." "What changed?" "Well, I installed a new package with the same name as one of my modules." "Well there you go." Of course we can come up with scenarios that are more confusing. Suppose you have a program that you don't use very often, say, once a year, that relies on "" being at the front of the path, and then somebody who isn't you installs a site-wide third-party package that now shadows one of your modules, and you don't find out about for six months. But this sort of thing can already happen: any time you add a third-party module, it may shadow *other* third party modules that are found later in the path. So I'm not creating a brand new failure mode, I'm just changing how it applies a little. There are all sorts of debugging strategies available to an experienced developer, starting from "print out module.__file__" to "use the debugger", which a beginner who has just written "turtle.py" would never think of. If I'm going to break anyone's code, I'd rather it be yours than theirs, since I have every confidence you can debug any problems fairly quickly. I don't suggest breaking working code lightly, but the issue of shadowing the stdlib is an ever-present millstone around every Python developer's neck. The stdlib alone, never mind site packages, is too big to expect everyone to memorise what names they should not use. We can't realistically expect people to avoid shadowing. But we can shift it from "easy to shadow, and mostly affect those who are least able to cope" to "harder to shadow, and mostly affect those who can cope". -- Steve

In a message of Mon, 02 Nov 2015 02:44:10 +1100, "Steven D'Aprano" writes:
And I do. seriously. I think that shadowing third party packages is something I do all the time. I am not willing to share a namespace with all the package writers in the universe, and should I write package name with the same name as somebody else's third party package, I damn well want mine, not some other one. But learners shadowing the stdlib are a separate case. We've had similar arguments before, the slipperly slope sort. Either you don't want to solve any unless all get solved, or else you are afraid that minor precedent here will lead to all hell breaking loose when it is extended to its ultimate conclusion. But I think the entire purpose of being a human being is to make such judgments. And I think that this is an easy one. Shadowing the stdlib can be treated as a catchable mistake. Shadowing an outside package is something we want to do all the time.
Ah, I think your thinking is less than clear on this one. 1. my sysadmin does an update and suddently the system python is a different one than the one where I wrote my code. some poor sysadmin soul who never wrote anything gets to deal with the fact that my code now has name conflicts with packages that we have, but I never knew about and never imported. 2. things fall over because the new verision of python, in effect, shadows my code. I am not willing to share my namespace with every other python package creator on the planet. I am a consultant. When my code breaks, they call me. I am unwilling to have my code break every time some human being decides to make a package named something I already used over the last 18 years of writing python code. (I know that 3.x has not been out that long, but 18 years from now, the same condition holds.) Laura

On Nov 1, 2015 7:44 AM, "Steven D'Aprano" <steve@pearwood.info> wrote:
On Sun, Nov 01, 2015 at 08:53:17AM +0100, Laura Creighton wrote:
In a message of Sun, 01 Nov 2015 17:06:30 +1100, "Steven D'Aprano"
writes:
Scripts which deliberately or unintentionally shadow installed packages will break from this change. I don't have a problem with this. You
can't
In this case there actually is a common use case [1] that involves issuing using the "" entry to intentionally shadow third-party packages: if you're hacking on foo v1.3.dev, and want to test out the code you're writing as opposed to the installed version of foo v1.2, then switching to the root of your source checkout before running python or nosetests or whatever suffices. (OTOH it is also the true that numpy has had enough users run into problems due to trying to run python while accidentally in the root of a source checkout that numpy/__init__.py contains special case code to check for this case and error out early.) -n [1] Commonality here is assessed using the standard experimental procedure, i.e., "*I* do it so it's common".

Steven D'Aprano wrote:
It will also break scripts that rely on looking at the first element of sys.path to find the directory they are running from. Such code is NOT relying on harmful behaviour. -- Greg

On 11/1/2015 6:22 PM, Greg Ewing wrote:
'' is not very informative ;-). I suspect the full path added in the past, but it is not now. import os; os.getcwd() works better -- Terry Jan Reedy

Terry Reedy wrote:
You misunderstand. The point is *not* to find the cwd, it's to find the directory containing the main script, so you can load resources related to it. There are probably other and possibly better ways to go about that, but it works and is sometimes used. The proposed change would break it. -- Greg
participants (11)
-
Alexander Walters
-
Chris Angelico
-
Erik
-
Greg Ewing
-
Laura Creighton
-
MRAB
-
Nathaniel Smith
-
Serhiy Storchaka
-
Steven D'Aprano
-
Terry Reedy
-
Todd