Add 'module' module, similar to 'keyword' module

This idea results from issue of user files shadowing stdlib files on import. There was a thread on pydev about this yesterday. There is also an opposite issue of builtin modules shadowing user files. The keyword module provides kwlist and iskeyword function. One use of kwlist is used in some other stdlib modules and can be used by syntax highlighters (as in IDLE). Kwlist is updated by the main function. A module module would have at least liblist and islibmodule function. Liblist would contain all directories with __init__.py and all .py files. (I don't think files within package directories should be included, as there is no direct shadowing problem.) A python oriented editor could then warn on save requests "This name matches a stdlib name in /Lib. If you run python in this directory, you will not be able to import the stdlib module. Continue?". The module should also have binlist and isbinmodule for builtin modules. (I do not know how to get such a list. If necessary, an api could be added.) An editor could than warn "This name matches a builtin stdlib name. You will not be able to import this file. Continue?". -- Terry Jan Reedy

There's sys.builtin_module_names which returns the names of the hardcoded builtin modules. Dynamically loaded modules can be found by searching sys.path in the usual way -- importlib shoul know. I wonder if just asking importlib whether it can locate a given module would be enough? On Fri, Oct 30, 2015 at 6:09 PM, Terry Reedy <tjreedy@udel.edu> wrote:
-- --Guido van Rossum (python.org/~guido)

On 10/30/2015 9:19 PM, Guido van Rossum wrote:
There's sys.builtin_module_names which returns the names of the hardcoded builtin modules.
Great. With this solved, I opened an issue for IDLE. https://bugs.python.org/issue25522
The default search order is stdlib builtins, local user files, /lib files, so the shadowing issue the opposite for builtin and /lib modules. Hence a different message is needed.
-- Terry Jan Reedy

On Fri, 30 Oct 2015 at 22:57 Terry Reedy <tjreedy@udel.edu> wrote:
Quick and dirty way to use importlib is to get the location of the stdlib (os.__file__ should work since it's hard-coded in the interpreter as representing where the stdlib is) and then use importlib.find_spec() for a module name to check if the file location in the spec has the same location prefix as os or not (make sure you use absolute paths since it isn't guaranteed if you don't execute site.py). I've now seen this use case, the logging one, and the 2to3 module rename. I'm starting to wonder if there some general solution that should get added to the import machinery that can serve these cases more easily than with importers or __import__ overrides that can be tricky to get right. -Brett

Could we add a context manager to importlib (or perhaps site or sys) to temporarily disable imports from non-standard paths? I don't see any safe way to change the default behavior, but no reason we can't make it easy for applications to self-isolate. Top-posted from my Windows Phone -----Original Message----- From: "Brett Cannon" <brett@python.org> Sent: 10/31/2015 9:50 To: "Terry Reedy" <tjreedy@udel.edu>; "python-ideas@python.org" <python-ideas@python.org> Subject: Re: [Python-ideas] Add 'module' module, similar to 'keyword' module On Fri, 30 Oct 2015 at 22:57 Terry Reedy <tjreedy@udel.edu> wrote: On 10/30/2015 9:19 PM, Guido van Rossum wrote:
There's sys.builtin_module_names which returns the names of the hardcoded builtin modules.
Great. With this solved, I opened an issue for IDLE. https://bugs.python.org/issue25522
The default search order is stdlib builtins, local user files, /lib files, so the shadowing issue the opposite for builtin and /lib modules. Hence a different message is needed. Quick and dirty way to use importlib is to get the location of the stdlib (os.__file__ should work since it's hard-coded in the interpreter as representing where the stdlib is) and then use importlib.find_spec() for a module name to check if the file location in the spec has the same location prefix as os or not (make sure you use absolute paths since it isn't guaranteed if you don't execute site.py). I've now seen this use case, the logging one, and the 2to3 module rename. I'm starting to wonder if there some general solution that should get added to the import machinery that can serve these cases more easily than with importers or __import__ overrides that can be tricky to get right. -Brett
-- Terry Jan Reedy _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

On Sat, 31 Oct 2015 at 11:33 Steve Dower <steve.dower@python.org> wrote:
There's https://www.python.org/dev/peps/pep-0406/ which tried to get isolated import data. As for an immediate solution, trick is whether we have stored somewhere what the standard path entries are or if we have a reliable way to figure out which paths are standard or not. There is also the problem of not being thread-safe when swapping out sys.path entries. -Brett

On Fri, Oct 30, 2015 at 09:09:00PM -0400, Terry Reedy wrote:
`liblist` cannot usefully be a statically created list, since the availability of modules on the path is dynamic: the path can change, and files can be added or removed. Nor can it be limited to .py files, since .pyc and other extensions can be imported. I don't know what `islibmodule` is supposed to do. Surely all libraries are modules?
(I don't think files within package directories should be included, as there is no direct shadowing problem.)
Sounds like you want a private "fix_shadowing_of_modules_for_IDLE" module, rather than a general-purpose "module" module. For a general purpose "module" module, whether or not there is a shadowing problem is irrelevant. By the way, it should be clear from the above that the name "module" is atrocious -- it makes it awkward to talk about the module, and it will itself be shadowed by anyone (especially beginners) who create a "module.py" file. "modtools" might be a better name.
I don't think this is much of a solution to the shadowing problem. For starters, it relies on people using a specific editor. It assumes that files won't be renamed or moved outside of the editor. It depends on the user actually paying attention and reading the error message, which beginners notoriously don't do. Many people will either blindly continue (and hence shadow), or dismiss the dialog and then get into a flap that they can't save their work.
binlist is easy: sys.builtin_module_names. I don't know what `isbinmodule` means either. Presumably you don't actually mean "is binary module", but "is builtin module". I believe that the canonical way to do that is hasattr(module, "__file__"). -- Steve

On 10/31/2015 4:14 AM, Steven D'Aprano wrote: A response to my already half-dead proposal. Literally 10 minutes after I posted it, Guido replied that half of what I wanted already existed as sys.builtin_module_names. I acknowledged that, two hours before Steven's post, by saying I would go ahead and use this tuple for IDLE. General comments: The possibility of name clashes arises because Python does not limit import to stdlib. As Steven noted, Python cannot prevent this. What it could do is search the all possible import sources with each import and report clashes before picking one. I am not proposing this. Whether python picks a user file or stdlib file when both have the same name depends on how the stdlib version is implemented. Specific comment: I limited the proposal to the stdlib because a) the stdlib is fixed for a given version of CPython on a particular OS*, and b) the reported problems of beginners that I have seen, where they are stuck on what to do, involve the stdlib. (I could have made this limitation clearer in my original first paragraph.) *Except as modules are omitted in a particular build. sys.builtin_module_names exists because the information is not otherwise exposed and because it is needed and used in several places. Revised and reduced proposal: If other people would find it useful, add all_stdlib_toplevel_module_names - builtin_module_names to sys as something equivalent to .other_stdlib_module_names or .python_coded_module_names. Anyone wanting all_toplevel_module names could add the two. Or add the latter, and let others subtract. In the meanwhile, I will adapt the code in test__all__ that creates such a list from the lib directory. -- Terry Jan Reedy

On 2015-10-31 15:45, Terry Reedy wrote:
One thing I've sometimes wondered about for Python 4000 is the idea of putting the whole standard library under a single top-level package (e.g., from stdlib import sys). This would be a big change but would reduce the surprises that can arise from stdlib modules named things like "string", "parser", etc. I see a brief mention of this in PEP 3108 but that's it. Was there more discussion of the idea? -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On 2015-11-01 17:10, Terry Reedy wrote:
That doesn't handle the situation where you want to import a third-party module that is neither part of the stdlib nor part of your own project. But I guess my question is answered anyhow. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

There's sys.builtin_module_names which returns the names of the hardcoded builtin modules. Dynamically loaded modules can be found by searching sys.path in the usual way -- importlib shoul know. I wonder if just asking importlib whether it can locate a given module would be enough? On Fri, Oct 30, 2015 at 6:09 PM, Terry Reedy <tjreedy@udel.edu> wrote:
-- --Guido van Rossum (python.org/~guido)

On 10/30/2015 9:19 PM, Guido van Rossum wrote:
There's sys.builtin_module_names which returns the names of the hardcoded builtin modules.
Great. With this solved, I opened an issue for IDLE. https://bugs.python.org/issue25522
The default search order is stdlib builtins, local user files, /lib files, so the shadowing issue the opposite for builtin and /lib modules. Hence a different message is needed.
-- Terry Jan Reedy

On Fri, 30 Oct 2015 at 22:57 Terry Reedy <tjreedy@udel.edu> wrote:
Quick and dirty way to use importlib is to get the location of the stdlib (os.__file__ should work since it's hard-coded in the interpreter as representing where the stdlib is) and then use importlib.find_spec() for a module name to check if the file location in the spec has the same location prefix as os or not (make sure you use absolute paths since it isn't guaranteed if you don't execute site.py). I've now seen this use case, the logging one, and the 2to3 module rename. I'm starting to wonder if there some general solution that should get added to the import machinery that can serve these cases more easily than with importers or __import__ overrides that can be tricky to get right. -Brett

Could we add a context manager to importlib (or perhaps site or sys) to temporarily disable imports from non-standard paths? I don't see any safe way to change the default behavior, but no reason we can't make it easy for applications to self-isolate. Top-posted from my Windows Phone -----Original Message----- From: "Brett Cannon" <brett@python.org> Sent: 10/31/2015 9:50 To: "Terry Reedy" <tjreedy@udel.edu>; "python-ideas@python.org" <python-ideas@python.org> Subject: Re: [Python-ideas] Add 'module' module, similar to 'keyword' module On Fri, 30 Oct 2015 at 22:57 Terry Reedy <tjreedy@udel.edu> wrote: On 10/30/2015 9:19 PM, Guido van Rossum wrote:
There's sys.builtin_module_names which returns the names of the hardcoded builtin modules.
Great. With this solved, I opened an issue for IDLE. https://bugs.python.org/issue25522
The default search order is stdlib builtins, local user files, /lib files, so the shadowing issue the opposite for builtin and /lib modules. Hence a different message is needed. Quick and dirty way to use importlib is to get the location of the stdlib (os.__file__ should work since it's hard-coded in the interpreter as representing where the stdlib is) and then use importlib.find_spec() for a module name to check if the file location in the spec has the same location prefix as os or not (make sure you use absolute paths since it isn't guaranteed if you don't execute site.py). I've now seen this use case, the logging one, and the 2to3 module rename. I'm starting to wonder if there some general solution that should get added to the import machinery that can serve these cases more easily than with importers or __import__ overrides that can be tricky to get right. -Brett
-- Terry Jan Reedy _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

On Sat, 31 Oct 2015 at 11:33 Steve Dower <steve.dower@python.org> wrote:
There's https://www.python.org/dev/peps/pep-0406/ which tried to get isolated import data. As for an immediate solution, trick is whether we have stored somewhere what the standard path entries are or if we have a reliable way to figure out which paths are standard or not. There is also the problem of not being thread-safe when swapping out sys.path entries. -Brett

On Fri, Oct 30, 2015 at 09:09:00PM -0400, Terry Reedy wrote:
`liblist` cannot usefully be a statically created list, since the availability of modules on the path is dynamic: the path can change, and files can be added or removed. Nor can it be limited to .py files, since .pyc and other extensions can be imported. I don't know what `islibmodule` is supposed to do. Surely all libraries are modules?
(I don't think files within package directories should be included, as there is no direct shadowing problem.)
Sounds like you want a private "fix_shadowing_of_modules_for_IDLE" module, rather than a general-purpose "module" module. For a general purpose "module" module, whether or not there is a shadowing problem is irrelevant. By the way, it should be clear from the above that the name "module" is atrocious -- it makes it awkward to talk about the module, and it will itself be shadowed by anyone (especially beginners) who create a "module.py" file. "modtools" might be a better name.
I don't think this is much of a solution to the shadowing problem. For starters, it relies on people using a specific editor. It assumes that files won't be renamed or moved outside of the editor. It depends on the user actually paying attention and reading the error message, which beginners notoriously don't do. Many people will either blindly continue (and hence shadow), or dismiss the dialog and then get into a flap that they can't save their work.
binlist is easy: sys.builtin_module_names. I don't know what `isbinmodule` means either. Presumably you don't actually mean "is binary module", but "is builtin module". I believe that the canonical way to do that is hasattr(module, "__file__"). -- Steve

On 10/31/2015 4:14 AM, Steven D'Aprano wrote: A response to my already half-dead proposal. Literally 10 minutes after I posted it, Guido replied that half of what I wanted already existed as sys.builtin_module_names. I acknowledged that, two hours before Steven's post, by saying I would go ahead and use this tuple for IDLE. General comments: The possibility of name clashes arises because Python does not limit import to stdlib. As Steven noted, Python cannot prevent this. What it could do is search the all possible import sources with each import and report clashes before picking one. I am not proposing this. Whether python picks a user file or stdlib file when both have the same name depends on how the stdlib version is implemented. Specific comment: I limited the proposal to the stdlib because a) the stdlib is fixed for a given version of CPython on a particular OS*, and b) the reported problems of beginners that I have seen, where they are stuck on what to do, involve the stdlib. (I could have made this limitation clearer in my original first paragraph.) *Except as modules are omitted in a particular build. sys.builtin_module_names exists because the information is not otherwise exposed and because it is needed and used in several places. Revised and reduced proposal: If other people would find it useful, add all_stdlib_toplevel_module_names - builtin_module_names to sys as something equivalent to .other_stdlib_module_names or .python_coded_module_names. Anyone wanting all_toplevel_module names could add the two. Or add the latter, and let others subtract. In the meanwhile, I will adapt the code in test__all__ that creates such a list from the lib directory. -- Terry Jan Reedy

On 2015-10-31 15:45, Terry Reedy wrote:
One thing I've sometimes wondered about for Python 4000 is the idea of putting the whole standard library under a single top-level package (e.g., from stdlib import sys). This would be a big change but would reduce the surprises that can arise from stdlib modules named things like "string", "parser", etc. I see a brief mention of this in PEP 3108 but that's it. Was there more discussion of the idea? -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On 2015-11-01 17:10, Terry Reedy wrote:
That doesn't handle the situation where you want to import a third-party module that is neither part of the stdlib nor part of your own project. But I guess my question is answered anyhow. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
participants (6)
-
Brendan Barnwell
-
Brett Cannon
-
Guido van Rossum
-
Steve Dower
-
Steven D'Aprano
-
Terry Reedy