New sys.module_names attribute in Python 3.10: list of all stdlib modules
Hi, I just added a new sys.module_names attribute, list (technically a frozenset) of all stdlib module names: https://bugs.python.org/issue42955 There are multiple use cases: * Group stdlib imports when reformatting a Python file, * Exclude stdlib imports when computing dependencies. * Exclude stdlib modules when listing extension modules on crash or fatal error, only list 3rd party extension (already implemented in master, see bpo-42923 ;-)). * Exclude stdlib modules when tracing the execution of a program using the trace module. * Detect typo and suggest a fix: ImportError("No module named maths. Did you mean 'math'?",) (test the nice friendly-traceback project!). Example:
'asyncio' in sys.module_names True 'numpy' in sys.module_names False
len(sys.module_names) 312 type(sys.module_names) <class 'frozenset'>
sorted(sys.module_names)[:10] ['__future__', '_abc', '_aix_support', '_ast', '_asyncio', '_bisect', '_blake2', '_bootsubprocess', '_bz2', '_codecs'] sorted(sys.module_names)[-10:] ['xml.dom', 'xml.etree', 'xml.parsers', 'xml.sax', 'xmlrpc', 'zipapp', 'zipfile', 'zipimport', 'zlib', 'zoneinfo']
The list is opinionated and defined by its documentation: A frozenset of strings containing the names of standard library modules. It is the same on all platforms. Modules which are not available on some platforms and modules disabled at Python build are also listed. All module kinds are listed: pure Python, built-in, frozen and extension modules. Test modules are excluded. For packages, only sub-packages are listed, not sub-modules. For example, ``concurrent`` package and ``concurrent.futures`` sub-package are listed, but not ``concurrent.futures.base`` sub-module. See also the :attr:`sys.builtin_module_names` list. The design (especially, the fact of having the same list on all platforms) comes from the use cases list above. For example, running isort should produce the same output on any platform, and not depend if the Python stdlib was splitted into multiple packages on Linux (which is done by most popular Linux distributions). The list is generated by the Tools/scripts/generate_module_names.py script: https://github.com/python/cpython/blob/master/Tools/scripts/generate_module_... When you add a new module, you must run "make regen-module-names, otherwise a pre-commit check will fail on your PR ;-) The list of Windows extensions is currently hardcoded in the script (contributions are welcomed to discover them, since the list is short and evolves rarely, I didn't feel the need to spend time that on that). Currently (Python 3.10.0a4+), there are 312 names in sys.module_names, stored in Python/module_names.h: https://github.com/python/cpython/blob/master/Python/module_names.h It was decided to include "helper" modules like "_aix_support" which is used by sysconfig. But test modules like _testcapi are excluded to make the list shorter (it's rare to run the CPython test suite outside Python). There are 83 private modules, name starting with an underscore (exclude _abc but also __future__):
len([name for name in sys.module_names if not name.startswith('_')]) 229
This new attribute may help to define "what is the Python stdlib" ;-) Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Mon, 25 Jan 2021 14:03:22 +0100 Victor Stinner <vstinner@python.org> wrote:
The list is opinionated and defined by its documentation:
So "the list is opinionated" means there can be false negatives, i.e. some stdlib modules which are not present in this list? This will probably make life harder for third-party software that wants to answer the question "is module XXX a stdlib module or does it need to be distributed separately?". Regards Antoine.
On Mon, Jan 25, 2021 at 4:18 PM Antoine Pitrou <antoine@python.org> wrote:
On Mon, 25 Jan 2021 14:03:22 +0100 Victor Stinner <vstinner@python.org> wrote:
The list is opinionated and defined by its documentation:
So "the list is opinionated" means there can be false negatives, i.e. some stdlib modules which are not present in this list?
Test modules of the stdlib are excluded. Example:
import sys '_testcapi' in sys.module_names # _testcapi extension False 'test' in sys.module_names # Lib/test/ package False import _testcapi _testcapi <module '_testcapi' from '/home/vstinner/python/master/build/lib.linux-x86_64-3.10-pydebug/_testcapi.cpython-310d-x86_64-linux-gnu.so'> import test test <module 'test' from '/home/vstinner/python/master/Lib/test/__init__.py'>
It can be changed if it's an issue. That's also why I sent an email to python-dev, to see if there is something wrong with sys.module_names definition. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
Just _names_? There's a recurring error case when a 3rd-party module overrides a standard one if it happens to have the same name. If you filter such a module out, you're shooting yourself in the foot... On 25.01.2021 16:03, Victor Stinner wrote:
Hi,
I just added a new sys.module_names attribute, list (technically a frozenset) of all stdlib module names: https://bugs.python.org/issue42955
There are multiple use cases:
* Group stdlib imports when reformatting a Python file, * Exclude stdlib imports when computing dependencies. * Exclude stdlib modules when listing extension modules on crash or fatal error, only list 3rd party extension (already implemented in master, see bpo-42923 ;-)). * Exclude stdlib modules when tracing the execution of a program using the trace module. * Detect typo and suggest a fix: ImportError("No module named maths. Did you mean 'math'?",) (test the nice friendly-traceback project!).
Example:
'asyncio' in sys.module_names True 'numpy' in sys.module_names False
len(sys.module_names) 312 type(sys.module_names) <class 'frozenset'>
sorted(sys.module_names)[:10] ['__future__', '_abc', '_aix_support', '_ast', '_asyncio', '_bisect', '_blake2', '_bootsubprocess', '_bz2', '_codecs'] sorted(sys.module_names)[-10:] ['xml.dom', 'xml.etree', 'xml.parsers', 'xml.sax', 'xmlrpc', 'zipapp', 'zipfile', 'zipimport', 'zlib', 'zoneinfo']
The list is opinionated and defined by its documentation:
A frozenset of strings containing the names of standard library modules.
It is the same on all platforms. Modules which are not available on some platforms and modules disabled at Python build are also listed. All module kinds are listed: pure Python, built-in, frozen and extension modules. Test modules are excluded.
For packages, only sub-packages are listed, not sub-modules. For example, ``concurrent`` package and ``concurrent.futures`` sub-package are listed, but not ``concurrent.futures.base`` sub-module.
See also the :attr:`sys.builtin_module_names` list.
The design (especially, the fact of having the same list on all platforms) comes from the use cases list above. For example, running isort should produce the same output on any platform, and not depend if the Python stdlib was splitted into multiple packages on Linux (which is done by most popular Linux distributions).
The list is generated by the Tools/scripts/generate_module_names.py script: https://github.com/python/cpython/blob/master/Tools/scripts/generate_module_...
When you add a new module, you must run "make regen-module-names, otherwise a pre-commit check will fail on your PR ;-) The list of Windows extensions is currently hardcoded in the script (contributions are welcomed to discover them, since the list is short and evolves rarely, I didn't feel the need to spend time that on that).
Currently (Python 3.10.0a4+), there are 312 names in sys.module_names, stored in Python/module_names.h: https://github.com/python/cpython/blob/master/Python/module_names.h
It was decided to include "helper" modules like "_aix_support" which is used by sysconfig. But test modules like _testcapi are excluded to make the list shorter (it's rare to run the CPython test suite outside Python).
There are 83 private modules, name starting with an underscore (exclude _abc but also __future__):
len([name for name in sys.module_names if not name.startswith('_')]) 229
This new attribute may help to define "what is the Python stdlib" ;-)
Victor
-- Regards, Ivan
On Mon, Jan 25, 2021 at 06:46:51PM +0300, Ivan Pozdeev via Python-Dev wrote:
There's a recurring error case when a 3rd-party module overrides a standard one if it happens to have the same name.
Any argument and expectation is off in this case. We shouldn't worry about such scenarios. -- Senthil
Hi Ivan, On Mon, Jan 25, 2021 at 4:53 PM Ivan Pozdeev via Python-Dev <python-dev@python.org> wrote:
Just _names_? There's a recurring error case when a 3rd-party module overrides a standard one if it happens to have the same name. If you filter such a module out, you're shooting yourself in the foot...
Overriding stdlib modules has been discussed in the issue. For example, it was proposed to add an attribute to all stdlib modules (__stdlib__=True or __author__ = 'PSF'), and then check if the attribute exists or not. The problem is that importing a module to check for its attribute cause side effect or fail, and so cannot be used for some use cases. For example, it would be a surprising to open a web browser window when running isort on a Python code containing "import antigravity". Another problem is that third party can also add the attribute to pretend that their code is part of the stdlib. In a previous version of my PR, I added a note about sys.path and overriding stdlib modules, but I have been asked to remove it. Feel free to propose a PR to add such note if you consider that it's related to sys.module_names. Please read the discussion at https://bugs.python.org/issue42955 and https://github.com/python/cpython/pull/24238 Victor
Hello, In general, I love the idea and implementation. I'm not in love with the name though, it makes it sound like it contains all module names imported/available. We have sys.module already containing all module imported. So without a deeper knowledge sys.modules_names is very close to sys.module.keys() or all available modules. Can we name it instead sys.stdlib_modules_names to clarify that this is standard library only subset and not all available modules for the interpreter? Thanks, On Mon, Jan 25, 2021 at 4:33 PM Victor Stinner <vstinner@python.org> wrote:
Hi Ivan,
On Mon, Jan 25, 2021 at 4:53 PM Ivan Pozdeev via Python-Dev <python-dev@python.org> wrote:
Just _names_? There's a recurring error case when a 3rd-party module overrides a standard one if it happens to have the same name. If you filter such a module out, you're shooting yourself in the foot...
Overriding stdlib modules has been discussed in the issue.
For example, it was proposed to add an attribute to all stdlib modules (__stdlib__=True or __author__ = 'PSF'), and then check if the attribute exists or not. The problem is that importing a module to check for its attribute cause side effect or fail, and so cannot be used for some use cases. For example, it would be a surprising to open a web browser window when running isort on a Python code containing "import antigravity". Another problem is that third party can also add the attribute to pretend that their code is part of the stdlib.
In a previous version of my PR, I added a note about sys.path and overriding stdlib modules, but I have been asked to remove it. Feel free to propose a PR to add such note if you consider that it's related to sys.module_names.
Please read the discussion at https://bugs.python.org/issue42955 and https://github.com/python/cpython/pull/24238
Victor _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7HMWTGBE... Code of Conduct: http://python.org/psf/codeofconduct/
Hi Bernat, "stdlib_module_names" was my first idea but it looks too long, so I chose "module_names". But someone on Twitter and now you asked me why not "stdlib_module_names", so I wrote a PR to rename module_names to sys.stdlib_module_names: https://github.com/python/cpython/pull/24332 At least "stdlib_module_names" better summarizes its definition: "A frozenset of strings containing the names of standard library modules". Victor On Mon, Jan 25, 2021 at 5:39 PM Bernat Gabor <jokerjokerer@gmail.com> wrote:
Hello,
In general, I love the idea and implementation. I'm not in love with the name though, it makes it sound like it contains all module names imported/available. We have sys.module already containing all module imported. So without a deeper knowledge sys.modules_names is very close to sys.module.keys() or all available modules. Can we name it instead sys.stdlib_modules_names to clarify that this is standard library only subset and not all available modules for the interpreter?
Thanks,
On Mon, Jan 25, 2021 at 4:33 PM Victor Stinner <vstinner@python.org> wrote:
Hi Ivan,
On Mon, Jan 25, 2021 at 4:53 PM Ivan Pozdeev via Python-Dev <python-dev@python.org> wrote:
Just _names_? There's a recurring error case when a 3rd-party module overrides a standard one if it happens to have the same name. If you filter such a module out, you're shooting yourself in the foot...
Overriding stdlib modules has been discussed in the issue.
For example, it was proposed to add an attribute to all stdlib modules (__stdlib__=True or __author__ = 'PSF'), and then check if the attribute exists or not. The problem is that importing a module to check for its attribute cause side effect or fail, and so cannot be used for some use cases. For example, it would be a surprising to open a web browser window when running isort on a Python code containing "import antigravity". Another problem is that third party can also add the attribute to pretend that their code is part of the stdlib.
In a previous version of my PR, I added a note about sys.path and overriding stdlib modules, but I have been asked to remove it. Feel free to propose a PR to add such note if you consider that it's related to sys.module_names.
Please read the discussion at https://bugs.python.org/issue42955 and https://github.com/python/cpython/pull/24238
Victor _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7HMWTGBE... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.
On Mon, Jan 25, 2021 at 06:17:09PM +0100, Victor Stinner wrote:
Hi Bernat,
"stdlib_module_names" was my first idea but it looks too long, so I chose "module_names". But someone on Twitter and now you asked me why not "stdlib_module_names", so I wrote a PR to rename module_names to sys.stdlib_module_names: https://github.com/python/cpython/pull/24332
At least "stdlib_module_names" better summarizes its definition: "A frozenset of strings containing the names of standard library modules".
Your first instinct that it is too long is correct. Just call it "stdlib" or "stdlib_names". The fact that it is a frozen set of module names will be obvious from just looking at it, and there is no need for the name to explain everything about it. We have: * `dir()`, not `sorted_dir_names()`; * `sys.prefix`, not `sys.site_specific_directory_path_prefix`; * `sys.audit`, not `sys.raise_audit_hook_event`; * `sys.exit()`, not `sys.exit_python()`; * `sys.float_info`, not `sys.float_prec_and_low_level_info`; etc. Python has very good documentation and excellent introspection capabilities. Names should act as a short reminder of the meaning, there is no need to encode a full description into a long amd verbose name. -- Steve
On Tue, 26 Jan 2021 10:36:10 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
On Mon, Jan 25, 2021 at 06:17:09PM +0100, Victor Stinner wrote:
Hi Bernat,
"stdlib_module_names" was my first idea but it looks too long, so I chose "module_names". But someone on Twitter and now you asked me why not "stdlib_module_names", so I wrote a PR to rename module_names to sys.stdlib_module_names: https://github.com/python/cpython/pull/24332
At least "stdlib_module_names" better summarizes its definition: "A frozenset of strings containing the names of standard library modules".
Your first instinct that it is too long is correct.
Disagreed. This is niche enough that it warrants a long but explicit name, rather than some ambiguous shortcut.
Just call it "stdlib" or "stdlib_names".
If you call it "stdlib", then you should make it a namedtuple that will expose various information, such as "sys.stdlib.module_names". Regards Antoine.
On Tue, Jan 26, 2021 at 1:26 AM Antoine Pitrou <antoine@python.org> wrote:
On Tue, 26 Jan 2021 10:36:10 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
On Mon, Jan 25, 2021 at 06:17:09PM +0100, Victor Stinner wrote:
Hi Bernat,
"stdlib_module_names" was my first idea but it looks too long, so I chose "module_names". But someone on Twitter and now you asked me why not "stdlib_module_names", so I wrote a PR to rename module_names to sys.stdlib_module_names: https://github.com/python/cpython/pull/24332
At least "stdlib_module_names" better summarizes its definition: "A frozenset of strings containing the names of standard library modules".
Your first instinct that it is too long is correct.
Disagreed. This is niche enough that it warrants a long but explicit name, rather than some ambiguous shortcut.
I agree w/ Antoine that a more descriptive name for such a niche (but useful!) attribute makes sense. -Brett
Just call it "stdlib" or "stdlib_names".
If you call it "stdlib", then you should make it a namedtuple that will expose various information, such as "sys.stdlib.module_names".
Regards
Antoine.
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CJYWXVBI... Code of Conduct: http://python.org/psf/codeofconduct/
On Tue, Jan 26, 2021 at 12:08:03PM -0800, Brett Cannon wrote:
On Tue, Jan 26, 2021 at 1:26 AM Antoine Pitrou <antoine@python.org> wrote:
[...]
Disagreed. This is niche enough that it warrants a long but explicit name, rather than some ambiguous shortcut.
I agree w/ Antoine that a more descriptive name for such a niche (but useful!) attribute makes sense.
This descriptive name is *literally incorrect*. By design, it doesn't list modules. It only lists sub-packages and not sub-modules, to keep the number of entries more managable. (Personally, I don't think an extra hundred or two names makes that much difference. Its going to be a big list one way or the other.) So by the current rules, many stdlib modules are not included and the name is inaccurate. If you're not going to list all the dotted modules of a package, why distinguish sub-modules from sub-packages? It is confusing and ackward to have only some dotted modules listed based on the **implementation**. (We need a good term for "things you can import that create a module *object* regardless of whether they are a *module file* or a *package*. I'm calling them a dotted module for lack of a better name.) By the current rules, many stdlib modules are not listed, and you can't see why unless you know their implementation: * urllib - listed * urllib.parse - not listed * collections - listed * collections.abc - not listed * email - listed * email.parser - not listed * email.mime - listed # Surprise! So we have this weird situation where an implementation detail of the dotted module (whether it is a file `package/module.py` or `package/module/__init__.py`) determines whether it shows up or not. And because the file system structure of a module is not part of its API, that implementation detail could change without warning. I think that either of: 1. list *all* package dotted modules regardless of whether they are implemented as a sub-module or sub-package; 2. list *no* package dotted modules, only the top-level package; would be better than this inconsistent hybrid of only listing some dotted modules. (Excluding the test modules is fine.) -- Steve
On Wed, 27 Jan 2021 11:05:28 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Jan 26, 2021 at 12:08:03PM -0800, Brett Cannon wrote:
On Tue, Jan 26, 2021 at 1:26 AM Antoine Pitrou <antoine@python.org> wrote:
[...]
Disagreed. This is niche enough that it warrants a long but explicit name, rather than some ambiguous shortcut.
I agree w/ Antoine that a more descriptive name for such a niche (but useful!) attribute makes sense.
This descriptive name is *literally incorrect*. By design, it doesn't list modules. It only lists sub-packages and not sub-modules, to keep the number of entries more managable.
(Personally, I don't think an extra hundred or two names makes that much difference. Its going to be a big list one way or the other.)
So by the current rules, many stdlib modules are not included and the name is inaccurate.
Ok, then "stdlib_package_names"? :-) Regards Antoine.
On Wed, Jan 27, 2021 at 10:44:00AM +0100, Antoine Pitrou wrote:
Ok, then "stdlib_package_names"? :-)
Heh :-) I see your smiley, and I'm not going to argue about the name any further. I have my preference, but if the consensus is stdlib_module_names, so be it. But I think the inconsistency between sub-modules and sub-packages is important. We should either list all sub-whatever or none of them, rather than only some of them. -- Steve
Hi Steven, That makes sense to me: I wrote https://github.com/python/cpython/pull/24353 to exclude sub-package. The change removes 12 sub-packages from sys.stdlib_module_names and len(sys.stdlib_module_names) becomes 300 :-) -"concurrent.futures", -"ctypes.macholib", -"distutils.command", -"email.mime", -"ensurepip._bundled", -"lib2to3.fixes", -"lib2to3.pgen2", -"multiprocessing.dummy", -"xml.dom", -"xml.etree", -"xml.parsers", -"xml.sax", With that name, names of sys.stdlib_module_names don't contain "." anymore. So to check if "email.message" is a stdlib module name, exclude the part after the first dot, and check if "email" is in sys.stdlib_module_names. In practice, it is not possible to add a sub-package or a sub-module to a stdlib module, so this limitation (excluding sub-packages and sub-modules) sounds reasonable to me. Victor On Wed, Jan 27, 2021 at 1:09 AM Steven D'Aprano <steve@pearwood.info> wrote:
This descriptive name is *literally incorrect*. By design, it doesn't list modules. It only lists sub-packages and not sub-modules, to keep the number of entries more managable.
(Personally, I don't think an extra hundred or two names makes that much difference. Its going to be a big list one way or the other.)
So by the current rules, many stdlib modules are not included and the name is inaccurate.
If you're not going to list all the dotted modules of a package, why distinguish sub-modules from sub-packages? It is confusing and ackward to have only some dotted modules listed based on the **implementation**.
(We need a good term for "things you can import that create a module *object* regardless of whether they are a *module file* or a *package*. I'm calling them a dotted module for lack of a better name.)
By the current rules, many stdlib modules are not listed, and you can't see why unless you know their implementation:
* urllib - listed * urllib.parse - not listed
* collections - listed * collections.abc - not listed
* email - listed * email.parser - not listed * email.mime - listed # Surprise!
So we have this weird situation where an implementation detail of the dotted module (whether it is a file `package/module.py` or `package/module/__init__.py`) determines whether it shows up or not.
And because the file system structure of a module is not part of its API, that implementation detail could change without warning.
I think that either of:
1. list *all* package dotted modules regardless of whether they are implemented as a sub-module or sub-package;
2. list *no* package dotted modules, only the top-level package;
would be better than this inconsistent hybrid of only listing some dotted modules.
(Excluding the test modules is fine.)
-- Night gathers, and now my watch begins. It shall not end until my death.
On Tue, Jan 26, 2021 at 12:44 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Mon, Jan 25, 2021 at 06:17:09PM +0100, Victor Stinner wrote:
Hi Bernat,
"stdlib_module_names" was my first idea but it looks too long, so I chose "module_names". But someone on Twitter and now you asked me why not "stdlib_module_names", so I wrote a PR to rename module_names to sys.stdlib_module_names: https://github.com/python/cpython/pull/24332
At least "stdlib_module_names" better summarizes its definition: "A frozenset of strings containing the names of standard library modules".
Your first instinct that it is too long is correct. Just call it "stdlib" or "stdlib_names". The fact that it is a frozen set of module names will be obvious from just looking at it, and there is no need for the name to explain everything about it. We have:
The sys module already has a sys.modules attribute, and so sys.module_names sounds like "give me the name of all imported modules": sys.module.keys(). It's confusing. Just after I announced the creation of the attribute, at least 3 people told me that they were confused by the name. Also, my PR was approved quickly 3 times which confirms that the rename was a good idea ;-) In general, I agree that short names are great ;-) For example, I like short obj.name() rather than obj.getname() when it doesn't make sense to set the name. Naming is a hard problem :-D Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Tue, Jan 26, 2021 at 10:56:57AM +0100, Victor Stinner wrote:
On Tue, Jan 26, 2021 at 12:44 AM Steven D'Aprano <steve@pearwood.info> wrote:
[...]
Your first instinct that it is too long is correct. Just call it "stdlib" or "stdlib_names". The fact that it is a frozen set of module names will be obvious from just looking at it, and there is no need for the name to explain everything about it. We have:
The sys module already has a sys.modules attribute, and so sys.module_names sounds like "give me the name of all imported modules": sys.module.keys().
Then its a good thing I didn't propose calling it "module_names" :-) -- Steve
If the length of the name is any kind of issue, since the stdlib only contains modules (and packages), why not just sys.stdlib_names? On Mon, Jan 25, 2021 at 5:18 PM Victor Stinner <vstinner@python.org> wrote:
Hi Bernat,
"stdlib_module_names" was my first idea but it looks too long, so I chose "module_names". But someone on Twitter and now you asked me why not "stdlib_module_names", so I wrote a PR to rename module_names to sys.stdlib_module_names: https://github.com/python/cpython/pull/24332
At least "stdlib_module_names" better summarizes its definition: "A frozenset of strings containing the names of standard library modules".
Victor
On Mon, Jan 25, 2021 at 5:39 PM Bernat Gabor <jokerjokerer@gmail.com> wrote:
Hello,
In general, I love the idea and implementation. I'm not in love with the
name though, it makes it sound like it contains all module names imported/available. We have sys.module already containing all module imported. So without a deeper knowledge sys.modules_names is very close to sys.module.keys() or all available modules. Can we name it instead sys.stdlib_modules_names to clarify that this is standard library only subset and not all available modules for the interpreter?
Thanks,
On Mon, Jan 25, 2021 at 4:33 PM Victor Stinner <vstinner@python.org>
wrote:
Hi Ivan,
On Mon, Jan 25, 2021 at 4:53 PM Ivan Pozdeev via Python-Dev <python-dev@python.org> wrote:
Just _names_? There's a recurring error case when a 3rd-party module
overrides a standard one if it happens to have the same name. If you
filter such a module out, you're shooting yourself in the foot...
Overriding stdlib modules has been discussed in the issue.
For example, it was proposed to add an attribute to all stdlib modules (__stdlib__=True or __author__ = 'PSF'), and then check if the attribute exists or not. The problem is that importing a module to check for its attribute cause side effect or fail, and so cannot be used for some use cases. For example, it would be a surprising to open a web browser window when running isort on a Python code containing "import antigravity". Another problem is that third party can also add the attribute to pretend that their code is part of the stdlib.
In a previous version of my PR, I added a note about sys.path and overriding stdlib modules, but I have been asked to remove it. Feel free to propose a PR to add such note if you consider that it's related to sys.module_names.
Please read the discussion at https://bugs.python.org/issue42955 and https://github.com/python/cpython/pull/24238
Victor _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7HMWTGBE... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WJMYK2JK... Code of Conduct: http://python.org/psf/codeofconduct/
On 1/26/2021 8:32 PM, Steve Holden wrote:
If the length of the name is any kind of issue, since the stdlib only contains modules (and packages), why not just sys.stdlib_names?
And since the modules can vary between platforms and builds, why wouldn't this be sysconfig.stdlib_names rather than sys.stdlib_names? "Modules that were built into the stdlib" sounds more like sysconfig, and having an accurate list seems better than one that specifies (e.g.) distutils, ensurepip, resource or termios when those are absent. Cheers, Steve
On Tue, Jan 26, 2021 at 10:04 PM Steve Dower <steve.dower@python.org> wrote:
On 1/26/2021 8:32 PM, Steve Holden wrote:
If the length of the name is any kind of issue, since the stdlib only contains modules (and packages), why not just sys.stdlib_names?
And since the modules can vary between platforms and builds, why wouldn't this be sysconfig.stdlib_names rather than sys.stdlib_names?
The list is the same on all platforms on purpose ;-) Example:
'winsound' in sys.stdlib_module_names True 'ossaudiodev' in sys.stdlib_module_names True
For example, grouping stdlib imports using sys.stdlib_module_names gives the same output on any platform, even if there were missing dependencies when you built Python. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Tue, Jan 26, 2021 at 11:19:13PM +0100, Victor Stinner wrote:
On Tue, Jan 26, 2021 at 10:04 PM Steve Dower <steve.dower@python.org> wrote:
On 1/26/2021 8:32 PM, Steve Holden wrote:
If the length of the name is any kind of issue, since the stdlib only contains modules (and packages), why not just sys.stdlib_names?
And since the modules can vary between platforms and builds, why wouldn't this be sysconfig.stdlib_names rather than sys.stdlib_names?
The list is the same on all platforms on purpose ;-) Example:
'winsound' in sys.stdlib_module_names True
Right. This is (I think) Steve's point: the list is inaccurate, because the existence of 'winsound' in the stdlib_module_names doesn't mean that the module 'winsound' exists. -- Steve
On Wed, Jan 27, 2021 at 1:16 AM Steven D'Aprano <steve@pearwood.info> wrote:
Right. This is (I think) Steve's point: the list is inaccurate, because the existence of 'winsound' in the stdlib_module_names doesn't mean that the module 'winsound' exists.
This point is addressed by the definition of the list: sys.stdlib_module_names documentation. "It is the same on all platforms. Modules which are not available on some platforms and modules disabled at Python build are also listed. All module kinds are listed: pure Python, built-in, frozen and extension modules. Test modules are excluded." https://docs.python.org/dev/library/sys.html#sys.stdlib_module_names As I wrote previously, there are use cases which *require* the list being the same on all platforms. Moreover, in practice, it's quite hard to build a list of available stdlib module names. You need to build extension modules, try to implement them, then rebuild the list of module which requires to rebuild Python. It's not convenient. Also, there are different definition of "available". For example, "import multiprocessing" can fail on some platforms if there is no lock implementation available. It's not because it's installed on the system that the import will work for sure. IMO the only reliable way to check if a module can be imported... is to import it. And then you hit again the issue of import side effects. There are different ways to filter sys.stdlib_module_names list to only list "available" modules. Try import, pkgutil.iter_modules() or pkgutil.walk_packages(). IMO it should remain out of the scope of sys.stdlib_module_names. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Mon, Jan 25, 2021 at 7:51 AM Ivan Pozdeev via Python-Dev < python-dev@python.org> wrote:
Just _names_? There's a recurring error case when a 3rd-party module overrides a standard one if it happens to have the same name. If you filter such a module out, you're shooting yourself in the foot...
Would another use case be to support issuing a warning if a third-party module is imported whose name matches a standard one? A related use case would be to build on this and define a function that accepts an already imported module and return whether it is from the standard library. Unlike, the module_names attribute, this function would reflect the reality of the underlying module, and so not have false positives as with doing a name check alone. —Chris
Hi,
I just added a new sys.module_names attribute, list (technically a frozenset) of all stdlib module names: https://bugs.python.org/issue42955
There are multiple use cases:
* Group stdlib imports when reformatting a Python file, * Exclude stdlib imports when computing dependencies. * Exclude stdlib modules when listing extension modules on crash or fatal error, only list 3rd party extension (already implemented in master, see bpo-42923 ;-)). * Exclude stdlib modules when tracing the execution of a program using the trace module. * Detect typo and suggest a fix: ImportError("No module named maths. Did you mean 'math'?",) (test the nice friendly-traceback project!).
Example:
'asyncio' in sys.module_names True 'numpy' in sys.module_names False
len(sys.module_names) 312 type(sys.module_names) <class 'frozenset'>
sorted(sys.module_names)[:10] ['__future__', '_abc', '_aix_support', '_ast', '_asyncio', '_bisect', '_blake2', '_bootsubprocess', '_bz2', '_codecs'] sorted(sys.module_names)[-10:] ['xml.dom', 'xml.etree', 'xml.parsers', 'xml.sax', 'xmlrpc', 'zipapp', 'zipfile', 'zipimport', 'zlib', 'zoneinfo']
The list is opinionated and defined by its documentation:
A frozenset of strings containing the names of standard library modules.
It is the same on all platforms. Modules which are not available on some platforms and modules disabled at Python build are also listed. All module kinds are listed: pure Python, built-in, frozen and extension modules. Test modules are excluded.
For packages, only sub-packages are listed, not sub-modules. For example, ``concurrent`` package and ``concurrent.futures`` sub-package are listed, but not ``concurrent.futures.base`` sub-module.
See also the :attr:`sys.builtin_module_names` list.
The design (especially, the fact of having the same list on all platforms) comes from the use cases list above. For example, running isort should produce the same output on any platform, and not depend if the Python stdlib was splitted into multiple packages on Linux (which is done by most popular Linux distributions).
The list is generated by the Tools/scripts/generate_module_names.py
On 25.01.2021 16:03, Victor Stinner wrote: script:
https://github.com/python/cpython/blob/master/Tools/scripts/generate_module_...
When you add a new module, you must run "make regen-module-names, otherwise a pre-commit check will fail on your PR ;-) The list of Windows extensions is currently hardcoded in the script (contributions are welcomed to discover them, since the list is short and evolves rarely, I didn't feel the need to spend time that on that).
Currently (Python 3.10.0a4+), there are 312 names in sys.module_names, stored in Python/module_names.h: https://github.com/python/cpython/blob/master/Python/module_names.h
It was decided to include "helper" modules like "_aix_support" which is used by sysconfig. But test modules like _testcapi are excluded to make the list shorter (it's rare to run the CPython test suite outside Python).
There are 83 private modules, name starting with an underscore (exclude _abc but also __future__):
len([name for name in sys.module_names if not name.startswith('_')]) 229
This new attribute may help to define "what is the Python stdlib" ;-)
Victor
-- Regards, Ivan _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/KCJDHKOK... Code of Conduct: http://python.org/psf/codeofconduct/
On Mon, Jan 25, 2021 at 6:37 PM Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
On Mon, Jan 25, 2021 at 7:51 AM Ivan Pozdeev via Python-Dev <python-dev@python.org> wrote:
Just _names_? There's a recurring error case when a 3rd-party module overrides a standard one if it happens to have the same name. If you filter such a module out, you're shooting yourself in the foot...
Would another use case be to support issuing a warning if a third-party module is imported whose name matches a standard one? A related use case would be to build on this and define a function that accepts an already imported module and return whether it is from the standard library. Unlike, the module_names attribute, this function would reflect the reality of the underlying module, and so not have false positives as with doing a name check alone.
This is a different use case which requires a different solution. sys.module_names solve some specific use cases (that I listed in my first email). In Python 3.9, you can already check if a module __file__ is in the sysconfig.get_paths()['stdlib'] directory. You don't need to modify Python for that. If you also would like to check if an *extension* module comes from the stdlib, you need to get the "lib-dynload" directory. I failed to find a programmatic way to get this directory, maybe new API would be needed for that. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Mon, Jan 25, 2021 at 2:05 PM Victor Stinner <vstinner@python.org> wrote:
On Mon, Jan 25, 2021 at 7:51 AM Ivan Pozdeev via Python-Dev <
Just _names_? There's a recurring error case when a 3rd-party module
overrides a standard one if it happens to have the same name. If you
filter such a module out, you're shooting yourself in the foot...
Would another use case be to support issuing a warning if a third-party module is imported whose name matches a standard one? A related use case would be to build on this and define a function that accepts an already imported module and return whether it is from the standard library. Unlike,
On Mon, Jan 25, 2021 at 6:37 PM Chris Jerdonek <chris.jerdonek@gmail.com> wrote: python-dev@python.org> wrote: the module_names attribute, this function would reflect the reality of the underlying module, and so not have false positives as with doing a name check alone.
This is a different use case which requires a different solution. sys.module_names solve some specific use cases (that I listed in my first email).
In Python 3.9, you can already check if a module __file__ is in the sysconfig.get_paths()['stdlib'] directory. You don't need to modify Python for that.
But to issue a warning when a standard module is being overridden like I was suggesting, wouldn’t you also need to know whether the name of the module being imported is a standard name, which is what says.module_names provides? —Chris If you also would like to check if an *extension* module comes from
the stdlib, you need to get the "lib-dynload" directory. I failed to find a programmatic way to get this directory, maybe new API would be needed for that.
Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Mon, Jan 25, 2021, at 18:44, Chris Jerdonek wrote:
But to issue a warning when a standard module is being overridden like I was suggesting, wouldn’t you also need to know whether the name of the module being imported is a standard name, which is what says.module_names provides?
I don't think the warning would be only useful for stdlib modules... has any thought been given to warning when a module being imported from the current directory / script directory is the same as an installed package?
On Mon, Jan 25, 2021 at 10:23 PM Random832 <random832@fastmail.com> wrote:
On Mon, Jan 25, 2021, at 18:44, Chris Jerdonek wrote:
But to issue a warning when a standard module is being overridden like I was suggesting, wouldn’t you also need to know whether the name of the module being imported is a standard name, which is what says.module_names provides?
I don't think the warning would be only useful for stdlib modules... has any thought been given to warning when a module being imported from the current directory / script directory is the same as an installed package?
Related to this, I wonder if another application of sys.stdlib_module_names could be for installers: When installing a new package, a warning could be issued if the package is attempting to install a package with a name already in sys.stdlib_module_names. I don't know off-hand what happens if one were to try to do that today.. --Chris
On Mon, 25 Jan 2021 23:05:07 +0100 Victor Stinner <vstinner@python.org> wrote:
This is a different use case which requires a different solution. sys.module_names solve some specific use cases (that I listed in my first email).
In Python 3.9, you can already check if a module __file__ is in the sysconfig.get_paths()['stdlib'] directory. You don't need to modify Python for that.
Is this reliable? What if the stdlib is zipped or frozen in some way?
If you also would like to check if an *extension* module comes from the stdlib, you need to get the "lib-dynload" directory.
So you're saying the need is already fulfilled, even though it only has a cryptic (*) and partial solution? (*) who would think about `sysconfig.get_paths()['stdlib']` on their own? Regards Antoine.
I probably wouldn't think of that on my own, but the need is rare enough that having the recipe in the documentation (preferably including the docstring) might be enough. (Or it might not.)
That's not possible. Stdlib can be arranged any way a user/maintainer wishes (zipped stdlib and virtual environments are just two examples), so there's no way to tell if the module's location is "right". Dowstream changes are also standard practice so there's no way to verify a module's contents, either. As such, there's no way to tell if any given module being imported is a standard or a 3rd-party one. On 25.01.2021 20:33, Chris Jerdonek wrote:
On Mon, Jan 25, 2021 at 7:51 AM Ivan Pozdeev via Python-Dev <python-dev@python.org <mailto:python-dev@python.org>> wrote:
Just _names_? There's a recurring error case when a 3rd-party module overrides a standard one if it happens to have the same name. If you filter such a module out, you're shooting yourself in the foot...
Would another use case be to support issuing a warning if a third-party module is imported whose name matches a standard one? A related use case would be to build on this and define a function that accepts an already imported module and return whether it is from the standard library. Unlike, the module_names attribute, this function would reflect the reality of the underlying module, and so not have false positives as with doing a name check alone.
—Chris
On 25.01.2021 16:03, Victor Stinner wrote: > Hi, > > I just added a new sys.module_names attribute, list (technically a > frozenset) of all stdlib module names: > https://bugs.python.org/issue42955 <https://bugs.python.org/issue42955> > > There are multiple use cases: > > * Group stdlib imports when reformatting a Python file, > * Exclude stdlib imports when computing dependencies. > * Exclude stdlib modules when listing extension modules on crash or > fatal error, only list 3rd party extension (already implemented in > master, see bpo-42923 ;-)). > * Exclude stdlib modules when tracing the execution of a program using > the trace module. > * Detect typo and suggest a fix: ImportError("No module named maths. > Did you mean 'math'?",) (test the nice friendly-traceback project!). > > Example: > >>>> 'asyncio' in sys.module_names > True >>>> 'numpy' in sys.module_names > False > >>>> len(sys.module_names) > 312 >>>> type(sys.module_names) > <class 'frozenset'> > >>>> sorted(sys.module_names)[:10] > ['__future__', '_abc', '_aix_support', '_ast', '_asyncio', '_bisect', > '_blake2', '_bootsubprocess', '_bz2', '_codecs'] >>>> sorted(sys.module_names)[-10:] > ['xml.dom', 'xml.etree', 'xml.parsers', 'xml.sax', 'xmlrpc', 'zipapp', > 'zipfile', 'zipimport', 'zlib', 'zoneinfo'] > > The list is opinionated and defined by its documentation: > > A frozenset of strings containing the names of standard library > modules. > > It is the same on all platforms. Modules which are not available on > some platforms and modules disabled at Python build are also listed. > All module kinds are listed: pure Python, built-in, frozen and > extension modules. Test modules are excluded. > > For packages, only sub-packages are listed, not sub-modules. For > example, ``concurrent`` package and ``concurrent.futures`` > sub-package are listed, but not ``concurrent.futures.base`` > sub-module. > > See also the :attr:`sys.builtin_module_names` list. > > The design (especially, the fact of having the same list on all > platforms) comes from the use cases list above. For example, running > isort should produce the same output on any platform, and not depend > if the Python stdlib was splitted into multiple packages on Linux > (which is done by most popular Linux distributions). > > The list is generated by the Tools/scripts/generate_module_names.py script: > https://github.com/python/cpython/blob/master/Tools/scripts/generate_module_... <https://github.com/python/cpython/blob/master/Tools/scripts/generate_module_names.py> > > When you add a new module, you must run "make regen-module-names, > otherwise a pre-commit check will fail on your PR ;-) The list of > Windows extensions is currently hardcoded in the script (contributions > are welcomed to discover them, since the list is short and evolves > rarely, I didn't feel the need to spend time that on that). > > Currently (Python 3.10.0a4+), there are 312 names in sys.module_names, > stored in Python/module_names.h: > https://github.com/python/cpython/blob/master/Python/module_names.h <https://github.com/python/cpython/blob/master/Python/module_names.h> > > It was decided to include "helper" modules like "_aix_support" which > is used by sysconfig. But test modules like _testcapi are excluded to > make the list shorter (it's rare to run the CPython test suite outside > Python). > > There are 83 private modules, name starting with an underscore > (exclude _abc but also __future__): > >>>> len([name for name in sys.module_names if not name.startswith('_')]) > 229 > > This new attribute may help to define "what is the Python stdlib" ;-) > > Victor
-- Regards, Ivan _______________________________________________ Python-Dev mailing list -- python-dev@python.org <mailto:python-dev@python.org> To unsubscribe send an email to python-dev-leave@python.org <mailto:python-dev-leave@python.org> https://mail.python.org/mailman3/lists/python-dev.python.org/ <https://mail.python.org/mailman3/lists/python-dev.python.org/> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/KCJDHKOK... <https://mail.python.org/archives/list/python-dev@python.org/message/KCJDHKOKCN5343VVA3DC7RAGNUGWNKZY/> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>
-- Regards, Ivan
On Mon, Jan 25, 2021 at 11:22 PM Ivan Pozdeev via Python-Dev <python-dev@python.org> wrote:
That's not possible.
Stdlib can be arranged any way a user/maintainer wishes (zipped stdlib and virtual environments are just two examples), so there's no way to tell if the module's location is "right". Dowstream changes are also standard practice so there's no way to verify a module's contents, either.
As such, there's no way to tell if any given module being imported is a standard or a 3rd-party one.
By the way, IMO it's also a legit use case on an old Python version to override a stdlib module with a patched or more recent version, to get a bugfix for example ;-) Even if it's an uncommon use case, it can solve some practical issues. Victor
Fortunately for, you :) , all this argument is not against the feature per se but only against its use to blindly filter module lists for automated bug reports. On 26.01.2021 1:34, Victor Stinner wrote:
On Mon, Jan 25, 2021 at 11:22 PM Ivan Pozdeev via Python-Dev <python-dev@python.org> wrote:
That's not possible.
Stdlib can be arranged any way a user/maintainer wishes (zipped stdlib and virtual environments are just two examples), so there's no way to tell if the module's location is "right". Dowstream changes are also standard practice so there's no way to verify a module's contents, either.
As such, there's no way to tell if any given module being imported is a standard or a 3rd-party one. By the way, IMO it's also a legit use case on an old Python version to override a stdlib module with a patched or more recent version, to get a bugfix for example ;-) Even if it's an uncommon use case, it can solve some practical issues.
Victor
-- Regards, Ivan
On 1/25/21 5:03 AM, Victor Stinner wrote:
I just added a new sys.module_names attribute, list (technically a frozenset) of all stdlib module names
The list is opinionated and defined by its documentation
For packages, only sub-packages are listed, not sub-modules. For example, ``concurrent`` package and ``concurrent.futures`` sub-package are listed, but not ``concurrent.futures.base`` sub-module.
I'm not sure I understand the above. Is it fair to say that any stdlib module, except for private or test (./Lib/test/*) modules, that can be imported are listed in `sys.module_names`? My confusion stems from being able to import `concurrent.futures` but not `concurrent.futures.base`. -- ~Ethan~
On Mon, Jan 25, 2021 at 6:39 PM Ethan Furman <ethan@stoneleaf.us> wrote:
For packages, only sub-packages are listed, not sub-modules. For example, ``concurrent`` package and ``concurrent.futures`` sub-package are listed, but not ``concurrent.futures.base`` sub-module.
I'm not sure I understand the above. Is it fair to say that any stdlib module, except for private or test (./Lib/test/*) modules,
Private modules are listed: __future__, _abc, _aix_support, etc.
that can be imported are listed in `sys.module_names`? My confusion stems from being able to import `concurrent.futures` but not `concurrent.futures.base`.
For package, I chose to exclude sub-modules just to keep the list short. ~300 items can be displayed and read manually. If you want to check if "asyncio.base_events" is a stdlib module, extract "asyncio" string and check if "asyncio" is part of the list. sys.module_names cannot be used directly if you need to get the exhaustive list of all modules including sub-modules. pkgutil.iter_modules() can be used to list modules of package:
[mod.name for mod in pkgutil.iter_modules(path=asyncio.__path__)] ['__main__', 'base_events', 'base_futures', 'base_subprocess', 'base_tasks', 'constants', 'coroutines', 'events', 'exceptions', 'format_helpers', 'futures', 'locks', 'log', 'mixins', 'proactor_events', 'protocols', 'queues', 'runners', 'selector_events', 'sslproto', 'staggered', 'streams', 'subprocess', 'tasks', 'threads', 'transports', 'trsock', 'unix_events', 'windows_events', 'windows_utils']
One drawback is that if the stdlib would contain packages without __init__.py file, a third party project could add a sub-module to it (ex: inject encodings/myencoding.py in the encodings package). But it seems like all Lib/ sub-directories contain an __init__.py file, so it's not an issue in practice. If we include sub-modules, sys.module_names grows from 312 names to 813 names (2.6x more). Two examples: "collections", +"collections.abc", "concurrent", "concurrent.futures", +"concurrent.futures._base", +"concurrent.futures.process", +"concurrent.futures.thread", Just the encodings package contains 121 sub-modules. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
I see a bunch of similar -- but not quite the same -- use cases. I feel like instead of a set, it should be a dict pointing to an object with attributes that describe the module in various ways (top-level vs subpackage, installed on this machine or not, test module or not, etc). I'll understand if this seems like scope creep, but try not to rule it out as a future enhancement. (e.g., don't promise it will be precisely a set., as opposed to the keys of a map.)
participants (13)
-
Antoine Pitrou
-
Bernat Gabor
-
Brett Cannon
-
Chris Jerdonek
-
Ethan Furman
-
Ivan Pozdeev
-
Jim J. Jewett
-
Random832
-
Senthil Kumaran
-
Steve Dower
-
Steve Holden
-
Steven D'Aprano
-
Victor Stinner