PEP: Distributing a Subset of the Standard Library
Hi! We have written a draft PEP entitled "Distributing a Subset of the Standard Library" that aims to standardize and improve how Python handles omissions from its standard library. This is relevant both to Python itself as well as to Linux and other distributions that are packaging it, as they already separate out parts of the standard library into optionally installable packages. Ideas leading up to this PEP were discussed on the python-dev mailing list: https://mail.python.org/pipermail/python-dev/2016-July/145534.html Rendered PEP: https://fedora-python.github.io/pep-drafts/pep-A.html ------------------------------------------------------------------------ Abstract ======== Python is sometimes being distributed without its full standard library. However, there is as of yet no standardized way of dealing with importing a missing standard library module. This PEP proposes a mechanism for identifying which standard library modules are missing and puts forth a method of how attempts to import a missing standard library module should be handled. Motivation ========== There are several use cases for including only a subset of Python's standard library. However, there is so far no formal specification of how to properly implement distribution of a subset of the standard library. Namely, how to safely handle attempts to import a missing *stdlib* module, and display an informative error message. CPython ------- When one of Python standard library modules (such as ``_sqlite3``) cannot be compiled during a Python build because of missing dependencies (e.g. SQLite header files), the module is simply skipped. If you then install this compiled Python and use it to try to import one of the missing modules, Python will go through the ``sys.path`` entries looking for it. It won't find it among the *stdlib* modules and thus it will continue onto ``site-packages`` and fail with a ModuleNotFoundError_ if it doesn't find it. .. _ModuleNotFoundError: https://docs.python.org/3.7/library/exceptions.html#ModuleNotFoundError This can confuse users who may not understand why a cleanly built Python is missing standard library modules. Linux and other distributions ----------------------------- Many Linux and other distributions are already separating out parts of the standard library to standalone packages. Among the most commonly excluded modules are the ``tkinter`` module, since it draws in a dependency on the graphical environment, and the ``test`` package, as it only serves to test Python internally and is about as big as the rest of the standard library put together. The methods of omission of these modules differ. For example, Debian patches the file ``Lib/tkinter/__init__.py`` to envelop the line ``import _tkinter`` in a *try-except* block and upon encountering an ``ImportError`` it simply adds the following to the error message: ``please install the python3-tk package`` [#debian-patch]_. Fedora and other distributions simply don't include the omitted modules, potentially leaving users baffled as to where to find them. Specification ============= When, for any reason, a standard library module is not to be included with the rest, a file with its name and the extension ``.missing.py`` shall be created and placed in the directory the module itself would have occupied. This file can contain any Python code, however, it *should* raise a ModuleNotFoundError_ with a helpful error message. Currently, when Python tries to import a module ``XYZ``, the ``FileFinder`` path hook goes through the entries in ``sys.path``, and in each location looks for a file whose name is ``XYZ`` with one of the valid suffixes (e.g. ``.so``, ..., ``.py``, ..., ``.pyc``). The suffixes are tried in order. If none of them are found, Python goes on to try the next directory in ``sys.path``. The ``.missing.py`` extension will be added to the end of the list, and configured to be handled by ``SourceFileLoader``. Thus, if a module is not found in its proper location, the ``XYZ.missing.py`` file is found and executed, and further locations are not searched. The CPython build system will be modified to generate ``.missing.py`` files for optional modules that were not built. Rationale ========= The mechanism of handling missing standard library modules through the use of the ``.missing.py`` files was chosen due to its advantages both for CPython itself and for Linux and other distributions that are packaging it. The missing pieces of the standard library can be subsequently installed simply by putting the module files in their appropriate location. They will then take precedence over the corresponding ``.missing.py`` files. This makes installation simple for Linux package managers. This mechanism also solves the minor issue of importing a module from ``site-packages`` with the same name as a missing standard library module. Now, Python will import the ``.missing.py`` file and won't ever look for a *stdlib* module in ``site-packages``. In addition, this method of handling missing *stdlib* modules can be implemented in a succinct, non-intrusive way in CPython, and thus won't add to the complexity of the existing code base. The ``.missing.py`` file can be customized by the packager to provide any desirable behaviour. While we strongly recommend that these files only raise a ModuleNotFoundError_ with an appropriate message, there is no reason to limit customization options. Ideas leading up to this PEP were discussed on the `python-dev mailing list`_. .. _`python-dev mailing list`: https://mail.python.org/pipermail/python-dev/2016-July/145534.html Backwards Compatibility ======================= No problems with backwards compatibility are expected. Distributions that are already patching Python modules to provide custom handling of missing dependencies can continue to do so unhindered. Reference Implementation ======================== Reference implementation can be found on `GitHub`_ and is also accessible in the form of a `patch`_. .. _`GitHub`: https://github.com/torsava/cpython/pull/1 .. _`patch`: https://github.com/torsava/cpython/pull/1.patch References ========== .. [#debian-patch] http://bazaar.launchpad.net/~doko/python/pkg3.5-debian/view/head:/patches/tk... Copyright ========= This document has been placed in the public domain. ------------------------------------------------------------------------ Regards, Tomas Orsava
On Tue, Nov 29, 2016 at 12:28 AM, Tomas Orsava <torsava@redhat.com> wrote:
We have written a draft PEP entitled "Distributing a Subset of the Standard Library" that aims to standardize and improve how Python handles omissions from its standard library. This is relevant both to Python itself as well as to Linux and other distributions that are packaging it, as they already separate out parts of the standard library into optionally installable packages.
Thanks for writing this up! Since you're already working on GitHub, it's probably most straight-forward for you to create a PR against the PEPs repository: https://github.com/python/peps Looks like the next available PEP number is 534. ChrisA PEP editor
On 28 November 2016 at 13:28, Tomas Orsava <torsava@redhat.com> wrote:
The ``.missing.py`` extension will be added to the end of the list, and configured to be handled by ``SourceFileLoader``. Thus, if a module is not found in its proper location, the ``XYZ.missing.py`` file is found and executed, and further locations are not searched.
Am I right to think that if a user had a file tkinter.missing.py in the current directory, then they'd get that in preference to the stdlib tkinter? Obviously this is no different from having a tkinter.py file in that directory, so it's not like this is a major problem, but it might be worth pointing out this minor incompatibility. Also, and possibly more of an issue, use of the ".missing.py" file will mean that a user can't provide their own implementation of the module later on sys.path. I don'rt know if this is a significant issue on Unix platforms. On Windows, there is a 3rd party implementation of the curses module which (as I understand it) can be user installed. If Python included a curses.missing.py, that would no longer work. Certainly these are only minor points, but worth considering. Paul
On 11/28/2016 03:32 PM, Paul Moore wrote:
The ``.missing.py`` extension will be added to the end of the list, and configured to be handled by ``SourceFileLoader``. Thus, if a module is not found in its proper location, the ``XYZ.missing.py`` file is found and executed, and further locations are not searched. Am I right to think that if a user had a file tkinter.missing.py in
On 28 November 2016 at 13:28, Tomas Orsava <torsava@redhat.com> wrote: the current directory, then they'd get that in preference to the stdlib tkinter? Obviously this is no different from having a tkinter.py file in that directory, so it's not like this is a major problem, but it might be worth pointing out this minor incompatibility.
Correct, both tkinter.py and tkinter.missing.py in the current directory will take precedence. I will note this in the backwards compatibility section.
Also, and possibly more of an issue, use of the ".missing.py" file will mean that a user can't provide their own implementation of the module later on sys.path. I don'rt know if this is a significant issue on Unix platforms. On Windows, there is a 3rd party implementation of the curses module which (as I understand it) can be user installed. If Python included a curses.missing.py, that would no longer work.
Certainly these are only minor points, but worth considering.
I believe I may have found the Windows curses implementation, it's called PDCurses [0], and this website [1] appears to be distributing it under the name `curses`. Could some Windows user please check if compiling Python with the current reference implementation [2] of this PEP indeed generates a `curses.missing.py` file among the stdlib files? If so, we might consider skipping the generation of the .missing.py file for the curses module on Windows. [0] http://pdcurses.sourceforge.net/ [1] http://www.lfd.uci.edu/~gohlke/pythonlibs/#curses [2] https://www.python.org/dev/peps/pep-0534/#reference-implementation Thank you for the feedback!
On 28 November 2016 at 15:51, Tomas Orsava <torsava@redhat.com> wrote:
I believe I may have found the Windows curses implementation, it's called PDCurses [0], and this website [1] appears to be distributing it under the name `curses`.
My apologies, I should have included a pointer. That is indeed the distribution I was thinking of.
Could some Windows user please check if compiling Python with the current reference implementation [2] of this PEP indeed generates a `curses.missing.py` file among the stdlib files? If so, we might consider skipping the generation of the .missing.py file for the curses module on Windows.
I'll see if I can make some time to do the test. But as the change is to setup.py, and the Windows build uses Visual Studio project files to do the build, I expect that it won't generate missing.py files on Windows. In actual fact, that may be the simplest solution, to note that the build part of this change is restricted to Unix (non-Windows) platforms specifically. As there's no real concept of a "distribution version" of Python on Windows, it's probably not something that will be that important on that platform (and support for .missing.py files is there, it would just be necessary for distributors to manually create those files as needed). Paul
Overall I think this is a good idea. I have one hit: It seems that there are two possible strategies for searching the .missing.py file: 1. (Currently in the PEP) search it at the same time as the .py file when walking along sys.path. - Pro: prevents confusion when the user accidentally has their own matching file later in sys.path. - Con: prevents the user from installing a matching file intentionally (e.g. a 3rd party version). 2. After exhausting sys.path, search it again just for .missing.py files (or perhaps remember the location of the .missing.py file during the first search but don't act immediately on it -- this has the same effect). - Pro: allows user to install their own version. - Con: if the user has a matching file by accident, that file will be imported, causing more confusion. I personally would weigh these so as to prefer (2). The option of installing your own version when the standard version doesn't exist seems reasonable; there may be reasons that you can't or don't want to install the distribution's version. I don't worry much about the danger of accidental name conflicts (have you ever seen this?). --Guido On Mon, Nov 28, 2016 at 8:13 AM, Paul Moore <p.f.moore@gmail.com> wrote:
I believe I may have found the Windows curses implementation, it's called PDCurses [0], and this website [1] appears to be distributing it under
On 28 November 2016 at 15:51, Tomas Orsava <torsava@redhat.com> wrote: the
name `curses`.
My apologies, I should have included a pointer. That is indeed the distribution I was thinking of.
Could some Windows user please check if compiling Python with the current reference implementation [2] of this PEP indeed generates a `curses.missing.py` file among the stdlib files? If so, we might consider skipping the generation of the .missing.py file for the curses module on Windows.
I'll see if I can make some time to do the test. But as the change is to setup.py, and the Windows build uses Visual Studio project files to do the build, I expect that it won't generate missing.py files on Windows. In actual fact, that may be the simplest solution, to note that the build part of this change is restricted to Unix (non-Windows) platforms specifically. As there's no real concept of a "distribution version" of Python on Windows, it's probably not something that will be that important on that platform (and support for .missing.py files is there, it would just be necessary for distributors to manually create those files as needed).
Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
I would also prefer (2) for exactly the example given in this thread. The Windows version of curses.missing.py could raise a ModuleNotFoundError saying that curses is not available on Windows, but a developer who wants to can install PDCurses to implement the stdlib module. I don't think the few cases where an stdlib package is both missing and has a name collision with an incompatible module are enough to outweigh the benefits of being able to install a third-party package to implement a missing part of the stdlib Alex On 2016-11-28 11:38 AM, Guido van Rossum wrote:
Overall I think this is a good idea. I have one hit:
It seems that there are two possible strategies for searching the .missing.py file:
1. (Currently in the PEP) search it at the same time as the .py file when walking along sys.path. - Pro: prevents confusion when the user accidentally has their own matching file later in sys.path. - Con: prevents the user from installing a matching file intentionally (e.g. a 3rd party version).
2. After exhausting sys.path, search it again just for .missing.py files (or perhaps remember the location of the .missing.py file during the first search but don't act immediately on it -- this has the same effect). - Pro: allows user to install their own version. - Con: if the user has a matching file by accident, that file will be imported, causing more confusion.
I personally would weigh these so as to prefer (2). The option of installing your own version when the standard version doesn't exist seems reasonable; there may be reasons that you can't or don't want to install the distribution's version. I don't worry much about the danger of accidental name conflicts (have you ever seen this?).
--Guido
On Mon, Nov 28, 2016 at 8:13 AM, Paul Moore <p.f.moore@gmail.com <mailto:p.f.moore@gmail.com>> wrote:
On 28 November 2016 at 15:51, Tomas Orsava <torsava@redhat.com <mailto:torsava@redhat.com>> wrote: > I believe I may have found the Windows curses implementation, it's called > PDCurses [0], and this website [1] appears to be distributing it under the > name `curses`.
My apologies, I should have included a pointer. That is indeed the distribution I was thinking of.
> Could some Windows user please check if compiling Python with the current > reference implementation [2] of this PEP indeed generates a > `curses.missing.py <http://curses.missing.py>` file among the stdlib files? If so, we might consider > skipping the generation of the .missing.py file for the curses module on > Windows.
I'll see if I can make some time to do the test. But as the change is to setup.py, and the Windows build uses Visual Studio project files to do the build, I expect that it won't generate missing.py files on Windows. In actual fact, that may be the simplest solution, to note that the build part of this change is restricted to Unix (non-Windows) platforms specifically. As there's no real concept of a "distribution version" of Python on Windows, it's probably not something that will be that important on that platform (and support for .missing.py files is there, it would just be necessary for distributors to manually create those files as needed).
Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org <mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas <https://mail.python.org/mailman/listinfo/python-ideas> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>
-- --Guido van Rossum (python.org/~guido <http://python.org/%7Eguido>)
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Nov 28, 2016 8:38 AM, "Guido van Rossum" <guido@python.org> wrote:
Overall I think this is a good idea. I have one hit:
It seems that there are two possible strategies for searching the
.missing.py file:
1. (Currently in the PEP) search it at the same time as the .py file when
- Pro: prevents confusion when the user accidentally has their own matching file later in sys.path. - Con: prevents the user from installing a matching file intentionally (e.g. a 3rd party version).
2. After exhausting sys.path, search it again just for .missing.py files (or perhaps remember the location of the .missing.py file during the first search but don't act immediately on it -- this has the same effect). - Pro: allows user to install their own version. - Con: if the user has a matching file by accident, that file will be imported, causing more confusion.
I personally would weigh these so as to prefer (2). The option of installing your own version when the standard version doesn't exist seems reasonable; there may be reasons that you can't or don't want to install
walking along sys.path. the distribution's version. I don't worry much about the danger of accidental name conflicts (have you ever seen this?). I was going to make a similar comment, because it seems to me that it could make sense for a redistributor to want to move some bits of the stdlib into wheels, and this is most conveniently handled by letting bits of the stdlib live in site-packages. Also note that in Guido's option 2, we only incur the extra fstat calls if the import would otherwise fail. In option 1, there are extra fstat calls (and thus disk seeks etc.) adding some overhead to every import. -n
On Mon, Nov 28, 2016 at 9:14 AM, Nathaniel Smith <njs@pobox.com> wrote:
Also note that in Guido's option 2, we only incur the extra fstat calls if the import would otherwise fail. In option 1, there are extra fstat calls (and thus disk seeks etc.) adding some overhead to every import.
Oh, that's an important consideration! Yes, adding .missing.py to the list of extensions would cause extra stat() calls and potentially slow down every import. -- --Guido van Rossum (python.org/~guido)
On 29 November 2016 at 03:28, Guido van Rossum <guido@python.org> wrote:
On Mon, Nov 28, 2016 at 9:14 AM, Nathaniel Smith <njs@pobox.com> wrote:
Also note that in Guido's option 2, we only incur the extra fstat calls if the import would otherwise fail. In option 1, there are extra fstat calls (and thus disk seeks etc.) adding some overhead to every import.
Oh, that's an important consideration! Yes, adding .missing.py to the list of extensions would cause extra stat() calls and potentially slow down every import.
This is the second time I've seen "but stat calls!" concern in relation to import today, so I'll echo what Brett pointed out in his reply: the import system in recent 3.x releases, along with the importlib2 backport to Python 2.7, builds a cache of the directory contents for path entries rather than making multiple stat calls. The current 3.x source code for that is at https://hg.python.org/cpython/file/tip/Lib/importlib/_bootstrap_external.py#... The significant reduction in the number of stat calls through better caching is the main way the 3.3 import reimplementation in Python managed to be competitive performance-wise with the previous C implementation, and faster when importing from a network filesystem :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 11/28/2016 11:38 AM, Guido van Rossum wrote:
Overall I think this is a good idea. I have one hit:
It seems that there are two possible strategies for searching the .missing.py file:
1. (Currently in the PEP) search it at the same time as the .py file when walking along sys.path. - Pro: prevents confusion when the user accidentally has their own matching file later in sys.path. - Con: prevents the user from installing a matching file intentionally (e.g. a 3rd party version).
2. After exhausting sys.path, search it again just for .missing.py files (or perhaps remember the location of the .missing.py file during the first search but don't act immediately on it -- this has the same effect). - Pro: allows user to install their own version.
The Windows distribution. for instance, could have a mod.missing.py file for every non-Windows, unix-only module. This would cut down on 'Why did this import fail?' questions on SO and python-list. And without breaking anything. Even for non-beginners, it would save having to look up whether the an import failure is inherent on the platform or due to a typo.
- Con: if the user has a matching file by accident, that file will be imported, causing more confusion.
I personally would weigh these so as to prefer (2). The option of installing your own version when the standard version doesn't exist seems reasonable; there may be reasons that you can't or don't want to install the distribution's version. I don't worry much about the danger of accidental name conflicts (have you ever seen this?).
The accidental conflict reports I have seen were due to scripts in as in the same directory as the program, rather than modules in site-packages. -- Terry Jan Reedy
On Mon, Nov 28, 2016 at 11:54 AM, Terry Reedy <tjreedy@udel.edu> wrote:
On 11/28/2016 11:38 AM, Guido van Rossum wrote:
Overall I think this is a good idea. I have one hit:
It seems that there are two possible strategies for searching the .missing.py file:
1. (Currently in the PEP) search it at the same time as the .py file when walking along sys.path. - Pro: prevents confusion when the user accidentally has their own matching file later in sys.path. - Con: prevents the user from installing a matching file intentionally (e.g. a 3rd party version).
2. After exhausting sys.path, search it again just for .missing.py files (or perhaps remember the location of the .missing.py file during the first search but don't act immediately on it -- this has the same effect). - Pro: allows user to install their own version.
The Windows distribution. for instance, could have a mod.missing.py file for every non-Windows, unix-only module. This would cut down on 'Why did this import fail?' questions on SO and python-list. And without breaking anything. Even for non-beginners, it would save having to look up whether the an import failure is inherent on the platform or due to a typo.
Yes, this is why I like the proposal (but it would still have this benefit with option (2)).
- Con: if the user has a matching file by accident, that file will be
imported, causing more confusion.
I personally would weigh these so as to prefer (2). The option of installing your own version when the standard version doesn't exist seems reasonable; there may be reasons that you can't or don't want to install the distribution's version. I don't worry much about the danger of accidental name conflicts (have you ever seen this?).
The accidental conflict reports I have seen were due to scripts in as in the same directory as the program, rather than modules in site-packages.
Yes we get those occasionally -- but do those ever happen specifically for stdlib module names that aren't installed because the platform doesn't support them or requires them to be installed separately? I.e. would the .missing.py feature actually have benefitted those reports? -- --Guido van Rossum (python.org/~guido)
On 11/28/2016 08:38 AM, Guido van Rossum wrote:
2. After exhausting sys.path, search it again just for .missing.py files (or perhaps remember the location of the .missing.py file during the first search but don't act immediately on it -- this has the same effect). - Pro: allows user to install their own version. - Con: if the user has a matching file by accident, that file will be imported, causing more confusion.
I personally would weigh these so as to prefer (2). The option of installing your own version when the standard version doesn't exist seems reasonable; there may be reasons that you can't or don't want to install the distribution's version.
I also agree that (2) is the better option. General Python philosophy seems to be to not prohibit actions unless there is a *really* good reason (i.e. sum()ing strings), and option (1) would require installing third-party modules into the stdlib to workaround the prohibition. -- ~Ethan~
On 11/28/2016 05:38 PM, Guido van Rossum wrote:
Overall I think this is a good idea. I have one hit:
It seems that there are two possible strategies for searching the .missing.py file:
1. (Currently in the PEP) search it at the same time as the .py file when walking along sys.path. - Pro: prevents confusion when the user accidentally has their own matching file later in sys.path. - Con: prevents the user from installing a matching file intentionally (e.g. a 3rd party version).
2. After exhausting sys.path, search it again just for .missing.py files (or perhaps remember the location of the .missing.py file during the first search but don't act immediately on it -- this has the same effect). - Pro: allows user to install their own version. - Con: if the user has a matching file by accident, that file will be imported, causing more confusion.
I personally would weigh these so as to prefer (2). The option of installing your own version when the standard version doesn't exist seems reasonable; there may be reasons that you can't or don't want to install the distribution's version. I don't worry much about the danger of accidental name conflicts (have you ever seen this?).
--Guido
Solution (2) is a very good alternative and can be implemented using a metapath hook as Steve proposed elsewhere in this thread [0]. We considered a similar metapath hook when designing the PEP, but decided against it, to better match the current behavior of third-party modules not being able to replace parts of stdlib. Note that as Brett says elsewhere in the thread, due to caching there would be no extra stat() calls in the usual case. On the other hand, we aren't familiar with Windows, where replacing missing stdlib modules seems to be standard practice. Thanks for letting us know. With a metapath hook, .missing.py files are probably overkill, and the hook can just look at one file (or a static compiled-in list) of ModuleNotFound/ImportError messages for all missing modules, as M.-A. Lemburg and others are suggesting. We'll just need to think about coordinating how the list is generated/updated: the current PEP implicitly allows other parties, besides Python and the distributors, to step in cleanly if they need to—needing to update a single list could lead to messy hacks. We'll update the PEP to go with solution (2). [0] https://mail.python.org/pipermail/python-ideas/2016-November/043837.html Tomas Orsava
On Mon, Nov 28, 2016 at 8:13 AM, Paul Moore <p.f.moore@gmail.com <mailto:p.f.moore@gmail.com>> wrote:
On 28 November 2016 at 15:51, Tomas Orsava <torsava@redhat.com <mailto:torsava@redhat.com>> wrote: > I believe I may have found the Windows curses implementation, it's called > PDCurses [0], and this website [1] appears to be distributing it under the > name `curses`.
My apologies, I should have included a pointer. That is indeed the distribution I was thinking of.
> Could some Windows user please check if compiling Python with the current > reference implementation [2] of this PEP indeed generates a > `curses.missing.py <http://curses.missing.py>` file among the stdlib files? If so, we might consider > skipping the generation of the .missing.py file for the curses module on > Windows.
I'll see if I can make some time to do the test. But as the change is to setup.py, and the Windows build uses Visual Studio project files to do the build, I expect that it won't generate missing.py files on Windows. In actual fact, that may be the simplest solution, to note that the build part of this change is restricted to Unix (non-Windows) platforms specifically. As there's no real concept of a "distribution version" of Python on Windows, it's probably not something that will be that important on that platform (and support for .missing.py files is there, it would just be necessary for distributors to manually create those files as needed).
Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org <mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas <https://mail.python.org/mailman/listinfo/python-ideas> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>
-- --Guido van Rossum (python.org/~guido <http://python.org/%7Eguido>)
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 29 November 2016 at 20:54, Tomas Orsava <torsava@redhat.com> wrote:
With a metapath hook, .missing.py files are probably overkill, and the hook can just look at one file (or a static compiled-in list) of ModuleNotFound/ImportError messages for all missing modules, as M.-A. Lemburg and others are suggesting. We'll just need to think about coordinating how the list is generated/updated: the current PEP implicitly allows other parties, besides Python and the distributors, to step in cleanly if they need to—needing to update a single list could lead to messy hacks.
What if, rather than using an explicitly file-based solution, this was instead defined as a new protocol module, where the new metapath hook imported a "__missing__" module and called a particular function in it (e.g. "__missing__.module_not_found(modname)")? The default missing module implementation hook would just handle CPython's optional modules, but redistributors could patch it to use a mechanism that made sense for them. For example, if we ever get to the point where the Fedora RPM database includes "Provides: pythonXYimport(module.of.interest)" data in addition to "Provides: pythonXYdist(pypi-package-name)" , the right system package to import could be reported for any module, not just standard library ones that have been split out (with the trade-off being that any such checks would make optional imports a bit slower to fail, but that could be mitigated in various ways). Specific applications could also implement their own missing module handling by providing a __missing__.py file alongside their __main__.py, and relying on directory and/or zipfile execution, or else by monkeypatching the __missing__ module at runtime. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, 29 Nov 2016 at 06:49 Nick Coghlan <ncoghlan@gmail.com> wrote:
On 29 November 2016 at 20:54, Tomas Orsava <torsava@redhat.com> wrote:
With a metapath hook, .missing.py files are probably overkill, and the hook can just look at one file (or a static compiled-in list) of ModuleNotFound/ImportError messages for all missing modules, as M.-A. Lemburg and others are suggesting. We'll just need to think about coordinating how the list is generated/updated: the current PEP implicitly allows other parties, besides Python and the distributors, to step in cleanly if they need to—needing to update a single list could lead to messy hacks.
What if, rather than using an explicitly file-based solution, this was instead defined as a new protocol module, where the new metapath hook imported a "__missing__" module and called a particular function in it (e.g. "__missing__.module_not_found(modname)")?
You can answer this question the best, Nick, but would it be worth defining a _stdlib.py that acts as both a marker for where the stdlib is installed -- instead of os.py which is the current marker -- and which also stores metadata like an attribute called `missing` which is a dict that maps modules to ModuleNotFoundError messages? Although maybe this is too specific of a solution (or still too general and we use an e.g. missing.json off of sys.path which contains the same mapping). Otherwise MAL touched on the solution I always had in the back of my head where we let people register a callback that gets passed the name of any module that wasn't found through sys.meta_path. We could either have the return value mean nothing and by default raise ModuleNotFoundError, have the return value be what to set the module to and raise an exception as expected, or have it be more error-specific and return an exception to raise (all of these options also ask whether a default callback doing what is normal is provided or if it's None by default and import continues to provide the default semantics). The perk of the callback is it removes the order sensitivity of any sys.meta_path or sys.path_hooks solution where people might be doing sys.meta_path.append(custom_finder) and thus won't necessarily trigger if a new hook for missing modules is put at the end by default. My personal vote is a callback called at https://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap.py#L9... with a default implementation that raises ModuleNotFoundError just like the current line does.
On 11/29/2016 1:33 PM, Brett Cannon wrote:
On Tue, 29 Nov 2016 at 06:49 Nick Coghlan <ncoghlan@gmail.com <mailto:ncoghlan@gmail.com>> wrote:
On 29 November 2016 at 20:54, Tomas Orsava <torsava@redhat.com <mailto:torsava@redhat.com>> wrote: > With a metapath hook, .missing.py files are probably overkill, and the hook > can just look at one file (or a static compiled-in list) of > ModuleNotFound/ImportError messages for all missing modules, as M.-A. > Lemburg and others are suggesting. We'll just need to think about > coordinating how the list is generated/updated: the current PEP implicitly > allows other parties, besides Python and the distributors, to step in > cleanly if they need to—needing to update a single list could lead to messy > hacks.
What if, rather than using an explicitly file-based solution, this was instead defined as a new protocol module, where the new metapath hook imported a "__missing__" module and called a particular function in it (e.g. "__missing__.module_not_found(modname)")?
You can answer this question the best, Nick, but would it be worth defining a _stdlib.py that acts as both a marker for where the stdlib is installed -- instead of os.py which is the current marker -- and which also stores metadata like an attribute called `missing` which is a dict that maps modules to ModuleNotFoundError messages? Although maybe this is too specific of a solution (or still too general and we use an e.g. missing.json off of sys.path which contains the same mapping).
Otherwise MAL touched on the solution I always had in the back of my head where we let people register a callback that gets passed the name of any module that wasn't found through sys.meta_path. We could either have the return value mean nothing and by default raise ModuleNotFoundError, have the return value be what to set the module to and raise an exception as expected, or have it be more error-specific and return an exception to raise (all of these options also ask whether a default callback doing what is normal is provided or if it's None by default and import continues to provide the default semantics). The perk of the callback is it removes the order sensitivity of any sys.meta_path or sys.path_hooks solution where people might be doing sys.meta_path.append(custom_finder) and thus won't necessarily trigger if a new hook for missing modules is put at the end by default.
How about having a sys.meta_path_last_chance (or whatever), which is identical to anything else on sys.meta_path, but is always guaranteed to be called last. Then instead of: meta_path = sys.meta_path it would be: meta_path = sys.meta_path + ([sys.meta_path_last_chance] if sys.meta_path_last_chance else []) There's no need to invent a new callback signature, or any new logic anywhere: it's literally just another metapath importer, but always guaranteed to be last. Eric.
On 30 November 2016 at 04:33, Brett Cannon <brett@python.org> wrote:
On Tue, 29 Nov 2016 at 06:49 Nick Coghlan <ncoghlan@gmail.com> wrote:
On 29 November 2016 at 20:54, Tomas Orsava <torsava@redhat.com> wrote:
With a metapath hook, .missing.py files are probably overkill, and the hook can just look at one file (or a static compiled-in list) of ModuleNotFound/ImportError messages for all missing modules, as M.-A. Lemburg and others are suggesting. We'll just need to think about coordinating how the list is generated/updated: the current PEP implicitly allows other parties, besides Python and the distributors, to step in cleanly if they need to—needing to update a single list could lead to messy hacks.
What if, rather than using an explicitly file-based solution, this was instead defined as a new protocol module, where the new metapath hook imported a "__missing__" module and called a particular function in it (e.g. "__missing__.module_not_found(modname)")?
You can answer this question the best, Nick, but would it be worth defining a _stdlib.py that acts as both a marker for where the stdlib is installed -- instead of os.py which is the current marker -- and which also stores metadata like an attribute called `missing` which is a dict that maps modules to ModuleNotFoundError messages? Although maybe this is too specific of a solution (or still too general and we use an e.g. missing.json off of sys.path which contains the same mapping).
Really, I think the ideal solution from a distro perspective would be to enable something closer to what bash and other shells support for failed CLI calls: $ blender bash: blender: command not found... Install package 'blender' to provide command 'blender'? [N/y] n This would allow redistributors to point folks towards platform packages (via apt/yum/dnf/PyPM/conda/Canopy/etc) for the components they provide, and towards pip/PyPI for everything else (and while we don't have a dist-lookup-by-module-name service for PyPI *today*, it's something I hope we'll find a way to provide sometime in the next few years). I didn't suggest that during the Fedora-level discussions of this PEP because it didn't occur to me - the elegant simplicity of the new import suffix as a tactical solution to the immediate "splitting the standard library" problem [1] meant I missed that it was really a special case of the general "provide guidance on obtaining missing modules from the system package manager" concept. The problem with that idea however is that while it provides the best possible interactive user experience, it's potentially really slow, and hence too expensive to do for every import error - we would instead need to find a way to run with Wolfgang Maier's suggestion of only doing this for *unhandled* import errors. Fortunately, we do have the appropriate mechanisms in place to support that approach: 1. For interactive use, we have sys.excepthook 2. For non-interactive use, we have the atexit module As a simple example of the former: >>> def module_missing(modname): ... return f"Module not found: {modname}" >>> def my_except_hook(exc_type, exc_value, exc_tb): ... if isinstance(exc_value, ModuleNotFoundError): ... print(module_missing(exc_value.name)) ... >>> sys.excepthook = my_except_hook >>> import foo Module not found: foo >>> import foo.bar Module not found: foo >>> import sys.bar Module not found: sys.bar For the atexit handler, that could be installed by the `site` module, so the existing mechanisms for disabling site module processing would also disable any default exception reporting hooks. Folks could also register their own handlers via either `sitecustomize.py` or `usercustomize.py`. And at that point the problem starts looking less like "Customise the handling of missing modules" and more like "Customise the rendering and reporting of particular types of unhandled exceptions". For example, a custom handler for subprocess.CalledProcessError could introspect the original command and use `shutil.which` to see if the requested command was even visible from the current process (and, in a redistributor provided Python, indicate which system packages to install to obtain the requested command).
My personal vote is a callback called at https://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap.py#L9... with a default implementation that raises ModuleNotFoundError just like the current line does.
Ethan's observation about try/except import chains has got me think that limiting this to handling errors within the context of single import statement will be problematic, especially given that folks can already write their own metapath hook for that case if they really want to. Cheers, Nick. [1] For folks wondering "This problem has existed for years, why suddenly worry about it now?", Fedora's in the process of splitting out an even more restricted subset of the standard library for system tools to use: https://fedoraproject.org/wiki/Changes/System_Python That means "You're relying on a missing stdlib module" is going to come up more often for system tools developers trying to stick within that restricted subset. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 11/30/2016 03:56 AM, Nick Coghlan wrote:
Really, I think the ideal solution from a distro perspective would be to enable something closer to what bash and other shells support for failed CLI calls:
$ blender bash: blender: command not found... Install package 'blender' to provide command 'blender'? [N/y] n
This would allow redistributors to point folks towards platform packages (via apt/yum/dnf/PyPM/conda/Canopy/etc) for the components they provide, and towards pip/PyPI for everything else (and while we don't have a dist-lookup-by-module-name service for PyPI *today*, it's something I hope we'll find a way to provide sometime in the next few years).
I didn't suggest that during the Fedora-level discussions of this PEP because it didn't occur to me - the elegant simplicity of the new import suffix as a tactical solution to the immediate "splitting the standard library" problem [1] meant I missed that it was really a special case of the general "provide guidance on obtaining missing modules from the system package manager" concept.
The problem with that idea however is that while it provides the best possible interactive user experience, it's potentially really slow, and hence too expensive to do for every import error - we would instead need to find a way to run with Wolfgang Maier's suggestion of only doing this for *unhandled* import errors.
Fortunately, we do have the appropriate mechanisms in place to support that approach:
1. For interactive use, we have sys.excepthook 2. For non-interactive use, we have the atexit module
As a simple example of the former:
>>> def module_missing(modname): ... return f"Module not found: {modname}" >>> def my_except_hook(exc_type, exc_value, exc_tb): ... if isinstance(exc_value, ModuleNotFoundError): ... print(module_missing(exc_value.name)) ... >>> sys.excepthook = my_except_hook >>> import foo Module not found: foo >>> import foo.bar Module not found: foo >>> import sys.bar Module not found: sys.bar
For the atexit handler, that could be installed by the `site` module, so the existing mechanisms for disabling site module processing would also disable any default exception reporting hooks. Folks could also register their own handlers via either `sitecustomize.py` or `usercustomize.py`.
Is there some reason not to use sys.excepthook for both interactive and non-interactive use? From the docs: "When an exception is raised and uncaught, the interpreter calls|sys.excepthook|with three arguments, the exception class, exception instance, and a traceback object. In an interactive session this happens just before control is returned to the prompt; in a Python program this happens just before the program exits. The handling of such top-level exceptions can be customized by assigning another three-argument function to|sys.excepthook|." Though I believe the default sys.excepthook function is currently written in C, so it wouldn't be very easy for distributors to customize it. Maybe it could be made to read module=error_message pairs from some external file, which would be easier to modify? Yours aye, Tomas
And at that point the problem starts looking less like "Customise the handling of missing modules" and more like "Customise the rendering and reporting of particular types of unhandled exceptions". For example, a custom handler for subprocess.CalledProcessError could introspect the original command and use `shutil.which` to see if the requested command was even visible from the current process (and, in a redistributor provided Python, indicate which system packages to install to obtain the requested command).
My personal vote is a callback called at https://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap.py#L9... with a default implementation that raises ModuleNotFoundError just like the current line does. Ethan's observation about try/except import chains has got me think that limiting this to handling errors within the context of single import statement will be problematic, especially given that folks can already write their own metapath hook for that case if they really want to.
Cheers, Nick.
[1] For folks wondering "This problem has existed for years, why suddenly worry about it now?", Fedora's in the process of splitting out an even more restricted subset of the standard library for system tools to use: https://fedoraproject.org/wiki/Changes/System_Python
That means "You're relying on a missing stdlib module" is going to come up more often for system tools developers trying to stick within that restricted subset.
On 3 December 2016 at 02:56, Tomas Orsava <torsava@redhat.com> wrote:
Is there some reason not to use sys.excepthook for both interactive and non-interactive use? From the docs:
"When an exception is raised and uncaught, the interpreter calls sys.excepthook with three arguments, the exception class, exception instance, and a traceback object. In an interactive session this happens just before control is returned to the prompt; in a Python program this happens just before the program exits. The handling of such top-level exceptions can be customized by assigning another three-argument function to sys.excepthook."
No, that was just me forgetting that sys.excepthook was also called for unhandled exceptions in non-interactive mode. It further strengthens the argument for seeing how far we can get with just the flexibility CPython already provides, though.
Though I believe the default sys.excepthook function is currently written in C, so it wouldn't be very easy for distributors to customize it. Maybe it could be made to read module=error_message pairs from some external file, which would be easier to modify?
The default implementation is written in C, but distributors could patch site.py to replace it with a custom one written in Python. For example, publish a "fedora-hooks" module to PyPI (so non-system Python installations or applications regularly run without the site module can readily use the same hooks if they choose to do so), and then patch site.py in the system Python to do: import fedora_hooks fedora_hooks.install_excepthook() The nice thing about that approach is it wouldn't need a new switch to turn it off - it would get turned off with all the other site-specific customisations when -S or -I is used. It would also better open things up to redistributor experimentation in existing releases (2.7, 3.5, etc) before we commit to a specific approach in the reference interpreter (such as adding an optional 'platform.hooks' submodule that vendors may provide, and relevant stdlib APIs will then call automatically to override the default upstream provided processing). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 12/03/2016 05:08 AM, Nick Coghlan wrote:
Though I believe the default sys.excepthook function is currently written in C, so it wouldn't be very easy for distributors to customize it. Maybe it could be made to read module=error_message pairs from some external file, which would be easier to modify? The default implementation is written in C, but distributors could patch site.py to replace it with a custom one written in Python. For example, publish a "fedora-hooks" module to PyPI (so non-system Python installations or applications regularly run without the site module can readily use the same hooks if they choose to do so), and then patch site.py in the system Python to do:
import fedora_hooks fedora_hooks.install_excepthook()
The nice thing about that approach is it wouldn't need a new switch to turn it off - it would get turned off with all the other site-specific customisations when -S or -I is used. It would also better open things up to redistributor experimentation in existing releases (2.7, 3.5, etc) before we commit to a specific approach in the reference interpreter (such as adding an optional 'platform.hooks' submodule that vendors may provide, and relevant stdlib APIs will then call automatically to override the default upstream provided processing).
Ah, but of course! That leaves us with only one part of the PEP unresolved: When the build process is unable to compile some modules when building Python from source (such as _sqlite3 due to missing sqlite headers), it would be great to provide a custom message when one then tries to import such module when using the compiled Python. Do you see a 'pretty' solution for that within this framework? Yours aye, Tomas
On 5 December 2016 at 19:56, Tomas Orsava <torsava@redhat.com> wrote:
On 12/03/2016 05:08 AM, Nick Coghlan wrote:
Though I believe the default sys.excepthook function is currently written in C, so it wouldn't be very easy for distributors to customize it. Maybe it could be made to read module=error_message pairs from some external file, which would be easier to modify?
The default implementation is written in C, but distributors could patch site.py to replace it with a custom one written in Python. For example, publish a "fedora-hooks" module to PyPI (so non-system Python installations or applications regularly run without the site module can readily use the same hooks if they choose to do so), and then patch site.py in the system Python to do:
import fedora_hooks fedora_hooks.install_excepthook()
The nice thing about that approach is it wouldn't need a new switch to turn it off - it would get turned off with all the other site-specific customisations when -S or -I is used. It would also better open things up to redistributor experimentation in existing releases (2.7, 3.5, etc) before we commit to a specific approach in the reference interpreter (such as adding an optional 'platform.hooks' submodule that vendors may provide, and relevant stdlib APIs will then call automatically to override the default upstream provided processing).
Ah, but of course! That leaves us with only one part of the PEP unresolved: When the build process is unable to compile some modules when building Python from source (such as _sqlite3 due to missing sqlite headers), it would be great to provide a custom message when one then tries to import such module when using the compiled Python.
Do you see a 'pretty' solution for that within this framework?
I'm not sure it qualifies as 'pretty', but one approach would be to have a './Modules/missing/' directory that gets pre-populated with checked in "<name>.py" files for extension modules that aren't always built. When getpath.c detects it's running from a development checkout, it would add that directory to sys.path (just before site-packages), while 'make install' and 'make altinstall' would only copy files from that directory into the installation target if the corresponding extension modules were missing. Essentially, that would be the "name.missing.py" part of the draft proposal for optional standard library modules, just with a regular "name.py" module name and a tweak to getpath.c. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 12/05/2016 01:42 PM, Nick Coghlan wrote:
On 5 December 2016 at 19:56, Tomas Orsava <torsava@redhat.com> wrote:
On 12/03/2016 05:08 AM, Nick Coghlan wrote:
Though I believe the default sys.excepthook function is currently written in C, so it wouldn't be very easy for distributors to customize it. Maybe it could be made to read module=error_message pairs from some external file, which would be easier to modify? The default implementation is written in C, but distributors could patch site.py to replace it with a custom one written in Python. For example, publish a "fedora-hooks" module to PyPI (so non-system Python installations or applications regularly run without the site module can readily use the same hooks if they choose to do so), and then patch site.py in the system Python to do:
import fedora_hooks fedora_hooks.install_excepthook()
The nice thing about that approach is it wouldn't need a new switch to turn it off - it would get turned off with all the other site-specific customisations when -S or -I is used. It would also better open things up to redistributor experimentation in existing releases (2.7, 3.5, etc) before we commit to a specific approach in the reference interpreter (such as adding an optional 'platform.hooks' submodule that vendors may provide, and relevant stdlib APIs will then call automatically to override the default upstream provided processing). Ah, but of course! That leaves us with only one part of the PEP unresolved: When the build process is unable to compile some modules when building Python from source (such as _sqlite3 due to missing sqlite headers), it would be great to provide a custom message when one then tries to import such module when using the compiled Python.
Do you see a 'pretty' solution for that within this framework? I'm not sure it qualifies as 'pretty', but one approach would be to have a './Modules/missing/' directory that gets pre-populated with checked in "<name>.py" files for extension modules that aren't always built. When getpath.c detects it's running from a development checkout, it would add that directory to sys.path (just before site-packages), while 'make install' and 'make altinstall' would only copy files from that directory into the installation target if the corresponding extension modules were missing.
Essentially, that would be the "name.missing.py" part of the draft proposal for optional standard library modules, just with a regular "name.py" module name and a tweak to getpath.c.
To my eye that looks like a complicated mechanism necessitating changes to several parts of the codebase. Have you considered modifying the default sys.excepthook implementation to read a list of modules and error messages from a file that was generated during the build process? To me that seems simpler, and the implementation will be only in one place. In addition, distributors could just populate that file with their data, thus we would have one mechanism for both use cases. Tomas
On 5 December 2016 at 22:53, Tomas Orsava <torsava@redhat.com> wrote:
On 12/05/2016 01:42 PM, Nick Coghlan wrote:
Essentially, that would be the "name.missing.py" part of the draft proposal for optional standard library modules, just with a regular "name.py" module name and a tweak to getpath.c.
To my eye that looks like a complicated mechanism necessitating changes to several parts of the codebase. Have you considered modifying the default sys.excepthook implementation to read a list of modules and error messages from a file that was generated during the build process? To me that seems simpler, and the implementation will be only in one place.
In addition, distributors could just populate that file with their data, thus we would have one mechanism for both use cases.
That's certainly another possibility, and one that initially appears to confine most of the complexity to sys.excepthook(). However, the problem you run into in that case is that CPython, by default, doesn't have any configuration files other than site.py, sitecustomize.py, usercustomize.py and whatever PYTHONSTARTUP points to for interactive use. The only non-executable one that is currently defined is the recommendation to redistributors in PEP 493 for file-based configuration of HTTPS-verification-by-default backports to earlier 2.7.x versions. Probably the closest analogy I can think of is the way we currently generate _sysconfigdata-<assorted-build-qualifiers>.py in order to capture the build time settings such that sysconfig.get_config_vars() can report them at runtime. So using _sysconfigdata as inspiration, it would likely be possible to provide a "sysconfig.get_missing_modules()" API that the default sys.excepthook() could use to report that a particular import didn't work because an optional standard library module hadn't been built. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 12/06/2016 03:27 AM, Nick Coghlan wrote:
On 12/05/2016 01:42 PM, Nick Coghlan wrote:
Essentially, that would be the "name.missing.py" part of the draft proposal for optional standard library modules, just with a regular "name.py" module name and a tweak to getpath.c. To my eye that looks like a complicated mechanism necessitating changes to several parts of the codebase. Have you considered modifying the default sys.excepthook implementation to read a list of modules and error messages from a file that was generated during the build process? To me that seems simpler, and the implementation will be only in one place.
In addition, distributors could just populate that file with their data, thus we would have one mechanism for both use cases. That's certainly another possibility, and one that initially appears to confine most of the complexity to sys.excepthook(). However, the
On 5 December 2016 at 22:53, Tomas Orsava <torsava@redhat.com> wrote: problem you run into in that case is that CPython, by default, doesn't have any configuration files other than site.py, sitecustomize.py, usercustomize.py and whatever PYTHONSTARTUP points to for interactive use. The only non-executable one that is currently defined is the recommendation to redistributors in PEP 493 for file-based configuration of HTTPS-verification-by-default backports to earlier 2.7.x versions.
Probably the closest analogy I can think of is the way we currently generate _sysconfigdata-<assorted-build-qualifiers>.py in order to capture the build time settings such that sysconfig.get_config_vars() can report them at runtime.
So using _sysconfigdata as inspiration, it would likely be possible to provide a "sysconfig.get_missing_modules()" API that the default sys.excepthook() could use to report that a particular import didn't work because an optional standard library module hadn't been built.
Quite interesting. And sysconfig.get_missing_modules() wouldn't even have to be generated during the build process, because it would be called only when the import has failed, at which point it is obvious Python was built without said component (like _sqlite3). So do you see that as an acceptable solution? Do you prefer the one you suggested previously? Alternatively, can the contents of site.py be generated during the build process? Because if some modules couldn't be built, a custom implementation of sys.excepthook might be generated there with the data for the modules that failed to be built. Regards, Tom
On 7 December 2016 at 02:50, Tomas Orsava <torsava@redhat.com> wrote:
So using _sysconfigdata as inspiration, it would likely be possible to provide a "sysconfig.get_missing_modules()" API that the default sys.excepthook() could use to report that a particular import didn't work because an optional standard library module hadn't been built.
Quite interesting. And sysconfig.get_missing_modules() wouldn't even have to be generated during the build process, because it would be called only when the import has failed, at which point it is obvious Python was built without said component (like _sqlite3). So do you see that as an acceptable solution?
Oh, I'd missed that - yes, the sysconfig API could potentially be something like `sysconfig.get_stdlib_modules()` and `sysconfig.get_optional_modules()` instead of specifically reporting which ones were missed by the build process. There'd still be some work around generating the manifests backing those APIs at build time (including getting them right for Windows as well), but it would make some other questions that are currently annoying to answer relatively straightforward (see http://stackoverflow.com/questions/6463918/how-can-i-get-a-list-of-all-the-p... for more on that)
Do you prefer the one you suggested previously?
The only strong preference I have around how this is implemented is that I don't want to add complex single-purpose runtime infrastructure for the task. For all of the other specifics, I think it makes sense to err on the side of "What will be easiest to maintain over time?"
Alternatively, can the contents of site.py be generated during the build process? Because if some modules couldn't be built, a custom implementation of sys.excepthook might be generated there with the data for the modules that failed to be built.
We don't really want site.py itself to be auto-generated (although it could be updated to use Argument Clinic selectively if we deemed that to be an appropriate thing to do), but there's no problem with generating either data modules or normal importable modules that get accessed from site.py. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
I know that you started this thread focusing on the stdlib, but for the purpose of distributors, the scope goes far beyond just the stdlib. Basically any Python module or package which the distribution can provide should be usable as basis for a nice error message pointing to the package to install. Now, it's the distribution which knows which modules/packages are available, so we don't need a list of stdlib modules in Python to help with this. The helper function (whether called via sys.excepthook() or perhaps a new sys.importerrorhook()) would then check the imported module name against this list and write out the message pointing the user to the missing package. A list of stdlib modules may still be useful, but it comes with it's own set of problems, which should be irrelevant for this use case: some stdlib modules are optional and only available if the system provides (and Python can find) certain libs (or header files during compilation). For a distribution there are no optional stdlib modules, since the distributor will know the complete list of available modules in the distribution, including their external dependencies. In other words: Python already provides all the necessary logic to enable implementing the suggested use case. On 07.12.2016 06:24, Nick Coghlan wrote:
On 7 December 2016 at 02:50, Tomas Orsava <torsava@redhat.com> wrote:
So using _sysconfigdata as inspiration, it would likely be possible to provide a "sysconfig.get_missing_modules()" API that the default sys.excepthook() could use to report that a particular import didn't work because an optional standard library module hadn't been built.
Quite interesting. And sysconfig.get_missing_modules() wouldn't even have to be generated during the build process, because it would be called only when the import has failed, at which point it is obvious Python was built without said component (like _sqlite3). So do you see that as an acceptable solution?
Oh, I'd missed that - yes, the sysconfig API could potentially be something like `sysconfig.get_stdlib_modules()` and `sysconfig.get_optional_modules()` instead of specifically reporting which ones were missed by the build process. There'd still be some work around generating the manifests backing those APIs at build time (including getting them right for Windows as well), but it would make some other questions that are currently annoying to answer relatively straightforward (see http://stackoverflow.com/questions/6463918/how-can-i-get-a-list-of-all-the-p... for more on that)
Do you prefer the one you suggested previously?
The only strong preference I have around how this is implemented is that I don't want to add complex single-purpose runtime infrastructure for the task. For all of the other specifics, I think it makes sense to err on the side of "What will be easiest to maintain over time?"
Alternatively, can the contents of site.py be generated during the build process? Because if some modules couldn't be built, a custom implementation of sys.excepthook might be generated there with the data for the modules that failed to be built.
We don't really want site.py itself to be auto-generated (although it could be updated to use Argument Clinic selectively if we deemed that to be an appropriate thing to do), but there's no problem with generating either data modules or normal importable modules that get accessed from site.py.
Cheers, Nick.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Dec 07 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
On 7 December 2016 at 18:33, M.-A. Lemburg <mal@egenix.com> wrote:
I know that you started this thread focusing on the stdlib, but for the purpose of distributors, the scope goes far beyond just the stdlib.
Basically any Python module or package which the distribution can provide should be usable as basis for a nice error message pointing to the package to install.
The PEP draft covered two questions: - experienced redistributors breaking the standard library up into pieces - optional modules for folks building their own Python (even if they're new to that)
Now, it's the distribution which knows which modules/packages are available, so we don't need a list of stdlib modules in Python to help with this.
Right, that's the case that we realised can be covered entirely by the suggestion "patch site.py to install a different default sys.excepthook()"
A list of stdlib modules may still be useful, but it comes with it's own set of problems, which should be irrelevant for this use case: some stdlib modules are optional and only available if the system provides (and Python can find) certain libs (or header files during compilation).
While upstream changes turned out not to be necessary for the "distributor breaking up the standard library" use case, they may still prove worthwhile in making import errors more informative in the case of "I just built my own Python from upstream sources and didn't notice (or didn't read) the build message indicating that some modules weren't built". Given the precedent of the sysconfig metadata generation, providing some form of machine-readable build-time-generated module manifest should be pretty feasible if someone was motivated to implement it, and we already have the logic to track which optional modules weren't built in order to generate the message at the end of the build process. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Nick Coghlan writes:
While upstream changes turned out not to be necessary for the "distributor breaking up the standard library" use case, they may still prove worthwhile in making import errors more informative in the case of "I just built my own Python from upstream sources and didn't notice (or didn't read) the build message indicating that some modules weren't built".
This case-by-case line of argument gives me a really bad feeling. Do we have to play whack-a-mole with every obscure message that pops up that somebody might not be reading? OK, this is a pretty common and confusing case, but surely there's something more systematic (and flexible vs. turning every error message into a complete usage manual ... which tl;dr) we can do. One way to play would be an interactive checklist-based diagnostic module (ie, a "rule-based expert system") that could be plugged into IDEs or even into sys.excepthook. Given Python's excellent introspective facilities, with a little care the rule interpreter could be designed with access to namespaces to provide additional detail or tweak rule priority. We could even build in a learning engine to give priority to users' habitual bugs (including typical mistaken diagnoses). That said, I don't have time to work on it :-(, so feel free to ignore me. And I grant that since AFAIK we have zero existing code for the engine and rule database, it might be a good idea to do something for some particular obscure errors in the 3.7 timeframe.
On 07.12.2016 13:57, Nick Coghlan wrote:
On 7 December 2016 at 18:33, M.-A. Lemburg <mal@egenix.com> wrote:
I know that you started this thread focusing on the stdlib, but for the purpose of distributors, the scope goes far beyond just the stdlib.
Basically any Python module or package which the distribution can provide should be usable as basis for a nice error message pointing to the package to install.
The PEP draft covered two questions:
- experienced redistributors breaking the standard library up into pieces - optional modules for folks building their own Python (even if they're new to that)
Now, it's the distribution which knows which modules/packages are available, so we don't need a list of stdlib modules in Python to help with this.
Right, that's the case that we realised can be covered entirely by the suggestion "patch site.py to install a different default sys.excepthook()"
A list of stdlib modules may still be useful, but it comes with it's own set of problems, which should be irrelevant for this use case: some stdlib modules are optional and only available if the system provides (and Python can find) certain libs (or header files during compilation).
While upstream changes turned out not to be necessary for the "distributor breaking up the standard library" use case, they may still prove worthwhile in making import errors more informative in the case of "I just built my own Python from upstream sources and didn't notice (or didn't read) the build message indicating that some modules weren't built".
Given the precedent of the sysconfig metadata generation, providing some form of machine-readable build-time-generated module manifest should be pretty feasible if someone was motivated to implement it, and we already have the logic to track which optional modules weren't built in order to generate the message at the end of the build process.
True, but the build process only covers C extensions. Writing the information somewhere for Python to pick up would be easy, though (just dump the .failed* lists somewhere). For pure Python modules, I suppose the install process could record all installed modules. Put all this info into a generated "_sysconfigstdlib" module, import this into sysconfig and you're set. Still, in all the years I've been using Python I never ran into a situation where I was interested in such information. For cases where a module is optional, you usually write a try...except and handle this on a case-by-case basis. That's safer than relying on some build time generated list, since the Python binary may well have been built on a different machine than the one the application is currently running on and so, even if an optional module is listed as built successfully, it may still fail to import. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Dec 07 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
On Mon, Nov 28, 2016, at 10:51, Tomas Orsava wrote:
Could some Windows user please check if compiling Python with the current reference implementation [2] of this PEP indeed generates a `curses.missing.py` file among the stdlib files? If so, we might consider skipping the generation of the .missing.py file for the curses module on Windows.
"Skip it for curses on Windows" doesn't seem like an acceptable solution, because tomorrow there could be another module, on another platform, that needs a similar fix. I think it'd be better to fix the logic. Searching the whole path for whatever.py before searching for whatever.missing.py makes sense to me and I'm not sure why this isn't the proposal. Honestly, though, I'm not sure of the need for the PEP in general. "However, there is as of yet no standardized way of dealing with importing a missing standard library module." is simply not true. The standardized way of dealing with it is that the import statement will raise an ImportError exception. Why exactly is that not good enough? A distribution could, for example, include an excepthook in site.py that prints an informative error message when an ImportError is unhandled for a list of modules that it knows about. Or they could modify the *default* excepthook in the interpreter itself.
On 11/28/2016 08:35 AM, Random832 wrote:
On Mon, Nov 28, 2016, at 10:51, Tomas Orsava wrote:
Could some Windows user please check if compiling Python with the current reference implementation [2] of this PEP indeed generates a `curses.missing.py` file among the stdlib files? If so, we might consider skipping the generation of the .missing.py file for the curses module on Windows.
"Skip it for curses on Windows" doesn't seem like an acceptable solution, because tomorrow there could be another module, on another platform, that needs a similar fix. I think it'd be better to fix the logic. Searching the whole path for whatever.py before searching for whatever.missing.py makes sense to me [...]
Agreed.
Honestly, though, I'm not sure of the need for the PEP in general. "However, there is as of yet no standardized way of dealing with importing a missing standard library module." is simply not true. The standardized way of dealing with it is that the import statement will raise an ImportError exception. Why exactly is that not good enough?
Because it is unfriendly. Helpful error messages are a great tool to both beginner and seasoned programmers.
A distribution could, for example, include an excepthook in site.py that prints an informative error message when an ImportError is unhandled for a list of modules that it knows about. [...]
As you say above, that list will fall out of date. Better to have a standard method that is easily implemented. -- ~Ethan~
On Mon, Nov 28, 2016 at 12:05:01PM -0800, Ethan Furman wrote:
Honestly, though, I'm not sure of the need for the PEP in general. "However, there is as of yet no standardized way of dealing with importing a missing standard library module." is simply not true. The standardized way of dealing with it is that the import statement will raise an ImportError exception. Why exactly is that not good enough?
Because it is unfriendly. Helpful error messages are a great tool to both beginner and seasoned programmers.
Random already covers that. There's no reason why packagers can't fix that.
A distribution could, for example, include an excepthook in site.py that prints an informative error message when an ImportError is unhandled for a list of modules that it knows about. [...]
As you say above, that list will fall out of date. Better to have a standard method that is easily implemented.
I think you have misunderstood Random's observation. Random notes correctly that treating "curses on Windows" as a special case will get out of date. Today its curses, tomorrow it might be curses and foo, then curses foo and bar, then just foo. Who knows? And what about Linux and Mac users? Might we start deploying third-party replacements for Windows-only std lib modules? (If any.) This is effectively a black-list: - don't add a .missing file for these modules where the list depends on guessing what *other* people do. But Random's observation doesn't apply to the packager. They cannot fall out of date, since they're generating a *white-list* of modules they have split out of the std lib into a separate package. Instead of the packager doing this: - remove foo, bar, baz from the standard python package; - add foo.missing, bar.missing, baz.missing to the python package; - add foo, bar, baz to the python-extras package Random suggest that they do this: - remove foo, bar, baz from the standard python package; - add foo, bar, baz to the list of modules that ImportError knows about; - add foo, bar, baz to the python-extras package. It can no more get out of date than can the .missing files. Instead of adding a complex mechanism for searching the PYTHONPATH twice, the second time looking for .missing files, here's a counter proposal: - Add a single config file in a known, standard place, call it "missing.ini" for the sake of the argument; - If present, that file should be a list of module names as keys and custom error messages as values; foo: try running "yum install spam-python" bar: try running "yum install spam-python" baz: try running "yum install eggs-python" - When ImportError is raised, Python looks at that file, and if the module name is found, it gives the custom error message in addition to the regular error message: import foo ImportError: No module named 'foo' try running "yum install spam-python" -- Steve
On Mon, Nov 28, 2016, at 15:05, Ethan Furman wrote:
Because it is unfriendly. Helpful error messages are a great tool to both beginner and seasoned programmers.
There won't be a helpful error message unless the distributor writes one.
A distribution could, for example, include an excepthook in site.py that prints an informative error message when an ImportError is unhandled for a list of modules that it knows about. [...]
As you say above, that list will fall out of date. Better to have a standard method that is easily implemented.
Whatever the standard method is, it has to be something we can direct distributors to modify, it's simply not something Python can do on its own (which means maybe distributors should be part of the conversation here). The default exception hook is as good a place as any. Maybe write most of the logic and get the distributors to just populate an empty-by-default array of structs with the module name and error message (and what about localization?) And the idea that building a ".missing.py" for every optional module that's disabled is going to adequate is a bit naive. For one thing, they're not going to *be* disabled, the distributors are going to build the whole thing and break up the installed files into packages. And you've still got to get the distributors to actually put their friendly error message in those files, and the missing.py files are build artifacts instead of a source file that they can patch.
On 11/28/2016 05:46 PM, Random832 wrote:
On Mon, Nov 28, 2016, at 15:05, Ethan Furman wrote:
Because it is unfriendly. Helpful error messages are a great tool to both beginner and seasoned programmers.
There won't be a helpful error message unless the distributor writes one.
The purpose of this PEP, if I understand correctly, is to settle on a standard for the location of that distributor written helpful error message. As a bonus, cpython itself can use the same mechanism for modules that are possible to build in the stdlib, but weren't. This would be useful for folks that build their own version.
Whatever the standard method is, it has to be something we can direct distributors to modify, it's simply not something Python can do on its own.
<smart alec> --- Yo, Distributor! If you move tkinter to a separate module, please add a tkinter.missing file in your main python package where tkinter is supposed to be; this file should contain a helpful message on how to install the tkinter package! --- There. Done. </smart alec>
The default exception hook is as good a place as any. Maybe write most of the logic and get the distributors to just populate an empty-by-default array of structs with the module name and error message (and what about localization?)
This might handle the stdlib portion, but the PEP's solution could be easily extended to handle any Python application that is installable in pieces. As far as localization -- it's a small text file, surely there are mechanisms already to deal with that? (I don't know, it's not a problem I have to deal with.)
And the idea that building a ".missing.py" for every optional module that's disabled is going to adequate is a bit naive.
Who said disabled? The PEP says missing, as in not there -- not disabled, as in there but ... what? not able to be used?
For one thing, they're not going to *be* disabled,
Ah, whew -- we agree on something! ;)
the distributors are going to build the whole thing and break up the installed files into packages. And you've still got to get the distributors to actually put their friendly error message in those files,
No, we don't. We provide a mechanism for them to use, and they use it or don't at their whim. It's a quality-of-implementation issue.
and the missing.py files are build artifacts instead of a source file that they can patch.
They are the ones that decide how to segment the stdlib, so they get to do the work. I would imagine, or at least hope, that they have the build and segmentation code under version control -- they can patch that. -- ~Ethan~
On Mon, Nov 28, 2016 at 02:32:16PM +0000, Paul Moore wrote:
Also, and possibly more of an issue, use of the ".missing.py" file will mean that a user can't provide their own implementation of the module later on sys.path. I don'rt know if this is a significant issue on Unix platforms. On Windows, there is a 3rd party implementation of the curses module which (as I understand it) can be user installed. If Python included a curses.missing.py, that would no longer work.
Certainly these are only minor points, but worth considering.
I don't think that's a minor point, I think its a very important one. -- Steve
On 11/28/2016 05:28 AM, Tomas Orsava wrote:
Rendered PEP: https://fedora-python.github.io/pep-drafts/pep-A.html
Overall +1, but using Guido's #2 option instead for handling *.missing.py files (searching all possible locations for the module before falling back to the stdlib xxx.missing.py default). -- ~Ethan~
On Mon, Nov 28, 2016 at 12:51 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
On 11/28/2016 05:28 AM, Tomas Orsava wrote:
Rendered PEP: https://fedora-python.github.io/pep-drafts/pep-A.html
Overall +1, but using Guido's #2 option instead for handling *.missing.py files (searching all possible locations for the module before falling back to the stdlib xxx.missing.py default).
Actually the .missing.py feature would be useful for other use cases, so it shouldn't be limited to the stdlib part of sys.path. (Also I'm withdrawing my idea of searching for it while searching for the original .py since that would burden successful imports with extra stat() calls.) -- --Guido van Rossum (python.org/~guido)
On 11/28/2016 01:01 PM, Guido van Rossum wrote:
On Mon, Nov 28, 2016 at 12:51 PM, Ethan Furman wrote:
On 11/28/2016 05:28 AM, Tomas Orsava wrote:
Rendered PEP: https://fedora-python.github.io/pep-drafts/pep-A.html <https://fedora-python.github.io/pep-drafts/pep-A.html>
Overall +1, but using Guido's #2 option instead for handling *.missing.py files (searching all possible locations for the module before falling back to the stdlib xxx.missing.py default).
Actually the .missing.py feature would be useful for other use cases, so it shouldn't be limited to the stdlib part of sys.path. (Also I'm withdrawing my idea of searching for it while searching for the original .py since that would burden successful imports with extra stat() calls.)
Absolutely. The key point in your counter proposal is not failing at the first .missing.py file possible, but rather searching all possible locations first. If we do the full search for the import first, then a full search for the .missing.py, and that ends up not hurting performance at all for successful imports -- well, that's just icing on the cake. :) One "successful" use-case that would be impacted is the fallback import idiom: try: # this would do two full searches before getting the error import BlahBlah except ImportError: import blahblah -- ~Ethan~
On 28 November 2016 at 21:11, Ethan Furman <ethan@stoneleaf.us> wrote:
One "successful" use-case that would be impacted is the fallback import idiom:
try: # this would do two full searches before getting the error import BlahBlah except ImportError: import blahblah
Under this proposal, the above idiom could potentially now fail. If there's a BlahBlah.missing.py, then that will get executed rather than an ImportError being raised, so the fallback wouldn't be executed. This could actually be a serious issue for code that currently protects against optional stdlib modules not being available like this. There's no guarantee that I can see that a .missing.py file would raise ImportError (even if we said that was the intended behaviour, there's nothing to enforce it). Could the proposal execute the .missing.py file and then raise ImportError? I could imagine that having problems of its own, though... Paul
On 28.11.2016 22:26, Paul Moore wrote:
On 28 November 2016 at 21:11, Ethan Furman <ethan@stoneleaf.us> wrote:
One "successful" use-case that would be impacted is the fallback import idiom:
try: # this would do two full searches before getting the error import BlahBlah except ImportError: import blahblah
Under this proposal, the above idiom could potentially now fail. If there's a BlahBlah.missing.py, then that will get executed rather than an ImportError being raised, so the fallback wouldn't be executed. This could actually be a serious issue for code that currently protects against optional stdlib modules not being available like this. There's no guarantee that I can see that a .missing.py file would raise ImportError (even if we said that was the intended behaviour, there's nothing to enforce it).
Could the proposal execute the .missing.py file and then raise ImportError? I could imagine that having problems of its own, though...
How about addressing both concerns by triggering the search for .missing.py only if an ImportError bubbles up uncaught (a bit similar to StopIteration nowadays)? Wolfgang
On Mon, Nov 28, 2016 at 1:26 PM, Paul Moore <p.f.moore@gmail.com> wrote:
One "successful" use-case that would be impacted is the fallback import idiom:
try: # this would do two full searches before getting the error import BlahBlah except ImportError: import blahblah
Under this proposal, the above idiom could potentially now fail.
higher on the thread, someone said that IMportError was not robust enough, because it didn't give near,y as meaninful an error message as it might.
There's no guarantee that I can see that a .missing.py file
would raise ImportError (even if we said that was the intended behaviour, there's nothing to enforce it).
there is nothing to enforce all sorts of things -- I dont hinkt it's so wrong to have it in the spec that .missing.py fles NEED to raise an ImportError and they could give nice meaningful error messages that way without breaking old code. Could the proposal execute the .missing.py file and then raise
ImportError? I could imagine that having problems of its own, though...
if the ImportError is raised by the surrounding code, then it would need a protocol to get the nice error message -- raising an Exception is already a protocol -- let's use that one. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On 11/28/2016 01:26 PM, Paul Moore wrote:
On 28 November 2016 at 21:11, Ethan Furman <ethan@stoneleaf.us> wrote:
One "successful" use-case that would be impacted is the fallback import idiom:
try: # this would do two full searches before getting the error import BlahBlah except ImportError: import blahblah
Under this proposal, the above idiom could potentially now fail. If there's a BlahBlah.missing.py, then that will get executed rather than an ImportError being raised, so the fallback wouldn't be executed.
Which is why the strong recommendation is for the .missing.py file to raise an ImportError exception, but with a useful error message, such as "Tkinter is not currently installed. Install python-tkinter to get it."
This could actually be a serious issue for code that currently protects against optional stdlib modules not being available like this. There's no guarantee that I can see that a .missing.py file would raise ImportError (even if we said that was the intended behaviour, there's nothing to enforce it).
Presumably the folks doing the splitting know what they are doing. Any cPython auto-generated .missing.py files would be correct: raise ImportError("tkinter was not compiled due to ...") .
Could the proposal execute the .missing.py file and then raise ImportError? I could imagine that having problems of its own, though...
Yeah, I don't think that's a good idea. -- ~Ethan~
On Mon, Nov 28, 2016 at 5:28 AM, Tomas Orsava <torsava@redhat.com> wrote: [...]
Specification =============
When, for any reason, a standard library module is not to be included with the rest, a file with its name and the extension ``.missing.py`` shall be created and placed in the directory the module itself would have occupied. This file can contain any Python code, however, it *should* raise a ModuleNotFoundError_ with a helpful error message.
Currently, when Python tries to import a module ``XYZ``, the ``FileFinder`` path hook goes through the entries in ``sys.path``, and in each location looks for a file whose name is ``XYZ`` with one of the valid suffixes (e.g. ``.so``, ..., ``.py``, ..., ``.pyc``). The suffixes are tried in order. If none of them are found, Python goes on to try the next directory in ``sys.path``.
The ``.missing.py`` extension will be added to the end of the list, and configured to be handled by ``SourceFileLoader``. Thus, if a module is not found in its proper location, the ``XYZ.missing.py`` file is found and executed, and further locations are not searched.
I'd suggest that we additional specify that if we find a foo.missing.py, then the code is executed but -- unlike a regular module load -- it's not automatically inserted into sys.modules["foo"]. That seems like it could only create confusion. And it doesn't restrict functionality, because if someone really wants to implement some clever shenanigans, they can always modify sys.modules["foo"] by hand. This also suggests that the overall error-handling flow for 'import foo' should look like: 1) run foo.missing.py 2) if it raises an exception: propagate that 3) otherwise, if sys.modules["foo"] is missing: raise some variety of ImportError. 4) otherwise, use sys.modules["foo"] as the object that should be bound to 'foo' in the original invoker's namespace I think this might make everyone who was worried about exception handling downthread happy -- it allows a .missing.py file to successfully import if it really wants to, but only if it explicitly fulfills 'import' requirement that the module should somehow be made available. -n -- Nathaniel J. Smith -- https://vorpus.org
On 28Nov2016 1419, Nathaniel Smith wrote:
I'd suggest that we additional specify that if we find a foo.missing.py, then the code is executed but -- unlike a regular module load -- it's not automatically inserted into sys.modules["foo"]. That seems like it could only create confusion. And it doesn't restrict functionality, because if someone really wants to implement some clever shenanigans, they can always modify sys.modules["foo"] by hand.
In before Brett says "you can do this with an import hook", because, well, we can do this with an import hook :) Given that, this wouldn't necessarily need to be an executable file. The finder could locate a "foo.missing" file and raise ModuleNotFoundError with the contents of the file as the message. No need to allow/require any Python code at all, and no risk of polluting sys.modules. Cheers, Steve
On 28Nov2016 1433, Steve Dower wrote:
On 28Nov2016 1419, Nathaniel Smith wrote:
I'd suggest that we additional specify that if we find a foo.missing.py, then the code is executed but -- unlike a regular module load -- it's not automatically inserted into sys.modules["foo"]. That seems like it could only create confusion. And it doesn't restrict functionality, because if someone really wants to implement some clever shenanigans, they can always modify sys.modules["foo"] by hand.
In before Brett says "you can do this with an import hook", because, well, we can do this with an import hook :)
And since I suggested it, here's a rough proof-of-concept: import importlib.abc import os import sys class MissingPathFinder(importlib.abc.MetaPathFinder): def find_spec(self, fullname, path, target=None): for p in (path or sys.path): file = os.path.join(p, fullname + ".missing") if os.path.isfile(file): with open(file, 'r', encoding='utf-8') as f: raise ModuleNotFoundError(f.read()) sys.meta_path.append(MissingPathFinder()) import foo Add a "foo.missing" file to your working directory and you'll get the message from that instead of the usual one. Cheers, Steve
On Mon, 28 Nov 2016 at 14:49 Steve Dower <steve.dower@python.org> wrote:
On 28Nov2016 1433, Steve Dower wrote:
On 28Nov2016 1419, Nathaniel Smith wrote:
I'd suggest that we additional specify that if we find a foo.missing.py, then the code is executed but -- unlike a regular module load -- it's not automatically inserted into sys.modules["foo"]. That seems like it could only create confusion. And it doesn't restrict functionality, because if someone really wants to implement some clever shenanigans, they can always modify sys.modules["foo"] by hand.
In before Brett says "you can do this with an import hook", because, well, we can do this with an import hook :)
And since I suggested it, here's a rough proof-of-concept:
import importlib.abc import os import sys
class MissingPathFinder(importlib.abc.MetaPathFinder): def find_spec(self, fullname, path, target=None): for p in (path or sys.path): file = os.path.join(p, fullname + ".missing") if os.path.isfile(file): with open(file, 'r', encoding='utf-8') as f: raise ModuleNotFoundError(f.read())
sys.meta_path.append(MissingPathFinder()) import foo
Add a "foo.missing" file to your working directory and you'll get the message from that instead of the usual one.
Since this PEP directly affects import I'm going to weigh in. First, this won't necessarily create more stat calls depending on how it's implemented. importlib.machinery.FileFinder which does the searching for files on sys.path caches directory contents for as long as the granularity of the file system's mtime is (e.g. a 1 second mtime granularity means directory contents are cached for 1 second). This means that if the check occurs within that granularity (whether immediately after looking for *.py or in some way through a second pass) then there's no file system overhead. Second, as proposed the PEP probably shouldn't change importlib.machinery.SourceFileLoader and instead should return some new loader that only handles these *.missing.py files (just like a different loader is returned for extension modules). This allows for the loader to be simpler and avoids making any custom loader from no longer implementing current semantics (although I have tried to structure things in importlib to make it so subclassing is an attractive option for people so this isn't vital, just something to at least consider). Third, Steve channeled me properly and this actually doesn't require any changes to any pre-existing code and can instead be implemented as an importlib.abc.MetaPathFinder that is at the end of sys.meta_path which means there wouldn't be any local shadowing of modules available farther down sys.path (although this would lead to more stat calls). Fourth, if you make a meta-path finder and use static data you do away with any performance issue with the file system. And since it would be installed at the end of the sys.meta_path -- and thus after importlib.machinery.PathFinder -- you effectively shadow it with a successful import and so there's no need to worry about the information leaking out unless someone mucks with sys.meta_path. Fifth, people have asked for some way to catch/log/manipulate the response of import when a module isn't found, e.g. renaming modules in 2/3 migrations was the first major instance of this, but I've heard others wanting to log this to detect what modules they should install for users in a cloud environment. This is a specific solution to a general problem that some have asked for so it might warrant thinking about whether a more general solution could work (but never enough to warrant me trying to solve it above other issues). Sixth, this would be easier to deal with if import got refactored into its own object and out of the sys.module for easier manipulation of the import process. ;) Seventh, these *.missing.py files if they are directly executed are totally going to be abused like *.pth files, I can just feel it in my bones. We need to be okay with this if we accept this PEP as-is.
On 29.11.2016 00:50, Brett Cannon wrote:
Seventh, these *.missing.py files if they are directly executed are totally going to be abused like *.pth files, I can just feel it in my bones. We need to be okay with this if we accept this PEP as-is.
Since the purpose of the PEP was to allow distributors to guide users through the installation process of extra packages in order to get access to parts of the stdlib which are not installed, I think the PEP is overly broad in concept to address this one use case. Just as with .pth files, the possibility to hook arbitrary code execution into the module search path will get abused for all kinds of weird things, esp. if the whole sys.path is scanned for the .missing.py module and not only the part where the stdlib lives (as was suggested in the thread). So why not limit the PEP to just the intended use case ? I.e. define a static list of modules which do make up the Python stdlib and then have the importer turn a ModuleNotFoundError error into a nice distribution specific error message, if and only if the imported module is from the set of stdlib modules. The change of the error message could be done by having the distributor patch the importer or we could have the importer call a function defined via sitecustomize.py by the distributor to return a message. Thinking about this some more... We don't even need a list of stdlib modules. Simply define a general purpose import error formatting function, e.g. sys.formatimporterror(), pass in the name of the module and let it determine the error message based on the available information. A distributor could then provide a custom function that knows about the installed Python packages and then guides the user to install any missing ones. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Nov 29 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
On Tue, Nov 29, 2016 at 4:13 AM, M.-A. Lemburg <mal@egenix.com> wrote:
On 29.11.2016 00:50, Brett Cannon wrote:
Seventh, these *.missing.py files if they are directly executed are totally going to be abused like *.pth files, I can just feel it in my bones. We need to be okay with this if we accept this PEP as-is.
Since the purpose of the PEP was to allow distributors to guide users through the installation process of extra packages in order to get access to parts of the stdlib which are not installed, I think the PEP is overly broad in concept to address this one use case.
Just as with .pth files, the possibility to hook arbitrary code execution into the module search path will get abused for all kinds of weird things, esp. if the whole sys.path is scanned for the .missing.py module and not only the part where the stdlib lives (as was suggested in the thread).
So why not limit the PEP to just the intended use case ?
I think a better question is why should we artificially limit the PEP? These is something that could be useful outside of the stdlib. At least for Linux packages it is common to split out optional components of a python package into separate linux packages to limit the size and dependencies of the main package. This could help a lot in that situation.
I.e. define a static list of modules which do make up the Python stdlib and then have the importer turn a ModuleNotFoundError error into a nice distribution specific error message, if and only if the imported module is from the set of stdlib modules.
This is hard to do in a general sense. The point is to be able to tell the user what package they should install to get that functionality, but there is no general rule as to what the package should be named, and platform-specific modules would not be installable at all. So every module would need its own error message defined.
Thinking about this some more...
We don't even need a list of stdlib modules. Simply define a general purpose import error formatting function, e.g. sys.formatimporterror(), pass in the name of the module and let it determine the error message based on the available information.
A distributor could then provide a custom function that knows about the installed Python packages and then guides the user to install any missing ones.
This is getting pretty complicated compared to simply defining a one-line text file containing the error message with the module name somewhere in the file name, as others have proposed.
On Tue, Nov 29, 2016 at 10:55:14AM -0500, Todd wrote:
On Tue, Nov 29, 2016 at 4:13 AM, M.-A. Lemburg <mal@egenix.com> wrote:
Just as with .pth files, the possibility to hook arbitrary code execution into the module search path will get abused for all kinds of weird things, esp. if the whole sys.path is scanned for the .missing.py module and not only the part where the stdlib lives (as was suggested in the thread).
So why not limit the PEP to just the intended use case ?
I think a better question is why should we artificially limit the PEP?
Because YAGNI. Overly complex, complicated systems which do more than is needed "because it might be useful one day" is an anti-pattern. The intended use-case is to allow Linux distributions to customize the error message on ImportError. From there, it is a small step to allow *other* people to do the same thing. But it is a BIG step to go from that to a solution that executes arbitrary code. Before we take that big step, we ought to have a good reason.
These is something that could be useful outside of the stdlib.
Sure. I don't think there is any proposal to prevent people outside of Linux package distributors from using this mechanism. I'm not sure how this would even be possible: if Red Hat or Debian can create a .missing file, so can anyone else.
At least for Linux packages it is common to split out optional components of a python package into separate linux packages to limit the size and dependencies of the main package. This could help a lot in that situation.
Well... I'm not sure how common that it. But it doesn't really matter. This is a good argument for having a separate .missing file for each module, rather than a single flat registry of custom error messages. Separate .missing files will allow any Python package to easily install their own message. But either way, whether there's a single registry or an import hook that searches for .missing files if and only if the import failed, I haven't seen a strong argument for allowing arbitrary Python code. (Earlier I suggested such a flat registry -- I now withdraw that suggestion. I'm satisfied that a separate spam.missing file containing the custom error message when spam.py cannot be found is a better way to handle this.) -- Steve
On 28 November 2016 at 22:33, Steve Dower <steve.dower@python.org> wrote:
Given that, this wouldn't necessarily need to be an executable file. The finder could locate a "foo.missing" file and raise ModuleNotFoundError with the contents of the file as the message. No need to allow/require any Python code at all, and no risk of polluting sys.modules.
I like this idea. Would it completely satisfy the original use case for the proposal? (Or, to put it another way, is there any specific need for arbitrary code execution in the missing.py file?) Paul
On 29.11.2016 10:39, Paul Moore wrote:
On 28 November 2016 at 22:33, Steve Dower <steve.dower@python.org> wrote:
Given that, this wouldn't necessarily need to be an executable file. The finder could locate a "foo.missing" file and raise ModuleNotFoundError with the contents of the file as the message. No need to allow/require any Python code at all, and no risk of polluting sys.modules.
I like this idea. Would it completely satisfy the original use case for the proposal? (Or, to put it another way, is there any specific need for arbitrary code execution in the missing.py file?)
The only thing that I could think of so far would be cross-platform .missing.py files that query the system (e.g. using the platform module) to generate adequate messages for the specific platform or distro. E.g., correctly recommend to use dnf install or yum install or apt install, etc.
On 29 November 2016 at 10:51, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
On 29.11.2016 10:39, Paul Moore wrote:
On 28 November 2016 at 22:33, Steve Dower <steve.dower@python.org> wrote:
Given that, this wouldn't necessarily need to be an executable file. The finder could locate a "foo.missing" file and raise ModuleNotFoundError with the contents of the file as the message. No need to allow/require any Python code at all, and no risk of polluting sys.modules.
I like this idea. Would it completely satisfy the original use case for the proposal? (Or, to put it another way, is there any specific need for arbitrary code execution in the missing.py file?)
The only thing that I could think of so far would be cross-platform .missing.py files that query the system (e.g. using the platform module) to generate adequate messages for the specific platform or distro. E.g., correctly recommend to use dnf install or yum install or apt install, etc.
Yeah. I'd like to see a genuine example of how that would be used in practice, otherwise I'd be inclined to suggest YAGNI. (Particularly given that this PEP is simply a standardised means of vendor customisation - for special cases, vendors obviously still have the capability to patch or override standard behaviour in any way they like). Paul
On Nov 29, 2016 5:51 AM, "Wolfgang Maier" < wolfgang.maier@biologie.uni-freiburg.de> wrote:
On 29.11.2016 10:39, Paul Moore wrote:
On 28 November 2016 at 22:33, Steve Dower <steve.dower@python.org> wrote:
Given that, this wouldn't necessarily need to be an executable file. The finder could locate a "foo.missing" file and raise ModuleNotFoundError
with
the contents of the file as the message. No need to allow/require any Python code at all, and no risk of polluting sys.modules.
I like this idea. Would it completely satisfy the original use case for the proposal? (Or, to put it another way, is there any specific need for arbitrary code execution in the missing.py file?)
The only thing that I could think of so far would be cross-platform .missing.py files that query the system (e.g. using the platform module) to generate adequate messages for the specific platform or distro. E.g., correctly recommend to use dnf install or yum install or apt install, etc.
In those cases it would probably be as easy, if not easier, to do that at build-time, which would get us back to simple text files. Making a standard script is hard, if not impossible, in many cases because the package name often does not match the module name. So you are going to need manual intervention in many cases, and modifying a one-line text file is going to be easier than modifying a script.
On 28.11.2016 23:19, Nathaniel Smith wrote:
I'd suggest that we additional specify that if we find a foo.missing.py, then the code is executed but -- unlike a regular module load -- it's not automatically inserted into sys.modules["foo"]. That seems like it could only create confusion. And it doesn't restrict functionality, because if someone really wants to implement some clever shenanigans, they can always modify sys.modules["foo"] by hand.
This also suggests that the overall error-handling flow for 'import foo' should look like:
1) run foo.missing.py 2) if it raises an exception: propagate that 3) otherwise, if sys.modules["foo"] is missing: raise some variety of ImportError. 4) otherwise, use sys.modules["foo"] as the object that should be bound to 'foo' in the original invoker's namespace
I think this might make everyone who was worried about exception handling downthread happy -- it allows a .missing.py file to successfully import if it really wants to, but only if it explicitly fulfills 'import' requirement that the module should somehow be made available.
A refined (from my previous post which may have ended up too nested) alternative: instead of triggering an immediate search for a .missing.py file, why not have the interpreter intercept any ModuleNotFoundError that bubbles up to the top without being caught, then uses the name attribute of the exception to look for the .missing.py file. Agreed, this is more complicated to implement, but it would avoid any performance loss in situations where running code knows how to deal with the missing module anyway. Wolfgang
On 11/28/2016 02:42 PM, Wolfgang Maier wrote:
A refined (from my previous post which may have ended up too nested) alternative: instead of triggering an immediate search for a .missing.py file, why not have the interpreter intercept any ModuleNotFoundError that bubbles up to the top without being caught, then uses the name attribute of the exception to look for the .missing.py file. Agreed, this is more complicated to implement, but it would avoid any perfor- mance loss in situations where running code knows how to deal with the missing module anyway.
So we only have the hit when the exception is going to kill the interpreter? +1 -- ~Ethan~
On Tue, Nov 29, 2016 at 9:19 AM, Nathaniel Smith <njs@pobox.com> wrote:
This also suggests that the overall error-handling flow for 'import foo' should look like:
1) run foo.missing.py 2) if it raises an exception: propagate that 3) otherwise, if sys.modules["foo"] is missing: raise some variety of ImportError. 4) otherwise, use sys.modules["foo"] as the object that should be bound to 'foo' in the original invoker's namespace
+1, because this also provides a coherent way to reword the try/except import idiom: # Current idiom # somefile.py try: import foo except ImportError: import subst_foo as foo # New idiom: # foo.missing.py import subst_foo as foo import sys; sys.modules["foo"] = foo #somefile.py import foo ChrisA
On 28.11.2016 23:52, Chris Angelico wrote:
+1, because this also provides a coherent way to reword the try/except import idiom:
# Current idiom # somefile.py try: import foo except ImportError: import subst_foo as foo
# New idiom: # foo.missing.py import subst_foo as foo import sys; sys.modules["foo"] = foo #somefile.py import foo
Hmm. I would rather take this example as an argument against the proposed behavior. It invites too many clever hacks. I thought that the idea was that .missing.py does *not* act as a replacement module, but, more or less, just as a message generator.
On Mon, Nov 28, 2016 at 6:00 PM, Wolfgang Maier < wolfgang.maier@biologie.uni-freiburg.de> wrote:
On 28.11.2016 23:52, Chris Angelico wrote:
+1, because this also provides a coherent way to reword the try/except import idiom:
# Current idiom # somefile.py try: import foo except ImportError: import subst_foo as foo
# New idiom: # foo.missing.py import subst_foo as foo import sys; sys.modules["foo"] = foo #somefile.py import foo
Hmm. I would rather take this example as an argument against the proposed behavior. It invites too many clever hacks. I thought that the idea was that .missing.py does *not* act as a replacement module, but, more or less, just as a message generator.
Is there a reason we need a full-blown message generator? Why couldn't there just be a text file, and the contents of that text file are used as the error message for an ImportError?
On Tue, Nov 29, 2016 at 09:52:21AM +1100, Chris Angelico wrote:
+1, because this also provides a coherent way to reword the try/except import idiom:
# Current idiom # somefile.py try: import foo except ImportError: import subst_foo as foo
Nice, clean and self-explanatory: import a module, if it fails, import its replacement. Obviously that idiom must die. *wink*
# New idiom: # foo.missing.py import subst_foo as foo import sys; sys.modules["foo"] = foo #somefile.py import foo
Black magic where the replacement happens out of sight. What if I have two files? # a.py try: import spam except ImportError: import ham as spam # b.py try: import spam except ImportError: import cornedbeef as spam The current idiom makes it the responsibility of the importer to decide what happens when the import fails. Perhaps it is as simple as importing a substitute module, but it might log a warning, monkey-patch some classes, who knows? The important thing is that the importer decides whether to fail or try something else. Your proposal means that the importer no longer has any say in the matter. The administrator of the site chooses what .missing files get added, they decide whether or not to log a warning, they decide whether to substitute ham.py or cornedbeef.py for spam. I do not like this suggestion, and the mere possibility that it could happen makes this entire .missing suggestion a very strong -1 from me. To solve the simple problem of providing a more useful error message when an ImportError occurs, we do not need to execute arbitrary code in a .missing.py file. That's using a nuclear-powered bulldozer to crack a peanut. -- Steve
On Tue, Nov 29, 2016 at 12:14 PM, Steven D'Aprano <steve@pearwood.info> wrote:
What if I have two files?
# a.py try: import spam except ImportError: import ham as spam
# b.py try: import spam except ImportError: import cornedbeef as spam
In the same project? Then you already have a maintenance nightmare, because 'spam' will sometimes mean the same module (with state shared between the files), but might mean two distinct modules (and thus unrelated module objects). In different projects? They won't conflict. ChrisA
On 29 Nov 2016, at 02:48, Chris Angelico <rosuav@gmail.com> wrote:
On Tue, Nov 29, 2016 at 12:14 PM, Steven D'Aprano <steve@pearwood.info> wrote:
What if I have two files?
# a.py try: import spam except ImportError: import ham as spam
# b.py try: import spam except ImportError: import cornedbeef as spam
In the same project? Then you already have a maintenance nightmare, because 'spam' will sometimes mean the same module (with state shared between the files), but might mean two distinct modules (and thus unrelated module objects). In different projects? They won't conflict.
Well, you *might* have a maintenance nightmare, but you might not. In particular, I should point out that “spam” is just a name (more correctly referred to as a.spam and b.spam.) If the “spam” module is intended to have global state that the “a” and “b” modules use to communicate then obviously this is a problem. But if it isn’t, then there is exactly no problem with each module choosing its own fallback. As a really silly example, consider sqlite3 again. If there were third-party modules that both implement the sqlite3 API, then there is no reason for each module to agree on what sqlite3 module they use unless types are being passed between them. If we consider “a” and “b” as truly separate non-communicating modules, then there’s no issue at all. Cory
participants (20)
-
Alexandre Brault
-
Brett Cannon
-
Chris Angelico
-
Chris Barker
-
Cory Benfield
-
Eric V. Smith
-
Ethan Furman
-
Guido van Rossum
-
M.-A. Lemburg
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Random832
-
Stephen J. Turnbull
-
Steve Dower
-
Steven D'Aprano
-
Terry Reedy
-
Todd
-
Tomas Orsava
-
Wolfgang Maier