On Thu, Sep 2, 2021 at 10:46 PM Gregory Szorc <gregory.szorc@gmail.com> wrote:
Over in https://bugs.python.org/issue45020 there is some exciting work around expanding the use of the frozen importer to speed up Python interpreter startup. I wholeheartedly support the effort and don't want to discourage progress in this area.
Simultaneously, I've been down this path before with PyOxidizer and feel like I have some insight to share.
Thanks for the support and for taking the time to share your insight! Your work on PyOxidizer is really neat. Before I dive in to replying, I want to be clear about what we are discussing here. There are two related topics: the impact of freezing stdlib modules and usability problems with frozen modules in general (stdlib or not). https://bugs.python.org/issue45020 is concerned with the former but prompted some good discussion about the latter. From what I understand, this python-dev thread is more about the latter (and then some). That's totally worth discussing! I just don't want the two topics to be unnecessarily conflated. FYI, frozen modules (effectively the .pyc data) are compiled into the Python binary and lhen loaded from there during import rather than from the filesystem. This allows us to avoid disk access, giving us a performance benefit, but we still have to unmarshal and execute the module code. It also allows us to have the import machinery written in pure Python (importlib._bootstrap and importlib._bootstrap_external). (Thanks Brett!) While frozen modules are derived from .py files, they currently have some differences from the corresponding source modules: the loader (which has less capability), the repr, frozen packages have __path__ set to [], and frozen modules don't have __file__, __cached__, etc. set. This has been the case for a long time. MAL worked on addressing __file__ but the effort stalled out. (See https://bugs.python.org/issue45020#msg400769 and especially https://bugs.python.org/issue21736.) The challenge with solving this for non-stdlib modules is that the frozen importer would need help to know where to find corresponding .py files. bpo-45020 is about freezing a small subset of the stdlib as a performance improvement. It's the 11 stdlib modules (plus encodings) that get imported every time during "./python -c pass". Freezing them provides a roughly 15% startup time improvement. (The 11 modules are: abc, codecs, encodings, io, _collections_abc, _site_builtins, os, os.path, genericpath, site, and stat. Maybe there are a few other modules it would make sense to freeze but we're starting with those 11.) This work is probably somewhat affected by the differences between frozen and source modules, and we may need to set an appropriate __file__ on frozen stdlib modules to avoid impacting folks that expect any of those stdlib modules to have it set. Otherwise, for bpo-45020 there likely isn't much more we need to do about frozen stdlib modules shipping with CPython by default. Regardless, bpo-45020 doesn't introduce any new problems; rather it slightly exposes the existing ones. In contrast to the use of frozen modules in default Python builds, there are a number of tools in the community for freezing modules (both stdlib and not) into custom Python binaries, like PyOxidizer and MAL's PyRun. Such tools would benefit from broader compatibility between frozen modules and the corresponding source modules. Consequently the tool maintainers would be the most likely drivers of any effort to improve frozen modules (which the discussion with MAL and Gregory bears out). The tools would especially benefit if those improvements could apply to non-stdlib modules, which requires a more complex solution than is needed for stdlib modules. At the (relative) extreme is to throw out the existing frozen module approach (or even the "unmarshal + exec" approach of source-based modules) and replace it with something more efficient and/or more compatible (and cross-platform). From what I understood, this is the main focus of this thread. It's interesting stuff and I hope the discussion renders a productive result. FTR, in bpo-45020 Gregory helpfully linked to some insightful material related to PyOxidizer and frozen modules: * https://github.com/indygreg/PyOxidizer/issues/69 * https://pyoxidizer.readthedocs.io/en/stable/oxidized_importer_behavior_and_c... * https://pypi.org/project/oxidized-importer/ and https://pyoxidizer.readthedocs.io/en/stable/oxidized_importer.html With that said, on to replying. :)
I don't think I'll be offending anyone by saying the existing CPython frozen importer is quite primitive in terms of functionality: it does the minimum it needs to do to support importing module bytecode embedded in the interpreter binary [for purposes of bootstrapping the Python-based importlib modules]. The C struct representing frozen modules is literally just the module name and a pointer to a sized buffer containing bytecode.
I suppose one question is if "primitive" is enough. The current approach is certainly straightforward and relatively easy to quickly wrap one's brain around. Would an alternative approach provide sufficient advantage to offset extra complexity or the cost of changing things in case it isn't more complex ("status quo wins a stalemate")?
In issue45020 there is talk of enhancing the functionality of the frozen importer to support its potential broader use. For example, setting __file__ or exposing .__loader__.get_source(). I support the overall initiative.
However, introducing enhanced functionality of the frozen importer will at the C level require either:
bpo-45020 isn't about improving the functionality of the frozen importer but rather about using it to speed up startup (and then not breaking users that expect __file__ on stdlib modules).
a) backwards incompatible changes to the C API to support additional metadata on frozen modules (or at the very least a supplementary API that fragments what a "frozen" module is).
What part of the C-API, specifically? I'm aware of PyImport_ImportFrozenModule() and PyImport_ImportFrozenModuleObject(), as well as PyImport_FrozenModules, none of which would need to change (nor would become backward-incompatible). I most certainly could have missed something. Other than that API, it's all implementation details. We cover it with tooling (like Tools/scripts/freeze_modules.py and Tools/freeze/freeze.py) rather than C-API, no?
b) CPython only hacks to support additional functionality for "freezing" the standard library for purposes of speeding up startup.
That's definitely what we would do in the short-term. However, any solution we would pursue would definitely have to be done in a way that doesn't break when used with non-stdlib modules. Basically we're aiming to preserve the status quo behavior where it matters. (FWIW, "hack" isn't the word I'd use. :) As a core developer I'm firmly committed to the health of the project, which includes keeping code as maintainable as possible and pursuing solid solutions even if they only solve some of the problems we'd like to address.)
I'm not a CPython core developer, but neither "a" nor "b" seem ideal to me. "a" is backwards incompatible. "b" seems like a stop-gap solution until a more generic version is available outside the CPython standard library.
From my experience with PyOxidizer and software in general, here is what I think is going to happen:
1. CPython enhances the frozen importer to be usable in more situations. 2. Python programmers realize this solution has performance and ease-of-distribution wins and want to use it more. 3. Limitations in the frozen importer are found. Bugs are reported. Feature requests are made. 4. The frozen importer keeps getting incrementally extended or Python developers grow frustrated that its enhancements are only available to the standard library.
Yeah, that's usually how it goes in open source. :) That said, with bpo-45020 the only change relative to the frozen importer is that we would use it for more stdlib modules. So I suppose it could make more people aware of the idea of frozen modules, though I hope not -- that would probably only happen if they start getting unexpected failures, which is what we'd like to avoid. All the limitations are already there. I suppose the relevant question is about the community weight behind those steps. I expect that twitter will never blow up with threads about frozen modules. :)
You end up slowly reimplementing the importing mechanism in C (remember Python 2?) or disappoint users.
I'm not sure I follow. What part of the import system would be reimplemented in C? The frozen importer is written in pure Python with a few small helpers written in C. I expect that nearly all necessary changes would happen in Lib/importlib/_bootstrap.py and not Python/import.c.
Rather than extending the frozen importer, I would suggest considering an alternative solution that is far more useful to the long-term success of Python: I would consider building a fully-featured, generic importer that is capable of importing modules and resource data from a well-defined and portable serialization format / data structure that isn't defined by C structs and APIs.
Instead of defining module bytecode (and possible additional minimal metadata) in C structs in a frozen modules array (or an equivalent C API), what if we instead defined a serialization format for representing the contents of loadable Python data (module source, module bytecode, resource files, extension module library data, etc)? We could then point the Python interpreter at instances of this data structure (in memory or in files) so it could import/load the resources within using a meta path importer.
What if this serialization format were designed so that it was extremely efficient to parse and imports could be serviced with the same trivially minimal overhead that the frozen importer currently has? We could embed these data structures in produced binaries and achieve the same desirable results we'll be getting in issue45020 all while delivering a more generic solution.
FWIW, the performance benefits from bpo-45020 are almost completely from avoiding disk access. We still pay the cost of unmarshaling and executing each module's code object. So any solution that can be more efficient than "unmarshal + exec" would be a total win! Involving the filesystem mostly kills any benefit (with the possible exception of zipimport where you take the hit once). Note that the concept of improving on "unmarshal + exec" isn't new and a variety of prior art exists. In fact, there are many possible approaches to beating the performance of "unmarshal + exec", with varying degrees of complexity and effectiveness. Guido has been exploring several, e.g. https://github.com/faster-cpython/ideas/issues/84.
What if this serialization format were portable across machines? The entire Python ecosystem could leverage it as a container format for distributing Python resources. Rather than splatting dozens or hundreds of files on the filesystem, you could write a single file with all of a package's resources. Bugs around filesystem implementation details such as case (in)sensitivity and Unicode normalization go away. Package installs are quicker. Run-time performance is better due to faster imports.
(OK, maybe that last point brings back bad memories of eggs and you instinctively reject the idea. Or you have concerns about development ergonomics when module source code isn't in standalone editable files. These are fair points!)
What if the Python interpreter gains an "app mode" where it is capable of being paired with a single "resources file" and running the application within? Think running zip applications today, but a bit faster, more tailored to Python, and more fully featured.
What if an efficient binary serialization format could be leveraged as a cache to speed up subsequent interpreter startups?
Brett Cannon and I (and others) have talked on several occasions about a possible replacement for the marshal format. Just keep in mind that there are 3 performance-impacting parts to importing a (cached) source module: disk access, unmarshal, exec. (If not cached then there's even more disk access, as well as a marshal step.) It will be hard for a replacement to get past 2x performance improvement without solving all three.
These were all considerations on my mind in the early days of PyOxidizer when I realized that the frozen importer and zip importers were lacking the features I desired and I would need to find an alternative solution.
One thing led to another and I have incrementally developed the "Python packed resources" data format (https://pyoxidizer.readthedocs.io/en/stable/pyoxidizer_packed_resources.html). This is a binary format for representing Python source code, bytecode, resource files, extension modules, even shared libraries that extension modules rely on!
Coupled with this format is the oxidized_importer meta path finder (https://pypi.org/project/oxidized-importer/ and https://pyoxidizer.readthedocs.io/en/latest/oxidized_importer.html) capable of servicing imports and resource loading from these "Python packed resources" data structures.
From a super high level, PyOxidizer assembles an instance of "Python packed resources" containing the CPython standard library and any additional Python packages you point it at and produces an executable with a main() that starts a Python interpreter, configures oxidized_importer.OxidizedFinder to read from the configured packed resources data structure (which may be embedded in the binary or loaded from a mmap()d file), and invokes some Python code inside to run your application.
oxidized_importer has an API for reading and writing "Python packed resources" data structures. You can even use it to build your own PyOxidizer-like utilities (https://pyoxidizer.readthedocs.io/en/latest/oxidized_importer_freezing_appli...).
This is great stuff! I'm sure a deep look at it is in order. :)
I bring this work up because I believe that if you set yourself on a path to build a performant and fully featured importer/finder, you will inevitably build something with properties very similar to what I have built. To be uncompromising on performance, you'll want to roll your own data format that is in tune with Python's specific needs and avoids I/O and overhead when possible. To fully support the long-tail of features in Python's importing mechanism, you need the ability to richly - and efficiently - express metadata like whether a module is a package. It is possible to shoehorn this [meta]data into formats like tar and zip. But it won't be as efficient as rolling your own data structure. And when it comes to interpreter startup overhead, performance does matter.
Am I suggesting CPython use oxidized_importer? No. It is implemented in Rust and CPython can't take a Rust dependency.
Am I suggesting CPython support the "Python packed resources" data format as-is? No. The exact format today isn't suitable for CPython: I didn't design it with consideration for use beyond PyOxidizer's use case and there are still a ton of missing features.
What I am suggesting is that Python developers think about the idea of standardizing a Python-centric container format for holding "Python resources" and a built-in/stdlib meta path finder for using it. Think of this as "frozen/zip importer 2.0" but with a more strongly defined and portable data format that is detached from C struct definitions. This could potentially solve a lot of problems around startup/import performance. And if you wanted to extend it to packaging/distribution, I think it could solve a lot of problems there too. (If you designed the format properly, I think it would be possible to converge with the use case of wheels.) (But I understand the skepticism about making the leap to packaging: that is an absurdly complex problem space!)
If this idea sounds radical to you, I get the skepticism. I didn't want to incur this work/complexity when writing PyOxidizer either. But a long series of investigations and ruling out alternatives lead me down this path. With the benefit of hindsight I believe the type of solution is sound and it is inevitable Python gains something like this in the standard library or at least sees something like this in wide use in the wild. I say that because multiple Python app distribution tools have reinvented solutions to the general problem of "package multiple modules/resources in a single, efficient-to-load file/binary" in different ways because the solutions in the standard library (frozen and zip importers) or package distribution (wheels or eggs) just aren't sufficient because they each lack critical features. oxidized_importer _might_ be the most robust of these solutions to also be available as a standalone package on PyPI.
This doesn't sound radical at all! The concept makes sense and my gut tells me there is a good solution out there. As always with Python core development, it's a matter of finding volunteers to drive the effort. :)
I would encourage you to play around with oxidized_importer outside the context of PyOxidizer. I think you'll be pleasantly surprised by its performance and ability to emulate most of the common parts of the importlib APIs. The API for working with "Python packed resources" data structures isn't great. But only because I haven't spent much effort in making it so.
I believe there's a path to adding a meta path importer to the stdlib that - like oxidized_importer - reads resource data from a well-defined data structure while retaining the performance of the frozen importer with the full feature set of PathFinder. I would suggest this as a better longer term solution than trying to incrementally evolve the frozen or zip importers to fit this use case. You could probably implement most of it in Python and freeze the bytecode into the interpreter like we do with PathFinder, leaving only the performance-sensitive parser to be implemented in C.
All that being said, what I advocate for is obviously a lot of scope bloat versus doing some quick work to enable use of the frozen importer on a few dozen stdlib modules to speed up interpreter startup as is being discussed in issue45020. The practical engineer in me supports doing the quick and dirty solution now for the quick win. But I do encourage thinking bigger towards longer-term solutions, especially if you find yourself tempted to incrementally add features to frozen importer. I believe there is a market need for a stdlib meta path importer that reads a highly optimized and portable format similar to the solutions I've devised for PyOxidizer. Let me know how I can help incorporate one in the standard library.
Thanks again for bringing this up. Finding a good solution for this is definitely on my mind, even if diving in isn't an option right now. I for one would be glad to chat about this if you need more feedback. -eric