A fast startup patch (was: Python startup time)
Hello, Yesterday Neil Schemenauer mentioned some work that a colleague of mine (CCed) and I have done to improve CPython start-up time. Given the recent discussion, it seems timely to discuss what we are doing and whether it is of interest to other people hacking on the CPython runtime. There are many ways to reduce the start-up time overhead. For this experiment, we are specifically targeting the cost of unmarshaling heap objects from compiled Python bytecode. Our measurements show this specific cost to represent 10% to 25% of the start-up time among the applications we have examined. Our approach to eliminating this overhead is to store unmarshaled objects into the data segment of the python executable. We do this by processing the compiled python bytecode for a module, creating native object code with the unmarshaled objects in their in-memory representation, and linking this into the python executable. When a module is imported, we simply return a pointer to the top-level code object in the data segment directly without invoking the unmarshaling code or touching the file system. What we are doing is conceptually similar to the existing capability to freeze a module, but we avoid non-trivial unmarshaling costs. The patch is still under development and there is still a little bit more work to do. With that in mind, the numbers look good but please take these with a grain of salt Baseline $ bench "./python.exe -c ''" benchmarking ./python.exe -c '' time 31.46 ms (31.24 ms .. 31.78 ms) 1.000 R² (0.999 R² .. 1.000 R²) mean 32.08 ms (31.82 ms .. 32.63 ms) std dev 778.1 μs (365.6 μs .. 1.389 ms) $ bench "./python.exe -c 'import difflib'" benchmarking ./python.exe -c 'import difflib' time 32.82 ms (32.64 ms .. 33.02 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 33.17 ms (33.01 ms .. 33.44 ms) std dev 430.7 μs (233.8 μs .. 675.4 μs) With our patch $ bench "./python.exe -c ''" benchmarking ./python.exe -c '' time 24.86 ms (24.62 ms .. 25.08 ms) 0.999 R² (0.999 R² .. 1.000 R²) mean 25.58 ms (25.36 ms .. 25.94 ms) std dev 592.8 μs (376.2 μs .. 907.8 μs) $ bench "./python.exe -c 'import difflib'" benchmarking ./python.exe -c 'import difflib' time 25.30 ms (25.00 ms .. 25.55 ms) 0.999 R² (0.998 R² .. 1.000 R²) mean 26.78 ms (26.30 ms .. 27.64 ms) std dev 1.413 ms (747.5 μs .. 2.250 ms) variance introduced by outliers: 20% (moderately inflated) Here are some numbers with the patch but with the stat calls preserved to isolate just the marshaling effects Baseline $ bench "./python.exe -c 'import difflib'" benchmarking ./python.exe -c 'import difflib' time 34.67 ms (33.17 ms .. 36.52 ms) 0.995 R² (0.990 R² .. 1.000 R²) mean 35.36 ms (34.81 ms .. 36.25 ms) std dev 1.450 ms (1.045 ms .. 2.133 ms) variance introduced by outliers: 12% (moderately inflated) With our patch (and calls to stat) $ bench "./python.exe -c 'import difflib'" benchmarking ./python.exe -c 'import difflib' time 30.24 ms (29.02 ms .. 32.66 ms) 0.988 R² (0.968 R² .. 1.000 R²) mean 31.86 ms (31.13 ms .. 32.75 ms) std dev 1.789 ms (1.329 ms .. 2.437 ms) variance introduced by outliers: 17% (moderately inflated) (This work was done in CPython 3.6 and we are exploring back-porting to 2.7 so we can run the hg startup benchmarks in the performance test suite.) This is effectively a drop-in replacement for the frozen module capability and (so far) required only minimal changes to the runtime. To us, it seems like a very nice win without compromising on compatibility or complexity. I am happy to discuss more of the technical details until we have a public patch available. I hope this provides some optimism around the possibility of improving the start-up time of CPython. What do you all think? Kindly, Carl
+1 to the concept! On Thu, May 3, 2018 at 1:13 PM, Carl Shapiro <carl.shapiro@gmail.com> wrote:
Hello,
Yesterday Neil Schemenauer mentioned some work that a colleague of mine (CCed) and I have done to improve CPython start-up time. Given the recent discussion, it seems timely to discuss what we are doing and whether it is of interest to other people hacking on the CPython runtime.
There are many ways to reduce the start-up time overhead. For this experiment, we are specifically targeting the cost of unmarshaling heap objects from compiled Python bytecode. Our measurements show this specific cost to represent 10% to 25% of the start-up time among the applications we have examined.
Our approach to eliminating this overhead is to store unmarshaled objects into the data segment of the python executable. We do this by processing the compiled python bytecode for a module, creating native object code with the unmarshaled objects in their in-memory representation, and linking this into the python executable.
When a module is imported, we simply return a pointer to the top-level code object in the data segment directly without invoking the unmarshaling code or touching the file system. What we are doing is conceptually similar to the existing capability to freeze a module, but we avoid non-trivial unmarshaling costs.
The patch is still under development and there is still a little bit more work to do. With that in mind, the numbers look good but please take these with a grain of salt
Baseline
$ bench "./python.exe -c ''" benchmarking ./python.exe -c '' time 31.46 ms (31.24 ms .. 31.78 ms) 1.000 R² (0.999 R² .. 1.000 R²) mean 32.08 ms (31.82 ms .. 32.63 ms) std dev 778.1 μs (365.6 μs .. 1.389 ms)
$ bench "./python.exe -c 'import difflib'" benchmarking ./python.exe -c 'import difflib' time 32.82 ms (32.64 ms .. 33.02 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 33.17 ms (33.01 ms .. 33.44 ms) std dev 430.7 μs (233.8 μs .. 675.4 μs)
With our patch
$ bench "./python.exe -c ''" benchmarking ./python.exe -c '' time 24.86 ms (24.62 ms .. 25.08 ms) 0.999 R² (0.999 R² .. 1.000 R²) mean 25.58 ms (25.36 ms .. 25.94 ms) std dev 592.8 μs (376.2 μs .. 907.8 μs)
$ bench "./python.exe -c 'import difflib'" benchmarking ./python.exe -c 'import difflib' time 25.30 ms (25.00 ms .. 25.55 ms) 0.999 R² (0.998 R² .. 1.000 R²) mean 26.78 ms (26.30 ms .. 27.64 ms) std dev 1.413 ms (747.5 μs .. 2.250 ms) variance introduced by outliers: 20% (moderately inflated)
Here are some numbers with the patch but with the stat calls preserved to isolate just the marshaling effects
Baseline
$ bench "./python.exe -c 'import difflib'" benchmarking ./python.exe -c 'import difflib' time 34.67 ms (33.17 ms .. 36.52 ms) 0.995 R² (0.990 R² .. 1.000 R²) mean 35.36 ms (34.81 ms .. 36.25 ms) std dev 1.450 ms (1.045 ms .. 2.133 ms) variance introduced by outliers: 12% (moderately inflated)
With our patch (and calls to stat)
$ bench "./python.exe -c 'import difflib'" benchmarking ./python.exe -c 'import difflib' time 30.24 ms (29.02 ms .. 32.66 ms) 0.988 R² (0.968 R² .. 1.000 R²) mean 31.86 ms (31.13 ms .. 32.75 ms) std dev 1.789 ms (1.329 ms .. 2.437 ms) variance introduced by outliers: 17% (moderately inflated)
(This work was done in CPython 3.6 and we are exploring back-porting to 2.7 so we can run the hg startup benchmarks in the performance test suite.)
This is effectively a drop-in replacement for the frozen module capability and (so far) required only minimal changes to the runtime. To us, it seems like a very nice win without compromising on compatibility or complexity. I am happy to discuss more of the technical details until we have a public patch available.
I hope this provides some optimism around the possibility of improving the start-up time of CPython. What do you all think?
Kindly,
Carl
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ greg%40krypto.org
On 4 May 2018 at 06:13, Carl Shapiro <carl.shapiro@gmail.com> wrote:
Hello,
Yesterday Neil Schemenauer mentioned some work that a colleague of mine (CCed) and I have done to improve CPython start-up time. Given the recent discussion, it seems timely to discuss what we are doing and whether it is of interest to other people hacking on the CPython runtime.
There are many ways to reduce the start-up time overhead. For this experiment, we are specifically targeting the cost of unmarshaling heap objects from compiled Python bytecode. Our measurements show this specific cost to represent 10% to 25% of the start-up time among the applications we have examined.
Our approach to eliminating this overhead is to store unmarshaled objects into the data segment of the python executable. We do this by processing the compiled python bytecode for a module, creating native object code with the unmarshaled objects in their in-memory representation, and linking this into the python executable.
When a module is imported, we simply return a pointer to the top-level code object in the data segment directly without invoking the unmarshaling code or touching the file system. What we are doing is conceptually similar to the existing capability to freeze a module, but we avoid non-trivial unmarshaling costs.
This definitely seems interesting, but is it something you'd be seeing us being able to take advantage of for conventional Python installations, or is it more something you'd expect to be useful for purpose-built interpreter instances? (e.g. if Mercurial were running their own Python, they could precache the heap objects for their commonly imported modules in their custom interpreter binary, regardless of whether those were standard library modules or not). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Fri, May 4, 2018 at 5:14 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
This definitely seems interesting, but is it something you'd be seeing us being able to take advantage of for conventional Python installations, or is it more something you'd expect to be useful for purpose-built interpreter instances? (e.g. if Mercurial were running their own Python, they could precache the heap objects for their commonly imported modules in their custom interpreter binary, regardless of whether those were standard library modules or not).
Yes, this would be a win for a conventional Python installation as well. Specifically, users and their scripts would enjoy a reduction in cold-startup time. In the numbers I showed yesterday, the version of the interpreter with our patch applied included unmarshaled data for the modules that always appear on the sys.modules list after an ordinary interpreter cold-start. I believe it is worthwhile to including that set of modules in the standard CPython interpreter build. Expanding that set to include the commonly imported modules might be an additional win, especially for short-running scripts.
What are the obstacles to including "preloaded" objects in regular .pyc files, so that everyone can take advantage of this without rebuilding the interpreter? Off the top of my head: We'd be making the in-memory layout of those objects part of the .pyc format, so we couldn't change that within a minor release. I suspect this wouldn't be a big change though, since we already commit to ABI compatibility for C extensions within a minor release? In principle there are some cases where this would be different (e.g. adding new fields at the end of an object is generally ABI compatible), but this might not be an issue for the types of objects we're talking about. There's some memory management concern, since these are, y'know, heap objects, and we wouldn't be heap allocating them. The main constraint would be that you couldn't free them one at a time, but would have to free the whole block at once. But I think it at least wouldn't be too hard to track whether any of the objects in the block are still alive, and free the whole block if there aren't any. E.g., we could have an object flag that means "when this object is freed, don't call free(), instead find the containing block and decrement its live-object count. You probably need this flag even in the current version, right? (And the flag could also be an escape hatch if we did need to change object size: check for the flag before accessing the new fields.) Or maybe you could get clever tracking object liveness on an page by page basis; not sure it's worth it though. Unloading module-level objects is pretty rare. I'm assuming these objects can have pointers to each other, and to well known constants like None, so you need some kind of relocation engine to fix those up. Right now I guess you're probably using the one built into the dynamic loader? In theory it shouldn't be too hard to write our own – basically just a list of offsets in the block where we need to add the base address or write the address of a well known constant, I think? Anything else I'm missing? On Fri, May 4, 2018, 16:06 Carl Shapiro <carl.shapiro@gmail.com> wrote:
On Fri, May 4, 2018 at 5:14 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
This definitely seems interesting, but is it something you'd be seeing us being able to take advantage of for conventional Python installations, or is it more something you'd expect to be useful for purpose-built interpreter instances? (e.g. if Mercurial were running their own Python, they could precache the heap objects for their commonly imported modules in their custom interpreter binary, regardless of whether those were standard library modules or not).
Yes, this would be a win for a conventional Python installation as well. Specifically, users and their scripts would enjoy a reduction in cold-startup time.
In the numbers I showed yesterday, the version of the interpreter with our patch applied included unmarshaled data for the modules that always appear on the sys.modules list after an ordinary interpreter cold-start. I believe it is worthwhile to including that set of modules in the standard CPython interpreter build. Expanding that set to include the commonly imported modules might be an additional win, especially for short-running scripts.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/njs%40pobox.com
On May 4, 2018 16:06, "Carl Shapiro" <carl.shapiro@gmail.com> wrote: On Fri, May 4, 2018 at 5:14 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
This definitely seems interesting, but is it something you'd be seeing us being able to take advantage of for conventional Python installations, or is it more something you'd expect to be useful for purpose-built interpreter instances? (e.g. if Mercurial were running their own Python, they could precache the heap objects for their commonly imported modules in their custom interpreter binary, regardless of whether those were standard library modules or not).
Yes, this would be a win for a conventional Python installation as well. Specifically, users and their scripts would enjoy a reduction in cold-startup time. In the numbers I showed yesterday, the version of the interpreter with our patch applied included unmarshaled data for the modules that always appear on the sys.modules list after an ordinary interpreter cold-start. I believe it is worthwhile to including that set of modules in the standard CPython interpreter build. Expanding that set to include the commonly imported modules might be an additional win, especially for short-running scripts. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/njs%40pobox.com
On 5 May 2018 at 11:58, Nathaniel Smith <njs@pobox.com> wrote:
What are the obstacles to including "preloaded" objects in regular .pyc files, so that everyone can take advantage of this without rebuilding the interpreter?
Off the top of my head:
We'd be making the in-memory layout of those objects part of the .pyc format, so we couldn't change that within a minor release. I suspect this wouldn't be a big change though, since we already commit to ABI compatibility for C extensions within a minor release? In principle there are some cases where this would be different (e.g. adding new fields at the end of an object is generally ABI compatible), but this might not be an issue for the types of objects we're talking about.
I'd frame this one a bit differently: what if we had a platform-specific variant of the pyc format that was essentially a frozen module packaged as an extension module? We probably couldn't quite do that for arbitrary Python modules *today* (due to the remaining capability differences between regular modules and extension modules), but multi-phase initialisation gets things *much* closer to parity, and running embedded bytecode instead of accessing the C API directly should avoid the limitations that exist for classes defined in C. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, May 5, 2018 at 10:30 AM, Toshio Kuratomi <a.badger@gmail.com> wrote:
On Fri, May 4, 2018, 7:00 PM Nathaniel Smith <njs@pobox.com> wrote:
What are the obstacles to including "preloaded" objects in regular .pyc files, so that everyone can take advantage of this without rebuilding the interpreter?
Would this make .pyc files arch specific?
Or have parallel "pyh" (Python "heap") files, that are architecture specific... (But that would cost more stat calls.)
On Sat, May 5, 2018, 10:40 AM Eric Fahlgren <ericfahlgren@gmail.com> wrote:
On Sat, May 5, 2018 at 10:30 AM, Toshio Kuratomi <a.badger@gmail.com> wrote:
On Fri, May 4, 2018, 7:00 PM Nathaniel Smith <njs@pobox.com> wrote:
What are the obstacles to including "preloaded" objects in regular .pyc files, so that everyone can take advantage of this without rebuilding the interpreter?
Would this make .pyc files arch specific?
Or have parallel "pyh" (Python "heap") files, that are architecture specific... (But that would cost more stat calls.)
I ask because arch specific byte code files are a big change in consumers expectations. It's not necessarily a bad change but it should be communicated to downstreams so they can decide how to adjust to it. Linux distros which ship byte code files will need to build them for each arch, for instance. People who ship just the byte code as an obfuscation of the source code will need to decide whether to ship packages for each arch they care about or change how they distribute. -Toshio
On Sat, May 5, 2018, 11:34 Toshio Kuratomi <a.badger@gmail.com> wrote:
On Sat, May 5, 2018, 10:40 AM Eric Fahlgren <ericfahlgren@gmail.com> wrote:
On Sat, May 5, 2018 at 10:30 AM, Toshio Kuratomi <a.badger@gmail.com> wrote:
On Fri, May 4, 2018, 7:00 PM Nathaniel Smith <njs@pobox.com> wrote:
What are the obstacles to including "preloaded" objects in regular .pyc files, so that everyone can take advantage of this without rebuilding the interpreter?
Would this make .pyc files arch specific?
Or have parallel "pyh" (Python "heap") files, that are architecture specific... (But that would cost more stat calls.)
I ask because arch specific byte code files are a big change in consumers expectations. It's not necessarily a bad change but it should be communicated to downstreams so they can decide how to adjust to it.
Linux distros which ship byte code files will need to build them for each arch, for instance. People who ship just the byte code as an obfuscation of the source code will need to decide whether to ship packages for each arch they care about or change how they distribute.
That's a good point. One way to minimize the disruption would be to include both the old and new info in the .pyc files, so at load time if the new version is incompatible then you can fall back on the old way, even if it's a bit slower. I think in the vast majority of cases currently .pyc files are built on the same architecture where they're used? Pip and Debian/Ubuntu and the interpreter's automatic compilation-on-import all build .pyc files on the computer where they'll be run. It might also be worth double checking much the memory layout of these objects even varies. Obviously it'll be different for 32- and 64-bit systems, but beyond that, most ISAs and OSes and compilers use pretty similar struct layout rules AFAIK... we're not talking about actual machine code. -n
On 5.5.2018 21:00, Nathaniel Smith wrote:
I think in the vast majority of cases currently .pyc files are built on the same architecture where they're used?
On Fedora (and by extension also on RHEL and CentOS) this is not rue. When the package is noarch (no extension module shipped, only pure Python) it is built and bytecompiled on a random architecture. Bytecompilation happens during build time. If bytecode gets arch specific, we'd need to make all our Python packages arch specific or switch to install-time bytecompilation. -- Miro Hrončok -- Phone: +420777974800 IRC: mhroncok
On 5/5/2018 2:33 PM, Toshio Kuratomi wrote:
On Sat, May 5, 2018, 10:40 AM Eric Fahlgren <ericfahlgren@gmail.com <mailto:ericfahlgren@gmail.com>> wrote:
On Sat, May 5, 2018 at 10:30 AM, Toshio Kuratomi <a.badger@gmail.com <mailto:a.badger@gmail.com>>wrote:
On Fri, May 4, 2018, 7:00 PM Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote:
What are the obstacles to including "preloaded" objects in regular .pyc files, so that everyone can take advantage of this without rebuilding the interpreter?
Would this make .pyc files arch specific?
Or have parallel "pyh" (Python "heap") files, that are architecture specific... (But that would cost more stat calls.)
I ask because arch specific byte code files are a big change in consumers expectations. It's not necessarily a bad change but it should be communicated to downstreams so they can decide how to adjust to it.
Linux distros which ship byte code files will need to build them for each arch, for instance. People who ship just the byte code as an obfuscation of the source code will need to decide whether to ship packages for each arch they care about or change how they distribute.
It is an advertised feature that CPython *can* produce cross-platform version-specific .pyc files. I believe this should continue, at least for a few releases. They are currently named modname.cpython-xy.pyc, with optional '.opt-1', '.opt-2', and '.opt-4' tags inserted before in __pycache__. These name formats should continue to mean what they do now. I believe *can* should not mean *always*. Architecture-specific files will need an additional architecture tag anyway, such as win32 and win64, anyway. Or would bitness and endianess be sufficient across platforms? If we make architecture-specific the default, we could add startup and compile and compile_all options for the cross-platform format. Or maybe add a recompile function that imports cross-platform .pycs and outputs local-architecture .pycs. -- Terry Jan Reedy
On Sat, 5 May 2018 at 10:41 Eric Fahlgren <ericfahlgren@gmail.com> wrote:
On Sat, May 5, 2018 at 10:30 AM, Toshio Kuratomi <a.badger@gmail.com> wrote:
On Fri, May 4, 2018, 7:00 PM Nathaniel Smith <njs@pobox.com> wrote:
What are the obstacles to including "preloaded" objects in regular .pyc files, so that everyone can take advantage of this without rebuilding the interpreter?
Would this make .pyc files arch specific?
Or have parallel "pyh" (Python "heap") files, that are architecture specific...
.pyc files have tags to specify details about them (e.g. were they compiled with -OO), so this isn't an "all or nothing" option, nor does it require a different file extension. There just needs to be an appropriate finder that knows how to recognize a .pyc file with the appropriate tag that can be used, and then a loader that knows how to read that .pyc.
(But that would cost more stat calls.)
Nope, we actually cache directory contents so file lookup existence is essentially free (this is why importlib.invalidate_caches() exists specifically to work around when the timestamp is too coarse for a directory content mutation). -Brett
On 6 May 2018 at 05:34, Brett Cannon <brett@python.org> wrote:
On Sat, 5 May 2018 at 10:41 Eric Fahlgren <ericfahlgren@gmail.com> wrote:
On Sat, May 5, 2018 at 10:30 AM, Toshio Kuratomi <a.badger@gmail.com> wrote:
On Fri, May 4, 2018, 7:00 PM Nathaniel Smith <njs@pobox.com> wrote:
What are the obstacles to including "preloaded" objects in regular .pyc files, so that everyone can take advantage of this without rebuilding the interpreter?
Would this make .pyc files arch specific?
Or have parallel "pyh" (Python "heap") files, that are architecture specific...
.pyc files have tags to specify details about them (e.g. were they compiled with -OO), so this isn't an "all or nothing" option, nor does it require a different file extension. There just needs to be an appropriate finder that knows how to recognize a .pyc file with the appropriate tag that can be used, and then a loader that knows how to read that .pyc.
Right, this is the kind of change I had in mind (perhaps in combination with Diana Clarke's suggestion from several months back to make pyc tagging more feature-flag centric, rather than the current focus on a numeric optimisation level). We also wouldn't ever generate this hypothetical format implicitly - similar to the new deterministic pyc's in 3.7, they'd be something you had to explicitly request via a compileall invocation. In the Linux distro use case then, the relevant distro packaging helper scripts and macros could generate traditional cross-platform pyc files for no-arch packages, but automatically switch to the load-time optimised arch-specific format if the package was already arch-specific. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 5/5/2018 10:30 AM, Toshio Kuratomi wrote:
On Fri, May 4, 2018, 7:00 PM Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote:
What are the obstacles to including "preloaded" objects in regular .pyc files, so that everyone can take advantage of this without rebuilding the interpreter?
Would this make .pyc files arch specific?
Lots of room in the __pycache__ folder. As compilation of the .py module proceeds, could it be determined if there is anything that needs to be architecture specific, and emit an architecture-specific one or an architecture-independent one as appropriate? Data structures are mostly bitness-dependent, no? But if an architecture-specific .pyc is required, could/should it be structured and named according to the OS conventions also: .dll .so .etc ? Even if it doesn't contain executable code, the bytecode could be contained in appropriate data sections, and there has been talk about doing relocation of pointer in such pre-compiled data structures, and the linker _already_ can do that sort of thing...
On Fri, May 4, 2018 at 6:58 PM, Nathaniel Smith <njs@pobox.com> wrote:
What are the obstacles to including "preloaded" objects in regular .pyc files, so that everyone can take advantage of this without rebuilding the interpreter?
The system we have developed can create a shared object file for each compiled Python file. However, such a representation is not directly usable. First, certain shared constants, such as interned strings, must be kept globally unique across object code files. Second, some marshaled objects, such as the hashed collections, must be initialized with randomization state that is not available until after the hosting runtime has been initialized. We are able to work around the first issue by generating a heap image with the transitive closure of all modules that will be loaded which allows us to easily maintain uniqueness guarantees. We are able to work around the second issue with some unobservable changes to the affected data structures. Based on our numbers, it appears there should be some hesitancy--at this time--to changing the format of compiled Python file for the sake of load-time performance. In contrast, the data shows that a focused change to address file system inefficiencies has the potential to broadly and transparently deliver benefit to users without affecting existing code or workflows.
“the data shows that a focused change to address file system inefficiencies has the potential to broadly and transparently deliver benefit to users without affecting existing code or workflows.” This is consistent with a Node.js experiment I heard about where they compiled an entire application in a single (HUGE!) .js file. Reading a single large file from disk is quicker than many small files on every significant file system I’m aware of. Is there benefit to supporting import of .tar files as we currently do .zip? Or perhaps having a special fast-path for uncompressed .zip files? Top-posted from my Windows phone From: Carl Shapiro Sent: Monday, May 7, 2018 14:36 To: Nathaniel Smith Cc: Nick Coghlan; Python Dev Subject: Re: [Python-Dev] A fast startup patch (was: Python startup time) On Fri, May 4, 2018 at 6:58 PM, Nathaniel Smith <njs@pobox.com> wrote: What are the obstacles to including "preloaded" objects in regular .pyc files, so that everyone can take advantage of this without rebuilding the interpreter? The system we have developed can create a shared object file for each compiled Python file. However, such a representation is not directly usable. First, certain shared constants, such as interned strings, must be kept globally unique across object code files. Second, some marshaled objects, such as the hashed collections, must be initialized with randomization state that is not available until after the hosting runtime has been initialized. We are able to work around the first issue by generating a heap image with the transitive closure of all modules that will be loaded which allows us to easily maintain uniqueness guarantees. We are able to work around the second issue with some unobservable changes to the affected data structures. Based on our numbers, it appears there should be some hesitancy--at this time--to changing the format of compiled Python file for the sake of load-time performance. In contrast, the data shows that a focused change to address file system inefficiencies has the potential to broadly and transparently deliver benefit to users without affecting existing code or workflows.
participants (11)
-
Brett Cannon
-
Carl Shapiro
-
Eric Fahlgren
-
Glenn Linderman
-
Gregory P. Smith
-
Miro Hrončok
-
Nathaniel Smith
-
Nick Coghlan
-
Steve Dower
-
Terry Reedy
-
Toshio Kuratomi