[Distutils]Introducing XAR - SquashFS based mountable executables - Calling OS/Distro Maintainers
Hi distutils, Today Facebook Open Sourced a competitor to PEX and other zip based distribution methods for Python (and potentially other languages). Basically it's claim to fame is start up time for large modules being similar to regular on file system modules due to extracting on read via SquashFS mounted executables. For more information read our blog post: - https://code.fb.com/data-infrastructure/xars-a-more-efficient-open-source-sy... https://code.fb.com/data-infrastructure/xars-a-more-efficient-open-source-sy... The bdist_xar 'wheel' like plugin is also now in PyPI: https://pypi.org/project/xar/ https://pypi.org/project/xar/ (https://github.com/facebookincubator/xar/ https://github.com/facebookincubator/xar/). Main reason I wanted to post here was I'd love to reach out to our OS maintainers on list (e.g. Mr Stinner) and talk about getting the components of XAR into the OS's package repos. The main components that make sense are: - xarexec_fuse: https://github.com/facebookincubator/xar/ https://github.com/facebookincubator/xar/ - squashfuse: a newer version with squashfuse_ll (and optionally zstd support) https://github.com/vasi/squashfuse https://github.com/vasi/squashfuse - squashfs-tools: a newer version with zstd support (made optional I guess) Once we had this, it would make building XARs as easy as building wheels, even more so when we can define OS level dependencies for it! Please feel free to reach out to me directly with any questions etc. - Also want to note, Sumanah suggested I floated this here, I apologize if this is a misuse of the list. If so, please ignore. I was torn wether or not this is appropriate. Thanks, Cooper
On 14 July 2018 at 06:51, Cooper Ry Lees
Hi distutils,
Today Facebook Open Sourced a competitor to PEX and other zip based distribution methods for Python (and potentially other languages). Basically it's claim to fame is start up time for large modules being similar to regular on file system modules due to extracting on read via SquashFS mounted executables. For more information read our blog post: - https://code.fb.com/data-infrastructure/xars-a-more-efficient-open-source-sy...
The bdist_xar 'wheel' like plugin is also now in PyPI: https://pypi.org/project/xar/ (https://github.com/facebookincubator/xar/).
Main reason I wanted to post here was I'd love to reach out to our OS maintainers on list (e.g. Mr Stinner) and talk about getting the components of XAR into the OS's package repos. The main components that make sense are: - xarexec_fuse: https://github.com/facebookincubator/xar/ - squashfuse: a newer version with squashfuse_ll (and optionally zstd support) https://github.com/vasi/squashfuse - squashfs-tools: a newer version with zstd support (made optional I guess)
For Fedora and derivatives, I'll forward your post over to the fedora-python list at https://lists.fedorahosted.org/archives/list/python-devel@lists.fedoraprojec... (since getting XAR building supported in Fedora would be the typical first step towards getting it supported in any Fedora downstreams). A quick internet search didn't give me a clear answer on whether or not Fuse was properly supported in WSL yet, so it would be interesting to know whether or not xar files were WSL compatible, or if they still need a full Linux kernel for now.
Once we had this, it would make building XARs as easy as building wheels, even more so when we can define OS level dependencies for it!
Please feel free to reach out to me directly with any questions etc. - Also want to note, Sumanah suggested I floated this here, I apologize if this is a misuse of the list. If so, please ignore. I was torn wether or not this is appropriate.
Making folks aware of new contenders in the Python-app-distribution space is definitely a reasonable use of the distutils-sig list :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Cooper Ry Lees wrote on 7/13/18 13:51:
Today Facebook Open Sourced a competitor to PEX and other zip based distribution methods for Python (and potentially other languages). Basically it's claim to fame is start up time for large modules being similar to regular on file system modules due to extracting on read via SquashFS mounted executables. For more information read our blog post: - https://code.fb.com/data-infrastructure/xars-a-more-efficient-open-source-sy... https://code.fb.com/data-infrastructure/xars-a-more-efficient-open-source-sy...
This is really interesting! Thanks for releasing it and announcing it. I should mention that LinkedIn (my day job) recently released 'shiv' as a more modern, and in our case much faster, zipapp container similar to pex. http://shiv.readthedocs.io/en/latest/ I have some quick questions about XAR: * How do you achieve faster hot start times with XAR over native file system? That's a bit unexpected, although based on our shiv work, I can imagine some things about how you start Python or lay out the code that might provide better hot start times (e.g. fewer entries on sys.path and a fanatical avoidance of pkg_resources). OTOH, I'd think that relying on FUSE would impose some additional overhead over native file systems. * Is there any practical operational or performance limits on the number of mounted XARs you can have? E.g. what's the impact of deploying say a few thousand XARs on any particular machine (generally, of Linux and macOS - I'm not as concerned about Windows :) * Do you just put the XAR mountpoint's bin on everyone's $PATH or do you symlink those bins into a standard $PATH location? Cheers, -Barry
Hi Barry, I'm aware of shiv, thus my "and other zip based distribution methods." Also, you're welcome to come and meet and greet with XAR people if you want to know all the ins and outs here on campus. We're not far away + I'm always in pypa-dev + can use that Twitter thing :) Specific answers below.
On Jul 16, 2018, at 3:56 PM, Barry Warsaw
wrote: Cooper Ry Lees wrote on 7/13/18 13:51:
Today Facebook Open Sourced a competitor to PEX and other zip based distribution methods for Python (and potentially other languages). Basically it's claim to fame is start up time for large modules being similar to regular on file system modules due to extracting on read via SquashFS mounted executables. For more information read our blog post: - https://code.fb.com/data-infrastructure/xars-a-more-efficient-open-source-sy... https://code.fb.com/data-infrastructure/xars-a-more-efficient-open-source-sy...
This is really interesting! Thanks for releasing it and announcing it. I should mention that LinkedIn (my day job) recently released 'shiv' as a more modern, and in our case much faster, zipapp container similar to pex.
http://shiv.readthedocs.io/en/latest/
I have some quick questions about XAR:
* How do you achieve faster hot start times with XAR over native file system? That's a bit unexpected, although based on our shiv work, I can imagine some things about how you start Python or lay out the code that might provide better hot start times (e.g. fewer entries on sys.path and a fanatical avoidance of pkg_resources). OTOH, I'd think that relying on FUSE would impose some additional overhead over native file systems.
I usually only see similar start times to native file system, sometimes only slightly faster. Our numbers published are probably on modules to small to be super accurate for all use cases.
* Is there any practical operational or performance limits on the number of mounted XARs you can have? E.g. what's the impact of deploying say a few thousand XARs on any particular machine (generally, of Linux and macOS - I'm not as concerned about Windows :) We have tiers (think of that as a collection of servers for a particular service) that have 100s mounted at times. Have not hit 1000s - But have not seen any huge problems with 100s. What do your servers do that would need 1000s of XARs running at once? Seems like that could be optimized.
* Do you just put the XAR mountpoint's bin on everyone's $PATH or do you symlink those bins into a standard $PATH location?
Standard PATH location via RPM installs.
Cheers, -Barry -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-leave@python.org https://mail.python.org/mm3/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/mm3/archives/list/distutils-sig@python.org/message/7...
* Is there any practical operational or performance limits on the number of mounted XARs you can have? E.g. what's the impact of deploying say a few thousand XARs on any particular machine (generally, of Linux and macOS - I'm not as concerned about Windows :) We have tiers (think of that as a collection of servers for a particular service) that have 100s mounted at times. Have not hit 1000s - But have not seen any huge problems with 100s. What do your servers do that would need 1000s of XARs running at once? Seems like that could be optimized.
I’m not thinking about just our server deployed applications, but our command line tools too. Combined, that’s roughly in the ballpark of 1000 or so Python applications (more if we count other languages).
* Do you just put the XAR mountpoint's bin on everyone's $PATH or do you symlink those bins into a standard $PATH location?
Standard PATH location via RPM installs.
Cool, thanks! -Barry
This is very cool. I'm glad to see attention towards packaging useful
applications and not just libraries for web development in a virtualenv,
and the limitations of zip compression are felt by our Python applications
comprised of many separately compressed files.
Is bdist_xar calling out to pip to install all of a package's dependencies
(or all that are not already installed) into the xar as part of the build?
On Tue, Jul 17, 2018 at 10:48 PM Barry Warsaw
* Is there any practical operational or performance limits on the number of mounted XARs you can have? E.g. what's the impact of deploying say a few thousand XARs on any particular machine (generally, of Linux and macOS - I'm not as concerned about Windows :) We have tiers (think of that as a collection of servers for a particular service) that have 100s mounted at times. Have not hit 1000s - But have not seen any huge problems with 100s. What do your servers do that would need 1000s of XARs running at once? Seems like that could be optimized.
I’m not thinking about just our server deployed applications, but our command line tools too. Combined, that’s roughly in the ballpark of 1000 or so Python applications (more if we count other languages).
* Do you just put the XAR mountpoint's bin on everyone's $PATH or do
you symlink those bins into a standard $PATH location?
Standard PATH location via RPM installs.
Cool, thanks!
-Barry
-- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-leave@python.org https://mail.python.org/mm3/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/mm3/archives/list/distutils-sig@python.org/message/M...
On Thu, Jul 19, 2018 at 10:55 AM, Daniel Holth
This is very cool. I'm glad to see attention towards packaging useful applications and not just libraries for web development in a virtualenv, and the limitations of zip compression are felt by our Python applications comprised of many separately compressed files. Is bdist_xar calling out to pip to install all of a package's dependencies (or all that are not already installed) into the xar as part of the build?
bdist_xar doesn't install dependencies by default, but if you're missing a dependency it will prompt you to pass the --download flag, which loads the pip entry point to download (but not install) the missing dependencies as wheels or sdists. Its currently disabled by default because it was merged a day before release, but I intend on enabling it by default. For reference, the code is here https://github.com/facebookincubator/xar/blob/master/xar/pip_installer.py#L1.... -Nick
On Mon, Jul 16, 2018 at 3:56 PM, Barry Warsaw
* How do you achieve faster hot start times with XAR over native file system? That's a bit unexpected, although based on our shiv work, I can imagine some things about how you start Python or lay out the code that might provide better hot start times (e.g. fewer entries on sys.path and a fanatical avoidance of pkg_resources). OTOH, I'd think that relying on FUSE would impose some additional overhead over native file systems.
Hi all, I collected the XAR benchmark numbers. I spent some time today investigating what exactly is causing the difference between native and XAR start times. The native installation I was benchmarking against used `pkg_resources.load_entry_point()` to run the script, while XAR called the entry point directly. I didn't realize that `pip install .` behavior differs based on whether `wheel` is installed or not. When wheel is installed pip uses `bdist_wheel` to build the package and uses its own loader script which calls the entry point directly. When it isn't it uses `python setup.py install` to build the package and uses the setuptools loader script which use `pkg_resources.load_entry_point()` to call the script. If I rerun the experiment with the wheel loader script (without pkg_resources), I see these hot start times: black: 0.171 s (vs 0.208 for XAR) jupyter-nbextension: 0.165 s (vs 0.179 s for XAR) Without the pkg_resources difference we have a small overhead over native installs. Best, Nick
On Mon, Jul 16, 2018 at 06:14:52PM -0700, Nick Terrell wrote:
I collected the XAR benchmark numbers. I spent some time today investigating what exactly is causing the difference between native and XAR start times. The native installation I was benchmarking against used `pkg_resources.load_entry_point()` to run the script, while XAR called the entry point directly.
Benchmarking against pkg_resources is a bit like running a race when your opponent has an iron cannonball chained to their leg: https://github.com/pypa/setuptools/issues/510 ;) <snip updated numbers> Marius Gedminas -- A secret: don't tell DARPA I'm not building the sun destroying weapon they think I am. -- Michael Salib, the author of Starkiller
On Jul 16, 2018, at 18:14, Nick Terrell
I collected the XAR benchmark numbers. I spent some time today investigating what exactly is causing the difference between native and XAR start times. The native installation I was benchmarking against used `pkg_resources.load_entry_point()` to run the script, while XAR called the entry point directly.
black: 0.171 s (vs 0.208 for XAR) jupyter-nbextension: 0.165 s (vs 0.179 s for XAR)
Without the pkg_resources difference we have a small overhead over native installs.
Thanks Nick, that definitely jives with our analysis. pkg_resources can be a hidden source of significant overhead, imposed at import time. That’s the main reason why importlib.resources was born. Cheers, -Barry
participants (6)
-
Barry Warsaw
-
Cooper Ry Lees
-
Daniel Holth
-
Marius Gedminas
-
Nick Coghlan
-
Nick Terrell