[Distutils] PEX at Twitter (re: PEX - Twitter's multi-platform executable archive format for Python)

Brian Wickman wickman at gmail.com
Fri Jan 31 22:57:05 CET 2014


>
> > For practical reasons we've always needed "not zip-safe" PEX files
> > where all code is written to disk prior to execution (ex:  legacy
> > Django applications that have __file__-relative business logic)
> > so we decided to just throw away the magical zipimport stuff and
> > embrace using disk as a pure cache.
>
> I take it this is because of dependencies (whether your own legacy code
> or external dependencies) that haven't been designed to run from zip, and
> where it's not worth it (or otherwise not possible) to re-engineer to run
> from zip? If there were other reasons (apart from the need to maintain the
> magic machinery, or problems with it), can you say what they were?
>

The two major reasons here are interpreter compatibility and minimizing
what we felt to be hacks, which falls under "maintain the magic machinery"
I suppose.  There are myriad other practical reasons.  Here are some:

Our recursive zipimporter was not very portable.  We relied upon things
like a mutable zipimport._zip_directory_cache but this was both
undocumented and subject to change (and was always immutable in PyPy, so we
couldn't support it.)  Code that touches _zip_directory_cache has since
been removed in pkg_resources (that change also broke us, around distribute
0.6.42.)  Things like the change from find_in_zip to find_in_zips broke us
just a few weeks ago (setuptools 1.1.7 => 2.x.)  Our monkeypatches to fix
bugs (or features, who knows?) exposed in setuptools by our recursive
zipimporter e.g.
https://bitbucket.org/tarek/distribute/issue/274/disparity-in-metadata-resource-_listdirusually
stopped working over time, since they were patching undocumented
parts of the API.  This has only accelerated recently since more attention
has been drawn to that code by PEPs 426/440 etc.

Even today PyPy has bugs with large collections of zipimported namespace
packages regardless of zipimport implementation.  We eventually surmounted
some of these issues, but rather than be doomed to repeat them in the
future, we decided to ditch our custom importer instead and rely upon the
one in the stdlib when possible.

We also wrote our own stubs for .so files so that they could be extracted
from nested zips:
https://github.com/twitter/commons/blob/677780ea56af610359b3ccb30accb5b9ecd713d8/src/python/twitter/common/python/distiller.py#L34
This turns out to be fairly memory intensive.  One of our most
production-critical Python applications, the Thermos executor component of
Aurora (see https://github.com/apache/incubator-aurora), manages much of
Twitter's infrastructure and as such needs to have as small a memory
footprint as possible.  The process of pulling out one .so file (a 40MB
library in one of our core dependencies) took ~150MB of RAM due to
zipimport making two copies of it in RAM (in order to do a string
comparison) before writing it to disk as an eager resource.  This became
problematic because the executor itself only needs ~45MB RSS in steady
state, which is the overhead we'd prefer to allocate to its kernel
container, but the reality is that we have to allocate for peak.  While
100MB extra RSS doesn't sound like much, when you multiply by O(10s) tasks
per machine and O(1000s) machines, it translates to O(TBs) of RAM, which is
O( :-( $ ) to Twitter.

Lastly, there are social reasons.  It's just hard to convince most
engineers to use things like pkg_resources or pkgutil to manipulate
resources when for them the status quo is just using __file__.  Bizarrely
the social challenges are just as hard as the abovementioned technical
challenges.



> Also, is your production usage confined to POSIX platforms, or does
> Windows have any role to play? If so, did you run into any particular
> issues relating to Windows that might not be common knowledge?
>
>
We only target Linux and OS X in practice.  I've gotten PEX working with
CPython (>=2.6 and >=3.2), PyPy and Jython in those situations.  I dabbled
with IronPython briefly but never got it to work.  I don't see any really
good reason why PEX files couldn't be launched on Windows but I've never
tried.

~brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20140131/a75222a0/attachment.html>


More information about the Distutils-SIG mailing list