> For practical reasons we've always needed "not zip-safe" PEX files
> where all code is written to disk prior to execution (ex:  legacy
> Django applications that have __file__-relative business logic)
> so we decided to just throw away the magical zipimport stuff and
> embrace using disk as a pure cache.

I take it this is because of dependencies (whether your own legacy code
or external dependencies) that haven't been designed to run from zip, and
where it's not worth it (or otherwise not possible) to re-engineer to run
from zip? If there were other reasons (apart from the need to maintain the
magic machinery, or problems with it), can you say what they were?

The two major reasons here are interpreter compatibility and minimizing what we felt to be hacks, which falls under "maintain the magic machinery" I suppose.  There are myriad other practical reasons.  Here are some:

Our recursive zipimporter was not very portable.  We relied upon things like a mutable zipimport._zip_directory_cache but this was both undocumented and subject to change (and was always immutable in PyPy, so we couldn't support it.)  Code that touches _zip_directory_cache has since been removed in pkg_resources (that change also broke us, around distribute 0.6.42.)  Things like the change from find_in_zip to find_in_zips broke us just a few weeks ago (setuptools 1.1.7 => 2.x.)  Our monkeypatches to fix bugs (or features, who knows?) exposed in setuptools by our recursive zipimporter e.g. https://bitbucket.org/tarek/distribute/issue/274/disparity-in-metadata-resource-_listdir usually stopped working over time, since they were patching undocumented parts of the API.  This has only accelerated recently since more attention has been drawn to that code by PEPs 426/440 etc.

Even today PyPy has bugs with large collections of zipimported namespace packages regardless of zipimport implementation.  We eventually surmounted some of these issues, but rather than be doomed to repeat them in the future, we decided to ditch our custom importer instead and rely upon the one in the stdlib when possible.

We also wrote our own stubs for .so files so that they could be extracted from nested zips:
https://github.com/twitter/commons/blob/677780ea56af610359b3ccb30accb5b9ecd713d8/src/python/twitter/common/python/distiller.py#L34
This turns out to be fairly memory intensive.  One of our most production-critical Python applications, the Thermos executor component of Aurora (see https://github.com/apache/incubator-aurora), manages much of Twitter's infrastructure and as such needs to have as small a memory footprint as possible.  The process of pulling out one .so file (a 40MB library in one of our core dependencies) took ~150MB of RAM due to zipimport making two copies of it in RAM (in order to do a string comparison) before writing it to disk as an eager resource.  This became problematic because the executor itself only needs ~45MB RSS in steady state, which is the overhead we'd prefer to allocate to its kernel container, but the reality is that we have to allocate for peak.  While 100MB extra RSS doesn't sound like much, when you multiply by O(10s) tasks per machine and O(1000s) machines, it translates to O(TBs) of RAM, which is O( :-( $ ) to Twitter.

Lastly, there are social reasons.  It's just hard to convince most engineers to use things like pkg_resources or pkgutil to manipulate resources when for them the status quo is just using __file__.  Bizarrely the social challenges are just as hard as the abovementioned technical challenges.

 
Also, is your production usage confined to POSIX platforms, or does
Windows have any role to play? If so, did you run into any particular
issues relating to Windows that might not be common knowledge?


We only target Linux and OS X in practice.  I've gotten PEX working with CPython (>=2.6 and >=3.2), PyPy and Jython in those situations.  I dabbled with IronPython briefly but never got it to work.  I don't see any really good reason why PEX files couldn't be launched on Windows but I've never tried.

~brian