[Distutils] PEX at Twitter (re: PEX - Twitter's multi-platform executable archive format for Python)

Nick Coghlan ncoghlan at gmail.com
Sat Feb 1 03:33:29 CET 2014

On 1 February 2014 05:31, Brian Wickman <wickman at gmail.com> wrote:
> This is in response to Vinay's thread but since I wasn't subscribed to
> distutils-sig, I couldn't easily respond directly to it.
> Vinay's right, the technology here isn't revolutionary but what's notable is
> that we've been using it in production for almost 3 years at Twitter.  It's
> also been open-sourced for a couple years at
> https://github.com/twitter/commons/tree/master/src/python/twitter/common/python
> but not widely announced (it is, after all, just a small subdirectory in a
> fairly large mono-repo, and was only recently published independently to
> PyPI as twitter.common.python.)
> PEX files are just executable zip files with hashbangs containing a
> carefully constructed __main__.py and a PEX-INFO, which is json-encoded
> dictionary describing how to scrub and bootstrap sys.path and the like.
> They work equally well unpacked into a standalone directory.
> In practice PEX files are simultaneously our replacement for virtualenv and
> also our way of distributing Python applications to production.  Now we
> could use virtualenv to do this but it's hard to argue with a deployment
> process that is literally "cp".  Furthermore, most of our machines don't
> have compiler toolchains or external network access, so hermetically sealing
> all dependencies once at build time (possibly for multiple platforms since
> all developers use Macs) has huge appeal.  This is even more important at
> Twitter where it's common to run a dozen different Python applications on
> the same box at the same time, some using 2.6, some 2.7, some PyPy, but all
> with varying versions of underlying dependencies.

Ah, very interesting - this is exactly the kind of thing we were
trying to enable with the executable directory/zip file support in
Python 2.6, and then we went and forgot to cover it in the original
"What's New in Python 2.6?" doc, so an awful lot of people never
learned about the existence of the feature :P

As Daniel noted, PEP 441 is at least as much about letting people know
"Hey, Python has supported direct execution of directories and zip
archives since Python 2.6!" as it is about actually providing some
tools that better support doing things that way :)

> Speaking to recent distutils-sig threads, we used to go way out of our way
> to never hit disk (going so far as building our own .egg packager and pure
> python recursive zipimport implementation so that we could import from eggs
> within zips, write ephemeral .so's to dlopen and unlink) but we've since
> moved away from that position for simplicity's sake.  For practical reasons
> we've always needed "not zip-safe" PEX files where all code is written to
> disk prior to execution (ex: legacy Django applications that have
> __file__-relative business logic) so we decided to just throw away the
> magical zipimport stuff and embrace using disk as a pure cache.  This seems
> more compatible philosophically with the direction wheels are going for
> example.
> Since there's been more movement in the PEP space recently, we've been
> evolving PEX in order to be as standards-compliant as possible, which is why
> I've been more visible recently re: talks, .whl support and the like.  I'd
> also love to chat about more about PEX and how it relates to things like PEP
> 441 and/or other attempts like pyzzer.

I think it's very interesting and relevant indeed :)

The design in PEP 441 just involves providing some very basic
infrastructure around making use of the direct execution support,
mostly in order to make the feature more discoverable in the first
place, but it sounds like Twitter have a much richer and
battle-hardened approach in PEX.

It may still prove to be more appropriate to keep the stdlib
infrastructure very basic (i.e. more at the PEP 441) level, leaving
something like PEX free to provide cross-version consistency. I'll
wait and see how the public documentation for PEX evolves before
forming a firm opinion one way or the other, but one possible outcome
would be a situation akin to the pyvenv vs virtualenv distinction,
where pyvenv has the benefit of always being available even in
environments where it is difficult to get additional third party tools
approved, but also has the inherent downside of Python version
dependent differences in available features and behaviour.


Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

More information about the Distutils-SIG mailing list