Using emacs' unexec to speed Python startup (was Re: [Python-Dev] Startup time)

Jeff Epler jepler@unpythonic.net
Sun, 18 May 2003 20:22:14 -0500


On Fri, May 16, 2003 at 03:09:31PM -0400, Barry Warsaw wrote:
> Skip, you're going about this all wrong.  We already have the technology
> to start Python up blazingly fast.  All you have to do <wink> is port
> XEmacs's unexec code.  Then you load up Python with all the modules you
> think you're going to need, unexec it, then the next time it starts up
> like lightening.  Disk space is cheap!

I gave it a try, starting with 2.3b1 and using FSF Emacs 21.3's unexelf.c.
An unexec'd binary loads faster than 'python -S -c pass', and seems to
work properly with two exceptions and a few limitations.

The only change to Python is in main(): I use mallopt() to force all
allocations to go through brk() instead of through mmap(), because unexec
doesn't support mmap'd memory.  I also used Modules/Setup.local to make
some normally-shared modules not shared (for the same reason).

dump.py loads the requested modules (-<module> forces the module to *not*
be found) and then calls unexec(), producing a new binary with the given
name.

$ time ./python -S -c pass         # best 'real' of 5 runs
real    0m0.054s
user    0m0.040s
sys     0m0.010s
$ time ./python -c 'import cgi'    # best 'real' of 5 runs
real    0m0.127s
user    0m0.110s
sys     0m0.010s
$ strace -e open ./python -c 'import cgi' 2>&1 | grep -v ENOENT | wc -l
     88
$ ./python dump.py cgipython -_ssl cgi
$ time ./cgipython -c 'import cgi' # best 'real' of 5 runs
real    0m0.039s
user    0m0.020s
sys     0m0.020s
$ strace -e open ./cgipython -c 'import cgi' 2>&1 | grep -v ENOENT | wc -l
      9
$ ./python dump.py dython
-rwxrwxr-x    1 jepler   jepler    4983713 May 18 19:42 cgipython
-rwxrwxr-x    1 jepler   jepler    3603737 May 18 19:39 python
-rwxrwxr-x    1 jepler   jepler    4541345 May 18 19:55 dython

(a minimal unexec'd python is about 90k bigger than the regular Python
binary)

I'm running the test suite now .. it hangs in test_signal for some reason.  
test_thread seems to hang too, which may be related.  (but test_threading
completes?)

$ ./dython Lib/test/regrtest.py -x test_signal -x test_thread
[...]
225 tests OK.
26 tests skipped:
    test_aepack test_al test_bsddb3 test_bz2 test_cd test_cl
    test_curses test_email_codecs test_gl test_imgfile
    test_linuxaudiodev test_macfs test_macostools test_nis
    test_normalization test_ossaudiodev test_pep277 test_plistlib
    test_scriptpackages test_socket_ssl test_socketserver
    test_sunaudiodev test_timeout test_urllibnet test_winreg
    test_winsound
1 skip unexpected on linux2:
    test_bz2

Well, if it worked right it'd sure be interesting.  OTOH, unexelf.c is
GPL'd and there's also the nightmare of different unex* for different
platforms.  

Jeff

########################################################################
# dump.py
import unexec, sys

for m in sys.argv[2:]:
	if m[0] == "-":
		sys.modules[m[1:]] = None
		continue
	__import__(m)
	
for m in sys.modules.keys():
	mod = sys.modules[m]
	if mod is None:
		continue # negatively cached entry
	if not hasattr(mod, "__file__"):
		continue # builtin module
	if mod.__file__.endswith(".so"):
		raise RuntimeError, "Cannot dump with shared module %s" % m

unexec.dump(sys.argv[1], sys.executable)


/**********************************************************************/
/* unexecmodule.c (needs unexec() eg from unexelf.c)                  */
#include <Python.h>

extern void unexec (char *new_name, char *old_name, unsigned data_start, unsigned bss_start, unsigned entry_address);

static PyObject *dump_python(PyObject *self, PyObject *args) {
	char *filename, *symfile;
	if(!PyArg_ParseTuple(args, "ss", &filename, &symfile))
		return NULL;
	unexec(filename, symfile, 0, 0, (unsigned)Py_Main);
	_exit(99);
}

static PyMethodDef dump_methods[] = {
	{"dump", dump_python, METH_VARARGS,
		PyDoc_STR("dump(filename, symfile) -> None")},
	{NULL, NULL}
};

PyDoc_STRVAR(module_doc,
"Support for undumping the Python executable, a la Emacs");

PyMODINIT_FUNC
initunexec(void)
{
	Py_InitModule3("unexec", dump_methods, module_doc);
}

########################################################################
# Setup.local

# Edit this file for local setup changes
unexec unexecmodule.c unexelf.c
time timemodule.c
_socket socketmodule.c
_random _randommodule.c
math mathmodule.c
fcntl fcntlmodule.c