CPython loading modules into memory

sjbrown shandy.b at gmail.com
Wed Feb 11 20:47:00 EST 2009


On Feb 11, 2:00 pm, "Martin v. Löwis" <mar... at v.loewis.de> wrote:
> > Can someone describe the details of how Python loads modules into
> > memory?  I assume once the .py file is compiled to .pyc that it is
> > mmap'ed in.  But that assumption is very naive.  Maybe it uses an
> > anonymous mapping?  Maybe it does other special magic?  This is all
> > very alien to me, so if someone could explain it in terms that a
> > person who never usually worries about memory could understand, that
> > would be much appreciated.
>
> There is no magic whatsoever. Python opens a sequential file descriptor
> for the .pyc file, and then reads it in small chunks, "unmarshalling"
> it (indeed, the marshal module is used to restore Python objects).
>
> The marshal format is an object serialization in a type-value encoding
> (sometimes type-length-value), with type codes for:
> - None, True, False
> - 32-bit ints, 64-bit ints (unmarshalled into int/long)
> - floats, complex
> - arbitrary-sized longs
> - strings, unicode
> - tuples (length + marshal data of values)
> - lists
> - dicts
> - code objects
> - a few others
>
> Result of unmarshalling is typically a code object.
>
> > Follow up: is this process different if the modules are loaded from a
> > zipfile?
>
> No; it uncompresses into memory, and then unmarshals from there (
> compressed block for compressed block)
>
> > If there is a link that covers this info, that'd be great too.
>
> See the description of the marshal module.
>
> HTH,
> Martin


Thanks for the answers.  For my own edification, and in case anyone is
interested, I confirmed this by looking at import.c and marshal.c in
the Python2.5.4 source.  Looks like the actual reading of the file is
done in the marshal.c function PyMarshal_ReadLastObjectFromFile.  It
is read sequentially using a small buffer on the heap.

-sjbrown



More information about the Python-list mailing list