[Tutor] python memory management
Steven D'Aprano
steve at pearwood.info
Thu Sep 1 14:10:02 EDT 2016
On Thu, Sep 01, 2016 at 02:12:11PM +0000, monikajg at netzero.net wrote:
> Hi:
> Can somebody please explain how memory is managed by python? What kind
> of memory it uses? What structures use what kind of memory?
> If many people work on the same project and have many instances of the
> same object how do they ensure that all instances are killed before
> the programs exit? Apparently if one of the programmer leaves a
> reference to object it might not be automatically deleted by python on
> exit. What is the command to do this?
>
> Could somebody please explain how this works, especially on projects
> involving multiple programmers?
In general, you (almost) never need to care about memory management,
Python will do it for you.
The number of programmers writing the code doesn't matter. What matters
is how many times the program is running *at the same time*. Each time
it runs, your computer's operating system (Windows, Linux, Mac OS X)
will start what is called "a process", running the Python interpreter.
When the process exits at the end, the OS will reclaim all the memory
used and make it available for the next process.
While the program is running, the OS has to allocate memory between many
different processes. On my computer, right now, I have over 200
processes running. Most of them are handled by the OS, but the others
include my email program, my web browser, a few text editors, my desktop
manager, and many others. The OS manages the memory allocation.
As far as Python is concerned, it manages its own memory from what the
OS gives it. When you assign a value:
name = "Inigo Montoya"
the Python interpreter allocates a chunk of memory in the memory heap to
hold the string. It then tracks whether or not the string is being used.
So long as the string is being used by your program, or *could possibly*
be used, Python will hold onto that string, forever.
But as soon as it sees that it can no longer be used, it will free the
memory and reuse it.
This process is called "garbage collection". You can google for more
information, or ask here. Different Python interpreters use different
garbage collectors:
IronPython uses the .Net garbage collector;
Jython uses the Java garbage collector;
PyPy has a few different ones that you can choose from;
and the CPython (that's the standard Python you are probably running)
interpreter has two, a simple "reference counter" GC that works very
fast but not very thoroughly, and a more thorough GC that picks up
anything the reference counter can't handle.
(Mostly reference cycles: if one object has a reference to another, and
that second object also has a reference to the first, that's a cycle.
The reference counter can't deal with that, but the second GC can.)
Let's track the life-span of a chunk of memory. Suppose you write the
following code in a module:
name = "Inigo Montoya"
print(name)
name = "The Dread Pirate Roberts"
The second assignment frees up the string "Inigo Montoya", as no part of
your program can possibly access the old value any more, since it has
been replaced by the new value. So the garbage collector frees that
chunk of memory and makes it available for something else. This happens
automatically, and virtually instantly.
You never need to care about allocating or deallocating memory. The
interpreter has its own memory manager to do that, with a garbage
collector to deallocate memory.
So, when do you need to care about memory?
- If you are writing a C extension, you have to manage your own memory.
- If you're using the ctypes module, you have access to the C code of
the interpreter, so you have to care about managing your own memory.
- If you're creating massively huge strings or lists, you might have to
worry about running out of memory.
For example, I once locked up my computer by foolishly creating a HUGE
list:
# Don't try this at home!
L = list(range(100**100))
That would need to find enough memory for a list with one hundred
trillion trillion trillion trillion trillion trillion trillion trillion
trillion trillion trillion trillion trillion trillion trillion trillion
trillion trillion trillion trillion trillion trillion entries. Each
entry will take at least four bytes, so the total is ... well, it's a
lot. Much more than my poor computer has.
On Windows, this will fail quite quickly with a MemoryError, no real
harm done, but on Linux (which I use) the OS will gamely, or perhaps
stupidly, try very hard to allocate a trillion trillion trillion ...
trillion terabytes of memory, locking up my computer. (I let it run
overnight, and eventually needed to just pull the plug to reset it.)
Fortunately, even on Linux there are ways to tell the OS not to be so
stupid, which means that Python will raise a MemoryError and no real
harm is done, but at the time I didn't know about them.
So that's about it really... if you're writing long-lasting server class
programs, you might need to care about memory; if you're trying to
process huge files bigger than the amount of RAM you have, you need to
think about memory... but most of the time, just write your code and let
the Python garbage collector manage it for you.
Feel free to ask questions if anything is unclear!
--
Steve
More information about the Tutor
mailing list