[Tutor] python memory management

Thu Sep 1 14:10:02 EDT 2016

On Thu, Sep 01, 2016 at 02:12:11PM +0000, monikajg at netzero.net wrote:
> Hi:
> Can somebody please explain how memory is managed by python? What kind 
> of memory it uses? What structures use what kind of memory?
> If many people work on the same project and have many instances of the 
> same object how do they ensure that all instances are killed before 
> the programs exit? Apparently if one of the programmer leaves a 
> reference to object it might not be automatically deleted by python on 
> exit. What is the command to do this?
> 
> Could somebody please explain how this works, especially on projects 
> involving multiple programmers?

In general, you (almost) never need to care about memory management, 
Python will do it for you.

The number of programmers writing the code doesn't matter. What matters 
is how many times the program is running *at the same time*. Each time 
it runs, your computer's operating system (Windows, Linux, Mac OS X) 
will start what is called "a process", running the Python interpreter. 
When the process exits at the end, the OS will reclaim all the memory 
used and make it available for the next process.

While the program is running, the OS has to allocate memory between many 
different processes. On my computer, right now, I have over 200 
processes running. Most of them are handled by the OS, but the others 
include my email program, my web browser, a few text editors, my desktop 
manager, and many others. The OS manages the memory allocation.

As far as Python is concerned, it manages its own memory from what the 
OS gives it. When you assign a value:

    name = "Inigo Montoya"

the Python interpreter allocates a chunk of memory in the memory heap to 
hold the string. It then tracks whether or not the string is being used. 
So long as the string is being used by your program, or *could possibly* 
be used, Python will hold onto that string, forever.

But as soon as it sees that it can no longer be used, it will free the 
memory and reuse it.

This process is called "garbage collection". You can google for more 
information, or ask here. Different Python interpreters use different 
garbage collectors:

IronPython uses the .Net garbage collector;

Jython uses the Java garbage collector;

PyPy has a few different ones that you can choose from;

and the CPython (that's the standard Python you are probably running) 
interpreter has two, a simple "reference counter" GC that works very 
fast but not very thoroughly, and a more thorough GC that picks up 
anything the reference counter can't handle.

(Mostly reference cycles: if one object has a reference to another, and 
that second object also has a reference to the first, that's a cycle. 
The reference counter can't deal with that, but the second GC can.)

Let's track the life-span of a chunk of memory. Suppose you write the 
following code in a module:

name = "Inigo Montoya"
print(name)
name = "The Dread Pirate Roberts"

The second assignment frees up the string "Inigo Montoya", as no part of 
your program can possibly access the old value any more, since it has 
been replaced by the new value. So the garbage collector frees that 
chunk of memory and makes it available for something else. This happens 
automatically, and virtually instantly.

You never need to care about allocating or deallocating memory. The 
interpreter has its own memory manager to do that, with a garbage 
collector to deallocate memory.

So, when do you need to care about memory?

- If you are writing a C extension, you have to manage your own memory.

- If you're using the ctypes module, you have access to the C code of 
  the interpreter, so you have to care about managing your own memory.

- If you're creating massively huge strings or lists, you might have to 
  worry about running out of memory.

For example, I once locked up my computer by foolishly creating a HUGE 
list:

    # Don't try this at home!
    L = list(range(100**100))

That would need to find enough memory for a list with one hundred 
trillion trillion trillion trillion trillion trillion trillion trillion 
trillion trillion trillion trillion trillion trillion trillion trillion 
trillion trillion trillion trillion trillion trillion entries. Each 
entry will take at least four bytes, so the total is ... well, it's a 
lot. Much more than my poor computer has.

On Windows, this will fail quite quickly with a MemoryError, no real 
harm done, but on Linux (which I use) the OS will gamely, or perhaps 
stupidly, try very hard to allocate a trillion trillion trillion ... 
trillion terabytes of memory, locking up my computer. (I let it run 
overnight, and eventually needed to just pull the plug to reset it.)

Fortunately, even on Linux there are ways to tell the OS not to be so 
stupid, which means that Python will raise a MemoryError and no real 
harm is done, but at the time I didn't know about them.

So that's about it really... if you're writing long-lasting server class 
programs, you might need to care about memory; if you're trying to 
process huge files bigger than the amount of RAM you have, you need to 
think about memory... but most of the time, just write your code and let 
the Python garbage collector manage it for you.

Feel free to ask questions if anything is unclear!

-- 
Steve