Please help with MemoryError

Jeremy jlconlin at gmail.com
Fri Feb 12 15:45:31 CET 2010


On Feb 11, 6:50 pm, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
> On Thu, 11 Feb 2010 15:39:09 -0800, Jeremy wrote:
> > My Python program now consumes over 2 GB of memory and then I get a
> > MemoryError.  I know I am reading lots of files into memory, but not 2GB
> > worth.
>
> Are you sure?
>
> Keep in mind that Python has a comparatively high overhead due to its
> object-oriented nature. If you have a list of characters:
>
> ['a', 'b', 'c', 'd']
>
> there is the (small) overhead of the list structure itself, but each
> individual character is not a single byte, but a relatively large object:
>
>  >>> sys.getsizeof('a')
> 32
>
> So if you read (say) a 500MB file into a single giant string, you will
> have 500MB plus the overhead of a single string object (which is
> negligible). But if you read it into a list of 500 million single
> characters, you will have the overhead of a single list, plus 500 million
> strings, and that's *not* negligible: 32 bytes each instead of 1.
>
> So try to avoid breaking a single huge strings into vast numbers of tiny
> strings all at once.
>
> > I thought I didn't have to worry about memory allocation in
> > Python because of the garbage collector.
>
> You don't have to worry about explicitly allocating memory, and you
> almost never have to worry about explicitly freeing memory (unless you
> are making objects that, directly or indirectly, contain themselves --
> see below); but unless you have an infinite amount of RAM available of
> course you can run out of memory if you use it all up :)
>
> > On this note I have a few
> > questions.  FYI I am using Python 2.6.4 on my Mac.
>
> > 1.    When I pass a variable to the constructor of a class does it copy
> > that variable or is it just a reference/pointer?  I was under the
> > impression that it was just a pointer to the data.
>
> Python's calling model is the same whether you pass to a class
> constructor or any other function or method:
>
> x = ["some", "data"]
> obj = f(x)
>
> The function f (which might be a class constructor) sees the exact same
> list as you assigned to x -- the list is not copied first. However,
> there's no promise made about what f does with that list -- it might copy
> the list, or make one or more additional lists:
>
> def f(a_list):
>     another_copy = a_list[:]
>     another_list = map(int, a_list)
>
> > 2.    When do I need
> > to manually allocate/deallocate memory and when can I trust Python to
> > take care of it?
>
> You never need to manually allocate memory.
>
> You *may* need to deallocate memory if you make "reference loops", where
> one object refers to itself:
>
> l = []  # make an empty list
> l.append(l)  # add the list l to itself
>
> Python can break such simple reference loops itself, but for more
> complicated ones, you may need to break them yourself:
>
> a = []
> b = {2: a}
> c = (None, b)
> d = [1, 'z', c]
> a.append(d)  # a reference loop
>
> Python will deallocate objects when they are no longer in use. They are
> always considered in use any time you have them assigned to a name, or in
> a list or dict or other structure which is in use.
>
> You can explicitly remove a name with the del command. For example:
>
> x = ['my', 'data']
> del x
>
> After deleting the name x, the list object itself is no longer in use
> anywhere and Python will deallocate it. But consider:
>
> x = ['my', 'data']
> y = x  # y now refers to THE SAME list object
> del x
>
> Although you have deleted the name x, the list object is still bound to
> the name y, and so Python will *not* deallocate the list.
>
> Likewise:
>
> x = ['my', 'data']
> y = [None, 1, x, 'hello world']
> del x
>
> Although now the list isn't bound to a name, it is inside another list,
> and so Python will not deallocate it.
>
> > 3.    Any good practice suggestions?
>
> Write small functions. Any temporary objects created by the function will
> be automatically deallocated when the function returns.
>
> Avoid global variables. They are a good way to inadvertently end up with
> multiple long-lasting copies of data.
>
> Try to keep data in one big piece rather than lots of little pieces.
>
> But contradicting the above, if the one big piece is too big, it will be
> hard for the operating system to swap it in and out of virtual memory,
> causing thrashing, which is *really* slow. So aim for big, but not huge.
>
> (By "big" I mean megabyte-sized; by "huge" I mean hundreds of megabytes.)
>
> If possible, avoid reading the entire file in at once, and instead
> process it line-by-line.
>
> Hope this helps,
>
> --
> Steven

Wow, what a great bunch of responses.  Thank you very much.  If I
understand correctly the suggestions seem to be:
1.    Write algorithms to read a file one line at a time instead of
reading the whole thing
2.    Use lots of little functions so that memory can fall out of
scope.

You also confirmed what I thought was true that all variables are
passed "by reference" so I don't need to worry about the data being
copied (unless I do that explicitly).

Thanks!
Jeremy



More information about the Python-list mailing list