[Python-Dev] Changing pymalloc behaviour for long running
processes
Evan Jones
ejones at uwaterloo.ca
Tue Oct 19 21:55:37 CEST 2004
On Oct 19, 2004, at 14:00, Martin v. Löwis wrote:
>> Some posts to various lists [1] have stated that this is not a real
>> problem because virtual memory takes care of it. This is fair if you
>> are talking about a couple megabytes. In my case, I'm talking about
>> ~700 MB of wasted RAM, which is a problem.
> This is not true. The RAM is not wasted. As you explain later, the
> pages will be swapped out to swap space, making the RAM available
> again for other tasks.
Well, it isn't "wasted," but it is not optimal. If the pages were
freed, the OS would use them for disk cache (or for other programs).
However, because the operating system believes that these pages contain
data it must either do one of the following two things:
a) Live with less disk cache (lower performance for disk I/O).
b) Pre-emptively swap the pages to disk, which is super slow. (On
Linux, you can control how pre-emptive the kernel is by adjusting the
"swapiness" sysctl).
If it chooses to swap them out, the next time Python touches those
pages, it will pause as the OS reads them back from disk.
It can only help the system's performance if we give it hints about
which pages are no longer in use.
>> If Python ever wants to use them again, they will be brought it from
>> swap.
> Yes. However, your assumption is that Python never wants to use them
> again, because the peek memory consumption is only local.
I am trying to correct the situation where Python is not going to use
the pages for a long time. For most applications, Python's memory
allocation policies are fine, but if you have a long running process
that does nothing most of the time (say a low usage server) or does
some huge pre-processing (my application), it keeps a ton of memory
around for no reason. Right now, Python has very poor performance for
my application because I have this massive memory peak, and very low
average memory usage.
Were I using Java, its usage would grow and shrink accordingly, thanks
to the garbage collector releasing memory to the OS. Yes, with Python,
we can't compact memory, but I think we can still do better than
nothing.
> As the working set grows or shrinks, pages
> get swapped in and out. As Tim explains, this is really hard to
> avoid.
If you actually tell the operating system that the pages are unused, it
won't swap unless it actually needs to. Right now, a lot of pages are
being swapped in and out that are actually *garbage*.
> Unfortunately, as Tim explains, there is no way to reliably
> "inform" the system. free(3) may or may not be taken as such
> information.
As noted before, free() may not be sufficient, but mmap or madvise are.
> The garbage collector holds the GIL. So while there could be other
> threads running, they must not manipulate any PyObject*. If they try
> to, they need to obtain the GIL first, which will make them block
> until the garbage collector is complete.
But as noted in a previous message, some extensions may not do this
correctly, and try to do PyObject_Free anyway. Is that the problem that
obmalloc tries to avoid? If the problem is only the possibility of
PyObject_Free being called while another thread has the GIL, then I can
probably avoid that issue.
> That will ultimately depend on the patches. The feature itself would
> be fine, as Tim explains.
Great! That's basically what I am looking for.
> However, patches might be rejected because:
[snip]
Of course, I certainly hope that Python wouldn't accept garbage
patches! :)
Thank you for your comments,
Evan Jones
--
Evan Jones: http://evanjones.ca/
"Computers are useless. They can only give answers" - Pablo Picasso
More information about the Python-Dev
mailing list