Mem "leak" w/ long-running network apps?

Fri Apr 18 00:36:19 EDT 2003

[Dave Brueck]
> I've spent the last few weeks trying to track down a memory leak in a
> long-running network app (an HTTP proxy for large-ish objects) and am
> trying to find out if anyone else has encountered something similar.

Overall, I've spent months of my life tracking down memory leaks in apps
that run for such a short time I can't really conceive of it <wink>.

> I narrowed the problem down to the receiving side of the proxy, and today
> in socketmodule.c I noticed that when you call sock.recv(n) a string of
> size n is created and then resized after the recv to the actual size of
> the data received. There's nothing wrong with that in and of itself, but
> when I replaced it with the function below to do receives the ever-growing
> memory problem went away (so now instead of calling sock.recv(amnt) I do a
> specialmodule.recv(sock, amnt)). The function merely uses a static buffer
> and then creates a Python string of just the size needed - the resize is
> what got eliminated. I initially encountered the problem on Python 2.1.3
> but then moved to 2.2.2 to use the gc module enhancements.
>
> My questions are:
>
> (1) has anybody else run into a similar problem before?

Oh sure.  It can pop up lots of ways.  The ultimate culprit is usually the
platform's malloc implementation.  It should be better under Python 2.3,
because the new release moves heaven and hell to keep the platform malloc
away from managing memory for small objects, which often delays their "Hey,
I've gone insane!" points beyond visible experience.

> This may not be very common because the app is a little unusual in that
> it's long-running, handles hundreds of concurrent connections, each
> connection is usually for a large (tens to hundreds of megabytes) object,
> the data is all proxied rather than being served off disk or generated by
> the app, and both the upstream and downstream connections are generally
> fast (aggregate throughput for the server is usually in the 100-200 Mbps
> range for a P3 900.
>
> (2) Any ideas on why using the normal socket.recv resulted in ever-growing
> memory use? I've spent ages using the gc module and other tools to track
> down objects that should have been freed, cyclic references, etc., and
> don't see any problems there and I don't think the memory is really being
> leaked (in the C sense). Could it be a heavily fragmented heap or
> something like that?

Almost certainly, yes.  You didn't say what "the typical" value for n is,
but since you made your static buffer 1MB, I assume it can be up to that.
Python mallocs that much, then reallocs back to the actual size.  Returns
the object to your program.  Your program then probably allocates a fair
number of tiny objects.  A less-than-stellar malloc can chop those off "the
high end" of the memory realloc returned, leaving a hole in the address
space that isn't big enough to hold the next call to recv.  So the next
block of n comes after the last block, etc -- iterate to disaster.  The used
part of the address space ends up looking like sparse Swiss cheese.

Python 2.3 won't make your platform malloc smarter, but it grabs large
chunks and carves them up itself for small objects.  While it wasn't the
intent, this has the pleasant side effect of making some platform mallocs
much better behaved across the large allocations Python still asks them to
do directly.  It's really difficult to write a malloc that handles a wild
mix of very small and very large requests gracefully; 2.3 helps by taking
the small requests out of the platform malloc's hair.

> I'd notice that after my tests ran for a long time I'd stop them and
> memory usage would drop down after awhile by a few megabytes, and upon
> starting my tests again (without restarting my app) mem usage would drop
> down some more but not all the way down and then it would gradually grow
> again to a new high-water mark, so that overnight my process had hundreds
> of MB of RAM.  With my recv-replacement I'm holding steady at about
> 30 MB, which is normal.

If you want to know more, scour your platform docs for whatever debug info
your local malloc can deliver.

> (3) Does anyone see any glaring errors in my function below? Seems to work
> well enough. :)
>
> Anyway, I don't think there's a bug in Python, and my function is
> certainly not patch-worthy because it really works just for my use, and
> now that my problem is gone it's mostly out of curiosity that I'm trying
> to better understand why my problem is gone, but I'd appreciate any
> insight or hints.
>
> Thanks,
> Dave
>
> Here's the function. I can get away with using a single large buffer for
> all my receives because while the server is a mixture of threading and
> poll-based, all the I/O happens sequentially in one thread against sockets
> that are known to be ready for I/O.

Heh.  That sounds regrettable, eventually <wink>.

> #define MAX_RECV_SIZE 1048576
> static char RECV_BUFF[MAX_RECV_SIZE];
>
> static PyObject *
> recv_wrapper(PyObject *self, PyObject *args)
> {
>     PyObject *py_sock;
>     PyObject *py_str;
>     int sock;              /* Output socket */
>     int len, n;
>
>     if (!PyArg_ParseTuple(args, "Oi:recv", &py_sock, &len))
>             return NULL;
>
>     if (len < 0)
>     {
>       PyErr_SetString(PyExc_ValueError, "negative buffersize in recv");
>       return NULL;
>     }
>
>     if (len > MAX_RECV_SIZE-1)

The "-1" is mysterious here.  If you've got N bytes allocated, why can't you
receive N bytes?

>     {
>       PyErr_SetString(PyExc_ValueError, "buffersize too large in recv");
>       return NULL;
>     }
>
>     sock = PyObject_AsFileDescriptor(py_sock);

Should follow this by

      if (sock < 0)
          return NULL;

although that assumes legit filedescs can't be negative.  Anally correct
would be

      if (sock == -1 && PyErr_Occurred())
          return NULL;

>     Py_BEGIN_ALLOW_THREADS;
>     n = recv(sock, RECV_BUFF, len, 0);
>     Py_END_ALLOW_THREADS;
>     if (n == -1)
>       return PyErr_SetFromErrno(RecvError);
>
>     py_str = PyString_FromStringAndSize(RECV_BUFF, n);
>     if (py_str == NULL)
>       return NULL;
>
>     return py_str;

The last 5 lines could be replaced by

    return PyString_FromStringAndSize(RECV_BUFF, n);