Bug report: memory leak in python 1.5.2

Wed May 26 18:17:56 EDT 1999

Fred L. Drake <fdrake at cnri.reston.va.us> wrote:

: Michael P. Reilly writes:
:  > There is also the situation where some UNIX systems put the environment
:  > initially in the u area, and it is difficult to programmatically determine
:  > where different runtime segments are (where is the heap vs. where is the
:  > u area).
:  > 
:  > Fred, your solution should work because it takes the problem case: what
:  > to do with the string initially, but I think it might be better to copy
:  > the values at module initialization time.  I've included an addition to
:  > Fred's patch to be called instead of the PyDict_New() function (in the
:  > module init function).
: 
:   If I understand correctly, your patch avoids the problem of memory
: leaked from the initial environment (a static size).  Is this correct?

My initializer only deals with the possible UNIX implimentations of the
environment, by copying the environment (supposedly) before it is
used.  There are no guarantees how putenv will modify the environment.
AIX 4.2.1 states that the environment is expanded as needed for new
values, but with some tests, it still shows that putenv/getenv is using
passed (borrowed) values.  If we mix Python-changed environment
variables with some that are not as a blanket system-independant
implimentation, I foresee problems.  If it is going to be treated as
volitile memory, we should initialize it in a more appropriate manner
(in my thinking).

Remember that the original definition of UNIX put the environment (and
argument list) in the u area, in-accessible to the process without the
passed copies (in main()) and using putenv/getenv.  I've never used
Linux, but I wouldn't be surprised if it used something along those
lines.  Mixing dynamic string with static could get dangerous (I
recently debugged a library that would return malloc'd string or a
static string indiscriminantly).

Regardless, it was just a suggestion for consistancy, your patch would
work except on a few, very odd systems where the environment isn't
quite what people expect, I was trying to handle that case.  But.. read
on, MacDuff.

:   If so, I'm not sure it's worth the extra code.  My intention was to
: avoid the Python-induced leak that would allow a long-running Python
: script that occaisionally created a subprocess to become a MemoryError
: traceback.  ;-)  In the case of systems without a lot of memory
: available, the environment should be kept small to begin with (making
: the additional data structures created by the startup code more of a
: problem).
:   I don't think I've ever checked the size of the "typical" UNIX
: environment; "printenv | wc -c" tells me I'm running under 2Kb in a
: fresh shell.  Is that enough to worry about, and slow down
: initialization?

A common bare minimum environment is: HOME, USER, SHELL, MAIL.  That's
not all that much (should be less then 256 on virtually every system).

Most systems have an upper bound based on a limit of the argument list
and the environment (ARG_MAX).  POSIX limits this at 4k, but most
systems have it larger (SunOS is at 1Mb).  This means that there's a
limit to how much we would be initializing anyway (with my addition).

:  > Also, how should we deal with this in terms of C applications who might
:  > change the environment?  (Embedders beware!)
: 
:   In this case, we may not clear all the possible garbage, but we only 
: leak for keys that are:
: 
:      1.  Changed from Python at least once, then
:      2.  Changed from C, and
:      3.  Never changed from Python again.
: 
:   Note that only one copy of the variable gets leaked, not an infinite 
: succession.
:   In the case of two Python putenv() calls with C putenv() calls
: inbetween, we don't introduce any new leaks; the effect is that the
: data from the first Python putenv() isn't collected until the second
: Python putenv().  This is acceptable.

True; my thought was in some system where you get a cowboy programmer
who decides to take control of the environment making everything
borrowed in his library.  Then you get:

  * cowboy initializes
  * python changes (with borrowed memory)
  * cowboy frees python-borrowed memory and makes change
  * python changes again attempting to free already freed memory
  ...

My point wasn't that we should handle this case, just to make it known
somewhere that Python is now attempting to borrow some aspects of the
environment (depending on the platform).  We can't babysit everything,
but taking control of a system managed facility does takes
responsibility.

(I try not to go so far as making these types of statements anymore,
however hypothetical - too many "people" take it personally.  It's
better to let intelligent people read and make there own conclusions.)

:  > From a programming standpoint, I don't think that it should be "proper"
:  > to be changing the environment all that much.  It's purpose is to
:  > propragate values to child processes, not to store runtime values.
: 
:   I agree.  Processes that run a lot of children, like HTTP servers
: running CGI scripts, won't be using their own environment to do this,
: but will create the desired environments on the fly.  (Especially if
: they're threaded!)

Yes, my point to the statement was that people shouldn't be leaking
much from using the current implimentation we have - assuming they use
it "properly".  Maybe there should be better education in the docs of
its purpose.

  -Arcege