Mailman 3 October 2004 - Python-Dev

Changing pymalloc behaviour for long running processes
by Evan Jones Oct. 28, 2004

Oct. 28, 2004

I know that this has been discussed a bit in the past, but I was hoping that some Python gurus could shed some light on this issue, and maybe let me know if there are any plans for solving this problem. I know a hack that might work, but there must be a better way to solve this problem. The short version of the problem is that obmalloc.c never frees memory. This is a great strategy if the application runs for a short time then quits, or if it has fairly constant memory usage. … [View More]However, applications with very dynamic memory needs and that run for a long time do not perform well because Python hangs on to the peak amount of memory required, even if that memory is only required for a tiny fraction of the run time. With my application, I have a python process which occupy 1 GB of RAM for ~20 hours, even though it only uses that 1 GB for about 5 minutes. This is a problem that needs to be addressed, as it negatively impacts the performance of Python when manipulating very large data sets. In fact, I found a mailing list post where the poster was looking for a workaround for this issue, but I can't find it now. Some posts to various lists [1] have stated that this is not a real problem because virtual memory takes care of it. This is fair if you are talking about a couple megabytes. In my case, I'm talking about ~700 MB of wasted RAM, which is a problem. First, this is wasting space which could be used for disk cache, which would improve the performance of my system. Second, when the system decides to swap out the pages that haven't been used for a while, they are dirty and must be written to swap. If Python ever wants to use them again, they will be brought it from swap. This is much worse than informing the system that the pages can be discarded, and allocating them again later. In fact, the other native object types (ints, lists) seem to realize that holding on to a huge amount of memory indefinitely is a bad strategy, because they explicitly limit the size of their free lists. So why is this not a good idea for other types? Does anyone else see this as a problem? Does anyone think this is not a problem? Proposal: - Python's memory allocator should occasionally free memory if the memory usage has been relatively constant, and has been well below the amount of memory allocated. This will incur additional overhead to free the memory, and additional overhead to reallocate it if the memory is needed again quickly. However, it will make Python co-operate nicely with other processes, and a clever implementation should be able to reduce the overhead. Problem: - I do not completely understand Python's memory allocator, but from what I see, it will not easily support this. Gross Hack: I've been playing with the fact that the "collect" function in the gc module already gets called occasionally. Whenever it gets called for a level 2 collection, I've hacked it to call a cleanup function in obmalloc.c. This function goes through the free pool list, reorganizes it to decrease memory fragmentation and decides based on metrics collected from the last run if it should free some memory. It currently works fine, except that it will permit the arena vector to grow indefinitely, which is also bad for a long running process. It is also bad because these cleanups are relatively slow as they touch every free page that is currently allocated, so I'm trying to figure out a way to integrate them more cleanly into the allocator itself. This also requires that nothing call the allocation functions while this is happening. I believe that this is reasonable, considering that it is getting called from the cyclical garbage collector, but I don't know enough about Python internals to figure that out. Eventually, I hope to do some benchmarks and figure out if this is actually a reasonable strategy. However, I was hoping to get some feedback before I waste too much time on this. Evan Jones [1] http://groups.google.com/groups?selm=mailman.1053801468.4243.python- list%40python.org -- Evan Jones: http://evanjones.ca/ "Computers are useless. They can only give answers" - Pablo Picasso [View Less]

7 14

To run all exit handlers or not?
by Skip Montanaro Oct. 28, 2004

Oct. 28, 2004

The current behavior of the atexit module is that if any of the exit handlers raises an exception the remaining handlers are not run. Greg Chapman posted a bug report about this: http://www.python.org/sf/1052242 Greg proposed catching any exceptions and continuing so that all exit handlers at least have a chance to run and Raymond agrees with him. I attached a patch to the ticket to add a flag to determine the behavior on the principle that atexit has been around long enough that … [View More]

2 2

Pre-existing bug of the millennium (so far)
by Tim Peters Oct. 28, 2004

Oct. 28, 2004

This one came up while working on ZODB: weakref callback vs. gc vs. threads http://www.python.org/sf/1055820 Short course: in the presence of weakrefs, cyclic gc is still hosed (it turns out that neither threads nor weakref callbacks are necessary to get hosed). temp2a.py there demonstrates there's a problem, but in an unclear way (hundreds of objects, hundreds of weakrefs and weakref callbacks (all via WeakValueDictionary internals), 3 threads). OTOH, there's nothing clever or … [View More]

1 0

Re: Python-Dev Digest, Vol 15, Issue 46
by Luis P Caamano Oct. 28, 2004

Oct. 28, 2004

On Tue, 19 Oct 2004 12:02:14 +0200 (CEST), Evan Jones <ejones(a)uwaterloo.ca> wrote: > Subject: [Python-Dev] Changing pymalloc behaviour for long running > processes > [ snip ] > > The short version of the problem is that obmalloc.c never frees memory. > This is a great strategy if the application runs for a short time then > quits, or if it has fairly constant memory usage. However, applications > with very dynamic memory needs and that run for a long … [View More]time do not > perform well because Python hangs on to the peak amount of memory > required, even if that memory is only required for a tiny fraction of > the run time. With my application, I have a python process which occupy > 1 GB of RAM for ~20 hours, even though it only uses that 1 GB for about > 5 minutes. This is a problem that needs to be addressed, as it > negatively impacts the performance of Python when manipulating very > large data sets. In fact, I found a mailing list post where the poster > was looking for a workaround for this issue, but I can't find it now. > > Some posts to various lists [1] have stated that this is not a real > problem because virtual memory takes care of it. This is fair if you > are talking about a couple megabytes. In my case, I'm talking about > ~700 MB of wasted RAM, which is a problem. First, this is wasting space > which could be used for disk cache, which would improve the performance > of my system. Second, when the system decides to swap out the pages > that haven't been used for a while, they are dirty and must be written > to swap. If Python ever wants to use them again, they will be brought > it from swap. This is much worse than informing the system that the > pages can be discarded, and allocating them again later. In fact, the > other native object types (ints, lists) seem to realize that holding on > to a huge amount of memory indefinitely is a bad strategy, because they > explicitly limit the size of their free lists. So why is this not a > good idea for other types? > > Does anyone else see this as a problem? > This is such a big problem for us that we had to rewrite some of our daemons to fork request handlers so that the memory would be freed. That's the only way we've found to deal with it, and it seems, that's the preferred python way of doing things, using processes, IPC, fork, etc. instead of threads. In order to be able to release memory, the interpreter has to allocate memory in chunks bigger than the minimum that can be returned to the OS, e.g., in Linux that'd be 256bytes (iirc), so that libc's malloc would use mmap to allocate that chunk. Otherwise, if the memory was obtained with brk, then in most virtually all OSes and malloc implementations, it won't be returned to the OS even if the interpreter frees the memory. For example, consider the following code in the interactive interpreter: for i in range(10000000): pass That run will create a lot of little integer objects and the virtual memory size of the interpreter will quickly grow to 155MB and then drop to 117MB. The 117MB left are all those little integer objects that are not in use any more that the interpreter would reuse as needed. When the system needs memory, it will page out the pages where these objects have been allocated to swap. In our application, paging to swap is extremely bad because sometimes we're running the OS booted from the net without swap. The daemon has to loop over list of 20 to 40 thousand items at a time and it quickly grows to 60mb on the first run and then continues to grow from there. When something else needs memory, it tries to swap and then crashes. In the example above, the difference between 155MB and 117MB is 37MB, which I assume is the size of the list object returned by 'range()' which contains the references to the integers. The list goes away when the interpreter finishes running the loop and because it was already known how big it was going to be, it was allocated as a big chunk using mmap (my speculation). As a result, that memory was given back to the OS and the virtual memory size of the interpreter went down from 155MB to 117MB. Regards, -- Luis P Caamano Atlanta, GA USA PS I rarely post to python-dev, this is probably the first time, so please let me take this opportunity to thank all the python developers for all your efforts, such a great language, and great tool. My respect and admiration to all of you. [View Less]

4 4

adding Py{String|Unicode}_{Lower|Upper} and fixing CreateProcess in _subprocess.pyd and PyWin32
by Trent Mick Oct. 28, 2004

Oct. 28, 2004

There is a subtlety in CreateProcess in the Win32 API in that if one specifies an environment (via the lpEnvironment argument), the environment strings (A) must be sorted alphabetically and (B) that sort must be case-insensitive. See the Remarks section on: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/ba… If this is not done, then surprises can happen with the use of {Get|Set}EnvironmentVariable in the created process: http://msdn.microsoft.com/library/default.… [View More]

6 6

RE: [Python-Dev] On when making modifications to the CVS code
by Batista, Facundo Oct. 27, 2004

Oct. 27, 2004

[Tim Delaney] #- I think those three platforms are sufficiently #- representative of Python #- users, so if it works on them, and the code looks good to a #- reviewer, it #- should be committed. It's not exactly a large patch after all ... Do you want to take a look at it? ;) #- What's the bug number? I've got a FreeBSD (5.2.1) virtual #- machine sitting #- around I could try it on (tomorrow - bed time now ;). The bug is 1050828. Thanks! . Facundo

1 0

On when making modifications to the CVS code
by Batista, Facundo Oct. 27, 2004

Oct. 27, 2004

People: I have these doubts from a while ago, and while a learned a lot about this through Raymond Hettinger, I still have some loose ends. Don't know if there's an official position or it's just developer common sense (which I still don't have), but I didn't find an article/PEP about this. Such paper exists? For now, I'll ask you about a specific issue: there's this bug open about the reindent.py tool, which has an issue about the reindented code file's metadata (more specific: permissions).… [View More]

2 1

[Python-Dev] Re: [Python-checkins] python/dist/src/Python compile.c, 2.330, 2.331
by Guido van Rossum Oct. 26, 2004

Oct. 26, 2004

Haven't seen the bug report, but you do realize that comparing code objects has other applications, and this pretty much kills that. On Sat, 23 Oct 2004 17:10:09 -0700, rhettinger(a)users.sourceforge.net <rhettinger(a)users.sourceforge.net> wrote: > Update of /cvsroot/python/python/dist/src/Python > In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30133/Python > > Modified Files: > compile.c > Log Message: > SF bug #1048870: call arg of lambda not … [View More]

6 10

python with database connectivity
by Sandeep Gohad Oct. 25, 2004

Oct. 25, 2004

Hello I am little new with the python.... If I want to develope any database related application,with options like add,save,modify,delete,previous,last,next,first for maintaining Employee's records.I want to use .mdb file or d-base database as a backend. In VB we can use ADO or DAO, In java we use JDBC, so for python please guide me for developing any basic application.... Regards Sandeep Yahoo! India Matrimony: Find your life partneronline.

2 1

Paid Research Project on Stackless 3.1
by Christian Tismer Oct. 23, 2004

Oct. 23, 2004

(re-sent and modified, after I recognized that my hardware-clock is broken, need a new note-buck) Dear community, I would love to publish Stackless 3.1, of course. Also I know that there is some inherent bug in it. This is the state of the art sine four months. I am currently in a very tight project and have no time to dig into this problem. BUT IT IS URGENT! I'm seeking for a person who would take the job to find the buglet. He would need to debug and nail down a commercial application, … [View More]

1 0