[CC'ed to python-dev, Barbara Mattson]
Barbara's encountered an apparent problem with test_longexp in Python 2.2 on MacOS X. test_longexp creates a big list expression and eval()'s it. The problem is that it takes an exceedingly long time to run, at least more than half an hour (at which point she interrupted it).
The two curious things are that 1) while test_longexp requires a lot of memory and often thrashes on a low-memory machine (I found there are 2 or 3 bugs in the SF bugtracker to this effect), the MacOS box in question has a gigabyte of RAM, and 2) Python 2.1.1 *doesn't* show the problem. Quoting from her report:
I tried the test_longexp by hand:
REPS = XXX l = eval("[" + "2," * REPS + "]") print len(l)
changing REPS from 1000 to 50000. 1000 and 10000 ran fairly quickly - under a minute. However, 25000 took about 5 minutes and 50000 took 23 minutes. I'm not about to try 65580 (I need to get some real work done today, after all :-). BTW, out of curiosity, I tried the same thing under 2.1.1, and even for REPS = 70000 it took less than a minute.
Any clues?
--amk (www.amk.ca) "Peri, how would you like to meet a genius?" "I thought I already have." -- The Doctor and Peri, in "Mark of the Rani"
amk> test_longexp creates a big list expression and eval()'s it. The amk> problem is that it takes an exceedingly long time to run, at least amk> more than half an hour (at which point she interrupted it). ... amk> changing REPS from 1000 to 50000. 1000 and 10000 ran fairly amk> quickly - under a minute. However, 25000 took about 5 minutes and amk> 50000 took 23 minutes. ... amk> Any clues?
Try configuring using --with-pymalloc to see if Vladimir's Python-specific object allocator helps. Even with a gigabyte of RAM, perhaps the malloc free list is getting badly fragmented, causing it to churn forever trying to coalesce memory blocks.
[Andrew Kuchling]
[CC'ed to python-dev, Barbara Mattson]
Barbara's encountered an apparent problem with test_longexp in Python 2.2 on MacOS X. test_longexp creates a big list expression and eval()'s it. The problem is that it takes an exceedingly long time to run, at least more than half an hour (at which point she interrupted it).
The two curious things are that 1) while test_longexp requires a lot of memory and often thrashes on a low-memory machine (I found there are 2 or 3 bugs in the SF bugtracker to this effect), the MacOS box in question has a gigabyte of RAM, and 2) Python 2.1.1 *doesn't* show the problem.
The test takes about 2 seconds on my box (Win98SE, 256MB, 866MHz), in 2.2 or 2.1.1, and I don't know of any Mac-specific code that might get touched here except for the C library. So Skip's suggestion to try pymalloc is a good one -- although it's hard to see in advance why that would make a difference in this specific case.
Quoting from her report:
I tried the test_longexp by hand:
REPS = XXX l = eval("[" + "2," * REPS + "]") print len(l)
Break it into smaller steps so we can narrow down possible causes:
REPS = 50000
print "building list guts" guts = "2," * REPS
print "building input string" input = "[" + guts + "]"
print "compiling the input string" code = compile(input, "<input string>", "eval")
print "executing" thelist = eval(code)
print len(thelist)
When REPS is large, what's the last thing that gets printed before the huge delay starts?
On Tue, Jan 08, 2002 at 01:41:37PM -0500, Tim Peters wrote:
Break it into smaller steps so we can narrow down possible causes:
You should have cc'ed Barbara on that. I forwarded your message to her and she wrote back (eventually):
BTW, I forgot to pass this on yesterday, but I tried the code in Tim Peters' e-mail yesterday and the delay happens during the code = compile(...) statement.
She's going to install sshd on her machine, so maybe this weekend I'll be able to log in, compile Python from source, and poke around in an effort to figure out what's going on.
--amk (www.amk.ca) Our lives are different from anybody else's. That's the exciting thing. Nobody in the universe can do what we're doing! -- The Doctor, in "Tomb of the Cybermen"
[Tim]
Break it into smaller steps so we can narrow down possible causes:
[Andrew Kuchling]
You should have cc'ed Barbara on that.
I did a Reply-All. In the copy I got back from Python-Dev,
mattson@milkyway.gsfc.nasa.gov
was in the cc list. If that didn't reach her, sorry, but I don't think I could have done more than I did.
I forwarded your message to her and she wrote back (eventually):
BTW, I forgot to pass this on yesterday, but I tried the code in Tim Peters' e-mail yesterday and the delay happens during the code = compile(...) statement.
So it's somehere in the front end -- that's a real help <wink>.
She's going to install sshd on her machine, so maybe this weekend I'll be able to log in, compile Python from source, and poke around in an effort to figure out what's going on.
Did she try Skip's suggestion to try pymalloc? Given that we believe there is no Mac-specific code here outside libc, the first suggestion was (and remains) the best. The front end will be doing a whale of a lot of mallocs. If it's like "the usual" malloc disease under glibc, the delays would appear during the free()s.
On Thu, Jan 10, 2002 at 03:37:46PM -0500, Tim Peters wrote:
I did a Reply-All. In the copy I got back from Python-Dev, ...
Oops, I misread the headers of the original mail. Sorry!
Did she try Skip's suggestion to try pymalloc? Given that we believe there is no Mac-specific code here outside libc, the first suggestion was (and
She hasn't compiled it herself yet, but that's the first thing I'll try.
--amk
>> Did she try Skip's suggestion to try pymalloc?
amk> She hasn't compiled it herself yet, but that's the first thing I'll amk> try.
I did try that when the problem was first raised. I just tried it again. It did have a positive effect:
w/ threads and w/o pymalloc:
user system elapsed CPU 7.64 0.72 0:09.73 85% 7.86 0.45 0:08.66 95% 7.66 0.66 0:08.32 99%
w/o threads and w/ pymalloc:
user system elapsed CPU 5.44 0.58 0:06.85 87% 5.57 0.46 0:06.02 100% 5.57 0.45 0:06.02 99%
The above was with my memory usage trimmed about as far down as I could get it (turned off X, for example). My apologies that both sets of numbers don't have threads disabled. It's just what I happened to have laying around on the disk.
Skip
[Skip Montanaro]
I did try that when the problem was first raised. I just tried it again. It did have a positive effect:
w/ threads and w/o pymalloc:
user system elapsed CPU 7.64 0.72 0:09.73 85% 7.86 0.45 0:08.66 95% 7.66 0.66 0:08.32 99%
w/o threads and w/ pymalloc:
user system elapsed CPU 5.44 0.58 0:06.85 87% 5.57 0.46 0:06.02 100% 5.57 0.45 0:06.02 99%
Skip, I think this is irrelevant to the OP's problem. You're telling us you can save a few seconds running test_longexp on a box with barely enough memory to run it at all. Barbara is worried about shaving *hours* off it on a box with gobs of memory to spare.
Still, I expect pymalloc will fix her problem (since malloc is the only suspect on the list, it better <wink>).
>> BTW, I forgot to pass this on yesterday, but I tried the code in Tim >> Peters' e-mail yesterday and the delay happens during the code = >> compile(...) statement.
I saw the same effect on my Linux laptop (with a mere 128MB). The disk went nuts when it tried compiling
"[" + "2," * 200000 + "]"
VM size as reported by top went to 98.5MB. This does not appear to be exclusively a 2.2 issue, as I got this with the fresh 2.1.2 I built this morning.
If you consider what this compiles to:
LOAD_CONST 1 (2) LOAD_CONST 1 (2) LOAD_CONST 1 (2) LOAD_CONST 1 (2) ... LOAD_CONST 1 (2) LOAD_CONST 1 (2) LOAD_CONST 1 (2) LOAD_CONST 1 (2) BUILD_LIST 200000
To generate that it has to generate and parse a pretty deep abstract syntax tree. It looks like symtable_node gets called once for each list element. There are probably other functions that are called once per list element as well.
Skip
On Thu, 10 Jan 2002, Andrew Kuchling wrote:
On Tue, Jan 08, 2002 at 01:41:37PM -0500, Tim Peters wrote:
Break it into smaller steps so we can narrow down possible causes:
You should have cc'ed Barbara on that. I forwarded your message to her and she wrote back (eventually):
BTW, I forgot to pass this on yesterday, but I tried the code in Tim Peters' e-mail yesterday and the delay happens during the code = compile(...) statement.
She's going to install sshd on her machine, so maybe this weekend I'll be able to log in, compile Python from source, and poke around in an effort to figure out what's going on.
IMHO, Barbara's problem is almost certainly related to the system malloc(), and if that is the case the only effective antidote is pymalloc.
However, just enabling WITH_PYMALLOC isn't enough as its currently only configured to be used for object allocation and doesn't help the parser.
I did attach a patch enabling pymalloc for all interpreter memory management to a long post to python-dev (which AMK might recall this from his python-dev summaries) about my research into test_longexp problems with the OS/2+EMX port. My research revealed that test_longexp causes the parser to go ballistic with small mallocs. While pymalloc solved the test_longexp problem, using it for all interpreter memory management caused about a 60% performance hit (on OS/2 + EMX).
On OS/2 the problem appeared to be overallocation (allocating 3-4x as much memory as actually requested), but I recall reading a thread on python-list wherein people reported system malloc()s that attempt to coalesce blocks which had a similar slowdown effect in another set of circumstances (I don't recall the details - might have been related to dictionaries?).
Although OS/X's BSD heritage goes back to FreeBSD 3.2, I wouldn't have expected either sort of problem from that source as I've had none of these problems with my own FreeBSD systems - in fact last night I ran the test suite on a CVS derived build on a 486/100 FreeBSD 4.4 system with only 32MB of RAM and 128MB of swap and test_longexp passed & in only minutes (the whole test suite took ~25-30 mins for the first pass, ie without .pyc files). OS/X may have acquired a malloc() of different heritage though.
I did keep a copy of the instrumented malloc() output from my OS/2 research if you're interested, although it probably isn't as helpful as it could be (Python 2.0 vintage)... Likewise my crude debug malloc wrapper... and I might be able to dig up my pymalloc_for_all patch...
-- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au | Snail: PO Box 370 andymac@pcug.org.au | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia