Slow Ref cleanup in 2.2.1 with 100K+ objects on Linux. - or SAP DBAPI problem?
Brad Clements
bkc at Murkworks.com
Thu May 22 19:49:31 EDT 2003
I've run into a strange situation, looking for commentary. I'll have done my
own workaround by the time you get this, so I don't need a critical answer.
I've written a program to copy database records from Interbase to SAP. I
have to change foreign key values during this process by looking up parent
table records as child records are copied.
I'm copying one table at a time, but a child may reference several other
parent tables. I keep a cache using a dict of these other records.
So .. 230K+ records have been copied, and during this process it probably
pulled in 200K+ other related parent records and is keeping all of those in
RAM (duh, bad design).
But interestingly, the copy subroutine has finished, and is "returning" to
it's parent, however the process seems hung. I thought I had read somewhere
about a problem cleaning up ref's (There aren't any cycles in this case),
but I can't recall exactly.
This is python 2.2.1 on RH 9
Dual XEON 2.2 GHZ machine with 1GB RAM.
Look at top:
7:35pm up 38 days, 9:07, 4 users, load average: 6.37, 6.93, 6.68
198 processes: 173 sleeping, 1 running, 24 zombie, 0 stopped
CPU0 states: 3.3% user, 1.4% system, 0.0% nice, 94.1% idle
CPU1 states: 2.4% user, 3.1% system, 0.0% nice, 93.3% idle
CPU2 states: 0.4% user, 0.5% system, 0.0% nice, 98.1% idle
CPU3 states: 0.2% user, 0.4% system, 0.1% nice, 98.3% idle
Mem: 1030372K av, 1020480K used, 9892K free, 0K shrd, 2624K
buff
Swap: 4096552K av, 1370228K used, 2726324K free 21208K
cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
16173 bkc 10 -5 988M 762M 718M D < 0.3 75.7 38:00 python2
SQLCopy.py -e /tmp/copyerrors2.txt --copy STSouthTables STSouthTablesS
Charges +
And vmstat
[bkc at strader /service]# vmstat 5
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us sy
id
0 1 0 1370228 9668 2572 19964 0 0 2 1 0 1 1 1
1
0 2 0 1370228 9692 2628 22872 816 0 1431 131 840 1423 2 2
96
0 12 4 1370284 10192 2632 23652 55 1902 466 1927 1010 993 1
2 97
0 17 4 1370284 10180 2632 23684 26 1466 32 1466 985 623 1
1 98
0 17 4 1370680 10064 2364 23960 37 1468 38 1468 931 326 1
1 98
0 3 0 1370604 9336 2100 31420 402 230 2690 314 921 1381 3 4
93
It looks to me like all the processes are deadlocked trying to swap in. I
niced python a little but that made no change.
Note the load average is right up there, but the CPU time is 95% idle!
Anyone seen a condition like this before?
--
Hey, I tried to ^C it and I got:
-11987 COMMUNIC sql03_catch_signal: caught signal 2
-11987 COMMUNIC sql03_catch_signal: caught signal 2
I had to kill it. I wonder if it was blocked in SAP DBAPI and it's not a
cleanup problem.
Oh well, just musing
I'll re-run this with a smarter cache design and see what happens.
--
Novell DeveloperNet Sysop #5
_
More information about the Python-list
mailing list