FreeBSD 7 amd64 and large memory tests
I noticed issue 3862 (http://bugs.python.org/issue3862), which appears to be exposed only on FreeBSD 7.0 amd64 (perhaps 64 bit FreeBSD 7, but amd64 is the only 64bit hardware I can check). As far as I can make out from a couple of quick checks, this appears to be happening because the process is killed before the malloc() that exhausts the available swap space is allowed to report an out of memory condition. FreeBSD's default process virtual memory limit (ulimit -v) on both i386 and amd64 is "unlimited" for 6.3 and 7.0. Setting ulimit -v to a finite value on 7.0 amd64 seems to give rise to "normal" malloc() behaviour so that test_array passes. FreeBSD 7 uses a different malloc() implementation than FreeBSD 6 and earlier, and the new implementation uses mmap() in preference to sbrk(). So I suspect that behaviour was "normal" on FreeBSD 6 amd64 (can't test at the moment as I don't have that installed anywhere). Behaviour appears "normal" on FreeBSD 7 i386 as well. I haven't yet tried posting a query to a FreeBSD list, as it could simply be a bug on amd64, but I was wondering whether there was anything (other than deactivating tests and documenting use of ulimit -v on this platform) that could be done to work around this behaviour. Andrew. -- ------------------------------------------------------------------------- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au (pref) | Snail: PO Box 370 andymac@pcug.org.au (alt) | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia
I haven't yet tried posting a query to a FreeBSD list, as it could simply be a bug on amd64, but I was wondering whether there was anything (other than deactivating tests and documenting use of ulimit -v on this platform) that could be done to work around this behaviour.
I think it should be possible to debug this (for people with access to such a system, and appropriate skills). I find it hard to believe that a large malloc will simply crash the process, rather than returning NULL. More likely, there is a NULL returned somewhere, and Python (or libc) fails to check for it. Regards, Martni
Martin v. Löwis wrote:
I haven't yet tried posting a query to a FreeBSD list, as it could simply be a bug on amd64, but I was wondering whether there was anything (other than deactivating tests and documenting use of ulimit -v on this platform) that could be done to work around this behaviour.
I think it should be possible to debug this (for people with access to such a system, and appropriate skills).
I find it hard to believe that a large malloc will simply crash the process, rather than returning NULL. More likely, there is a NULL returned somewhere, and Python (or libc) fails to check for it.
A simple C program doing a repetitive malloc(), much as pymalloc would with continuous demand, does indeed not see any NULL from malloc() when swap is exhausted but the process gets KILLed (the allocated memory does have to be written to to force the situation...) I'll take this up with FreeBSD folk, but I'm open to ideas as to how best to deal with the problem in the context of the test suite pending resolution by FreeBSD. Regards, Andrew. -- ------------------------------------------------------------------------- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au (pref) | Snail: PO Box 370 andymac@pcug.org.au (alt) | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia
On Sep 17, 2008, at 10:53 AM, Andrew MacIntyre wrote:
Martin v. Löwis wrote:
I haven't yet tried posting a query to a FreeBSD list, as it could simply be a bug on amd64, but I was wondering whether there was anything (other than deactivating tests and documenting use of ulimit -v on this platform) that could be done to work around this behaviour. I think it should be possible to debug this (for people with access to such a system, and appropriate skills). I find it hard to believe that a large malloc will simply crash the process, rather than returning NULL. More likely, there is a NULL returned somewhere, and Python (or libc) fails to check for it.
A simple C program doing a repetitive malloc(), much as pymalloc would with continuous demand, does indeed not see any NULL from malloc() when swap is exhausted but the process gets KILLed (the allocated memory does have to be written to to force the situation...)
I'll take this up with FreeBSD folk, but I'm open to ideas as to how best to deal with the problem in the context of the test suite pending resolution by FreeBSD.
Linux does the same thing, unless the user has explicitly configured that behavior off. Search the web for linux overcommit. It's controlled by the vm.overcommit_memory sysctl. Although linux's default is some heuristic that might make Python's test case work right in most cases, depending on malloc returning NULL is not something you can actually depend on. James
Unbelievable as this may seem, this crazy over-committing malloc
behavior is by now "a classic" -- I first fought against it in 1990,
when IBM released AIX 3 for its then-new RS/6000 line of workstations;
in a later minor release they did provide a way to optionally switch
this off, but, like on Linux, it's a system-wide switch, NOT
per-process:-(.
I concur with http://www.win.tue.nl/~aeb/linux/lk/lk-9.html (the best
explanation I know of the subject, and recommended reading) which, on
this subject, says "Linux on the other hand is seriously broken."
(just like AIX 3 was). Sad to learn that BSD is now also broken in
the same way:-(.
Alex
On Wed, Sep 17, 2008 at 8:00 AM, James Y Knight
On Sep 17, 2008, at 10:53 AM, Andrew MacIntyre wrote:
Martin v. Löwis wrote:
I haven't yet tried posting a query to a FreeBSD list, as it could simply be a bug on amd64, but I was wondering whether there was anything (other than deactivating tests and documenting use of ulimit -v on this platform) that could be done to work around this behaviour.
I think it should be possible to debug this (for people with access to such a system, and appropriate skills). I find it hard to believe that a large malloc will simply crash the process, rather than returning NULL. More likely, there is a NULL returned somewhere, and Python (or libc) fails to check for it.
A simple C program doing a repetitive malloc(), much as pymalloc would with continuous demand, does indeed not see any NULL from malloc() when swap is exhausted but the process gets KILLed (the allocated memory does have to be written to to force the situation...)
I'll take this up with FreeBSD folk, but I'm open to ideas as to how best to deal with the problem in the context of the test suite pending resolution by FreeBSD.
Linux does the same thing, unless the user has explicitly configured that behavior off. Search the web for linux overcommit. It's controlled by the vm.overcommit_memory sysctl. Although linux's default is some heuristic that might make Python's test case work right in most cases, depending on malloc returning NULL is not something you can actually depend on.
James _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/aleaxit%40gmail.com
On Wed, Sep 17, 2008 at 08:21:55AM -0700, Alex Martelli wrote:
Unbelievable as this may seem, this crazy over-committing malloc behavior is by now "a classic" -- I first fought against it in 1990, when IBM released AIX 3 for its then-new RS/6000 line of workstations; in a later minor release they did provide a way to optionally switch this off, but, like on Linux, it's a system-wide switch, NOT per-process:-(.
I concur with http://www.win.tue.nl/~aeb/linux/lk/lk-9.html (the best explanation I know of the subject, and recommended reading) which, on this subject, says "Linux on the other hand is seriously broken." (just like AIX 3 was). Sad to learn that BSD is now also broken in the same way:-(.
It's now "now" also broken, it has been that way for a very long time. For example, see this message I wrote back in July 1999 complaining about FreeBSD overcommit: http://www.mail-archive.com/freebsd-hackers@freebsd.org/msg01056.html
I'll take this up with FreeBSD folk, but I'm open to ideas as to how best to deal with the problem in the context of the test suite pending resolution by FreeBSD.
Not sure what the test purpose is: if it is to test that you get a MemoryError in cases where you ask for more than Python could represent, and the test errs on system where the requested size is actually representable, the solution then is to fix the test case. If the test purpose is to trigger a memory error for cases when the system runs out of memory, the test case should set a ulimit to less than the physical memory. It might be useful to have an interpreter-maintained limit on the amount of memory Python can consume, but such a feature is clearly out of scope for the current state of the code. Regards, Martin
On Sep 17, 2008, at 12:45 PM, Martin v. Löwis wrote:
I'll take this up with FreeBSD folk, but I'm open to ideas as to how best to deal with the problem in the context of the test suite pending resolution by FreeBSD.
Not sure what the test purpose is: if it is to test that you get a MemoryError in cases where you ask for more than Python could represent, and the test errs on system where the requested size is actually representable, the solution then is to fix the test case.
If the test purpose is to trigger a memory error for cases when the system runs out of memory, the test case should set a ulimit to less than the physical memory.
It might be useful to have an interpreter-maintained limit on the amount of memory Python can consume, but such a feature is clearly out of scope for the current state of the code.
There is an option at least on linux to hack using ld preload to use another memory manager that respond the way needed... at least that was what I was told today at lunch. (if ulimit is not enough for any reason). -- Leonardo Santagada santagada at gmail.com
There is an option at least on linux to hack using ld preload to use another memory manager that respond the way needed... at least that was what I was told today at lunch. (if ulimit is not enough for any reason).
For Python, there would be much less hackish ways. Most if not all calls to memory allocators are channeled through Python API, which could be diverted on source level also. Regards, Martin
Andrew MacIntyre wrote:
I'll take this up with FreeBSD folk, but I'm open to ideas as to how best to deal with the problem in the context of the test suite pending resolution by FreeBSD.
The response I got from Jason Evans (author of the new malloc() implementation), along with that of another respondent, indicates that the behaviour on FreeBSD 7.1 and later will (mostly) be restored to that similar to 6.x and earlier through the default use of sbrk() and consequent obedience to the data segment size limit (ulimit -d) - which defaults to 512MB in a standard FreeBSD install in recent times. The residual problem (as of 7.1) is that malloc() defaults to falling back to the mmap() strategy when it can't get more address space via sbrk(). As noted in the tracker item for issue 3862, the only way to control this is the virtual memory size limit (ulimit -v), which unfortunately defaults to "unlimited"... FreeBSD's malloc() can be tuned in several ways, so it is possible to force use of the sbrk() only strategy (as of 7.1) which would exactly match behaviour of the old malloc(). It seems to me that the most practical way forward is to just institute a policy that tests that want to try and test out of memory behaviour must ensure that appropriate resource limits are in place; if they can't (such as because the platform running the tests doesn't support getrlimit()/setrlimit()) the test should be skipped. As Mark Dickinson has suggested a patch for issue 3862 which should worm around the issue with test_array on 64 bit platforms, I think we can move forward for the time being. Cheers, Andrew. -- ------------------------------------------------------------------------- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au (pref) | Snail: PO Box 370 andymac@pcug.org.au (alt) | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia
It seems to me that the most practical way forward is to just institute a policy that tests that want to try and test out of memory behaviour must ensure that appropriate resource limits are in place
IMO, there shouldn't be any tests in the test suite that rely on exhaustion of all available memory. The MemoryError tests should all deal with overflow situations only. If stress-testing is desired, it should be done with platform support, i.e. with a malloc implementation that randomly fails. OTOH, I would hope that the static-analysis tools that Python gets run through find failures to properly check for NULL results much better than stress-testing. Regards, Martin
participants (6)
-
"Martin v. Löwis"
-
Alex Martelli
-
Andrew MacIntyre
-
James Y Knight
-
Jon Ribbens
-
Leonardo Santagada