[Python-Dev] test_subprocess and sparc buildbots

Alexandre Vassalotti alexandre at peadrop.com
Wed Dec 31 03:37:01 CET 2008


Here is what I found just by analyzing the logs. It seems the first
failures appeared after this change:

http://svn.python.org/view/python/branches/release30-maint/Objects/object.c?rev=67888&view=diff&r1=67888&r2=67887&p1=python/branches/release30-maint/Objects/object.c&p2=/python/branches/release30-maint/Objects/object.c

The logs of failing test runs all shows the same error message:

[31481 refs]
* ob
object  : <refcnt 0 at 0x3a97728>
type    : str
refcount: 0
address : 0x3a97728
* op->_ob_prev->_ob_next
object  : <refcnt 0 at 0x3a97728>
type    : str
refcount: 0
address : 0x3a97728
* op->_ob_next->_ob_prev
object  : [31776 refs]

This is the output of _Py_ForgetReference (which calls _PyObject_Dump)
called either from _PyUnicode_New or unicode_subtype_new. In both
cases, this implies PyObject_MALLOC returned NULL when allocating the
internal array of a str object. However, I have no idea why malloc()
is failing there.

By counting the number of [reftotal] printed in the log, I found that
the failing test could be one of the following: test_invalid_args,
test_invalid_bufsize, test_list2cmdline, test_no_leaking. Looking at
the tests, it seems only test_no_leaking could be problematic:

* test_list2cmdline checks if the subprocess.line2cmdline function
  works correctly, only Python code is involved here;
* test_invalid_args checks if using an option unsupported by a
platform raises an
  exception, only Python code is involved here;
* test_invalid_bufsize only checks whether Popen rejects non-integer
bufsize, only
  Python code is involved here.

And unsurprisingly, that is the failing test:

test test_subprocess failed -- Traceback (most recent call last):
  File "/home/pybot/buildarea-sid/3.0.klose-debian-sparc/build/Lib/test/test_subprocess.py",
line 423, in test_no_leaking
    data = p.communicate(b"lime")[0]
  File "/home/pybot/buildarea-sid/3.0.klose-debian-sparc/build/Lib/subprocess.py",
line 671, in communicate
    return self._communicate(input)
  File "/home/pybot/buildarea-sid/3.0.klose-debian-sparc/build/Lib/subprocess.py",
line 1171, in _communicate
    bytes_written = os.write(self.stdin.fileno(), chunk)
OSError: [Errno 32] Broken pipe

It seems one of the spawned processes goes out of memory while
allocating a new PyUnicode object. I believe we don't see the usual
MemoryError because the parent process catches stderr and stdout of
the children.

Also, only klose-*-sparc buildbots are failing this way; loewis-sun is
failing too but for a different reason. So, how much memory is
available on this machine (or actually, on this virtual machine)?

Now, I wonder why manipulating the GIL caused the bug to appear in
3.0, but not in 2.x. Maybe it is related to the new I/O library in
Python 3.0.

Regards,
-- Alexandre

On Tue, Dec 30, 2008 at 4:20 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Does anyone have local access to a sparc machine to try to track down
> the ongoing buildbot failures in test_subprocess?
>
> (I think the problem is specific to 3.x builds on sparc machines, but I
> haven't checked the buildbots all that closely - that assessment is just
> based on what I recall of the buildbot failure emails).
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> ---------------------------------------------------------------
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/alexandre%40peadrop.com
>


More information about the Python-Dev mailing list