XP buildbot problem cloning from hg.python.org
Starting yesterday, my XP buildbot began failing to execute clone operations against hg.python.org. There's not a lot of data being given aside from a transaction abort message (and my buildbot log showing the hg command exiting), and I'm wondering if something may be amiss on the server or its configuration? Note that this is a full clone (which for some reason the Windows buildbots seem to fall back on with some frequency) and can take quite a while. My Windows 7 buildbot is ok so far but it's still doing incremental pulls over the same time period. I've got two separate Internet connections here and have tried routing over both so I don't think it's a network issue. I've completely flushed the local build trees and rebooted the buildbot. Is there anything that might be available on the server to see if there are errors being logged? Or anything that could have changed configuration wise recently (maybe timeout related or something)? I'm running a bit low of items to try to change or reset on the buildbot side. Thanks. -- David
Is this using HTTPS or SSH.
On Oct 24, 2014, at 11:47 PM, David Bolen
wrote: Starting yesterday, my XP buildbot began failing to execute clone operations against hg.python.org. There's not a lot of data being given aside from a transaction abort message (and my buildbot log showing the hg command exiting), and I'm wondering if something may be amiss on the server or its configuration?
Note that this is a full clone (which for some reason the Windows buildbots seem to fall back on with some frequency) and can take quite a while. My Windows 7 buildbot is ok so far but it's still doing incremental pulls over the same time period.
I've got two separate Internet connections here and have tried routing over both so I don't think it's a network issue. I've completely flushed the local build trees and rebooted the buildbot.
Is there anything that might be available on the server to see if there are errors being logged? Or anything that could have changed configuration wise recently (maybe timeout related or something)? I'm running a bit low of items to try to change or reset on the buildbot side.
Thanks.
-- David
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io
Donald Stufft
Is this using HTTPS or SSH.
Um, good question - whatever the buildbot build process uses. Looking at the slave log on buildbot.python.org (I don't get the hg output locally), appears to be http (it's cloning http://hg.python.org/cpython) - though I thought I saw it using https (port 443) in some traffic monitoring I was doing, so maybe it gets redirected? Oh yeah, the log also shows "real URL is https://hg.python.org/cpython" as the first output from hg. -- David
On Fri, 24 Oct 2014 23:47:05 -0400
David Bolen
Starting yesterday, my XP buildbot began failing to execute clone operations against hg.python.org. There's not a lot of data being given aside from a transaction abort message (and my buildbot log showing the hg command exiting), and I'm wondering if something may be amiss on the server or its configuration?
Have you tried running the hg clone manually from the buildbot? You could try to add --debug to get more info where the thing breaks. Regards Antoine.
Antoine Pitrou
Have you tried running the hg clone manually from the buildbot? You could try to add --debug to get more info where the thing breaks.
Yes, I had but pretty much got the same output as the buildbot slave. But I just tried --traceback and it's definitely complaining about the connection being terminated. Regular test:
hg clone --verbose --noupdate http://hg.python.org/cpython test real URL is https://hg.python.org/cpython requesting all changes adding changesets adding manifests transaction abort! rollback completed abort: connection ended unexpectedly
Traceback:
hg clone --traceback --verbose --noupdate http://hg.python.org/cpython test real URL is https://hg.python.org/cpython requesting all changes adding changesets adding manifests transaction abort! rollback completed Traceback (most recent call last): File "mercurial\dispatch.pyc", line 54, in _runcatch File "mercurial\dispatch.pyc", line 490, in _dispatch File "mercurial\dispatch.pyc", line 351, in runcommand File "mercurial\dispatch.pyc", line 541, in _runcommand File "mercurial\dispatch.pyc", line 495, in checkargs File "mercurial\dispatch.pyc", line 488, in <lambda> File "mercurial\util.pyc", line 420, in check File "mercurial\commands.pyc", line 725, in clone File "mercurial\hg.pyc", line 334, in clone File "mercurial\localrepo.pyc", line 1853, in clone File "mercurial\localrepo.pyc", line 1206, in pull File "mercurial\localrepo.pyc", line 1695, in addchangegroup File "mercurial\revlog.pyc", line 1239, in addgroup File "mercurial\changegroup.pyc", line 31, in chunkiter File "mercurial\changegroup.pyc", line 20, in getchunk File "mercurial\util.pyc", line 924, in read File "mercurial\httprepo.pyc", line 22, in zgenerator IOError: [Errno None] connection ended unexpectedly abort: connection ended unexpectedly
I also stuck on --debug which generates a metric ton of output, but the final portion is:
hg clone --debug --traceback --verbose --noupdate http://hg.python.org/cpython test
(...) manifests: 5271/93170 chunks (5.66%) manifests: 5272/93170 chunks (5.66%) manifests: 5273/93170 chunks (5.66%) manifests: 5274/93170 chunks (5.66%) manifests: 5275/93170 chunks (5.66%) manifests: 5276/93170 chunks (5.66%) manifests: 5277/93170 chunks (5.66%) manifests: 5278/93170 chunks (5.66%) transaction abort! rollback completed Traceback (most recent call last): File "mercurial\dispatch.pyc", line 54, in _runcatch File "mercurial\dispatch.pyc", line 490, in _dispatch File "mercurial\dispatch.pyc", line 351, in runcommand File "mercurial\dispatch.pyc", line 541, in _runcommand File "mercurial\dispatch.pyc", line 495, in checkargs File "mercurial\dispatch.pyc", line 488, in <lambda> File "mercurial\util.pyc", line 420, in check File "mercurial\commands.pyc", line 725, in clone File "mercurial\hg.pyc", line 334, in clone File "mercurial\localrepo.pyc", line 1853, in clone File "mercurial\localrepo.pyc", line 1206, in pull File "mercurial\localrepo.pyc", line 1695, in addchangegroup File "mercurial\revlog.pyc", line 1239, in addgroup File "mercurial\changegroup.pyc", line 31, in chunkiter File "mercurial\changegroup.pyc", line 20, in getchunk File "mercurial\util.pyc", line 924, in read File "mercurial\httprepo.pyc", line 22, in zgenerator IOError: [Errno None] connection ended unexpectedly abort: connection ended unexpectedly which appears to die mid-stream while receiving the manifests. So I'm sort of hoping there might be some record server-side as to why things are falling apart mid-way. -- David
David Bolen
which appears to die mid-stream while receiving the manifests.
So I'm sort of hoping there might be some record server-side as to why things are falling apart mid-way.
Just to follow-up to myself, I get the same same error trying to do a clone from my own personal XP machine rather than the buildbot (which is a VM). I've had the issue with hg 1.6.2, 2.5.2 and 3.1.2. However, the same clones completely successfully under OSX and Linux. So that's sort of strange. -- David
I was seeing this recently and had to run recover on my repo (not sure what the command line is for that - TortoiseHg had a menu). YMMV, but the symptoms sound the same.
Cheers,
Steve
Top-posted from my Windows Phone
________________________________
From: David Bolenmailto:db3l.net@gmail.com
Sent: 10/24/2014 22:01
To: python-dev@python.orgmailto:python-dev@python.org
Subject: Re: [Python-Dev] XP buildbot problem cloning from hg.python.org
David Bolen
which appears to die mid-stream while receiving the manifests.
So I'm sort of hoping there might be some record server-side as to why things are falling apart mid-way.
Just to follow-up to myself, I get the same same error trying to do a clone from my own personal XP machine rather than the buildbot (which is a VM). I've had the issue with hg 1.6.2, 2.5.2 and 3.1.2. However, the same clones completely successfully under OSX and Linux. So that's sort of strange. -- David _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve.dower%40microsoft.c...
Do you mean your local repo? If so, I don't have a local repo at this
point - the failure is during the first clone.
-- David
On Sat, Oct 25, 2014 at 1:19 AM, Steve Dower
I was seeing this recently and had to run recover on my repo (not sure what the command line is for that - TortoiseHg had a menu). YMMV, but the symptoms sound the same.
Cheers, Steve
What version of OpenSSL is it using.
On Oct 25, 2014, at 1:00 AM, David Bolen
wrote: David Bolen
writes: which appears to die mid-stream while receiving the manifests.
So I'm sort of hoping there might be some record server-side as to why things are falling apart mid-way.
Just to follow-up to myself, I get the same same error trying to do a clone from my own personal XP machine rather than the buildbot (which is a VM). I've had the issue with hg 1.6.2, 2.5.2 and 3.1.2.
However, the same clones completely successfully under OSX and Linux.
So that's sort of strange.
-- David
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io
Donald Stufft
What version of OpenSSL is it using.
I'm using the pre-built Windows Mercurial installer, but if I unpack the included library.zip, the SSLEAY32.DLL shows version 0.9.8r. This is from the 3.1.2 install I just did a few hours ago. It appears that hg 2.5.2 on my other XP box also has 0.9.8r. The prior buildbot version (1.6.2) looks like it had 0.9.8o. I also got around to trying a manual clone on the Windows 7 buildbot, and it worked fine, even with the older hg 1.6.2. So it seems to correlate with XP more than anything else at the moment. -- David
I have an idea, can you run https://bpaste.net/show/c5d7cd102f5b and tell me what it outputs? Both on a machine that works and one that doesn’t.
On Oct 25, 2014, at 2:14 AM, David Bolen
wrote: Donald Stufft
writes: What version of OpenSSL is it using.
I'm using the pre-built Windows Mercurial installer, but if I unpack the included library.zip, the SSLEAY32.DLL shows version 0.9.8r.
This is from the 3.1.2 install I just did a few hours ago. It appears that hg 2.5.2 on my other XP box also has 0.9.8r. The prior buildbot version (1.6.2) looks like it had 0.9.8o.
I also got around to trying a manual clone on the Windows 7 buildbot, and it worked fine, even with the older hg 1.6.2.
So it seems to correlate with XP more than anything else at the moment.
-- David
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Donald Stufft
I have an idea, can you run https://bpaste.net/show/c5d7cd102f5b and tell me what it outputs? Both on a machine that works and one that doesn’t.
All but Linux (so XP/7 buildbots, XP standalone, OSX) return: ('DHE-RSA-AES128-SHA', 'TLSv1/SSLv3', 128) My Linux (Ubuntu 12.04) returns: ('ECDHE-RSA-AES128-SHA', 'TLSv1/SSLv3', 128) The script was run under a default Python on each box (2.6 on Windows, 2.7 on OSX and Linux). I tried 2.6 through 3.1 on my standalone XP with no change, so I don't think it differs by Python version. Its not precisely the same as running hg, since it has its own embedded Python under Windows, but I installed a source install on my XP box under 2.7 and it fails a clone the same way. In new news though, I just the same failure on the Win7 buildbot in a clone test. In repeated attempts, that's the only one so far. I also realized that one shared feature is that the XP boxes were using IPv4 while the other boxes were all IPv6 (an HE tunnel on my side). Though my earlier Win7 failure was also IPv6. I manually forced the Win7 box to use IPv4, but didn't see much difference. It certainly didn't start failing like the XP boxes. Anecdotally, the failing XP attempts appear to be running slower in general (with lower transfer rates as monitored by my router). I have had slow clones work on other boxes, so that's not automatically bad. But I wonder if it's still some sort of timeout somewhere. I don't think I currently have an active ssh account, but if there were a way to test a clone over ssh rather than http perhaps that would be a useful data point, in terms of eliminating some middlemen processing. -- David
As another data point, I've tried cloning randomly selected other repositories from hg.python.org, and smaller repositories (distutils2, peps, jython to name a few) are all working fine under XP, even though with jython for example, the clone takes longer in terms of wall time than I'll often see cpython fail.(*) A test of what I presumed was a more comparably sized repository (features/cdecimal) dies like cpython. -- David (*) Overall clone time is probably unrelated anyway since the XP buildbot traditionally needed 10+min for clones in the past (such as when the new build script changes were in place and every test used a clone) and was working fine with that.
In article
David Bolen
writes: which appears to die mid-stream while receiving the manifests.
So I'm sort of hoping there might be some record server-side as to why things are falling apart mid-way.
Just to follow-up to myself, I get the same same error trying to do a clone from my own personal XP machine rather than the buildbot (which is a VM). I've had the issue with hg 1.6.2, 2.5.2 and 3.1.2.
However, the same clones completely successfully under OSX and Linux.
So that's sort of strange.
Very interesting! I had been doing some housekeeping on some of my older OS X build systems over the past few days and I've run into the same problem. In particular, I am seeing this failure on an OS X 10.5.8 system (running in a Fusion VM) which I've used for years and from which I have regularly cloned repos from hg.python.org. I spent some time yesterday trying to isolate it. I came to the conclusion that it was independent of the version of OpenSSL (identical failures occurred with the system's ancient Apple 0.9.7 as well as a newly-build 1.0.1j) and independent of the version of hg (at least with two data points, current and a year-old version) and seemingly independent of the network connection. I was not able to reproduce the failure on the host OS X system (10.10) and I didn't have problems a few days earlier with various other OS X releases (10.6.x through 10.9.x) also running in VMs on the same host. I stumbled across a workaround for the problem as I was experiencing it: adding --uncompressed to hg clone eliminated failures. You can get more info on the hg failures by adding --traceback and --debugger to the clone command. After spending way too much time on the issue, I was not in the mood to spend more time isolating the problem after finding a workaround but if others are also seeing it, it might be worth doing. Sigh. $ hg --version Mercurial Distributed SCM (version 3.1.2) $ hg clone -U http://hg.python.org/cpython cpython real URL is https://hg.python.org/cpython requesting all changes adding changesets adding manifests transaction abort! rollback completed abort: connection ended unexpectedly $ hg clone --uncompressed -U https://hg.python.org/cpython cpython streaming all changes 10404 files to transfer, 248 MB of data transferred 248 MB in 44.4 seconds (5.58 MB/sec) -- Ned Deily, nad@acm.org
In article
So that's sort of strange. Very interesting! I had been doing some housekeeping on some of my
In article
, David Bolen wrote: older OS X build systems over the past few days and I've run into the same problem. In particular, I am seeing this failure on an OS X 10.5.8 system (running in a Fusion VM) which I've used for years and from which I have regularly cloned repos from hg.python.org. [...]
Update: after consulting with Donald on IRC, it appears that the problem was on the python.org end and is now fixed. David, is it now working again for you? -- Ned Deily, nad@acm.org
Ned Deily
Update: after consulting with Donald on IRC, it appears that the problem was on the python.org end and is now fixed. David, is it now working again for you?
Sorry for the delay - yes, it appears to be working again for me as well. And it looks like clones during the buildbot tests were working again as of tests yesterday. -- David
On 26.10.2014 00:14, Ned Deily wrote:
In article
, David Bolen wrote: David Bolen
writes: which appears to die mid-stream while receiving the manifests.
So I'm sort of hoping there might be some record server-side as to why things are falling apart mid-way.
Just to follow-up to myself, I get the same same error trying to do a clone from my own personal XP machine rather than the buildbot (which is a VM). I've had the issue with hg 1.6.2, 2.5.2 and 3.1.2.
However, the same clones completely successfully under OSX and Linux.
So that's sort of strange.
Very interesting! I had been doing some housekeeping on some of my older OS X build systems over the past few days and I've run into the same problem. In particular, I am seeing this failure on an OS X 10.5.8 system (running in a Fusion VM) which I've used for years and from which I have regularly cloned repos from hg.python.org. I spent some time yesterday trying to isolate it. I came to the conclusion that it was independent of the version of OpenSSL (identical failures occurred with the system's ancient Apple 0.9.7 as well as a newly-build 1.0.1j) and independent of the version of hg (at least with two data points, current and a year-old version) and seemingly independent of the network connection. I was not able to reproduce the failure on the host OS X system (10.10) and I didn't have problems a few days earlier with various other OS X releases (10.6.x through 10.9.x) also running in VMs on the same host. I stumbled across a workaround for the problem as I was experiencing it: adding --uncompressed to hg clone eliminated failures. You can get more info on the hg failures by adding --traceback and --debugger to the clone command. After spending way too much time on the issue, I was not in the mood to spend more time isolating the problem after finding a workaround but if others are also seeing it, it might be worth doing. Sigh.
$ hg --version Mercurial Distributed SCM (version 3.1.2) $ hg clone -U http://hg.python.org/cpython cpython real URL is https://hg.python.org/cpython requesting all changes adding changesets adding manifests transaction abort! rollback completed abort: connection ended unexpectedly $ hg clone --uncompressed -U https://hg.python.org/cpython cpython streaming all changes 10404 files to transfer, 248 MB of data transferred 248 MB in 44.4 seconds (5.58 MB/sec)
If compression is causing the problem, perhaps there's an incompatibility with the use zlib version between the host and your client system. hg.python.org was recently updated to a new Ubuntu version. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 26 2014)
Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2014-10-24: Released eGenix pyOpenSSL 0.13.5 ... http://egenix.com/go63 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
participants (6)
-
Antoine Pitrou
-
David Bolen
-
Donald Stufft
-
M.-A. Lemburg
-
Ned Deily
-
Steve Dower