Travis CI: macOS is now blocking -- remove macOS from Travis CI?
Hi,
Since today, it seems like the macOS task of a Travis CI job to validate a pull request hangs the whole job.
Don't try to cancel the macOS job, or the whole job will be marked as failed! ... even if macOS is in the "Allowed Failure" section. I don't know the best way to "repair" such job. I use "restart job" which restarts all tasks, even the completed *and successful* Linux and doc jobs.
I have PRs waiting for longer than 2 hours for Travis CI. The macOS job is seen as "queued".
Yesterday, it was possible to merge a PR even if the macOS job was still queued (no started).
I never wait for macOS, since, as I wrote, it can take longer than 1 hour. Moreover, macOS failures are not reported to the GitHub UI :-( (Hum, in fact, I'm not sure about that.)
Maybe we should remove the pre-commit macOS task from the Travis CI config to focus on post-commit macOS buildbots? If we remove it, should we remove it from 2.7, 3.6 and master branches?
We have 3 macOS buildbots:
- x86 Tiger 3.x
- x86-64 El Capitan 3.x
- x86-64 Sierra 3.x
All three are currently green ;-)
In the last 3 months, the macOS task of Travis CI caused multiple issues :-/
Victor
Le 01/09/2017 à 19:15, Victor Stinner a écrit :
Yesterday, it was possible to merge a PR even if the macOS job was still queued (no started).
It's still possible today.
Maybe we should remove the pre-commit macOS task from the Travis CI config to focus on post-commit macOS buildbots?
Even though macOS is slow on Travis CI, I prefer to get a pre-merge failure rather than have to watch the buildbots and try to merge followup PRs until I manage to fix the issue.
In the cases where I've no reason to suspect platform-specific issues, I simply don't wait for the macOS result.
Regards
Antoine.
Le 1 sept. 2017 7:24 PM, "Antoine Pitrou" antoine@python.org a écrit :
Le 01/09/2017 à 19:15, Victor Stinner a écrit :
Yesterday, it was possible to merge a PR even if the macOS job was still queued (no started).
It's still possible today.
Ah? The merge button was disabled whereas AppVeyor and the 2 mandatory tests of Travis CI already passed. I will check again.
Victor
Hi,
I was bitten again by the issue on https://github.com/python/cpython/pull/3350
After restarting the Travis CI build twice (first by me, then by Zach), I was able to merge it. But it's painful to have to restart a whole build. And it wastes Travis CI resources :-(
So I just proposed to drop the macOS job: https://bugs.python.org/issue31355
Please read the issue for the full rationale.
Victor
2017-09-01 19:15 GMT+02:00 Victor Stinner victor.stinner@gmail.com:
Hi,
Since today, it seems like the macOS task of a Travis CI job to validate a pull request hangs the whole job.
Don't try to cancel the macOS job, or the whole job will be marked as failed! ... even if macOS is in the "Allowed Failure" section. I don't know the best way to "repair" such job. I use "restart job" which restarts all tasks, even the completed *and successful* Linux and doc jobs.
I have PRs waiting for longer than 2 hours for Travis CI. The macOS job is seen as "queued".
Yesterday, it was possible to merge a PR even if the macOS job was still queued (no started).
I never wait for macOS, since, as I wrote, it can take longer than 1 hour. Moreover, macOS failures are not reported to the GitHub UI :-( (Hum, in fact, I'm not sure about that.)
Maybe we should remove the pre-commit macOS task from the Travis CI config to focus on post-commit macOS buildbots? If we remove it, should we remove it from 2.7, 3.6 and master branches?
We have 3 macOS buildbots:
- x86 Tiger 3.x
- x86-64 El Capitan 3.x
- x86-64 Sierra 3.x
All three are currently green ;-)
In the last 3 months, the macOS task of Travis CI caused multiple issues :-/
Victor
Hi,
The macOS job has been removed from Travis CI at the beginnig of the CPython sprint two weeks ago. Since the macOS build was removed, I'm less annoyed by Travis CI: it seems more stable.
Are you ok to not add again the macOS job to Travis CI?
Again, my rationale is that we already have 3 macOS buildbots and I'm looking closely at all buildbot failures. I try to keep track of *all* failures, even random failure. A recent macOS example: https://bugs.python.org/issue31510
Sadly, remaining random failures are the most rare and most difficult to reproduce. (I fixed a lot of them last months.)
Victor
2017-09-06 1:30 GMT+02:00 Victor Stinner victor.stinner@gmail.com:
Hi,
I was bitten again by the issue on https://github.com/python/cpython/pull/3350
After restarting the Travis CI build twice (first by me, then by Zach), I was able to merge it. But it's painful to have to restart a whole build. And it wastes Travis CI resources :-(
So I just proposed to drop the macOS job: https://bugs.python.org/issue31355
Please read the issue for the full rationale.
Victor
2017-09-01 19:15 GMT+02:00 Victor Stinner victor.stinner@gmail.com:
Hi,
Since today, it seems like the macOS task of a Travis CI job to validate a pull request hangs the whole job.
Don't try to cancel the macOS job, or the whole job will be marked as failed! ... even if macOS is in the "Allowed Failure" section. I don't know the best way to "repair" such job. I use "restart job" which restarts all tasks, even the completed *and successful* Linux and doc jobs.
I have PRs waiting for longer than 2 hours for Travis CI. The macOS job is seen as "queued".
Yesterday, it was possible to merge a PR even if the macOS job was still queued (no started).
I never wait for macOS, since, as I wrote, it can take longer than 1 hour. Moreover, macOS failures are not reported to the GitHub UI :-( (Hum, in fact, I'm not sure about that.)
Maybe we should remove the pre-commit macOS task from the Travis CI config to focus on post-commit macOS buildbots? If we remove it, should we remove it from 2.7, 3.6 and master branches?
We have 3 macOS buildbots:
- x86 Tiger 3.x
- x86-64 El Capitan 3.x
- x86-64 Sierra 3.x
All three are currently green ;-)
In the last 3 months, the macOS task of Travis CI caused multiple issues :-/
Victor
On Sep 19, 2017, at 15:32, Victor Stinner victor.stinner@gmail.com wrote:
The macOS job has been removed from Travis CI at the beginnig of the CPython sprint two weeks ago. Since the macOS build was removed, I'm less annoyed by Travis CI: it seems more stable.
Are you ok to not add again the macOS job to Travis CI?
Again, my rationale is that we already have 3 macOS buildbots and I'm looking closely at all buildbot failures. I try to keep track of *all* failures, even random failure. A recent macOS example: https://bugs.python.org/issue31510
Sadly, remaining random failures are the most rare and most difficult to reproduce. (I fixed a lot of them last months.)
If the macOS tests aren’t stable, then yes, removing them is better than frustrating developers who can’t reproduce CI failures, even on the CI machines let alone their own development boxes.
I forget though, was it a problem with macOS CI stability or general throughput? I thought they just couldn’t keep up with the workload, in which case it seems like we should be able to throw more resources at it, right?
-Barry
On Tue, 19 Sep 2017 at 15:04 Barry Warsaw barry@python.org wrote:
On Sep 19, 2017, at 15:32, Victor Stinner victor.stinner@gmail.com wrote:
The macOS job has been removed from Travis CI at the beginnig of the CPython sprint two weeks ago. Since the macOS build was removed, I'm less annoyed by Travis CI: it seems more stable.
Are you ok to not add again the macOS job to Travis CI?
Again, my rationale is that we already have 3 macOS buildbots and I'm looking closely at all buildbot failures. I try to keep track of *all* failures, even random failure. A recent macOS example: https://bugs.python.org/issue31510
Sadly, remaining random failures are the most rare and most difficult to reproduce. (I fixed a lot of them last months.)
If the macOS tests aren’t stable, then yes, removing them is better than frustrating developers who can’t reproduce CI failures, even on the CI machines let alone their own development boxes.
I forget though, was it a problem with macOS CI stability or general throughput? I thought they just couldn’t keep up with the workload, in which case it seems like we should be able to throw more resources at it, right?
If it is a Travis issue then there are no more resources to throw at it from a Travis perspective: what they are already providing us is rather large and paying out of pocket is rather costly. The only other option is to find another CI provider who has macOS support and use them just for that platform.
If you find a macOS CI platform with more capacity, please let me know :-)
Travis has been totally underwater of late, but I don't know of any alternatives; probably because operating a fleet of macOS builders is a giant pain. You need Apple hardware, and it turns out you can either purchase a trashcan or a mac mini, and neither of those is really designed for a server farm.
If anyone here can magically whisper in Tim Cook's ear, can you ask him to license macOS to AWS or Google Cloud or something?
:-(, Alex
On Tue, Sep 19, 2017 at 7:02 PM, Brett Cannon brett@python.org wrote:
On Tue, 19 Sep 2017 at 15:04 Barry Warsaw barry@python.org wrote:
On Sep 19, 2017, at 15:32, Victor Stinner victor.stinner@gmail.com wrote:
The macOS job has been removed from Travis CI at the beginnig of the CPython sprint two weeks ago. Since the macOS build was removed, I'm less annoyed by Travis CI: it seems more stable.
Are you ok to not add again the macOS job to Travis CI?
Again, my rationale is that we already have 3 macOS buildbots and I'm looking closely at all buildbot failures. I try to keep track of *all* failures, even random failure. A recent macOS example: https://bugs.python.org/issue31510
Sadly, remaining random failures are the most rare and most difficult to reproduce. (I fixed a lot of them last months.)
If the macOS tests aren’t stable, then yes, removing them is better than frustrating developers who can’t reproduce CI failures, even on the CI machines let alone their own development boxes.
I forget though, was it a problem with macOS CI stability or general throughput? I thought they just couldn’t keep up with the workload, in which case it seems like we should be able to throw more resources at it, right?
If it is a Travis issue then there are no more resources to throw at it from a Travis perspective: what they are already providing us is rather large and paying out of pocket is rather costly. The only other option is to find another CI provider who has macOS support and use them just for that platform.
python-committers mailing list python-committers@python.org https://mail.python.org/mailman/listinfo/python-committers Code of Conduct: https://www.python.org/psf/codeofconduct/
-- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero GPG Key fingerprint: D1B3 ADC0 E023 8CA6
On Sep 19, 2017, at 19:33, Alex Gaynor alex.gaynor@gmail.com wrote:
If you find a macOS CI platform with more capacity, please let me know :-)
Travis has been totally underwater of late, but I don't know of any alternatives; probably because operating a fleet of macOS builders is a giant pain. You need Apple hardware, and it turns out you can either purchase a trashcan or a mac mini, and neither of those is really designed for a server farm.
If anyone here can magically whisper in Tim Cook's ear, can you ask him to license macOS to AWS or Google Cloud or something?
Brett will love my musings: one could imagine a fleet of hackintoshes talking to a flexible CI runner infrastructure as is available on some alternative hosting platforms. Not that I, ahem, know anything about that.
Cheers, -Barry
Le 20 sept. 2017 00:03, "Barry Warsaw" barry@python.org a écrit :
I forget though, was it a problem with macOS CI stability or general throughput? I thought they just couldn’t keep up with the workload, in which case it seems like we should be able to throw more resources at it, right?
There were multiple issues.
It was not uncommon that macOS sometimes took 30 min or 1h if not longer.
The macOS job was not mandatory. So failures were not reported to GitHub. Moreover, since the job was much slower than the other pre-commit CIs, I never looked at it.
Sometimes, the optional macOS job was queue but blocked a PR to be merged. It's a Travis CI bug and I don't want to investigate or report it for the other reasons.
Python tests are very stable on macOS (on buildbots). So yes, it's an issue specific to Travis.
Victor
On 20 September 2017 at 12:04, Victor Stinner victor.stinner@gmail.com wrote:
Python tests are very stable on macOS (on buildbots). So yes, it's an issue specific to Travis.
Although as Alex explains, that isn't really Travis CI's *fault* - it's an artifact of the licensing design for macOS being generally hostile to the "dynamic worker pool" model typically used for pre-merge CI infrastructure management.
macOS is much more amenable to the post-commit model we use for the buildbot fleet, since we're not trying to manage an elastic pool of machines there - we have a static set of machines that work through the merged commits.
Cheers, NIck.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
For the record: https://blog.travis-ci.com/2017-09-22-macos-update
Regards
Antoine.
Le 01/09/2017 à 19:15, Victor Stinner a écrit :
Hi,
Since today, it seems like the macOS task of a Travis CI job to validate a pull request hangs the whole job.
Don't try to cancel the macOS job, or the whole job will be marked as failed! ... even if macOS is in the "Allowed Failure" section. I don't know the best way to "repair" such job. I use "restart job" which restarts all tasks, even the completed *and successful* Linux and doc jobs.
I have PRs waiting for longer than 2 hours for Travis CI. The macOS job is seen as "queued".
Yesterday, it was possible to merge a PR even if the macOS job was still queued (no started).
I never wait for macOS, since, as I wrote, it can take longer than 1 hour. Moreover, macOS failures are not reported to the GitHub UI :-( (Hum, in fact, I'm not sure about that.)
Maybe we should remove the pre-commit macOS task from the Travis CI config to focus on post-commit macOS buildbots? If we remove it, should we remove it from 2.7, 3.6 and master branches?
We have 3 macOS buildbots:
- x86 Tiger 3.x
- x86-64 El Capitan 3.x
- x86-64 Sierra 3.x
All three are currently green ;-)
In the last 3 months, the macOS task of Travis CI caused multiple issues :-/
Victor
python-committers mailing list python-committers@python.org https://mail.python.org/mailman/listinfo/python-committers Code of Conduct: https://www.python.org/psf/codeofconduct/
Ok. I closed https://bugs.python.org/issue31355
Victor
2017-09-23 22:58 GMT+02:00 Antoine Pitrou antoine@python.org:
For the record: https://blog.travis-ci.com/2017-09-22-macos-update
Regards
Antoine.
Le 01/09/2017 à 19:15, Victor Stinner a écrit :
Hi,
Since today, it seems like the macOS task of a Travis CI job to validate a pull request hangs the whole job.
Don't try to cancel the macOS job, or the whole job will be marked as failed! ... even if macOS is in the "Allowed Failure" section. I don't know the best way to "repair" such job. I use "restart job" which restarts all tasks, even the completed *and successful* Linux and doc jobs.
I have PRs waiting for longer than 2 hours for Travis CI. The macOS job is seen as "queued".
Yesterday, it was possible to merge a PR even if the macOS job was still queued (no started).
I never wait for macOS, since, as I wrote, it can take longer than 1 hour. Moreover, macOS failures are not reported to the GitHub UI :-( (Hum, in fact, I'm not sure about that.)
Maybe we should remove the pre-commit macOS task from the Travis CI config to focus on post-commit macOS buildbots? If we remove it, should we remove it from 2.7, 3.6 and master branches?
We have 3 macOS buildbots:
- x86 Tiger 3.x
- x86-64 El Capitan 3.x
- x86-64 Sierra 3.x
All three are currently green ;-)
In the last 3 months, the macOS task of Travis CI caused multiple issues :-/
Victor
python-committers mailing list python-committers@python.org https://mail.python.org/mailman/listinfo/python-committers Code of Conduct: https://www.python.org/psf/codeofconduct/
python-committers mailing list python-committers@python.org https://mail.python.org/mailman/listinfo/python-committers Code of Conduct: https://www.python.org/psf/codeofconduct/
participants (6)
-
Alex Gaynor
-
Antoine Pitrou
-
Barry Warsaw
-
Brett Cannon
-
Nick Coghlan
-
Victor Stinner