
Hi there, In my travails to get py-lmdb (github.com/dw/py-lmdb) into a state I can depend on, I've been exploring continuous integration options since sometime mid last year, in the hopes of finding the simplest option that allows the package to be tested on all sensible versions of Python that I might bump into in a commercial environment. It seems this simple task currently yields one solution - Travis CI, except that by default Travis supports neither Windows, nor any version of Python older than about 3 months, without requiring serious manual configuration and maintenance. The library just about has a working Travis CI at this point – although it is incredibly ugly. Testing against all recent versions of Python (by "recent" I mean "anything I've seen and might reasonably expect to bump into while consulting, i.e. since 2.5") requires around 14 Debian packages to be installed, followed by installation of compatible versions of easy_install and Pip for each interesting Python version. The result just about works, but this week I'm again finding that it doesn't quite go far enough, as I begin working on a Windows binary distribution for my package. Windows is almost the same problem all over again - testing supported/sensible extension module builds requires 14 binary installations of Python (one for each of 32 bit and 64 bit), 3 distinct versions of Microsoft Visual Studio (that I'm aware of so far), and a compatible version of Windows to run it all on. So the work done to support CI on Travis more or less isn't enough. It seems the best option now is a custom Jenkins setup, and acquiring licenses for all the Microsoft software necessary to build the package. You may wonder why one might care about so many old Python versions in weird and exotic environments - to which I can't really give a better answer than that I've lost so much time working with packages that suffer from platform/version compatibility issues, that I'd like to avoid it in my own code. While it isn't so true in web development, there are pretty huge installations of Python around the world, many of which aren't quite near migrating to 2.6, never mind 3.4. This can happen for different reasons, from maintaining private forks of Python (perhaps in a post-audited state, where another audit simply isn't in the budget), through to having thousands of customers who have written automation scripts for their applications using an older dialect. In any case it's enough to note these installations do and will continue to exist, than to question why they exist.. So in the coming months I'll probably end up configuring a Jenkins, porting the Travis CI scripts over to it to a private Ubuntu VM configured as a Jenkins runner, and setting up a new Windows VM as a second runner, but it seems entirely wasteful that so much effort should be made for just one project. And that leads to thinking about a Python-specific CI service, that could support builds similar to Travis CI, except for all configurations of Python still in active use. One aspect of such a service that is particularly interesting, is the ability to centralize management of the required licenses – the PSF has a much greater reputation, and is much more likely to be granted a blanket license covering a CI service for all open source Python code, than each individual project is by applying individually. Licensing aside, having a 0-cost option for any Python project to be tested would improve the quality of our ecosystem overall, especially in the middle of the largest transition to date - the 3.x series. Following from the ecosystem improvement angle, there might be no reason a Python-specific CI service could not automatically download, build and py.test every new package entering PyPI, producing as side effect scoreboards similar to the Python 3 Wall of Superpowers (http://python3wos.appspot.com/). This would make a sound base for building future improvements for PyPI - for example a lint service, verifying "setup.py" is not writing to /etc/passwd, or uploaded eggs aren't installing a new SVCHOST.EXE to %SystemDir%. It could even make sense to have the hypothetical service produce signed, "golden" artefacts for upload to PyPI - binaries built from source in a trusted environment, free of any Bitcoin ransom-ware surreptitiously inserting itself into the bdist_egg ZIP file. There are many problems with providing a CI service, for example, the handling of third party commercial libraries required during the build process. An example of this might be producing a build of cx_Oracle, where Oracle's proprietary SDK is a prerequisite. Still, it would not require many resources, or even very much code, to provide a service that could minimally cover at least basic needs for purely open source software, and so the idea appeals to me. Any thoughts? David

On Apr 18, 2014, at 2:04 PM, David Wilson <dw+python-ideas@hmmz.org> wrote:
Travis comes with 2.6, 2.7, 3.2, and 3.3 as well as PyPy 2.2.1 out of the box and I’ve got a PR merged there which is waiting on a deploy which will expand that out to also include 3.4 (as well as making a command line available that will build 2.4, 2.5, Jython, and others (Basically the latest versions of the above are installed, and you can use ``python-build`` to build anything located here https://github.com/yyuu/pyenv/tree/master/plugins/python-build/share/python-...). And honestly with the new PR I’ve laid out in Travis adding any older versions to be supported out of the box is simple and the biggest barrier is likely convincing the Travis folks they are needed (For example, if you’re distributing on PyPI the number of 2.5 or 3.1 users are *tiny* https://caremad.io/blog/a-look-at-pypi-downloads/).
The result just about works, but this week I'm again finding that it doesn't quite go far enough, as I begin working on a Windows binary distribution for my package.
Windows is almost the same problem all over again - testing supported/sensible extension module builds requires 14 binary installations of Python (one for each of 32 bit and 64 bit), 3 distinct versions of Microsoft Visual Studio (that I'm aware of so far), and a compatible version of Windows to run it all on.
As far as I’m aware the Travis-CI folks are working on Windows support but there is currently no ETA on it as they have to figure out licensing concerns and the like.
So the work done to support CI on Travis more or less isn't enough. It seems the best option now is a custom Jenkins setup, and acquiring licenses for all the Microsoft software necessary to build the package.
One thing to look out for here, is that you can’t test arbitrary PRs easily with this solution without whitelisting contributors because Travis runs a PR test in an isolated VM instance, whereas Jenkins runs it on a shared machine, and this shared machine through the jenkins API has more or less full control of the entire jenkins cluster.
I’ve thought about working on something like this for PyPI, one of the big barriers there being that there is no central way to define what versions you support, what OSs, what required _external_ dependencies you have, or even what your test runner is (besides setup.py test which is far from standard or ubiquitous).
It could even make sense to have the hypothetical service produce signed, "golden" artefacts for upload to PyPI - binaries built from source in a trusted environment, free of any Bitcoin ransom-ware surreptitiously inserting itself into the bdist_egg ZIP file.
A build farm for Wheels is something else that I’d love to do.
There are many problems with providing a CI service, for example, the handling of third party commercial libraries required during the build process. An example of this might be producing a build of cx_Oracle, where Oracle's proprietary SDK is a prerequisite.
Still, it would not require many resources, or even very much code, to provide a service that could minimally cover at least basic needs for purely open source software, and so the idea appeals to me.
I think you’re vastly underestimating the complexity in machine resources, code, and time. For security purposes you are more or less required to use throw away VMs which get destroyed after a single use. The sanest way of doing this is to manage a pool of VM workers, pull one off the pool, run tasks on it, and then destroy it and let the pool spawn a new one. This puts a bottleneck on how many tests you can run both in how large your pool is, and in how fast your VMs boot. Look at a project like https://travis-ci.org/pyca/cryptography which builds 40 different configurations for each “build” each taking about 8 minutes each. So if you have a pool of 5 VMs, you can run 5 tests at a time, so that splits that 40 into 5 chunks, and you get roughly 64 minutes to process 40 builds, plus we’ll say our machines boot in roughly 60s (a little fast for cloud servers, but not unreasonable), so there’s an additional 4-5 minutes just in booting. So roughly an hour and 10 minutes for a single test run if with 5 VMs for just a single project. (In reality Cryptography has more than 40 builds because it uses Travis and jenkins together to handle things travis doesn’t). So the machine cost is high, you’re probably looking at let’s just say a worker pool of 20 (though I think to actually replace Travis CI it’d need to be much higher) of roughly 4GB machines (last I recall travis machines were 3gig and change) which comes out to roughly 1k to 2k in server costs just for that pool. Add into that whatever support machines you would need (queues, web nodes, database servers, etc) you’re probably looking in the 2-4k range just for the servers once all was said and done. I believe the code cost would also be fairly high. There isn’t really an off the shelf solution that is going to work for this. You’ve mentioned Jenkins, however Jenkins does not scale well at all (look at the Openstack infra for all the things they’ve invented around Jenkins to try and scale it). There is a good chance some things could be reused from Openstack but I’m fairly sure some of it is specific enough to Openstack that it’d still require a decent amount of coding to make it generic enough to work. I also think the time investment would be high outside of the effort to build it. I hang out in the #openstack-infra channel and I can see how much effort goes into maintaing that infra (and i’m sure it’s similar to travis-ci). This means that it’s likely not going to be something we can just set it up and forget about it and will require an active infrastructure team to handle it. Now Python happens to have one of those, but we’re mostly volunteers with part times for on call stuff (who also volunteer for other stuff too!) and this would be a significant increase I believe in that work load.
It’s my personal opinion that a sort of “test during development” CI system like Travis is not something that making Python specific is very useful in the long run. Travis is open source and they are incredibly open to things that will make Python a more first class citizen there (disclaimer: I’m friends with them and I’ve helped add the Python support to start with, and then improve it over time). One obvious flaw in this is that Travis only supports github while there are others on Bitbucket or even their own hosting, and for those people this idea might be much more attractive. I do think there is maybe room for a “release testing” system and I definitely think there is room for a build farm (Which is a much smaller scale since the whole of PyPI pushes releases far less often than push commits to branches or make PRs or what have you). Just my 2 cents! ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 18 April 2014 15:07, Donald Stufft <donald@stufft.io> wrote:
And has the added downside of only working with GitHub (alas, the demise of Shining Panda CI rather narrowed the field of non-GitHub reliant CI options). Anyway, building out CPython's continuous integration infrastructure is indeed on my todo list, but that will be a long term project, and focus on CPython first (the new core-workflow list will be the home of any discussions about that, although it ay bleed over into the infrastructure SIG at times). Doing CI well is a genuinely hard problem, and that applies even when you trust the software being tested. Testing untrusted software ups the difficulty considerably. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi Donald! Thanks for replying,
Travis can work well, but the effort involved in maintaining it has, at least for me, been unjustifiably high: in total I've spent close to a man-week in the past 6 months trying various configurations, and fixing things up once a month, before resorting to what I have now. In all, I have probably spent 10x the time maintaining Travis as I have actually writing comprehensive tests. It cannot reasonably be expected that project maintainers should pay similarly, or ideally even need to be aware their package is undergoing testing someplace. My biggest gripe with Travis, though, is that they can and do remove things that break what for me seems trivial functionality. This is simply the nature of a one-size-fits-all service. From what I gather, they previously removed old Python releases from their base image to reduce its size. Windows licensing is hard, but given the scope of either Travis or the PSF, I'd be surprised if there wasn't some person or group at Microsoft to solve it for us, especially considering the new climate under Nadella. I know at least that they offer free Azure time to open source projects, even something like this could be used.
They are curious numbers, perhaps indicative of overcontention on extra-whimpy cloud VMs. On uncontended spinning rust with 2GB RAM my measily Core 2 Duo gets: Clone: 4.7s dev_requirements.txt + test vectors + pip install -e: 24.6s py.test: 2m10s For a package on the opposite end of the spectrum, in terms of test comprehensiveness and size, py-lmdb: Clone: 0.99s pip install -e: 2.646s py.test: 0.996s
It's easy to beat 60 seconds for a farm: resuming a pre-booted 2GB VM from cached snapshot with Qemu/KVM takes between 1.5s (465mb dirty) to 3s (2gb dirty), again on a whimpy Core 2 Duo. Actually it's surprising the numbers are so high, Qemu's IO code seems not the most efficient. Dirty numbers are interesting as tooling can be warmed into the VM page cache prior to snapshot. Since Python has solid side-by-side support for over 10 years, only one per-OS base image need exist with all versions installed and cache-warm prior to test (about 1GB total if you include standard libraries). Just to be clear, this is entirely free in the cost of resuming a snapshot. Further assuming local mirrors of PyPI and source repositories, it's easy to see how a specialized farm can vastly improve efficiency compared to a general solution.
40*2m40s works out to about 1h45m whimpy CPU-hours for a complete cycle, but this assumes results of all 40 workers are needed to gauge project health. In reality there are perhaps 3-6 configurations needing priority for prompt feedback (say CPyMac, CPyLinux, CPyWin, PyPyMac, PyPyLinux, PyPyWin). Ideally py-lmdb has about 32 configurations to test, including Windows. Using earlier numbers that's (32*7.5s) or about 7m whimpy CPU hours, however 45 seconds is sufficient to test the main configurations. It doesn't matter if Python2.5 breaks and it's not obvious for 24 hours, since in this case, releases are only made at most once per month. Assuming the average PyPI package lies somewhere between Cryptography and py-lmdb, and there are about 42k packages, roughly averaging this out gives: (7.5s*32 + 2m40s*40)/(32+40) = 1m32s, or about 1088 whimpy CPU hours to completely run one configuration of each package in PyPI. Now instead of whimpy CPUs we have an 8-core Xeon, pessimistically assuming the Xeon is only 20% faster, that gives 108.88 beefy 8-core-CPU hours to rebuild PyPI once. Assuming in a day 3% of packages are updated (which seems high), that's 3h15m 8-core-CPU hours to test each updated package from a fresh checkout in a fresh VM against a single Python version, which as we've seen, is the worst case behaviour since side-by-side versions are possible. In summary, one 8-core machine might suffice (allowing for napkin math) to retest 3% of PyPI in one config at least 7 times a day, or assuming each package has 6 primary configurations, one "all important configs" cycle once per day.
If one build per 4 hours sufficed for most projects, $1k/month seems like a good cap: a VM with comparable specs to the above scenario, GCE's n1-standard-8, costs around $275/month to run 24/7, assuming the project couldn't find a sponsor of multiple machines, which I suspect would be quite easy. The above estimates are a little optimistic: in addition to 2-4GB guest RAM per core, the host would need at least 8-32GB more to keep hot parts of the base image filesystems cached to achieve the time estimates. However, I've never seen any Python extension needing 4GB to build. Regarding supplementary services, a farm produces logs and perhaps assets for which a static file bucket suffices, and consumes jobs from a queue, which wouldn't require more beef than an SQLite database, and I'd be impressed if 40 jobs * 45k packages would fill 2GB.
It's easy to overdesign and overspec these things, but the code and infrastructure involved is fairly minimal, especially to produce something basic that just ensures, e.g. recently published or manually triggered-via-webhook packages get queued and tested. The most complex aspect is likely maintaining reproducable base images, versioning them and preventing breakage for users during updates, but that is almost a planning problem rather than an implementation problem. On saying this, though, I can imagine a useful versioning policy as simple as two paragraphs, and an image/snapshot builder as simple as two shell scripts. Integrating complex "best practices" off-the-shelf components is an example of where simple projects often explode.. Qemu is literally a self contained command-line tool, no more complex to manage the execution of than, say, wget. With a suitably prepped snapshot, all required is to run qemu, allow it 10 mins to complete, and read job results via a redirected port.
That's very true, though looking at the lack of straightforward Python solution at the ecosystem level, it also seems quite a small cost.
It's not clear what of scope would work best for a Python-specific system. For my needs, I'm only interested in removing the effort required to ensure my packages get good, reasonably timely coverage, and not worry about things like Windows licenses, or my build exploding monthly due to base image changes sacrificing Python-related functionality for something else. The size and complexity of the service could creep up massively, especially if attempting to compete with e.g. Travis' 5-minutes-or-less turnaround time, but at least for me, those fast turnaround times aren't particularly useful, I just need something low effort that works. Regarding the value of a Python-specific system, almost identical infrastructure could be used for your Wheel farm, or running any kind of "ecosystem lint", like detecting setup.py /etc/passwd writes and suchlike. Granted though, these could be also be solved independently. Finally it's not obvious to me that this is absolutely a good idea, however I'm willing to argue for it for as long as I'm procrastinating on setting up my own Jenkins install :-P David

On 19 Apr 2014 18:52, "David Wilson" <dw+python-ideas@hmmz.org> wrote:
Finally it's not obvious to me that this is absolutely a good idea,
however I'm willing to argue for it for as long as I'm procrastinating on setting up my own Jenkins install :-P I actually think it's a good idea, I'd just like to approach the idea indirectly by building out CPython's test infrastructure further first, before we try to boil the ocean that is PyPI :) Cheers, Nick.

On Apr 19, 2014, at 7:56 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
As you can probably guess from my posts, I don’t think trying to replace Travis-CI is a good idea. What they do is *a lot* of work and maintenance and takes a lot of capacity, even if we restrict it to just Python projects. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Apr 19, 2014, at 6:51 PM, David Wilson <dw+python-ideas@hmmz.org> wrote:
I’ve not had anywhere near that type of experience trying to get anything setup on Travis. The one exception had been when I’ve attempted to use their OSX builders, however a big portion of that is because their travis builders do not have Python configured by default. That’s not to say you didn’t have that behavior, but generally the worst I’ve had to do is install a python other than what they’ve provided, which has been trivially easy to do as well (See: https://github.com/pypa/pip/blob/develop/.travis.yml#L18-L23). Looking over the history of the pip .travis.yml the only times it’s really changes were to add versions or because we were trying out some new thing (Like using pytest-xdist). For the record They’ve never supported 2.4, and I recommended they drop 2.5 and 3.1 because the incoming traffic on PyPI don’t really justify keeping it in the base image because a larger base image takes a longer period of time to boot which slows down the entire build queue. For projects that want to keep it, it’s trivial to add back in a way similar to how pip has added 3.4 support.
Windows licensing is hard, but given the scope of either Travis or the PSF, I'd be surprised if there wasn't some person or group at Microsoft to solve it for us, especially considering the new climate under Nadella. I know at least that they offer free Azure time to open source projects, even something like this could be used.
Sure, Licensing is a solvable problem, but it’s still a problem :) There are other problems too of course, such as handling fundamental differences in how the two platforms interact (Easy to SSH into a *nix box, not so much a Windows box).
I believe that Travis is using OpenVM VMs which may or may not be on the greatest hardware. A different VM provider would probably give better performance.
All of those things require even more ongoing support since we’d have to support the host machines as well and not just reuse some providers VM/cloud images.
Further assuming local mirrors of PyPI and source repositories, it's easy to see how a specialized farm can vastly improve efficiency compared to a general solution.
As far as I’m aware Travis has been willing to setup a PyPI mirror, it’s mostly been nobody has helped them set one up :)
Going back to cryptography, those 40 test runs are *just* the ones that can run on Travis. There is also an additional 54 runs for Windows, OSX, FreeBSD, and other Linux installs. The cryptography project runs roughly 15 of those builds almost every day, and goes upwards of 30 builds a day on other days. So that’s 1410-2820 total runs, even with a 2m40s run you’re looking at that single project taking 60-125hours of CPU build time a day. Now this project is a fairly intensive one, but it’s useful to look at worst case uses in order to determine scaling.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Donald Stufft, 20.04.2014 03:34:
I'm not an overly happy Travis user myself, but I have to concur that the maintenance hasn't been a burden at all for me, in neither of three Python projects that I let them test. I'm also not sure a Python specific testing service would end up being successful. It's a lot of work to set up (snakebite?), and then to maintain and keep running, keep it safe and reliable, etc. Anyone who considers giving it a try should assume it to become a full time job for at least a couple of months, and only then take the decision. Stefan

[ munch - great conversation! ] On Sat, Apr 19, 2014 at 11:51:10PM +0100, David Wilson wrote:
We've been using Jenkins connected to github on our khmer project. So far it's been rather fragile so Michael Crusoe (developer guy) is thinking about things like Travis. hmm. :)
I am happy to broker introductions at MS if someone is committed to doing the work :) Note also that Mozilla and Numpy folk might have useful thoughts in this direction -- Mozilla does some insane number of builds per release (due to all the cell phones they need to support!) and the various Numpy folk mentioned to me that they were talking about a build farm.
I and others have tried various angles on this in the past and indeed have fallen down in precisely this way... maintenance is just a bear. cheers, --titus -- C. Titus Brown, ctb@msu.edu

On 20 Apr 2014 00:02, "C. Titus Brown" <ctb@msu.edu> wrote:
Steve Dower (from MS) and the other PTVS folks have been doing a fair bit of work around helping out with CPython, and has been looking at ways to simplify the CPython installer build process. If you have additional points of contact within MS, that could be a good thing :)
Yeah, this is part of why a key goal for me this year is getting some *paid* developer time going into the CPython workflow tools, as well as more exploration of cases where PaaS usage may lower maintenance burdens for some of our hosted services. If we can get both the CPython workflow tools and PyPI into a good state, *then* I think it may make sense to look at more ambitious things. Cheers, Nick.

On Apr 18, 2014, at 2:04 PM, David Wilson <dw+python-ideas@hmmz.org> wrote:
Travis comes with 2.6, 2.7, 3.2, and 3.3 as well as PyPy 2.2.1 out of the box and I’ve got a PR merged there which is waiting on a deploy which will expand that out to also include 3.4 (as well as making a command line available that will build 2.4, 2.5, Jython, and others (Basically the latest versions of the above are installed, and you can use ``python-build`` to build anything located here https://github.com/yyuu/pyenv/tree/master/plugins/python-build/share/python-...). And honestly with the new PR I’ve laid out in Travis adding any older versions to be supported out of the box is simple and the biggest barrier is likely convincing the Travis folks they are needed (For example, if you’re distributing on PyPI the number of 2.5 or 3.1 users are *tiny* https://caremad.io/blog/a-look-at-pypi-downloads/).
The result just about works, but this week I'm again finding that it doesn't quite go far enough, as I begin working on a Windows binary distribution for my package.
Windows is almost the same problem all over again - testing supported/sensible extension module builds requires 14 binary installations of Python (one for each of 32 bit and 64 bit), 3 distinct versions of Microsoft Visual Studio (that I'm aware of so far), and a compatible version of Windows to run it all on.
As far as I’m aware the Travis-CI folks are working on Windows support but there is currently no ETA on it as they have to figure out licensing concerns and the like.
So the work done to support CI on Travis more or less isn't enough. It seems the best option now is a custom Jenkins setup, and acquiring licenses for all the Microsoft software necessary to build the package.
One thing to look out for here, is that you can’t test arbitrary PRs easily with this solution without whitelisting contributors because Travis runs a PR test in an isolated VM instance, whereas Jenkins runs it on a shared machine, and this shared machine through the jenkins API has more or less full control of the entire jenkins cluster.
I’ve thought about working on something like this for PyPI, one of the big barriers there being that there is no central way to define what versions you support, what OSs, what required _external_ dependencies you have, or even what your test runner is (besides setup.py test which is far from standard or ubiquitous).
It could even make sense to have the hypothetical service produce signed, "golden" artefacts for upload to PyPI - binaries built from source in a trusted environment, free of any Bitcoin ransom-ware surreptitiously inserting itself into the bdist_egg ZIP file.
A build farm for Wheels is something else that I’d love to do.
There are many problems with providing a CI service, for example, the handling of third party commercial libraries required during the build process. An example of this might be producing a build of cx_Oracle, where Oracle's proprietary SDK is a prerequisite.
Still, it would not require many resources, or even very much code, to provide a service that could minimally cover at least basic needs for purely open source software, and so the idea appeals to me.
I think you’re vastly underestimating the complexity in machine resources, code, and time. For security purposes you are more or less required to use throw away VMs which get destroyed after a single use. The sanest way of doing this is to manage a pool of VM workers, pull one off the pool, run tasks on it, and then destroy it and let the pool spawn a new one. This puts a bottleneck on how many tests you can run both in how large your pool is, and in how fast your VMs boot. Look at a project like https://travis-ci.org/pyca/cryptography which builds 40 different configurations for each “build” each taking about 8 minutes each. So if you have a pool of 5 VMs, you can run 5 tests at a time, so that splits that 40 into 5 chunks, and you get roughly 64 minutes to process 40 builds, plus we’ll say our machines boot in roughly 60s (a little fast for cloud servers, but not unreasonable), so there’s an additional 4-5 minutes just in booting. So roughly an hour and 10 minutes for a single test run if with 5 VMs for just a single project. (In reality Cryptography has more than 40 builds because it uses Travis and jenkins together to handle things travis doesn’t). So the machine cost is high, you’re probably looking at let’s just say a worker pool of 20 (though I think to actually replace Travis CI it’d need to be much higher) of roughly 4GB machines (last I recall travis machines were 3gig and change) which comes out to roughly 1k to 2k in server costs just for that pool. Add into that whatever support machines you would need (queues, web nodes, database servers, etc) you’re probably looking in the 2-4k range just for the servers once all was said and done. I believe the code cost would also be fairly high. There isn’t really an off the shelf solution that is going to work for this. You’ve mentioned Jenkins, however Jenkins does not scale well at all (look at the Openstack infra for all the things they’ve invented around Jenkins to try and scale it). There is a good chance some things could be reused from Openstack but I’m fairly sure some of it is specific enough to Openstack that it’d still require a decent amount of coding to make it generic enough to work. I also think the time investment would be high outside of the effort to build it. I hang out in the #openstack-infra channel and I can see how much effort goes into maintaing that infra (and i’m sure it’s similar to travis-ci). This means that it’s likely not going to be something we can just set it up and forget about it and will require an active infrastructure team to handle it. Now Python happens to have one of those, but we’re mostly volunteers with part times for on call stuff (who also volunteer for other stuff too!) and this would be a significant increase I believe in that work load.
It’s my personal opinion that a sort of “test during development” CI system like Travis is not something that making Python specific is very useful in the long run. Travis is open source and they are incredibly open to things that will make Python a more first class citizen there (disclaimer: I’m friends with them and I’ve helped add the Python support to start with, and then improve it over time). One obvious flaw in this is that Travis only supports github while there are others on Bitbucket or even their own hosting, and for those people this idea might be much more attractive. I do think there is maybe room for a “release testing” system and I definitely think there is room for a build farm (Which is a much smaller scale since the whole of PyPI pushes releases far less often than push commits to branches or make PRs or what have you). Just my 2 cents! ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 18 April 2014 15:07, Donald Stufft <donald@stufft.io> wrote:
And has the added downside of only working with GitHub (alas, the demise of Shining Panda CI rather narrowed the field of non-GitHub reliant CI options). Anyway, building out CPython's continuous integration infrastructure is indeed on my todo list, but that will be a long term project, and focus on CPython first (the new core-workflow list will be the home of any discussions about that, although it ay bleed over into the infrastructure SIG at times). Doing CI well is a genuinely hard problem, and that applies even when you trust the software being tested. Testing untrusted software ups the difficulty considerably. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi Donald! Thanks for replying,
Travis can work well, but the effort involved in maintaining it has, at least for me, been unjustifiably high: in total I've spent close to a man-week in the past 6 months trying various configurations, and fixing things up once a month, before resorting to what I have now. In all, I have probably spent 10x the time maintaining Travis as I have actually writing comprehensive tests. It cannot reasonably be expected that project maintainers should pay similarly, or ideally even need to be aware their package is undergoing testing someplace. My biggest gripe with Travis, though, is that they can and do remove things that break what for me seems trivial functionality. This is simply the nature of a one-size-fits-all service. From what I gather, they previously removed old Python releases from their base image to reduce its size. Windows licensing is hard, but given the scope of either Travis or the PSF, I'd be surprised if there wasn't some person or group at Microsoft to solve it for us, especially considering the new climate under Nadella. I know at least that they offer free Azure time to open source projects, even something like this could be used.
They are curious numbers, perhaps indicative of overcontention on extra-whimpy cloud VMs. On uncontended spinning rust with 2GB RAM my measily Core 2 Duo gets: Clone: 4.7s dev_requirements.txt + test vectors + pip install -e: 24.6s py.test: 2m10s For a package on the opposite end of the spectrum, in terms of test comprehensiveness and size, py-lmdb: Clone: 0.99s pip install -e: 2.646s py.test: 0.996s
It's easy to beat 60 seconds for a farm: resuming a pre-booted 2GB VM from cached snapshot with Qemu/KVM takes between 1.5s (465mb dirty) to 3s (2gb dirty), again on a whimpy Core 2 Duo. Actually it's surprising the numbers are so high, Qemu's IO code seems not the most efficient. Dirty numbers are interesting as tooling can be warmed into the VM page cache prior to snapshot. Since Python has solid side-by-side support for over 10 years, only one per-OS base image need exist with all versions installed and cache-warm prior to test (about 1GB total if you include standard libraries). Just to be clear, this is entirely free in the cost of resuming a snapshot. Further assuming local mirrors of PyPI and source repositories, it's easy to see how a specialized farm can vastly improve efficiency compared to a general solution.
40*2m40s works out to about 1h45m whimpy CPU-hours for a complete cycle, but this assumes results of all 40 workers are needed to gauge project health. In reality there are perhaps 3-6 configurations needing priority for prompt feedback (say CPyMac, CPyLinux, CPyWin, PyPyMac, PyPyLinux, PyPyWin). Ideally py-lmdb has about 32 configurations to test, including Windows. Using earlier numbers that's (32*7.5s) or about 7m whimpy CPU hours, however 45 seconds is sufficient to test the main configurations. It doesn't matter if Python2.5 breaks and it's not obvious for 24 hours, since in this case, releases are only made at most once per month. Assuming the average PyPI package lies somewhere between Cryptography and py-lmdb, and there are about 42k packages, roughly averaging this out gives: (7.5s*32 + 2m40s*40)/(32+40) = 1m32s, or about 1088 whimpy CPU hours to completely run one configuration of each package in PyPI. Now instead of whimpy CPUs we have an 8-core Xeon, pessimistically assuming the Xeon is only 20% faster, that gives 108.88 beefy 8-core-CPU hours to rebuild PyPI once. Assuming in a day 3% of packages are updated (which seems high), that's 3h15m 8-core-CPU hours to test each updated package from a fresh checkout in a fresh VM against a single Python version, which as we've seen, is the worst case behaviour since side-by-side versions are possible. In summary, one 8-core machine might suffice (allowing for napkin math) to retest 3% of PyPI in one config at least 7 times a day, or assuming each package has 6 primary configurations, one "all important configs" cycle once per day.
If one build per 4 hours sufficed for most projects, $1k/month seems like a good cap: a VM with comparable specs to the above scenario, GCE's n1-standard-8, costs around $275/month to run 24/7, assuming the project couldn't find a sponsor of multiple machines, which I suspect would be quite easy. The above estimates are a little optimistic: in addition to 2-4GB guest RAM per core, the host would need at least 8-32GB more to keep hot parts of the base image filesystems cached to achieve the time estimates. However, I've never seen any Python extension needing 4GB to build. Regarding supplementary services, a farm produces logs and perhaps assets for which a static file bucket suffices, and consumes jobs from a queue, which wouldn't require more beef than an SQLite database, and I'd be impressed if 40 jobs * 45k packages would fill 2GB.
It's easy to overdesign and overspec these things, but the code and infrastructure involved is fairly minimal, especially to produce something basic that just ensures, e.g. recently published or manually triggered-via-webhook packages get queued and tested. The most complex aspect is likely maintaining reproducable base images, versioning them and preventing breakage for users during updates, but that is almost a planning problem rather than an implementation problem. On saying this, though, I can imagine a useful versioning policy as simple as two paragraphs, and an image/snapshot builder as simple as two shell scripts. Integrating complex "best practices" off-the-shelf components is an example of where simple projects often explode.. Qemu is literally a self contained command-line tool, no more complex to manage the execution of than, say, wget. With a suitably prepped snapshot, all required is to run qemu, allow it 10 mins to complete, and read job results via a redirected port.
That's very true, though looking at the lack of straightforward Python solution at the ecosystem level, it also seems quite a small cost.
It's not clear what of scope would work best for a Python-specific system. For my needs, I'm only interested in removing the effort required to ensure my packages get good, reasonably timely coverage, and not worry about things like Windows licenses, or my build exploding monthly due to base image changes sacrificing Python-related functionality for something else. The size and complexity of the service could creep up massively, especially if attempting to compete with e.g. Travis' 5-minutes-or-less turnaround time, but at least for me, those fast turnaround times aren't particularly useful, I just need something low effort that works. Regarding the value of a Python-specific system, almost identical infrastructure could be used for your Wheel farm, or running any kind of "ecosystem lint", like detecting setup.py /etc/passwd writes and suchlike. Granted though, these could be also be solved independently. Finally it's not obvious to me that this is absolutely a good idea, however I'm willing to argue for it for as long as I'm procrastinating on setting up my own Jenkins install :-P David

On 19 Apr 2014 18:52, "David Wilson" <dw+python-ideas@hmmz.org> wrote:
Finally it's not obvious to me that this is absolutely a good idea,
however I'm willing to argue for it for as long as I'm procrastinating on setting up my own Jenkins install :-P I actually think it's a good idea, I'd just like to approach the idea indirectly by building out CPython's test infrastructure further first, before we try to boil the ocean that is PyPI :) Cheers, Nick.

On Apr 19, 2014, at 7:56 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
As you can probably guess from my posts, I don’t think trying to replace Travis-CI is a good idea. What they do is *a lot* of work and maintenance and takes a lot of capacity, even if we restrict it to just Python projects. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Apr 19, 2014, at 6:51 PM, David Wilson <dw+python-ideas@hmmz.org> wrote:
I’ve not had anywhere near that type of experience trying to get anything setup on Travis. The one exception had been when I’ve attempted to use their OSX builders, however a big portion of that is because their travis builders do not have Python configured by default. That’s not to say you didn’t have that behavior, but generally the worst I’ve had to do is install a python other than what they’ve provided, which has been trivially easy to do as well (See: https://github.com/pypa/pip/blob/develop/.travis.yml#L18-L23). Looking over the history of the pip .travis.yml the only times it’s really changes were to add versions or because we were trying out some new thing (Like using pytest-xdist). For the record They’ve never supported 2.4, and I recommended they drop 2.5 and 3.1 because the incoming traffic on PyPI don’t really justify keeping it in the base image because a larger base image takes a longer period of time to boot which slows down the entire build queue. For projects that want to keep it, it’s trivial to add back in a way similar to how pip has added 3.4 support.
Windows licensing is hard, but given the scope of either Travis or the PSF, I'd be surprised if there wasn't some person or group at Microsoft to solve it for us, especially considering the new climate under Nadella. I know at least that they offer free Azure time to open source projects, even something like this could be used.
Sure, Licensing is a solvable problem, but it’s still a problem :) There are other problems too of course, such as handling fundamental differences in how the two platforms interact (Easy to SSH into a *nix box, not so much a Windows box).
I believe that Travis is using OpenVM VMs which may or may not be on the greatest hardware. A different VM provider would probably give better performance.
All of those things require even more ongoing support since we’d have to support the host machines as well and not just reuse some providers VM/cloud images.
Further assuming local mirrors of PyPI and source repositories, it's easy to see how a specialized farm can vastly improve efficiency compared to a general solution.
As far as I’m aware Travis has been willing to setup a PyPI mirror, it’s mostly been nobody has helped them set one up :)
Going back to cryptography, those 40 test runs are *just* the ones that can run on Travis. There is also an additional 54 runs for Windows, OSX, FreeBSD, and other Linux installs. The cryptography project runs roughly 15 of those builds almost every day, and goes upwards of 30 builds a day on other days. So that’s 1410-2820 total runs, even with a 2m40s run you’re looking at that single project taking 60-125hours of CPU build time a day. Now this project is a fairly intensive one, but it’s useful to look at worst case uses in order to determine scaling.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Donald Stufft, 20.04.2014 03:34:
I'm not an overly happy Travis user myself, but I have to concur that the maintenance hasn't been a burden at all for me, in neither of three Python projects that I let them test. I'm also not sure a Python specific testing service would end up being successful. It's a lot of work to set up (snakebite?), and then to maintain and keep running, keep it safe and reliable, etc. Anyone who considers giving it a try should assume it to become a full time job for at least a couple of months, and only then take the decision. Stefan

[ munch - great conversation! ] On Sat, Apr 19, 2014 at 11:51:10PM +0100, David Wilson wrote:
We've been using Jenkins connected to github on our khmer project. So far it's been rather fragile so Michael Crusoe (developer guy) is thinking about things like Travis. hmm. :)
I am happy to broker introductions at MS if someone is committed to doing the work :) Note also that Mozilla and Numpy folk might have useful thoughts in this direction -- Mozilla does some insane number of builds per release (due to all the cell phones they need to support!) and the various Numpy folk mentioned to me that they were talking about a build farm.
I and others have tried various angles on this in the past and indeed have fallen down in precisely this way... maintenance is just a bear. cheers, --titus -- C. Titus Brown, ctb@msu.edu

On 20 Apr 2014 00:02, "C. Titus Brown" <ctb@msu.edu> wrote:
Steve Dower (from MS) and the other PTVS folks have been doing a fair bit of work around helping out with CPython, and has been looking at ways to simplify the CPython installer build process. If you have additional points of contact within MS, that could be a good thing :)
Yeah, this is part of why a key goal for me this year is getting some *paid* developer time going into the CPython workflow tools, as well as more exploration of cases where PaaS usage may lower maintenance burdens for some of our hosted services. If we can get both the CPython workflow tools and PyPI into a good state, *then* I think it may make sense to look at more ambitious things. Cheers, Nick.
participants (5)
-
C. Titus Brown
-
David Wilson
-
Donald Stufft
-
Nick Coghlan
-
Stefan Behnel