Docker, development, buildout, virtualenv, local/global install
Hi, Buzzword bingo in the subject... Situation: I'm experimenting with docker, mostly in combination with buildout. But it also applies to pip/virtualenv. I build a docker container with a Dockerfile: install some .deb packages, add the current directory as /code/, run buildout (or pip), ready. Works fine. Now local development: it is normal to mount the current directory as /code/, so that now is overlayed over the originally-added-to-the-docker /code/. This means that anything done inside /code/ is effectively discarded in development. So a "bin/buildout" run has to be done again, because the bin/, parts/, eggs/ etc directories are gone. Same problem with a virtualenv. *Not* though when you run pip directly and let it install packages globally! Those are installed outside of /code in /usr/local/somewhere. A comment and a question: - Comment: "everybody" uses virtualenv, but be aware that it is apparently normal *not* to use virtualenv when building dockers. - Question: buildout, like virtualenv+pip, installs everything in the current directory. Would an option to install it globally instead make sense? I don't know if it is possible. Reinout -- Reinout van Rees http://reinout.vanrees.org/ reinout@vanrees.org http://www.nelen-schuurmans.nl/ "Learning history by destroying artifacts is a time-honored atrocity"
On Wed, Jun 15, 2016 at 12:07 PM, Reinout van Rees
Now local development: it is normal to mount the current directory as /code/, so that now is overlayed over the originally-added-to-the-docker /code/.
This means that anything done inside /code/ is effectively discarded in development. So a "bin/buildout" run has to be done again, because the bin/, parts/, eggs/ etc directories are gone.
Think of it like this: if you don't mount ./ as /code in the container then you practically have to rebuild the image all the time. This is fine for toy projects, but bigger ones have lots of files (eg: templates, images, css, tons of js 1-line modules etc). That means you get a big context that is slow to send (exacerbated in situations that have remote docker daemons like osx/windows docker clients). Another compound problem: if you rebuild the image for every code change the new context will invalidate the docker image cache - everything will be slow, all the time. Ideally you'd have a stack of images, like a base image that takes big context (with code/assets) + smaller images that build from that but have very small contexts. I wrote some ideas about that here https://blog.ionelmc.ro/2016/05/07/dealing-with-docker/#optimizing-the-build... if you have patience (it's a bit too long read). Basically what I'm saying is that you got no choice but to mount stuff in a development scenario :-) Regarding the buildout problem, I suspect a src-layout can solve the problem: - you only mount ./src inside container - you can happily run buildout inside the container (assuming it don't touch any of the code in src) I suspect you don't need to run buildout for every code change so it's fine if you stick that into the image building. Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro
Op 15-06-16 om 11:31 schreef Ionel Cristian Mărieș via Distutils-SIG:
... I wrote some ideas about that here https://blog.ionelmc.ro/2016/05/07/dealing-with-docker/#optimizing-the-build... if you have patience (it's a bit too long read).
I definitively have the patience for such a long read. Reading it definitively also made sure I won't ever see Docker as a potential magic bullet :-)
Basically what I'm saying is that you got no choice but to mount stuff in a development scenario :-)
Regarding the buildout problem, I suspect a src-layout can solve the problem:
* you only mount ./src inside container * you can happily run buildout inside the container (assuming it don't touch any of the code in src)
src/* for the mr.developer checkouts and my_package_dir/ for the actual code. And take care when you're downloading/developing javascript stuff, too, as that might also be in another directory. Ok... I'll have to watch out. Good to have a potential working solution, though. Reinout -- Reinout van Rees http://reinout.vanrees.org/ reinout@vanrees.org http://www.nelen-schuurmans.nl/ "Learning history by destroying artifacts is a time-honored atrocity"
On Wed, Jun 15, 2016 at 5:07 AM, Reinout van Rees
Hi,
Buzzword bingo in the subject...
Situation: I'm experimenting with docker, mostly in combination with buildout. But it also applies to pip/virtualenv.
I build a docker container with a Dockerfile: install some .deb packages, add the current directory as /code/, run buildout (or pip), ready. Works fine.
Now local development: it is normal to mount the current directory as /code/, so that now is overlayed over the originally-added-to-the-docker /code/.
This means that anything done inside /code/ is effectively discarded in development. So a "bin/buildout" run has to be done again, because the bin/, parts/, eggs/ etc directories are gone.
Same problem with a virtualenv. *Not* though when you run pip directly and let it install packages globally! Those are installed outside of /code in /usr/local/somewhere.
A comment and a question:
- Comment: "everybody" uses virtualenv, but be aware that it is apparently normal *not* to use virtualenv when building dockers.
- Question: buildout, like virtualenv+pip, installs everything in the current directory. Would an option to install it globally instead make sense? I don't know if it is possible.
So, a number of comments. I'm not going to address your points directly. 1. Creating production docker images works best when all you're doing is installing a bunch of binary artifacts (e.g. .debs, eggs, wheels). If you actually build programs as part of image building, then your image contains build tools, leading to image bloat and potentially security problems as the development tools provide a greater attack surface. You can uninstall the development tools after building, but that does nothing to reduce the bloat -- it just increases it, and increases buid time. 2. #1 is a shame, as it dilutes the simplicity of using docker. :( 3. IMO, you're better off separating development and image-building workflows, with development feeding image building. Unfortunately, AFAIK, there aren't standard docker workflows for this, although I haven't been paying close attention to this lately. There are standard docker images for building system packages (debs, rpms), but, for myself, part of the benefit of docker is not having to build system packages anymore. (Building system packages, especially applications, for use in docker could be a lot easier than normal as you could be a lot sloppier about the ways that packages are built, thus simpler packaging files...) 4. I've tried to come up with good buildout-based workflows for building docker images in the past. One approach was to build in a dev environment and then, when building the production images, using eggs built in development, or built on a CI server. At the time my attempts were stymied by the fact that setuptools prefers source distributions over eggs so if it saw the download cache, it would try to build from source. Looking back, I'm not sure why this wasn't surmountable -- probably just lack of priority and attention at the time. 5. As far as setting up a build environment goes, just mounting a host volume is simple and works well and even works with docker-machine. It's a little icky to have linux build artifacts in my Mac directories, but it's a minor annoyance. (I've also created images with ssh servers that I could log into and do development in. Emacs makes this easy for me and provides all the development capabilities I normally use, so this setup works well for me, but probably doesn't work for non-emacs users. :) ) Jim -- Jim Fulton http://jimfulton.info
On Jun 15, 2016, at 7:53 AM, Jim Fulton
wrote: If you actually build programs as part of image building, then your image contains build tools, leading to image bloat and potentially security problems as the development tools provide a greater attack surface.
This isn’t strictly true, the layering in Docker works on a per RUN command basis, so if you compose a single command that installs the build tools, builds the thing, installs the thing, and uninstalls the build tools (and cleans up any cache), then that’s roughly equivalent to installing a single binary (except of course, in the time it takes). — Donald Stufft
On Wed, Jun 15, 2016 at 7:57 AM, Donald Stufft
On Jun 15, 2016, at 7:53 AM, Jim Fulton
wrote: If you actually build programs as part of image building, then your image contains build tools, leading to image bloat and potentially security problems as the development tools provide a greater attack surface.
This isn’t strictly true, the layering in Docker works on a per RUN command basis, so if you compose a single command that installs the build tools, builds the thing, installs the thing, and uninstalls the build tools (and cleans up any cache), then that’s roughly equivalent to installing a single binary (except of course, in the time it takes).
OK, fair enough. People would typically start from an image that had the build tools installed already. But as you point out, you could have a single step that installed the build tools, built and then uninstalled the build tools. You'd avoid the bloat, but have extremely long build times. Jim -- Jim Fulton http://jimfulton.info
Op 15-06-16 om 13:53 schreef Jim Fulton:
1. Creating production docker images works best when all you're doing is installing a bunch of binary artifacts (e.g. .debs, eggs, wheels).
That's where pip and its "everything is in requirements.txt anyway" has an advantage. The buildouts I use always have most/all dependencies mentioned in the setup.py with version pins in the buildout config. So I need the setup.py and so I need my full source code as the setup.py won't work otherwise. I *do* like to specify all "install_requires" items in my setup.py, so this is something I need to think about.
3. IMO, you're better off separating development and image-building workflows, with development feeding image building. Unfortunately, AFAIK, there aren't standard docker workflows for this, although I haven't been paying close attention to this lately.
docker-compose helps a little bit here in that you can "compose" your docker into several scenarios. It *does* mean you have to watch carefully what you're doing, as one of the goals of docker, as far as I see it, is to have a large measure of "production/development parity" as per the "12 factor app" manifesto (http://12factor.net/)
5. As far as setting up a build environment goes, just mounting a host volume is simple and works well and even works with docker-machine. It's a little icky to have linux build artifacts in my Mac directories, but it's a minor annoyance. (I've also created images with ssh servers that I could log into and do development in. Emacs makes this easy for me and provides all the development capabilities I normally use, so this setup works well for me, but probably doesn't work for non-emacs users. :) )
I, like any deranged (and effective) human being, have memorized lots of emacs shortcuts and will use it till the day I die. Sadly I have colleagues that are a little bit more sane :-) Reinout -- Reinout van Rees http://reinout.vanrees.org/ reinout@vanrees.org http://www.nelen-schuurmans.nl/ "Learning history by destroying artifacts is a time-honored atrocity"
On Wed, Jun 15, 2016 at 5:39 PM, Reinout van Rees
Op 15-06-16 om 13:53 schreef Jim Fulton:
1. Creating production docker images works best when all you're doing is installing a bunch of binary artifacts (e.g. .debs, eggs, wheels).
That's where pip and its "everything is in requirements.txt anyway" has an advantage. The buildouts I use always have most/all dependencies mentioned in the setup.py with version pins in the buildout config. So I need the setup.py and so I need my full source code as the setup.py won't work otherwise.
No. If you had a built version of your project, the requirement information in the setup.py would still be available and used by buildout. How you specify requirements is beside the point. Whether you use buildout or pip, you need built versions of your non-pure-oython dependencies available and used.
I *do* like to specify all "install_requires" items in my setup.py, so this is something I need to think about.
I don't see how this is a problem. If your project is pure python, you don't need build tools to build it, so building it as part of building a docker image isn't a problem. If it isn't pure Python, then you'll need to have a built version available, and the meta-data in that built version will provide the necessary requirements information. I'm not trying here to plug buildout vs pip. I'm just saying that the pip vs builtout (or both) decision is irrelevant to building a docker image or docker workflow.
3. IMO, you're better off separating development and image-building workflows, with development feeding image building. Unfortunately, AFAIK, there aren't standard docker workflows for this, although I haven't been paying close attention to this lately.
docker-compose helps a little bit here in that you can "compose" your docker into several scenarios. It *does* mean you have to watch carefully what you're doing, as one of the goals of docker, as far as I see it, is to have a large measure of "production/development parity" as per the "12 factor app" manifesto (http://12factor.net/)
I think this only helps to the extent that it encourages you to decompose applications into lots of separately deployed bits. I'm a fan of docker, but it seems to me that build workflow is a an unsolved problem if you need build tools that you don't want be included at run time. Jim -- Jim Fulton http://jimfulton.info
On 16 June 2016 at 05:01, Jim Fulton
I'm a fan of docker, but it seems to me that build workflow is a an unsolved problem if you need build tools that you don't want be included at run time.
For OpenShift's source-to-image (which combines builder images with a git repo to create your deployment image [1]), they tackled that problem by setting up a multi-step build pipeline where the first builder image created the binary artifact (they use a WAR file in their example, but it can be an arbitrary tarball since you're not doing anything with it except handing it to the next stage of the pipeline), and then a second builder image that provides the required runtime environment and also unpacks the tarball into the right place. For pure Python (Ruby, JavaScript, etc) projects you'll often be able to get away without the separate builder image, but I suspect that kind of multi-step model would be a better fit for the way buildout works. Cheers, Nick. [1] https://github.com/openshift/source-to-image -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Thu, Jun 16, 2016 at 8:47 PM, Nick Coghlan
On 16 June 2016 at 05:01, Jim Fulton
wrote: I'm a fan of docker, but it seems to me that build workflow is a an unsolved problem if you need build tools that you don't want be included at run time.
For OpenShift's source-to-image (which combines builder images with a git repo to create your deployment image [1]), they tackled that problem by setting up a multi-step build pipeline where the first builder image created the binary artifact (they use a WAR file in their example, but it can be an arbitrary tarball since you're not doing anything with it except handing it to the next stage of the pipeline), and then a second builder image that provides the required runtime environment and also unpacks the tarball into the right place.
For pure Python (Ruby, JavaScript, etc) projects you'll often be able to get away without the separate builder image, but I suspect that kind of multi-step model would be a better fit for the way buildout works.
Or pip. I don't think this is really about buildout or pip. I agree that a multi-step process is needed if builds are required. I suggested that in my original response. When I said this was an unsolved problem, I didn't mean that there weren't solutions, but that there didn't seem to be widely accepted workflows for this within the docker community (or at least docs). The common assumption is that images are defined by a single build file, to the point that images are often "published" as git repositories with build files, rather than through a docker repository. <shrug> Anyway, getting back to Reinout's original question, I don't think docker build should be used for development. Build should assemble (again, using buildout, pip, or whatever) existing resources without actually building anything. (Of course, docker can be used for development, and docker build could and probably should be used to create build environments.) Jim -- Jim Fulton http://jimfulton.info
For a lot of good general information on these subjects, I recommend
Glyph's talk at pycon this year:
https://www.youtube.com/watch?v=5BqAeN-F9Qs
One point that's discussed is why you definitely should use virtualenv
inside your containers :-)
-n
On Jun 15, 2016 2:07 AM, "Reinout van Rees"
Hi,
Buzzword bingo in the subject...
Situation: I'm experimenting with docker, mostly in combination with buildout. But it also applies to pip/virtualenv.
I build a docker container with a Dockerfile: install some .deb packages, add the current directory as /code/, run buildout (or pip), ready. Works fine.
Now local development: it is normal to mount the current directory as /code/, so that now is overlayed over the originally-added-to-the-docker /code/.
This means that anything done inside /code/ is effectively discarded in development. So a "bin/buildout" run has to be done again, because the bin/, parts/, eggs/ etc directories are gone.
Same problem with a virtualenv. *Not* though when you run pip directly and let it install packages globally! Those are installed outside of /code in /usr/local/somewhere.
A comment and a question:
- Comment: "everybody" uses virtualenv, but be aware that it is apparently normal *not* to use virtualenv when building dockers.
- Question: buildout, like virtualenv+pip, installs everything in the current directory. Would an option to install it globally instead make sense? I don't know if it is possible.
Reinout
-- Reinout van Rees http://reinout.vanrees.org/ reinout@vanrees.org http://www.nelen-schuurmans.nl/ "Learning history by destroying artifacts is a time-honored atrocity"
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Op 15-06-16 om 19:38 schreef Nathaniel Smith:
For a lot of good general information on these subjects, I recommend Glyph's talk at pycon this year: https://www.youtube.com/watch?v=5BqAeN-F9Qs
That's the most valuable youtube video I saw this year!
One point that's discussed is why you definitely should use virtualenv inside your containers :-)
Seems like a very valid point. It is a very valid point that seems to point solidly at wheels. As, if you use a virtualenv, you don't have access to system python packages. And that's one of the main reasons why the company I work for (still) uses buildout. Buildout has the "syseggrecipe" package that makes it easy to use specific system packages. So.... numpy, psycopg2, matplotlib, mapnik, scipy, hard-to-install packages like that. So once you use a virtualenv, you're either in a hell because you're compiling all those hard-to-compile packages from scratch, or you're using binary wheels. Once you've chucked everything into a wheel, there's suddenly a bit less of a necessity for docker. At least, that might be the case. On the other hand, I could imagine having a base 16.04 docker that we build to instead of a manylinux1 machine that we build to. Aaargh, that Gliph talk gave me a *lot* of food for thought. And that's a good thing. Reinout -- Reinout van Rees http://reinout.vanrees.org/ reinout@vanrees.org http://www.nelen-schuurmans.nl/ "Learning history by destroying artifacts is a time-honored atrocity"
On Wed, Jun 15, 2016 at 3:52 PM, Reinout van Rees
It is a very valid point that seems to point solidly at wheels. As, if you use a virtualenv, you don't have access to system python packages. And that's one of the main reasons why the company I work for (still) uses buildout. Buildout has the "syseggrecipe" package that makes it easy to use specific system packages. So.... numpy, psycopg2, matplotlib, mapnik, scipy, hard-to-install packages like that.
So once you use a virtualenv, you're either in a hell because you're compiling all those hard-to-compile packages from scratch, or you're using binary wheels.
Of your list above, I know numpy and scipy now ship manylinux1 wheels, so you don't need to compile them. matplotlib has manylinux1 wheels posted to their mailing list for testing for a few weeks now -- I suspect they'll upload them on for real any day now. Maybe you should nag the psycopg2 and mapnik developers to get with the program (or offer to help) ;-).
Once you've chucked everything into a wheel, there's suddenly a bit less of a necessity for docker. At least, that might be the case.
On the other hand, I could imagine having a base 16.04 docker that we build to instead of a manylinux1 machine that we build to.
Using a manylinux1 image for building is necessary if you want to distribute wheels on PyPI, but if you're just building wheels to deploy to your own controlled environment then it's unnecessary. You can install manylinux1 wheels into a 16.04 docker together with your home-built 16.04 wheels. -n -- Nathaniel J. Smith -- https://vorpus.org
On Thu, Jun 16, 2016 at 1:52 AM, Reinout van Rees
Aaargh, that Gliph talk gave me a *lot* of food for thought. And that's a good thing.
His main argument is that not using a virtualenv can lead to version conflicts. Eg: app you installed with apt will probably break if it depends on packages installed with pip. Not to mention the metadata and uninstall mess that come with that. A sound argument. However, none of that applies to a Docker image. You'd have a single set of dependencies, so what's the point of using unnecessary tools and abstractions. :-) Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro
On Jun 15, 2016, at 7:54 PM, Ionel Cristian Mărieș via Distutils-SIG
wrote: A sound argument. However, none of that applies to a Docker image. You'd have a single set of dependencies, so what's the point of using unnecessary tools and abstractions. :-)
Of course It still applies to Docker. You still have an operating system inside that container and unless you install zero Python using packages from the system then all of that can still conflict with your own application’s dependencies. — Donald Stufft
On Thu, Jun 16, 2016 at 2:57 AM, Donald Stufft
Of course It still applies to Docker. You still have an operating system inside that container and unless you install zero Python using packages from the system then all of that can still conflict with your own application’s dependencies.
You're correct, theoretically. But in reality is best to not stick a dozen services or apps in a single docker image. What's the point of using docker if you're having a single container with everything in it? Eg: something smells if you need to run a supervisor inside the container. If we're talking about python packages managed by the os package manager ... there are very few situations where it makes sense to use them (eg: pysvn is especially hard to build), but other than that? Plus it's easier to cache some wheels than to cache some apt packages when building images, way easier. Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro
On Jun 15, 2016, at 8:10 PM, Ionel Cristian Mărieș
wrote: You're correct, theoretically. But in reality is best to not stick a dozen services or apps in a single docker image.
It’s not really about services as it is about things that you might need *for the service*. For instance, maybe you want node installed so you can build some static files using node.js tooling, but node requires gyp which requires Python. Now you install a new version of setuptools that breaks the OS installed gyp and suddenly now you can’t build your static files anymore. A virtual environment costs very little and protects you from these things. Of course, if you’re not using the System python (for instance, the official Python images install their own Python into /usr/local) then there’s no reason to use a virtual environment because you’re already isolated from the system. — Donald Stufft
On Thu, Jun 16, 2016 at 3:29 AM, Donald Stufft
Now you install a new version of setuptools that breaks the OS installed gyp and suddenly now you can’t build your static files anymore.
gyp or node-gyp don't depend on python-setuptools, at least not on Ubuntu.
Are you referring to some concrete pkg_resources issue, assuming gyp uses
entrypoints api, and that api changed in some setuptools version?
On Thu, Jun 16, 2016 at 3:33 AM, Nathaniel Smith
Using a virtualenv is cheap
Virtualenvs broke down on me and had mindboggling bugs one too many times for me to consider them "cheap" to use. They make things weird (ever tried to load debug symbols for a python from a virtualenv in gdb?) I'm not saying they aren't useful. Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro
On Wed, Jun 15, 2016 at 5:10 PM, Ionel Cristian Mărieș
On Thu, Jun 16, 2016 at 2:57 AM, Donald Stufft
wrote: Of course It still applies to Docker. You still have an operating system inside that container and unless you install zero Python using packages from the system then all of that can still conflict with your own application’s dependencies.
You're correct, theoretically. But in reality is best to not stick a dozen services or apps in a single docker image. What's the point of using docker if you're having a single container with everything in it? Eg: something smells if you need to run a supervisor inside the container.
If we're talking about python packages managed by the os package manager ... there are very few situations where it makes sense to use them (eg: pysvn is especially hard to build), but other than that? Plus it's easier to cache some wheels than to cache some apt packages when building images, way easier.
The problem is that the bits of the OS that you use inside the container might themselves be written in Python. Probably the most obvious example is that on Fedora, if you want to install a system package, then the apt equivalent (dnf) is written in Python, so sudo pip install could break your package manager. Debian-derived distros also use Python for core system stuff, and might use it more in the future. So sure, you might get away with this, depending on the details of your container and your base image and if you religiously follow other best-practices for containerization and ... -- but why risk it? Using a virtualenv is cheap and means you don't have to know or care about these potential problems, so it's what we should recommend as best practice. -n -- Nathaniel J. Smith -- https://vorpus.org
On Wed, Jun 15, 2016 at 2:07 AM, Reinout van Rees
Situation: I'm experimenting with docker, mostly in combination with buildout. But it also applies to pip/virtualenv. ... Now local development: it is normal to mount the current directory as /code/, so that now is overlayed over the originally-added-to-the-docker /code/.
This means that anything done inside /code/ is effectively discarded in development. So a "bin/buildout" run has to be done again, because the bin/, parts/, eggs/ etc directories are gone.
In my experience, the only point of friction is the code being worked on, which I pip-install in a virtualenv inside the container in "editable" mode. This lets code changes on your local machine be reflected immediately in the applications running inside the container. I asked for advice on this scenario here: https://mail.python.org/pipermail/distutils-sig/2016-March/028478.html If your virtualenv is outside of "/code/", then your third-party dependencies shouldn't need to be reinstalled. Only the "editable" installs should need to be reinstalled after mounting your directory, which is the scenario I asked about above. --Chris
participants (7)
-
Chris Jerdonek
-
Donald Stufft
-
Ionel Cristian Mărieș
-
Jim Fulton
-
Nathaniel Smith
-
Nick Coghlan
-
Reinout van Rees