New buildout options: checksums and allow-omitted-checksums
Hi, two weeks ago I asked about your opinions on a buildout option that enforces specifying (MD5) checksums for all files downloaded through buildout's download utility API. I've been discussing the subject with Christian Theune in the meantime and would like to describe a more concrete proposal now that deals with the questions raised in my previous post: In analogy with version pinning for eggs, two new options could be introduced to the buildout section: - "checksums": This option would name a config section that contains checksums for any number of resources by URL. I suggest a default value of "checksum" for it. Listing some URL with an empty checksum would explicitly express that the checksum for this resource should never be checked. I'm not sure how to structure the contents of the checksums section: URLs are not valid config keys in general (they can contain "=" characters) but I'd still like to be able to rely on the existing mechanism for overriding single options to override single checksums. Mapping arbitrary keys to whitespace-separated pairs of URL and checksum would work but sounds inelegant. - "allow-omitted-checksums": This option would specify whether resources should be downloaded that are not listed in the checksums section. I'd like to use False as this option's default value, meaning that buildout should raise a UserError if a resource is omitted from the checksums section. (Intentionally empty checksums would still be allowed.) In fact, I'm not completely happy about inventing an option with negative semantics but doing it this way is at least consistent with "allow-picked-versions". I'd like to hear other people's opinion on both the general idea and the details. Unless the whole thing gets shot down, I plan to start implementing it on a branch of zc.buildout next week. Thank you. -- Thomas
Hi Thomas, I like your idea in general. I'd like to point to the following suggestion with patch+test (though it might need some cleanup) that is not exactly related to what you're proposing, but has to do with the same thing (relationship between files and checksums): https://bugs.launchpad.net/zc.buildout/+bug/692600 It suggests (and implements) automatically redownloading files when checksums don't match (which could happen when you update your config file with a new checksum for a file that has changed upstream). Other comments below: On Thu, Mar 17, 2011 at 16:55, Thomas Lotze <thomas@thomas-lotze.de> wrote:
Hi,
[...]
- "checksums": This option would name a config section that contains checksums for any number of resources by URL. I suggest a default value of "checksum" for it.
Won't 'checksums' (plural) as the value be better? It would keep with the tradition of matching the name of the buildout option and the name of the section by default.
Listing some URL with an empty checksum would explicitly express that the checksum for this resource should never be checked. I'm not sure how to structure the contents of the checksums section: URLs are not valid config keys in general (they can contain "=" characters) but I'd still like to be able to rely on the existing mechanism for overriding single options to override single checksums. Mapping arbitrary keys to whitespace-separated pairs of URL and checksum would work but sounds inelegant.
I suggest mapping the checksums (which are valid keys) to URLs, and having a special key with line-break separated values for explicitly not checking: [checksums] 080d2c6a849ebd6b7fd49049c21b910a = http://example.com/foo/bar.tgz no-check = http://example.com/foo/baz.tgz http://example.com/foo/fred.tgz This will not be so elegant when you want to extend another configuration and override some decisions, but it works somewhat: [buildout] extends = config-file-above.cfg [checksums] 080d2c6a849ebd6b7fd49049c21b910a = no-check += http://example.com/foo/bar.tgz
[...]
Cheers, Leo
Leonardo Rochael Almeida wrote:
I like your idea in general. I'd like to point to the following suggestion with patch+test (though it might need some cleanup) that is not exactly related to what you're proposing, but has to do with the same thing (relationship between files and checksums):
https://bugs.launchpad.net/zc.buildout/+bug/692600
It suggests (and implements) automatically redownloading files when checksums don't match (which could happen when you update your config file with a new checksum for a file that has changed upstream).
Thank you for the pointer, I'll look into it. Fixing this along with other download-related things before the next buildout release would be a good thing, I think.
- "checksums": This option would name a config section that contains checksums for any number of resources by URL. I suggest a default value of "checksum" for it.
Won't 'checksums' (plural) as the value be better? It would keep with the tradition of matching the name of the buildout option and the name of the section by default.
Sure, I meant the section name to read "checksums" but a typo crept in.
I suggest mapping the checksums (which are valid keys) to URLs, and having a special key with line-break separated values for explicitly not checking:
This would be ugly in my humble opinion because it reverses the meaning of keys and values and makes the configuration look backwards for a purely technical reason. Also, the following:
This will not be so elegant when you want to extend another configuration and override some decisions, but it works somewhat:
[buildout] extends = config-file-above.cfg
[checksums] 080d2c6a849ebd6b7fd49049c21b910a = no-check += http://example.com/foo/bar.tgz
becomes even worse when you want to update a checksum specified in a configuration file that is being extended: you'd have to keep track of two checksums for each resource, once to invalidate the old one and once to specify the new one. Thank you for your input, though! -- Thomas
On Thu, Mar 17, 2011 at 04:55:05PM +0100, Thomas Lotze wrote:
two weeks ago I asked about your opinions on a buildout option that enforces specifying (MD5) checksums for all files downloaded through buildout's download utility API.
Please don't hardcode the checksum algorithm to MD5. Security researchers have been telling everyone to stop using MD5 (and SHA1) for a while now. Marius Gedminas -- How much net work could a network work, if a network could net work?
Marius Gedminas wrote:
Please don't hardcode the checksum algorithm to MD5. Security researchers have been telling everyone to stop using MD5 (and SHA1) for a while now.
Good point. All this talking about MD5 specifically has been due to the fact that this is what used to be used by the download API and the gocep.download recipe so far. To take up the idea I posted a few minutes ago, one might specify checksums like this: [checksums] foo = http://example.org/foo.tgz algorithm:checksum-value Since the checksum would be evaluated by the download API itself, many checksum algorithms could be used since adding another algorithm in this one place would add it consistently to all pieces of buildout and recipes that download things. -- Thomas
On Fri, Mar 18, 2011 at 9:43 AM, Thomas Lotze <thomas@thomas-lotze.de> wrote:
Marius Gedminas wrote:
Please don't hardcode the checksum algorithm to MD5. Security researchers have been telling everyone to stop using MD5 (and SHA1) for a while now.
Good point. All this talking about MD5 specifically has been due to the fact that this is what used to be used by the download API and the gocep.download recipe so far. To take up the idea I posted a few minutes ago, one might specify checksums like this:
[checksums] foo = http://example.org/foo.tgz algorithm:checksum-value
+1 -- Benji York
Thomas Lotze wrote:
Good point. All this talking about MD5 specifically has been due to the fact that this is what used to be used by the download API and the gocep.download recipe so far. To take up the idea I posted a few minutes ago, one might specify checksums like this:
[checksums] foo = http://example.org/foo.tgz algorithm:checksum-value
After some further offline discussion, I'd like to suggest using MD5 as the default algorithm, though. It has been the algorithm of choice in buildout and recipes and allowing to omit the md5: prefix in what's very likely the majority of cases sounds like a good bargain. I agree that the cleanest solution would be not having a default algorithm at all but this may just be one instance where practicality beats purity. -- Thomas
Thomas Lotze wrote:
After some further offline discussion, I'd like to suggest using MD5 as the default algorithm, though.
Warnings against using md5 are mainly about cryptographic security, aren't they? For just detecting accidental corruption it should still be good enough. -- Greg
On Tue, Mar 22, 2011 at 12:51:35PM +1300, Greg Ewing wrote:
Thomas Lotze wrote:
After some further offline discussion, I'd like to suggest using MD5 as the default algorithm, though.
Warnings against using md5 are mainly about cryptographic security, aren't they? For just detecting accidental corruption it should still be good enough.
Yes, that's my understanding too. (My only point for raising this was to consider a future-proof syntax for specifying the checksums, so that we're not locked in the past when the world moves on.) Marius Gedminas -- At most companies, programmers aren't trusted with words that a user might actually see (and for good reason, much of the time). -- Joel Spolski
Thomas Lotze wrote:
In analogy with version pinning for eggs, two new options could be introduced to the buildout section:
- "checksums": This option would name a config section that contains checksums for any number of resources by URL. I suggest a default value of "checksum" for it. Listing some URL with an empty checksum would explicitly express that the checksum for this resource should never be checked. I'm not sure how to structure the contents of the checksums section: URLs are not valid config keys in general (they can contain "=" characters) but I'd still like to be able to rely on the existing mechanism for overriding single options to override single checksums. Mapping arbitrary keys to whitespace-separated pairs of URL and checksum would work but sounds inelegant.
OTOH, thinking further about a line format like "name = url md5sum", I find more advantages than just the fact that it would be syntactically valid: Recipes such as zc.recipe.cmmi or gocept.download could reference the resource name instead of the url, and in analogy to zc.recipe.egg, they might even infer the resource name from the section name: [checksums] foo = http://example.org/foo.tgz 96772abbcb3331f63d05ffba40b7b523 bar = http://example.org/bar.tgz 64d714a998cab0c45c48307698317a0f baz = http://example.org/baz.tgz 22bfb8c1dd94b5f3813a2b25da67463f [install-foo-by-url] recipe = zc.recipe.cmmi url = http://example.org/foo.tgz [install-bar-by-name] recipe = zc.recipe.cmmi source = bar [baz] recipe = zc.recipe.cmmi It's possibly worth some amount of bike-shedding what the concept of the resource should really be called: For a cmmi recipe, "source" seems to work best as a key while a more general download recipe might indeed use the word "resource", and there's also the question of whether and how to reference extended configuration files by resource name. But my first question is whether this whole idea makes sense to other people at all or whether it adds more abstraction than it is worth. -- Thomas
participants (5)
-
Benji York
-
Greg Ewing
-
Leonardo Rochael Almeida
-
Marius Gedminas
-
Thomas Lotze