Re: [Python-Dev] PEP 376 : Changing the .egg-info structure
At 12:21 AM 5/15/2009 +0200, Tarek Ziadé wrote:
Hello
I'm proposing this PEP, which has been discussed in Distutils-SIG, for inclusion in Python 2.7 and 3.2
http://www.python.org/dev/peps/pep-0376/
Please comment !
I'd like to reiterate my suggestion that the uninstall record include size and checksum information, ala PEP 262's "FILES" section. This would allow the uninstall function to validate whether a file has been modified, and thus prevent uninstalling a locally-modified file, or a file installed in some other way. It may also be that providing an uninstall API that simply yields files to be uninstalled, with data about their existence/modification status, would be more useful than a blind uninstall operation with a filter function. Also, the PEP doesn't document what happens if a single file was installed by more than one package. Ideally, a file with identical size/checksum that belongs to more than one project should be silently left alone, and a file installed by more than one project with *different* size/checksum should be warned about and left alone. Next, the doc for the metadata API functions seems quite sparse. ISTR that I've previously commented on such issues as case- and punctuation-insensitivity of project names, and '/' separation in egg_info subpaths, but these don't seem to have been incorporated into the current version of the PEP. These are important considerations in general, btw, because project name and version canonicalization and escaping are an important part of both generating and parsing .egg-info filenemaes. At minimum, the relevant setuptools docs that define these standards should be cited. Finally, the "Definitions" section also claims that a project installs one or more packages, but a project may not contain *any* packages; it may have a standalone module, or just a script, data, or metadata.
-On [20090515 06:59], P.J. Eby (pje@telecommunity.com) wrote:
I'd like to reiterate my suggestion that the uninstall record include size and checksum information, ala PEP 262's "FILES" section. This would allow the uninstall function to validate whether a file has been modified, and thus prevent uninstalling a locally-modified file, or a file installed in some other way.
Agreed. Within FreeBSD's ports the installed package registration gets a MD5 hash per file recorded. Size is less interesting though, since essentially this information is encapsulated within the hash. Remove one byte from the file and your hash is already different. And the case of a collision for this kind of registration is sufficiently small to need the size information. And if you're worried about the MD5 collision space, which for this use case ought to be large enough, you could always settle for SHA1. -- Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B What's one man's poison, is another's meat or drink...
On Fri, May 15, 2009 at 8:32 AM, Jeroen Ruigrok van der Werven <asmodai@in-nomine.org> wrote:
Agreed. Within FreeBSD's ports the installed package registration gets a MD5 hash per file recorded. Size is less interesting though, since essentially this information is encapsulated within the hash. Remove one byte from the file and your hash is already different. And the case of a collision for this kind of registration is sufficiently small to need the size information.
Size is nice because it's much cheaper to check. I don't know if mass uninstalls will be so common that this is actually something we have to worry about, though. Cheers, Dirkjan
Yes, I don't think it's relevant to optimize install/uninstall code in Python. In the whole PEP 376 proposal, the only part that will need care will be the code that browses sys.path. On Fri, May 15, 2009 at 9:50 AM, Dirkjan Ochtman <dirkjan@ochtman.nl> wrote:
On Fri, May 15, 2009 at 8:32 AM, Jeroen Ruigrok van der Werven <asmodai@in-nomine.org> wrote:
Agreed. Within FreeBSD's ports the installed package registration gets a MD5 hash per file recorded. Size is less interesting though, since essentially this information is encapsulated within the hash. Remove one byte from the file and your hash is already different. And the case of a collision for this kind of registration is sufficiently small to need the size information.
Size is nice because it's much cheaper to check. I don't know if mass uninstalls will be so common that this is actually something we have to worry about, though.
Cheers,
Dirkjan _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ziade.tarek%40gmail.com
-- Tarek Ziadé | http://ziade.org
At 08:32 AM 5/15/2009 +0200, Jeroen Ruigrok van der Werven wrote:
Agreed. Within FreeBSD's ports the installed package registration gets a MD5 hash per file recorded. Size is less interesting though, since essentially this information is encapsulated within the hash. Remove one byte from the file and your hash is already different.
Which also means that in that case you can skip computing the MD5. The size allows you to easily notice an overwrite/corruption without further processing.
At 13:52 -0400 05/15/2009, P.J. Eby wrote:
At 08:32 AM 5/15/2009 +0200, Jeroen Ruigrok van der Werven wrote:
Agreed. Within FreeBSD's ports the installed package registration gets a MD5 hash per file recorded. Size is less interesting though, since essentially this information is encapsulated within the hash. Remove one byte from the file and your hash is already different.
Which also means that in that case you can skip computing the MD5. The size allows you to easily notice an overwrite/corruption without further processing.
In most cases the files will actually match, so the sizes and dates will be the same and the checksum must be computed to verify the match. RPM does this when asked to Verify a package. It is faster than Removing a package, and Verifying all installed packages takes a reasonable amount of time. I don't think Python would be any worse at verifying its own packages, and it would normally have less data to verify, so it should be fast enough. -- ____________________________________________________________________ TonyN.:' <mailto:tonynelson@georgeanelson.com> ' <http://www.georgeanelson.com/>
2009/5/15 P.J. Eby <pje@telecommunity.com>:
At 12:21 AM 5/15/2009 +0200, Tarek Ziadé wrote:
Hello
I'm proposing this PEP, which has been discussed in Distutils-SIG, for inclusion in Python 2.7 and 3.2
http://www.python.org/dev/peps/pep-0376/
Please comment !
I'd like to reiterate my suggestion that the uninstall record include size and checksum information, ala PEP 262's "FILES" section. This would allow the uninstall function to validate whether a file has been modified, and thus prevent uninstalling a locally-modified file, or a file installed in some other way.
good point, I'll re-work that part
It may also be that providing an uninstall API that simply yields files to be uninstalled, with data about their existence/modification status, would be more useful than a blind uninstall operation with a filter function.
Sure we could have it in that shape, I'll work on this as well.
Also, the PEP doesn't document what happens if a single file was installed by more than one package.
It does: "...as long as they are not mentioned in another RECORD file..."
Ideally, a file with identical size/checksum that belongs to more than one project should be silently left alone, and a file installed by more than one project with *different* size/checksum should be warned about and left alone.
I think the path is the info that should be looked at. And a warning could be raised like you said if a file was manually modified. But I don't think you want to leave alone a file with identical size/checksum that belongs to more than one project when it's not the same absolute path. Here's an example why : if two different packages includes the "feedparser.py" module (from the FeedParser project) for conveniency, and if you remove one package, you *do* want to remove its "feeparser.py" module even if it exists in the other project. So it's rather changing the PEP text like this: "...as long as they are not mentioned in another RECORD file, with the same size/checksum..."
Next, the doc for the metadata API functions seems quite sparse. ISTR that I've previously commented on such issues as case- and punctuation-insensitivity of project names, and '/' separation in egg_info subpaths, but these don't seem to have been incorporated into the current version of the PEP.
These are important considerations in general, btw, because project name and version canonicalization and escaping are an important part of both generating and parsing .egg-info filenemaes. At minimum, the relevant setuptools docs that define these standards should be cited.
I'll add more info on that part accordingly then,
Finally, the "Definitions" section also claims that a project installs one or more packages, but a project may not contain *any* packages; it may have a standalone module, or just a script, data, or metadata.
ok Thanks for the feedbacks -- Tarek Ziadé | http://ziade.org
participants (5)
-
Dirkjan Ochtman
-
Jeroen Ruigrok van der Werven
-
P.J. Eby
-
Tarek Ziadé
-
Tony Nelson