Package Meta-information Patch

Hi All, I finally got to delve into Distutils 0.1.2 over the long weekend, and I think it is a very nice package. The architecture is very clean and extensible, and the code is consistent, well commented, and well organized. Thanks, Greg! I've enclosed a patch which addresses a particular concern of mine: package meta-information. At the end of the install command, it creates a package information file in <install_py.install_dir>/_pkginfo named after the package (it also creates the "_pkginfo" directory, if necessary). The file contains python variable definitions for the package name, version number, list of files installed, dependencies, and compatible versions (although the latter two are always empty at this time). The module that is used to record and obtain package information can be overriden on the command line (install_info --pkg-module <your-module>). Also, if a global "pkginfo" module exists, it will be used instead of the default distutils.pkginfo module (whose behavior is described above). I envision the installer using whatever package information system is most appropriate for the system (RPM's for Linux, Registry files for Windows[?]). Meta-information facilitates uninstall, dependency checking, automatic downloading of dependencies, and system cataloging. I'd be interested in contributing some of these features to distutils, and welcome any discussion on the subject, particularly since it is relevant to my day job. ============================================================================= michaelMuller = mmuller@enduden.com | http://www.cloud9.net/~proteus ----------------------------------------------------------------------------- There is no concept that is more demeaning to the human spirit than the notion that our freedom must be limited in the interests of our own protection. =============================================================================

On 28 December 1999, Michael Muller said:
Hmmm... interesting idea. I mean, we've known all along that some sort of "package metainfo database" (metadatabase? ugh) is going to be needed for exactly the reasons you listed (uninstall, dependeny analysis, and system cataloging). I have not spent a lot of time thinking about it, but I think I was stuck in a "database must be one big file" rut, with all the attendant problems of performance, concurrent access, etc. About as far as I got was thinking "text files suck for size and performance, and DB or dbm files might not be portable enough". But I think I like your approach -- at least part of it. Specifically, I think I like the notion of spreading the "metainfo database" across many files in many directories. To find information about all module distributions installed, you troll sys.path, looking for a "_pkginfo" subdirectory in each entry, and then look at the files installed there. At least, that's the understanding I get from reading your message and a cursory scan of the patch -- am I right? This pretty much solves the practical side of "what to do about concurrent access" -- in practice, it's not going to happen much, so don't get too worried about it. It doesn't sound very good for performance, unless all you want is a list of packages installed -- that should be pretty fast (you can get everything you need from a succession of os.listdir() calls). What I'm a little leery about is using Python code as a data format. It's attractive because we all know the syntax and don't have to write a parser. But using a general-purpose language for *such* a specific, tightly-targeted task seems ... I dunno ... overkill-ish. And I wonder if there are security holes lurking in the concept of using code for system catalog data. Does anyone else share my reservations (which are vague, ill-defined, and more superstitious than anything else)? Conversely, does anyone think that Python code is absolutely the right way to store module distribution metadata? Thanks again for the patch -- I think it should find its way into Distutils 0.2 after the SIG has thrashed through some of the issues it raises. Greg -- Greg Ward - software developer gward@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913

Greg Ward writes:
About as far as I got was thinking "text files suck for size and performance, and DB or dbm files might not be portable enough".
They're not *bad* for size any more than page alignment in a dbm database. ;-)
Tools that need faster access during an operation can build temporary databases to accelerate operation as needed, so I don't see any problems here. I wouldn't expect that to be needed too often.
Yes. This stuff should not require any exec or eval. It might be reasonable to use something like the .ini format; this can be handled using ConfigParser. This way we still don't need to write a parser. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> Corporation for National Research Initiatives

On Mon, 17 Jan 2000, Fred L. Drake, Jr. wrote:
I second Fred here... use a format compatible with ConfigParser. Simple, clean, and easily handled. DBM or central databases are probably a bit bogus. What's the speed for? If you want to *locate* the _pkginfo files, then just append a pathname to a central file. Let the tool go and see if it is still there. Or the tool can be invoked with "do a filetree walk -- the log file may be out of sync." KISS :-) Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Ward wrote: [snip]
Actually, I was kind of stuck on the problem of what to do about multiple locations in sys.path, but now that you mention it, this sounds like a good approach :-).
I'm a little leery about this also - that's partly why I isolated this code within the pkginfo module. You should really be able to use whatever meta-info repository that you want, providing that only _one_ is used on a particular system. The reason I went with the Python code approach is that it was extremely easy to write and debug (it's always easier to see what's going on when the data itself is readable), and because it was the only format that I was _absolutely certain_ would be available on every platform that would be using distutils. Also, there's the fact that with Python code, the interpreter is written in C - it will run significantly faster than any code that could be written in Python. ============================================================================= michaelMuller = mmuller@enduden.com | http://www.cloud9.net/~proteus ----------------------------------------------------------------------------- Mantra for the 60's: Tune in, turn on, drop out Mantra for the 90's: Turn on, jack in, jerk off =============================================================================

Michael Muller writes:
No, it's fair to assume that the entire standard library is available. That makes available ConfigParser, rfc822, shlib, and xmllib. Writing a parser *is* an option, because it can be incorporated into distutils, but I don't see any need to do so with so many already-debugged modules available.
And the first malevolent packager to add a "while 1" loop to his package metadata breaks the whole system. If performance is really an issue, C and Java implementations can be built as needed. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> Corporation for National Research Initiatives

"Fred L. Drake, Jr." wrote:
You are correct (with the exception of "shlib" - which I seem to be unable to import; new module?). However, none of these was as easy to use as this: execfile(os.path.join(default_location, name), globals) Writing the file was slightly more difficult, it took a whopping 9 lines of code.
Maybe I'm missing something here, but I fail to see how this possibility is any more threatening than that of a malevolent packager installing viral code on your system. In the system that I have submitted, the packager does not directly provide the meta-database file (sorry :-). The manipulation of these files is performed through the pkginfo module, which can encapsulate any kind of information repository that you like. Nonetheless, as I said, I'm not a particularly strong proponent of using python code to store this information, myself. If everybody thinks that this approach sucks, that's fine. I'm much more concerned with the content and the interface to this system. ============================================================================= michaelMuller = mmuller@enduden.com | http://www.cloud9.net/~proteus ----------------------------------------------------------------------------- Society in every state is a blessing, but government even in its best state is but a necessary evil; in its worst state an intolerable one... - Thomas Paine =============================================================================

Michael Muller writes:
shlib was in 1.5.2, I think, but it is fairly new. Perhaps the ease of use indicates that the interface to modules like ConfigParser needs to be enhanced. You're right: it *should* be easy!
Even if buggy code is installed, the package manager itself should remain usable so that it can be removed easily.
That does help; my misunderstanding. But I still think a non-code syntax is preferable. Something like an .ini file is very readable and is already familiar to sys-admin types; Python syntax is almost there, but not quite. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> Corporation for National Research Initiatives

Fred L. Drake, Jr. writes:
shlib was in 1.5.2, I think, but it is fairly new.
I checked up on this; I was thinking of shlex, which was new in 1.5.2. It was contributed by Eric Raymond, and provides support for parsing shell-like syntaxes. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> Corporation for National Research Initiatives

On Tue, 18 Jan 2000, Michael Muller wrote:
I use: parser = ConfigParser.ConfigParser() parser.read(os.path.join(default_location, name) The biggest difference between the ConfigParser and the execfile() approach is that the latter can easily create complicated structures. For the ConfigParser, you may need to do something like: files = map(string.strip, string.split(parser.get('info', 'files'), ',')) For each "structured" entry.
Writing the file was slightly more difficult, it took a whopping 9 lines of code.
Writing a .ini might be a bit longer, but it mostly depends on the input structures and the number of output sections. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
This has been one of my main objections to using ConfigParser on the occasions when I've needed such a thing. However, as you've indicated, it is easily overcome. If everyone is good with using ConfigParser, I'd be happy to submit another patch. ============================================================================= michaelMuller = mmuller@enduden.com | http://www.cloud9.net/~proteus ----------------------------------------------------------------------------- Mantra for the 60's: Tune in, turn on, drop out Mantra for the 90's: Turn on, jack in, jerk off =============================================================================

On 28 December 1999, Michael Muller said:
Hmmm... interesting idea. I mean, we've known all along that some sort of "package metainfo database" (metadatabase? ugh) is going to be needed for exactly the reasons you listed (uninstall, dependeny analysis, and system cataloging). I have not spent a lot of time thinking about it, but I think I was stuck in a "database must be one big file" rut, with all the attendant problems of performance, concurrent access, etc. About as far as I got was thinking "text files suck for size and performance, and DB or dbm files might not be portable enough". But I think I like your approach -- at least part of it. Specifically, I think I like the notion of spreading the "metainfo database" across many files in many directories. To find information about all module distributions installed, you troll sys.path, looking for a "_pkginfo" subdirectory in each entry, and then look at the files installed there. At least, that's the understanding I get from reading your message and a cursory scan of the patch -- am I right? This pretty much solves the practical side of "what to do about concurrent access" -- in practice, it's not going to happen much, so don't get too worried about it. It doesn't sound very good for performance, unless all you want is a list of packages installed -- that should be pretty fast (you can get everything you need from a succession of os.listdir() calls). What I'm a little leery about is using Python code as a data format. It's attractive because we all know the syntax and don't have to write a parser. But using a general-purpose language for *such* a specific, tightly-targeted task seems ... I dunno ... overkill-ish. And I wonder if there are security holes lurking in the concept of using code for system catalog data. Does anyone else share my reservations (which are vague, ill-defined, and more superstitious than anything else)? Conversely, does anyone think that Python code is absolutely the right way to store module distribution metadata? Thanks again for the patch -- I think it should find its way into Distutils 0.2 after the SIG has thrashed through some of the issues it raises. Greg -- Greg Ward - software developer gward@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913

Greg Ward writes:
About as far as I got was thinking "text files suck for size and performance, and DB or dbm files might not be portable enough".
They're not *bad* for size any more than page alignment in a dbm database. ;-)
Tools that need faster access during an operation can build temporary databases to accelerate operation as needed, so I don't see any problems here. I wouldn't expect that to be needed too often.
Yes. This stuff should not require any exec or eval. It might be reasonable to use something like the .ini format; this can be handled using ConfigParser. This way we still don't need to write a parser. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> Corporation for National Research Initiatives

On Mon, 17 Jan 2000, Fred L. Drake, Jr. wrote:
I second Fred here... use a format compatible with ConfigParser. Simple, clean, and easily handled. DBM or central databases are probably a bit bogus. What's the speed for? If you want to *locate* the _pkginfo files, then just append a pathname to a central file. Let the tool go and see if it is still there. Or the tool can be invoked with "do a filetree walk -- the log file may be out of sync." KISS :-) Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Ward wrote: [snip]
Actually, I was kind of stuck on the problem of what to do about multiple locations in sys.path, but now that you mention it, this sounds like a good approach :-).
I'm a little leery about this also - that's partly why I isolated this code within the pkginfo module. You should really be able to use whatever meta-info repository that you want, providing that only _one_ is used on a particular system. The reason I went with the Python code approach is that it was extremely easy to write and debug (it's always easier to see what's going on when the data itself is readable), and because it was the only format that I was _absolutely certain_ would be available on every platform that would be using distutils. Also, there's the fact that with Python code, the interpreter is written in C - it will run significantly faster than any code that could be written in Python. ============================================================================= michaelMuller = mmuller@enduden.com | http://www.cloud9.net/~proteus ----------------------------------------------------------------------------- Mantra for the 60's: Tune in, turn on, drop out Mantra for the 90's: Turn on, jack in, jerk off =============================================================================

Michael Muller writes:
No, it's fair to assume that the entire standard library is available. That makes available ConfigParser, rfc822, shlib, and xmllib. Writing a parser *is* an option, because it can be incorporated into distutils, but I don't see any need to do so with so many already-debugged modules available.
And the first malevolent packager to add a "while 1" loop to his package metadata breaks the whole system. If performance is really an issue, C and Java implementations can be built as needed. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> Corporation for National Research Initiatives

"Fred L. Drake, Jr." wrote:
You are correct (with the exception of "shlib" - which I seem to be unable to import; new module?). However, none of these was as easy to use as this: execfile(os.path.join(default_location, name), globals) Writing the file was slightly more difficult, it took a whopping 9 lines of code.
Maybe I'm missing something here, but I fail to see how this possibility is any more threatening than that of a malevolent packager installing viral code on your system. In the system that I have submitted, the packager does not directly provide the meta-database file (sorry :-). The manipulation of these files is performed through the pkginfo module, which can encapsulate any kind of information repository that you like. Nonetheless, as I said, I'm not a particularly strong proponent of using python code to store this information, myself. If everybody thinks that this approach sucks, that's fine. I'm much more concerned with the content and the interface to this system. ============================================================================= michaelMuller = mmuller@enduden.com | http://www.cloud9.net/~proteus ----------------------------------------------------------------------------- Society in every state is a blessing, but government even in its best state is but a necessary evil; in its worst state an intolerable one... - Thomas Paine =============================================================================

Michael Muller writes:
shlib was in 1.5.2, I think, but it is fairly new. Perhaps the ease of use indicates that the interface to modules like ConfigParser needs to be enhanced. You're right: it *should* be easy!
Even if buggy code is installed, the package manager itself should remain usable so that it can be removed easily.
That does help; my misunderstanding. But I still think a non-code syntax is preferable. Something like an .ini file is very readable and is already familiar to sys-admin types; Python syntax is almost there, but not quite. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> Corporation for National Research Initiatives

Fred L. Drake, Jr. writes:
shlib was in 1.5.2, I think, but it is fairly new.
I checked up on this; I was thinking of shlex, which was new in 1.5.2. It was contributed by Eric Raymond, and provides support for parsing shell-like syntaxes. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> Corporation for National Research Initiatives

On Tue, 18 Jan 2000, Michael Muller wrote:
I use: parser = ConfigParser.ConfigParser() parser.read(os.path.join(default_location, name) The biggest difference between the ConfigParser and the execfile() approach is that the latter can easily create complicated structures. For the ConfigParser, you may need to do something like: files = map(string.strip, string.split(parser.get('info', 'files'), ',')) For each "structured" entry.
Writing the file was slightly more difficult, it took a whopping 9 lines of code.
Writing a .ini might be a bit longer, but it mostly depends on the input structures and the number of output sections. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
This has been one of my main objections to using ConfigParser on the occasions when I've needed such a thing. However, as you've indicated, it is easily overcome. If everyone is good with using ConfigParser, I'd be happy to submit another patch. ============================================================================= michaelMuller = mmuller@enduden.com | http://www.cloud9.net/~proteus ----------------------------------------------------------------------------- Mantra for the 60's: Tune in, turn on, drop out Mantra for the 90's: Turn on, jack in, jerk off =============================================================================
participants (5)
-
Fred L. Drake, Jr.
-
Greg Stein
-
Greg Ward
-
Michael Muller
-
Michael Muller