RFC: PEP243: Module Repository Upload Mechanism

Included below is the version of PEP243 after it's initial round of review. I welcome any feedback. Thanks, Sean ============================================================================ PEP: 243 Title: Module Repository Upload Mechanism Version: $Revision$ Author: jafo-pep@tummy.com (Sean Reifschneider) Status: Draft Type: Standards Track Created: 18-Mar-2001 Python-Version: 2.1 Post-History: Discussions-To: distutils-sig@python.org Abstract For a module repository system (such as Perl's CPAN) to be successful, it must be as easy as possible for module authors to submit their work. An obvious place for this submit to happen is in the Distutils tools after the distribution archive has been successfully created. For example, after a module author has tested their software (verifying the results of "setup.py sdist"), they might type "setup.py sdist --submit". This would flag Distutils to submit the source distribution to the archive server for inclusion and distribution to the mirrors. This PEP only deals with the mechanism for submitting the software distributions to the archive, and does not deal with the actual archive/catalog server. Upload Process The upload will include the Distutils "PKG-INFO" meta-data information (as specified in PEP-241 [1]), the actual software distribution, and other optional information. This information will be uploaded as a multi-part form encoded the same as a regular HTML file upload request. This form is posted using ENCTYPE="multipart/form-data" encoding [RFC1867]. The upload will be made to the host "modules.python.org" on port 80/tcp (POST http://modules.python.org:80/swalowpost.cgi). The form will consist of the following fields: distribution -- The file containing the module software (for example, a .tar.gz or .zip file). distmd5sum -- The MD5 hash of the uploaded distribution, encoded in ASCII representing the hexadecimal representation of the digest ("for byte in digest: s = s + ('%02x' % ord(byte))"). pkginfo (optional) -- The file containing the distribution meta-data (as specified in PEP-241 [1]). Note that if this is not included, the distribution file is expected to be in .tar format (gzipped and bzipped compreesed are allowed) or .zip format, with a "PKG-INFO" file in the top-level directory it extracts ("package-1.00/PKG-INFO"). infomd5sum (required if pkginfo field is present) -- The MD5 hash of the uploaded meta-data, encoded in ASCII representing the hexadecimal representation of the digest ("for byte in digest: s = s + ('%02x' % ord(byte))"). platform (optional) -- A string representing the target platform for this distribution. This is only for binary distributions. It is encoded as "<os_name>-<os_version>-<platform architecture>-<python version>". signature (optional) -- A OpenPGP-compatible signature [RFC2440] of the uploaded distribution as signed by the author. This may be used by the cataloging system to automate acceptance of uploads. protocol_version -- A string indicating the protocol version that the client supports. This document describes protocol version "1". Return Data The status of the upload will be reported using HTTP non-standard ("X-*)" headers. The "X-Swalow-Status" header may have the following values: SUCCESS -- Indicates that the upload has succeeded. FAILURE -- The upload is, for some reason, unable to be processed. TRYAGAIN -- The server is unable to accept the upload at this time, but the client should try again at a later time. Potential causes of this are resource shortages on the server, administrative down-time, etc... Optionally, there may be a "X-Swalow-Reason" header which includes a human-readable string which provides more detailed information about the "X-Swalow-Status". If there is no "X-Swalow-Status" header, or it does not contain one of the three strings above, it should be treated as a temporary failure. Example: >>> f = urllib.urlopen('http://modules.python.org:80/swalowpost.cgi') >>> s = f.headers['x-swalow-status'] >>> s = s + ': ' + f.headers.get('x-swalow-reason', '<None>') >>> print s FAILURE: Required field "distribution" missing. Sample Form The upload client must submit the page in the same form as Netscape Navigator version 4.76 for Linux produces when presented with the following form: <H1>Upload file</H1> <FORM NAME="fileupload" METHOD="POST" ACTION="swalowpost.cgi" ENCTYPE="multipart/form-data"> <INPUT TYPE="file" NAME="distribution"><BR> <INPUT TYPE="text" NAME="distmd5sum"><BR> <INPUT TYPE="file" NAME="pkginfo"><BR> <INPUT TYPE="text" NAME="infomd5sum"><BR> <INPUT TYPE="text" NAME="platform"><BR> <INPUT TYPE="text" NAME="signature"><BR> <INPUT TYPE="hidden" NAME="protocol_version" VALUE="1"><BR> <INPUT TYPE="SUBMIT" VALUE="Upload"> </FORM> Platforms The following are valid os names: aix beos debian dos freebsd hpux mac macos mandrake netbsd openbsd qnx redhat solaris suse windows yellowdog The above include a number of different types of distributions of Linux. Because of versioning issues these must be split out, and it is expected that when it makes sense for one system to use distributions made on other similar systems, the download client will make the distinction. Version is the official version string specified by the vendor for the particular release. For example, "2000" and "nt" (Windows), "9.04" (HP-UX), "7.0" (RedHat, Mandrake). The following are valid architectures: alpha hppa ix86 powerpc sparc ultrasparc Status I currently have a proof-of-concept client and server implemented. I plan to have the Distutils patches ready for the 2.1 release. Combined with Andrew's PEP-241 [1] for specifying distribution meta-data, I hope to have a platform which will allow us to gather real-world data for finalizing the catalog system for the 2.2 release. References [1] Metadata for Python Software Package, Kuchling, http://python.sourceforge.net/peps/pep-0241.html [RFC1867] Form-based File Upload in HTML http://www.faqs.org/rfcs/rfc1867.html [RFC2440] OpenPGP Message Format http://www.faqs.org/rfcs/rfc2440.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End: -- A smart terminal is not a smart*ass* terminal, but rather a terminal you can educate. -- Rob Pike Sean Reifschneider, Inimitably Superfluous <jafo@tummy.com> tummy.com - Linux Consulting since 1995. Qmail, KRUD, Firewalls, Python

Sean Reifschneider <jafo@tummy.com> writes: [...]
Platforms
The following are valid os names:
aix beos debian dos freebsd hpux mac macos mandrake netbsd openbsd qnx redhat solaris suse windows yellowdog
Shouldn't there be separate values for packages for Debian GNU/Linux and Debian GNU/Hurd? What's the difference between "mac" and "macos"? Other operating systems referred to in the README with Python 2.1b2 are: BSDI, DEC Unix, DEC Ultrix, Minix, SCO, SunOS 4, NeXTSTEP, Irix, OS/2, Monterey, Reliant UNIX, Mac OS X. Other OS's for which Python is available, that I know of, are Palm, WinCE and OS/400. [...]
The following are valid architectures:
alpha hppa ix86 powerpc sparc ultrasparc
Other common architectures are arm, ia64, m68k, mips, mipsel, s390 (or whatever IBM are calling it now) and as400 (or whatever IBM are calling _it_ now), although the AS/400 is arguably using the PowerPC architecture. Python should be available for all of these. -- Carey Evans http://home.clear.net.nz/pages/c.evans/ "Quiet, you'll miss the humorous conclusion."

On Sun, Mar 25, 2001 at 01:52:35PM +1200, Carey Evans wrote:
Shouldn't there be separate values for packages for Debian GNU/Linux and Debian GNU/Hurd?
I've added debian_hurd
What's the difference between "mac" and "macos"?
Removed "mac".
BSDI, DEC Unix, DEC Ultrix, Minix, SCO, SunOS 4, NeXTSTEP, Irix, OS/2, Monterey, Reliant UNIX, Mac OS X.
Added all these.
Other OS's for which Python is available, that I know of, are Palm, WinCE and OS/400.
Added os400 and palm, wince is OS "Windows" version "CE", isn't it?
Other common architectures are arm, ia64, m68k, mips, mipsel, s390 (or whatever IBM are calling it now) and as400 (or whatever IBM are calling _it_ now), although the AS/400 is arguably using the PowerPC architecture. Python should be available for all of these.
Added all these. Thanks, Sean -- Think. Sean Reifschneider, Inimitably Superfluous <jafo@tummy.com> tummy.com - Linux Consulting since 1995. Qmail, KRUD, Firewalls, Python

Sean Reifschneider <jafo@tummy.com> writes: [...]
Other OS's for which Python is available, that I know of, are Palm, WinCE and OS/400.
Added os400 and palm, wince is OS "Windows" version "CE", isn't it?
(BTW, thanks for all the hard work on this.) Yes, I guess WinCE could be considered a version of windows, though it's had it's own versions, too. In that case, wouldn't Debian GNU/Hurd 2.2 be OS "debian", version "hurd_2.2"? In fact, all the different Linux distributions are less different than Windows 98, Windows 2000 and WinCE, so couldn't they all be different versions of OS "linux"? It seems unfair to lump three different Windows code bases together, but give a particular selection of Linux distributions OS status. I'm not saying it should necessarily be different or not, just interested in arguments for and against.
Other common architectures are arm, ia64, m68k, mips, mipsel, s390 (or whatever IBM are calling it now) and as400 (or whatever IBM are calling _it_ now), although the AS/400 is arguably using the PowerPC architecture. Python should be available for all of these.
Added all these.
Can I be a nuisance and say that after arguing with myself a bit today, I've decided that AS/400 doesn't deserve its own architecture value? Linux running in a logical partition, and AIX binaries running in PASE will both be "powerpc", so it should just use that. -- Carey Evans http://home.clear.net.nz/pages/c.evans/ "Quiet, you'll miss the humorous conclusion."

On Sun, Mar 25, 2001 at 11:09:49PM +1200, Carey Evans wrote:
Yes, I guess WinCE could be considered a version of windows, though it's had it's own versions, too.
Yeah, I was thinking the version would be something like "CE_1.00" or something (I know not the versions of CE, but remember "NT4" and the like).
all be different versions of OS "linux"? It seems unfair to lump three different Windows code bases together, but give a particular selection of Linux distributions OS status.
Well, the reason I did it that way was because RedHat 7.0 is quite a bit different than, say Mandrake 7.0. So you think OS of "Linux" and the version would be "redhat_7.0"? I could dig that. Good idea. I wasn't really happy with having a butt-load of Linux names as the OS names...
Can I be a nuisance and say that after arguing with myself a bit today, I've decided that AS/400 doesn't deserve its own architecture value? Linux running in a logical partition, and AIX binaries running in PASE will both be "powerpc", so it should just use that.
Hmm. Ok. ;-) Sean -- Put out fires during the daytime. Do your real work at night. Sleep is just an addiction. -- Dieter Muller Sean Reifschneider, Inimitably Superfluous <jafo@tummy.com> tummy.com - Linux Consulting since 1995. Qmail, KRUD, Firewalls, Python

Sean Reifschneider <jafo@tummy.com> writes: [...]
Well, the reason I did it that way was because RedHat 7.0 is quite a bit different than, say Mandrake 7.0. So you think OS of "Linux" and the version would be "redhat_7.0"? I could dig that. Good idea. I wasn't really happy with having a butt-load of Linux names as the OS names...
This looks good. New operating systems probably happen a little less often than new Linux distributions... This would also give Hurd a proper place in the operating system field, which should keep the GNU fanatics happy. <wink> The only thing I still wonder about, given that, is whether the *BSD operating systems are different enough to get separate entries. I don't even know enough about BSD to offer an _uninformed_ opinion on that. -- Carey Evans http://home.clear.net.nz/pages/c.evans/ "Quiet, you'll miss the humorous conclusion."

On Sun, 25 Mar 2001, Sean Reifschneider wrote: [...]
Well, the reason I did it that way was because RedHat 7.0 is quite a bit different than, say Mandrake 7.0. So you think OS of "Linux" and the version would be "redhat_7.0"? I could dig that. Good idea. I wasn't really happy with having a butt-load of Linux names as the OS names... [...]
In that case, can I just ask again why you're not using the os.name values? The two sets of categories seem to coincide. If there are missing values from those allowed os.name ATM, why not add them? John

Sean Reifschneider wrote:
Included below is the version of PEP243 after it's initial round of review. I welcome any feedback.
Looks good! I have a just few more comments.
The upload will be made to the host "modules.python.org" on port 80/tcp (POST http://modules.python.org:80/swalowpost.cgi). The form will consist of the following fields:
distmd5sum -- The MD5 hash of the uploaded distribution, encoded in ASCII representing the hexadecimal representation of the digest ("for byte in digest: s = s + ('%02x' % ord(byte))").
How necessary is this? How likely is file corruption over HTTP?
protocol_version -- A string indicating the protocol version that the client supports. This document describes protocol version "1".
Couldn't this be handled by defining new endpoints for new protocols. When we rev this protocol just change the upload URL.
Return Data
The status of the upload will be reported using HTTP non-standard ("X-*)" headers. The "X-Swalow-Status" header may have the following values: TRYAGAIN -- The server is unable to accept the upload at this time, but the client should try again at a later time. Potential causes of this are resource shortages on the server, administrative down-time, etc...
How should the distutils handle this response? My guess is that it would be impractical for the distutils to automatically wait a while and retry. Probably it's better to not try and automate this. Let's keep things simple to start with. -Amos -- Amos Latteier mailto:amos@digicool.com Digital Creations http://www.digicool.com

On Sat, Mar 24, 2001 at 06:34:20PM -0800, Amos Latteier wrote:
How necessary is this? How likely is file corruption over HTTP?
It's not that necessary, but it's easy enough to implement, so why not? It can detect corruption and incomplete uploads, the latter being my bigger concern.
Couldn't this be handled by defining new endpoints for new protocols. When we rev this protocol just change the upload URL.
Yeah, but do we want to *HAVE* to change the endpoint? Mostly I'm thinking more of small changes as opposed to a complete re-writing. Sure, some of this may be able to be auto-detected (is the new field there?), but in general I think it's a good thing for the server and client to be sure they're talking the same language.
How should the distutils handle this response? My guess is that it would be impractical for the distutils to automatically wait a while and
The idea is to differentiate between temporary failures (disc space not available, server shut down for maintenance) versus something that will *NEVER* work (say, because of a client code bug). I don't expect that distutils will auto-retry it, because those sorts of failures are likely to require hours to work out. As a user it would be nice to know that no matter how many times I retry this over the next several days, that it will never work... Sean -- Linux: When you need to run like a greased weasel. -- Sean Reifschneider, 1998 Sean Reifschneider, Inimitably Superfluous <jafo@tummy.com> tummy.com - Linux Consulting since 1995. Qmail, KRUD, Firewalls, Python

On Sat, 24 Mar 2001 16:17:35 -0700 Sean Reifschneider <jafo@tummy.com> wrote:
Included below is the version of PEP243 after it's initial round of review. I welcome any feedback.
I've been looking at this PEP more closely, now that I'm thinking about actually implementing it with my prototype catalog server. One question I have is how does the catalog verify who is uploading the package. It seems that the only facility is via a pgp signature. However, this signature seems to verify the author, not the uploaded. Plus it's optional.
signature (optional) -- A OpenPGP-compatible signature [RFC2440] of the uploaded distribution as signed by the author. This may be used by the cataloging system to automate acceptance of uploads.
This means that the author must have some flavor of pgp and must have signed the package before you upload it. Otherwise the catalog has no way to associate a package with an individual expect the author in the PGK-INFO file. This is problematic from a security point of view. For example, I can put Guido down as the author of my malicious package. In my protype server folks who upload packages are verified by email address. (Actually this is not implemented yet, but will be soon. To get privledges to upload you will have to provide an email address, and a password will be sent to it.) So this way you can know the email address of the person who uploaded the package. Of course, you can also use pgp signatures to verify the author of the package, if there is a signature available. I like this system because it is light weight, and doesn't require much overhead for the author or uploader. It provides the downloader with some measure of information about what they're downloading. And it allows you to provide additional security information (pgp signatures) if you wish. If folks have other ideas about how to handle security I'd love to hear about them. I'm no security expert. In sum, I'd like to see the PEP address the issue of identifying the uploader (who may or may not be the author) of the package. -Amos

On Sun, Mar 25, 2001 at 02:45:02PM -0500, Amos Latteier wrote:
One question I have is how does the catalog verify who is uploading the package. It seems that the only facility is via a pgp signature. However, this signature seems to verify the author, not the uploaded. Plus it's optional.
Well, the authorization is more of a policy decision, IMHO... For example, one could send e-mail to the maintainer listed in the meta-information requesting that they approve the upload. Or, one could require manual verification if the signature doesn't match an "approved" signer for automatic processing... If you believe that having an e-mail address is enough to discourage tampering and allow automatic posting of the uploaded binaries to the repository, I think you're due for a big suprise... ;-/
I like this system because it is light weight, and doesn't require much overhead for the author or uploader. It provides the downloader with some measure of information
Interesting... I dislike it because it provides the downloader with a false sense of security... Just because you have a hotmail account that somone has once logged in to, I don't think that's enough that somone should believe it's not malicious code... Currently, I'm planning on using a manual process for verification, to figure out what really works. It constantly suprises me that people will use packages uploaded to the redhat contrib site (for example), but they do and there are suprisingly few problems with it. Maybe I'll have the uploaded packages come in as "unverified" and once there's been some sort of verification that the author or maintainer knows of it, or something along those lines, it will moved to "verified"? I agree that some sort of verification would be nice. I'm open to suggestions though. Sean -- Having been an entrprenuer, I value being a wage-slave in new ways. I also more fully understand why I hate it. -- Evelyn Mitchell, 1999 Sean Reifschneider, Inimitably Superfluous <jafo@tummy.com> tummy.com - Linux Consulting since 1995. Qmail, KRUD, Firewalls, Python
participants (4)
-
Amos Latteier
-
Carey Evans
-
John J. Lee
-
Sean Reifschneider