Edits to Metadata 1.2 to add extras (optional dependencies)

I've drafted some edits to Metadata 1.2 with valuable feedback from distutils-sig (special thanks to Erik Bray), which seems to have no more comments on the issue after about 6 weeks. Let me know if you have an opinion, or if you will have one during some bounded time in the future. Metadata 1.2 (PEP 345), a non-final PEP that has been adopted by approximately 10 of the latest sdists from pypy, cannot represent the setuptools "extras" (optional dependencies) feature. This is a problem because about 1600+ or 10% of the packages hosted on pypy define "extras" as measured in May of this year. The edit implements the extras feature by adding a new condition "extra == 'name'" to the Metadata 1.2 environment markers. Requirements with this marker are only installed when the named optional feature is requested. Valid extras for a package must be declared with Provides-Extra: name. It also adds Setup-Requires-Dist as a way to specify requirements needed during an install as opposed to during runtime. Abbreviated highlights: Setup-Requires-Dist (multiple use) Like Requires-Dist, but names dependencies needed while the distributions's distutils / packaging `setup.py` / `setup.cfg` is run. Provides-Extra (multiple use) A string containing the name of an optional feature. Examples: Requires-Dist: reportlab; extra == 'pdf' Requires-Dist: nose; extra == 'test' Requires-Dist: sphinx; extra == 'doc' (full changeset on https://bitbucket.org/dholth/python-peps/changeset/537e83bd4068) Thanks, Daniel Holth

Am 27.08.12 16:56, schrieb Daniel Holth:
You can't add new fields to the format after the fact, unless the format had provided for such additions (which it does not - there is no mention of custom fields anywhere, and no elaboration on how "unknown" fields should be processed). So if you want to add new fields, you need to create a new version of the metadata. Prepare for a ten-year period of acceptance - so it would be good to be sure that no further additions are desired within the next ten years before seeking approval for the PEP. Regards, Martin

On Mon, Aug 27, 2012 at 4:29 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
I don't know of a tool that doesn't reliably ignore extra fields, but I will put you down as being in favor of an X- fields paragraph: Extensions (X- Fields) :::::::::::::::::::::: Metadata files can contain fields that are not part of the specification, called *extensions*. These fields start with with `X-`.

Daniel Holth wrote:
See RFC 6648 for why such X-fields may not be a good idea: http://tools.ietf.org/html/rfc6648

Petri Lehtinen writes:
But note that the RFC also says that the preferred solution to the problem that X-fields are intended to solve is an easily accessible name registry and a simple registration procedure. If Martin's "be prepared for a ten-year period to acceptance" is serious, what should be done about such a registry?

I'm happy for PyPI to host such a registry. A specificaion for the registry should be part of the PEP for the 1.3 format, but I would propose this structure (without having researched in detail what other registries feature, but with a rough idea what IANA registries typically include): - name of metadata field - name of registrant (individual or PyPI package) - contact email address (published) - expiration date; by default, extensions expire 1 month after their registration, unless renewed; maximum expiration time is 5 years - English description of the field - regular expression to validate the field Deleting undesired extensions would not be possible, instead, one would have to create another extension if the syntax or semantics changes Regards, Martin

On Tuesday, August 28, 2012 at 10:43 AM, "Martin v. Löwis" wrote:
PyPI packages itself could serve as a registry, but I like the idea of a separate registry better in many ways because it lets you divorce the namespace from the package. The question being would this be a x-registered-name type system or a registered-namespace-* type system? It occurs to me one problem with arbitrary namespaces is there is a unintended collision problem. e.g. you have the foo-bar namespace and the foo namespace, what happens if you have a test key inside of foo-bar and a bar-test inside of the foo namepspace. They'll both end up being foo-bar-test. This makes me think that we need a seperate registry and that if we go the namespace route it should be limited to alphanumerics only so that you don't have the foo/foo-bar collision problem.
What happens when it expires? Is that name freed up for future use? I think that freeing up the name is likely to be a bad idea since we can't go backwards in time (as you alluded to later about not deleting them), so what does expiration do?

On Wed, Aug 29, 2012 at 12:53 AM, Donald Stufft <donald.stufft@gmail.com> wrote:
Please, don't. The software and infrastructure to run PyPI exists. Some level of namespacing makes sense to separate out extension management to different groups of people, but creating a whole management application just for this would be serious overkill. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tuesday, August 28, 2012 at 11:05 AM, Nick Coghlan wrote:
How do you deal with a PyPI package foo which wants a bar-test value (foo-bar-test), and a PyPI package foo-bar with a value test (foo-bar-test). PyPI packages allow too much in the way of names to be able to fully namespace it without collisions.

Am 28.08.12 16:53, schrieb Donald Stufft:
Maybe I didn't express myself clearly - this is exactly what I proposed. The registry would be implemented in the same software as PyPI, and run on the same machine, and (perhaps) have pypi.python.org as it's domain name, but otherwise would be decoupled from Python packages.
What happens when it expires? Is that name freed up for future use?
Yes, exactly.
Why would it require going backwards in time? Existing usages of the extension just become invalid, e.g. with the consequence that you can't upload the package to PyPI anymore unless you remove the extension, or re-register it. If the extension is in active use, somebody certainly will make sure it stays registered. Expiration is to free up names that are not in active use, but are otherwise reasonable names for metadata fields (say, Requires-Unicode-Version). Regards, Martin

On Tuesday, August 28, 2012 at 11:07 AM, "Martin v. Löwis" wrote:
What do you do with packages that have already been uploaded with requires-unicode-version once it expires? If the point of a registry is to remove ambiguity from what any particular key means, won't expiring and allowing reregistration of an in use name (even if it's no longer being uploaded, but is still available inside of a package) reintroduce that same ambiguity? How will we know that requires-unicode-version from a package uploaded a year ago and has since expired is different than requires-unicode-version from a package uploaded yesterday and has been reregistered?

On Tue, 28 Aug 2012 11:17:08 -0400, Donald Stufft <donald.stufft@gmail.com> wrote:
Ah, that's a better phrasing of the same concern I had but couldn't figure out how to articulate. I don't recall any RFC registries that have expiration dates for entries. Are there any? RFC registries usually have an organization vetting the entries, whereas it seems like we want this to be an open registry. Note that the MIME-type specification allows for "vendor types", which is a namespace mechanism and allows delegation of vetting authority. That sounds more like Nick's proposal. (I'm sure there is some way to solve the ambiguity issue.) We could still have a (vetted) registry for "official" names, if we wanted. That would follow the MIME model. Or we can still have a separate registry, but only "qualified" (namespaced) names are open for anyone to register, without any expiration dates. -- R. David Murray If you like the work I do for Python, you can enable me to spend more time doing it by supporting me here: http://gittip.com/bitdancer

Am 28.08.12 17:38, schrieb R. David Murray:
I don't recall any RFC registries that have expiration dates for entries. Are there any?
The RFC database itself has expiration dates on specifications, namely on I-D documents (internet drafts). The expire 6 months after their initial publication, unless renewed. For number assignments, the risk is that it will eventually run out of numbers, in which case the protocol gets redesigned to increase the number space. For name assignments, the risk is that many similar-sounding elements become used, and people accept that as a trade-off for the problems you see in my expiration proposal. The most popular name registry that does have expiration (despite being hierarchical) is the DNS: you have to renew your names yearly in most TLDs. People apparently accept the risk of confusion when a domain expires and gets reused by someone else (and yes, the DNS *is* an "RFC registry" :-)
RFC registries usually have an organization vetting the entries, whereas it seems like we want this to be an open registry.
It very much depends. If you browse over the IANA registries, you find that many parameter space require "IETF consensus", so they can be extended only by RFC (similar to the status quo in metadata). There are IANA registries that are open (e.g. SNMP, or MIME); things are assigned in a first-come first-served manner (e.g. try to find out what 1.3.6.1.4.1.18832.11.3 is :-)
I don't consider it an absolute necessity that there is an expiration. I do consider it a flaw in (some) IANA name registrations that there is no expiration to them; I can report that people regularly want to claim some PyPI package name on the basis that the original owner didn't ever release any software under that name. Regards, Martin

On Tue, Aug 28, 2012 at 06:36:40PM +0200, "\"Martin v. L?wis\"" <martin@v.loewis.de> wrote:
Does that expiration mean something? The draft for Web Proxy Autodiscovery Protocol[1] expired in 1999 but still is widely implemented and used. 1. https://en.wikipedia.org/wiki/Web_Proxy_Autodiscovery_Protocol Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Am 28.08.12 19:15, schrieb Oleg Broytman:
It's explained in RFC 2026. An internet draft is not an internet standard, it may get changed at any time. An I-D which is expired and still used has the same relevance as a proprietary standard; it has nothing to do with the internet standards process. Whether this has any practical consequence depends on the market, of course. Customers that insist on standards compliance will look for RFC compliance, but typically not for I-D compliance. If the field of standardization is of relevance for such users, they will eventually ask for an RFC to be issued, which then may or may not be compatible with a long-standing proprietary standard. Regards, Martin

On Tue, Aug 28, 2012 at 08:19:08PM +0200, "\"Martin v. L?wis\"" <martin@v.loewis.de> wrote:
I see. Thank you! Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

After this discussion it seemed wiser to submit my proposed 1.2 edits as Metadata 1.3, adding Provides-Extra, Setup-Requires-Dist, and Extension (with no defined registration procedure). This version is sure to be exciting as it also specifies that the values are UTF-8 with tolerant decoding and re-defines environment markers in terms of the ast module (is there a better way to specify a subset of Python?). The proposed Metadata 1.3 is at https://bitbucket.org/dholth/python-peps/changeset/8fa1de7478e95b5ef3a18c327... Thanks, Daniel Holth

Am 31.08.12 05:16, schrieb Daniel Holth:
Thanks for doing this. A few comments: 1. -1 on "tolerant decoding". I think the format should clearly specify what fields are text (I think most of them are), and mandate that they be in UTF-8. If there is a need for binary data, they should be specified to be in base64 encoding (but I don't think any of the fields really are binary data). 2. The extensions section should discuss order. E.g. is it ok to write Chili-Type: Poblano Extension: Chili Platform: Basmati Extension: Garlic Chili-Heat: Mild Garlic-Size: 1tsp 3. There should be a specification of how collisions between extension fields and standard fields are resolved. E.g. if I have Extension: Home Home-page: http://www.python.org is Home-page the extension field or the PEP 345 field? There are several ways to resolve this; I suggest giving precedence to the standard field (unless you specify that extensions must follow all standard fields, in which case you can drop the extension prefix from the extension keys). 4. There needs to be a discusion of the meta-syntax. PEP 314 still mentioned that this is RFC 822; PEP 345 dropped that and didn't say anything about the syntax of fields (i.e. not even that they are key-value, that the colon is a separator, that the keys are case-insensitive, etc). Regards, Martin

On Friday, August 31, 2012 at 6:48 AM, "Martin v. Löwis" wrote:
Unless i'm mistaken (which I may be!) I believe that a / can be used as the separator between the namespace and the "real" key. Home-page: http://www.python.org Extension: Home Home/other-thing: Foo Doing this is the "Extension" field required?

On Aug 31, 2012, at 6:54 AM, Donald Stufft <donald.stufft@gmail.com> wrote:
Not bad.
Doing this is the "Extension" field required?
Yes it is required. A simple lookup for data ['extension'] tells you what to expect.

On Fri, 31 Aug 2012 07:01:17 -0400, Daniel Holth <dholth@gmail.com> wrote:
It also allows for typo detection, which automatically interpreting prefix strings as extensions names would not. -- R. David Murray If you like the work I do for Python, you can enable me to spend more time doing it by supporting me here: http://gittip.com/bitdancer

On Fri, Aug 31, 2012 at 10:41 PM, R. David Murray <rdmurray@bitdance.com> wrote:
It also allows for typo detection, which automatically interpreting prefix strings as extensions names would not.
+1 on retaining the explicit extension field, mainly for the cross-validation benefits (including easily checking which extension syntax is used by a module). However, also +1 on using "/" as the extension separator to avoid ambiguity in field names, as well as restoring the explicit requirement that metadata entries use valid RFC 822 metasyntax. If the precise rules can be articulated as a 3.3 email module policy, so much the better. I've now pushed Daniel's latest draft as PEP 426. I added the following section on "Metadata Files", which restores some background info on the overall file format that went AWOL in v1.2: ----------------------------------------------------------------------- Metadata Files ============== The syntax defined in this PEP is for use with Python distribution metadata files. This file format is a single set of RFC-822 headers parseable by the ``rfc822`` or ``email`` modules. The field names listed in the `Fields`_ section are used as the header names. There are two standard locations for these metadata files: * the ``PKG-INFO`` file included in the base directory of Python source distribution archives (as created by the distutils ``sdist`` command) * the ``dist-info/METADATA`` files in a Python installation database, as described in PEP 376. Other tools involved in Python distribution may choose to record this metadata in additional tool-specific locations (e.g. as part of a binary distribution archive format). ----------------------------------------------------------------------- As far as I know, the sdist archive format isn't actually defined anywhere beyond "archives like those created by the distutils sdist command". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Am 31.08.12 15:57, schrieb Nick Coghlan:
Unfortunately, this conflicts with the desire to use UTF-8 in attribute values - RFC 822 (and also 2822) don't support this, but require the use oF MIME instead (Q or B encoding). RFC 2822 also has a continuation line semantics which traditionally conflicts with the metadata; in particular, line breaks cannot be represented (but are interpreted as continuation lines instead). OTOH, several of the metadata fields do require line breaks, in particular those formatted as ReST. Regards, Martin

Some edits to include / and remove rfc822 again. What is the right email.policy.Policy()? https://bitbucket.org/dholth/python-peps/changeset/8ec6dd453ccbde6d34c63d2d2...

On Fri, 31 Aug 2012 12:18:05 -0400, Daniel Holth <dholth@gmail.com> wrote:
Some edits to include / and remove rfc822 again. What is the right email.policy.Policy()?
When I discussed using email to parse metadata with Tarek a long time ago, I thought he was going to move to using a delimiter-substitution algorithm to encode and recover the line breaks. Perhaps that discussion wasn't in this same context, but I thought it was. If you did that, then 'SMTP' would be the correct policy for RFC2822/5322. But that isn't really going to work for this use case, even with the above hack. As Martin pointed out, RFC2822 does not allow utf-8 in the values. RFC 5335, which is Experimental, does. A medium term goal of the email package is to support that RFC, so this might be a motivation to move that higher in my feature priority list. (Support mostly involves switches to allow unicode/utf8 to be *written*; the parsing side works already, though it is not thoroughly tested.) However, all that aside, to answer your question you are really going to want to define a custom policy that derives from email.policy.Policy. Especially if you want to not follow the email RFCs and do want to assign meaning to the line separators. You can do that with a custom policy and thus still be able the use the email parsing infrastructure to read and write the files. I'll be glad to help out with creating the custom policy once we've reached that stage of the process. -- R. David Murray If you like the work I do for Python, you can enable me to spend more time doing it by supporting me here: http://gittip.com/bitdancer

On Fri, Aug 31, 2012 at 12:53 PM, R. David Murray <rdmurray@bitdance.com> wrote:
Thanks. For the time being I am happily using the surrogateescape/bytesgenerator hack and it preserves UTF-8 and linebreaks. I don't have a strong opinion about the line continuation policy; I do not have code that relies on parsing the long description from PKG-INFO files.

"Martin v. Löwis" writes:
This can be achieved simply by extending the set of characters permitted, as MIME did for message bodies. I'd be cautious about RFC 5335, not just because it's experimental, but because there may be other requirements we don't want to mess with. (If RDM says otherwise, listen to him. I just know the RFC exists.)
Of course line breaks can be represented, without any further change to RFC 2822. Just use Unicode LINE SEPARATOR. You could even do it within ASCII by adhering strictly to RFC 2822 syntax which interprets continuation lines by removing exactly the CRLF pair. Just use ASCII TAB as the field separator. There's a final dodge that occurs to me: the semantics you're talking about are *lexical* semantics in the RFC 2822 context (line unfolding and RFC 2047 decoding). We could possibly in the context of the email module treat Metadata as an intermediate post-lexical-decoding pre-syntactic-analysis representation. I don't know if that makes sense in the context of using email module facilities to parse Metadata. Steve

On Sat, 01 Sep 2012 13:55:11 +0900, "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
That is essentially what that RFC does. I haven't gone through it with a fine-tooth yet, but that's why I say the parsing side mostly works already: we allow unicode characters anywhere non-special-characters are allowed during parsing. The only issue is that we encode non-ASCII using the normal rules during serialization, so we need a new policy control to disable that. I'm thinking it will be any easy addition...the hard part for RFC5335 is doing that fine-tooth read and adding appropriate tests. Alternatively, as Donald pointed out, you can use the Binary mode, where the utf-8 bytes just go along for the ride. In the context of the metadata, I think that should produce the desired results, since there should be no need to re-wrap metadata lines. It will also preserve the line endings *if* you don't use the new policies. But that is why I would prefer to use explicit RFC5335 support...I'd like the email backward compatibility policy to go away some day :) (On the gripping hand, it will always be possible to recreate it as a custom policy.)
Yes, that is what I was talking to Tarek about. And since ReST source shouldn't contain tabs, a tab would probably work as the separator, if for some reason you didn't want to use LINE SEPARATOR.
The policy has hooks that support this. A policy gets handed the source line complete with the line breaks, determines what gets stored in the model, and also gets to control what gets handed back to the application when a header is retrieved from the model. The policy can also control the header folding during serialization. So preserving line separators using a custom policy is not only possible, but should be fairly easy. --David

Am 31.08.12 12:54, schrieb Donald Stufft:
What do you mean by "can be"? In RFC 822, a slash can be in a field-name, yes, but the PEPs recently became silent on the meta-syntax.
Well, in my example it would then be Home-page: http://www.python.org Home/page: Foo I don't think the Extension field is necessary if there is a promise that standard fields won't ever include slashes. Regards, Martin

On Aug 31, 2012, at 6:48 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Ok. If you want you can check the version to decide how strict you want to be.
Ordering doesn't matter and collisions with existing tags are not allowed.
I think the new profile support for email Parser will handle this perfectly.
Regards, Martin

Am 28.08.12 17:17, schrieb Donald Stufft:
What do you do with packages that have already been uploaded with requires-unicode-version once it expires?
Who is "I" in this case? The PyPI installation? Mark the keys in the database as expired, and stop displaying them. If the key is restored, and the values are still syntactically correct, restore the values. Or is "I" software which downloads packages? Continue doing what it always does for invalid meta-data: I recommend to issue a warning; aborting the setup could also work.
No: if nobody renews the old registration, it's because the extension is not in use. So the case you are constructing won't happen in practice.
If the packages that were uploaded a year ago are still in active use, somebody will renew the registration. So the case won't happen. If nobody cares about the specific field, it may break, which is then well-deserved. Regards, Martin

On Tue, 28 Aug 2012 18:08:51 +0200, =?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?= <martin@v.loewis.de> wrote:
The problem Donald is asking about is: the old registration expires, and a *new* registration is entered with a different meaning, but packages still exist on PyPI that have the key with the old meaning. That seems likely to happen in practice. Or if it doesn't, then allowing for the recycling of names probably isn't important. -- R. David Murray If you like the work I do for Python, you can enable me to spend more time doing it by supporting me here: http://gittip.com/bitdancer

Am 28.08.12 18:27, schrieb R. David Murray:
Let me retry answering the question: Expiration *is* important in the case the key was just registered and never used, because it may be a good name for something, but can't be used because it is reserved for a use case that has no users. If the key is *widely* used, the scenario you assume is *not* likely in practice - either the original registrant will renew the registration before it expires, or somebody else will reregister it after it expires. There is also the case of a key that is used in a few packages (one or two packages seems a likely case - namely packages produced by the original registrant for the purpose of testing). Assuming the registrant then loses interest, and nobody else starts using the keys (i.e. they are not widely used), then these packages will break (in a mode that can be painted in different colors). This may happen, but I don't consider it a problem. If the original author finds the package broken, he will have to release a new version without the these keys, or re-register them under a new name (since his original name is now taken by somebody else - who hopefully can attract more users with his definition of the key). There is also the potential risk of key-jacking, which can be resolved administratively (by revoking the abusive registration). Regards, Martin

On Tue, 28 Aug 2012 18:47:16 +0200, =?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?= <martin@v.loewis.de> wrote:
OK, I understand your logic now. Yes that does make sense to me. There are tradeoffs to be made, and this seems like a reasonable tradeoff given the goals articulated so far. -- R. David Murray If you like the work I do for Python, you can enable me to spend more time doing it by supporting me here: http://gittip.com/bitdancer

On Tue, Aug 28, 2012 at 6:29 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
I agree with this point - the main reason the metadata PEP is still lingering at Accepted rather than Final is the tangled relationship between distutils and other projects that led to the complete distutils feature freeze. Until distutils2 makes it into the standard library as the packaging module, the standard library is going to be stuck at v1.1 of the metadata format.
However, this point I really don't agree with. The packaging ecosystem is currently evolving outside the standard library, but the standardisation process for the data interchange formats still falls under the authority of python-dev and the PEP process. If there are things missing from v1.2 of the metadata spec, then define v1.3 to address those known problems. Don't overengineer it in an attempt to anticipate every possible need that might come in the next decade. Tools outside the standard library are then free to adopt the new standard, even while the stdlib itself continues to lag behind. When the packaging module is finally added (hopefully 3.4, even if that means we have to temporarily cull the entire compiler subpackage), it will handle the most recent accepted version of the metadata format (as well as any previous versions). If more holes reveal themselves in the next 18 months, then it's OK if v1.4 is created when it becomes clear that it's necessary. At the very least, something v1.3 should make explicit is that custom metadata should NOT be put into the .dist-info/METADATA (PEP 376 location, PKG-INFO, in distutils terms) file. Instead, that data should be placed in a *separate* file in the .dist-info directory. Something that *may* be appropriate is a new field in METADATA that explicitly calls out such custom metadata files by naming the PyPI distribution that is the authority for the relevant format (e.g. "Custom-Metadata: wheel" to indicate that 'wheel' defined metadata is present) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Aug 28, 2012 at 9:04 PM, Daniel Holth <dholth@gmail.com> wrote:
Setuptools just uses path.exists() when it needs a particular file and will not bother parsing pkg-info at all if it can help it. The metadata edits for 1.2 fold some of those files into metadata.
You can't use path.exists() on metadata published by a webservice (or still inside a zipfile), but you can download or read the main metadata file. Still, I don't really care whether or not such a field indicating the presence of custom metadata is added, I'm mainly registering a strong -1 on allowing extension fields (in the form of X- headers or CSS style prefixed headers) in the metadata file itself. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I personally think that at a minimum we should have X-Fields that get moved into the normal METADATA file, and personally I would prefer to just drop the X- prefix completely. I think any spec which doesn't include first class support for extending it with new metadata is going to essentially kick the can down the road and solve the problems of today without leaving room to solve the problems of tomorrow. I know that distutils2 have requires-dist, but for the sake of argument pretend they don't. If there is first class support for extending the metadata with new fields, a project could come along, and add a requires-dist (or x-requires-dist) concept to metadata. Tools that understand it would see that data and be able to act on it, tools that don't understand it would simply write it to the METADATA file incase in the future a tool that does understand it needs to act on it. Essentially first class support for extending the metadata outside of a PEP process means that outside of the stdlib people can experiment and try new things, existing tools will continue to work and just ignore that extra data (but leave it intact), new tools will be able to utilize it to do something useful. Ideally as a new concept is tested externally and begins to gain acceptance a new metadata version could be created that standardizes that field as part of the spec instead of an extension. On Tuesday, August 28, 2012 at 7:45 AM, Nick Coghlan wrote:

On Tue, Aug 28, 2012 at 8:07 AM, Donald Stufft <donald.stufft@gmail.com> wrote:
That is my preference as well. The standard library basically ignores every metadata field or metadata file inside or outside of metadata currently, so where is the harm changing the official document to read "you may add new metadata fields to metadata" with an updated standard library that only ignores some of the metadata in metadata instead of all of it. The community is small enough to handle it.

On Tue, Aug 28, 2012 at 10:28 PM, Daniel Holth <dholth@gmail.com> wrote:
I will campaign ardently against any such proposal. Any extension field must be clearly traceable to an authority that gets to define what it means to avoid a repeat of the setuptools debacle. Namespaces are a honkin' great idea, let's do more of those :P Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Am 28.08.12 14:28, schrieb Daniel Holth:
The problem with that (and the reason to introduce the X- prefix in RFC 822) is that allowing arbitrary additions will make evolution difficult: if you want to standardize a certain field at some point, you either need to pick a name that is unused in all implementations (which you never can be really certain about), or you break some existing tool by making the addition (unless the addition happens to have the exact same syntax and semantics as the prior use). Regards, Martin

On Tue, Aug 28, 2012 at 10:07 PM, Donald Stufft <donald.stufft@gmail.com> wrote:
Hell no. We've been down this road with setuptools and it *sucks*. Everybody gets confused, because you can't tell just by looking at a metadata file what's part of the standard and what's been added just because a tool developer thought it was a good idea without being able to obtain broad consensus (perhaps because others couldn't see the point until the extension had been field tested for an extended period). Almost *nobody* reads metadata specs other than the people that helped write them. Everyone else copies a file that works, and tweaks it to suit, or they use a tool that generates the metadata for them based on some other interface. The least-awful widespread extension approach I'm aware of is CSS vendor prefixes. X- headers suck because they only give you two namespaces - the "standard" namespace and the "extension" namespace. That means everyone is quickly forced back into seeking agreement and consensus to avoid naming conflicts for extension fields. However, I'm open to the idea of a properly namespaced extension mechanism, which is exactly why I suggested separate files flagged in the main metadata with the PyPI project that defines the format of those extensions. I'm also open to the idea of extensions appearing in [PyPI distribution] prefixed sections after the standard metadata so, for example, there could be a [wheel] section in METADATA rather than a separate WHEEL file. We already have a namespace registry in the form of PyPI, so there's no reason to invent a new one, and allowing *any* PyPI distribution to add custom metadata fields without name conflicts would allow easy experimentation while still making it clear which fields are defined in PEPs and which are defined by particular projects.
Agreed, and this is the kind of thing a v1.3 metadata PEP could define. It just needs to be properly namespaced, and the obvious namespacing mechanism is PyPI project names. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Aug 28, 2012 at 8:28 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Wheel deals with this somewhat by including a Packager: bdist_wheel line in WHEEL so that you can deal with packager-specific bugs. Bento uses indentation so you can have sections: Key: value Indented Key: value

How about Extensions are fields that start with a pypi-registered name followed by a hyphen. A file that contains extension fields declares them with Extension: name : Extension: pypiname pypiname-Field: value

On Tue, Aug 28, 2012 at 10:57 PM, Daniel Holth <dholth@gmail.com> wrote:
The repetition seems rather annoying. Compare the two section based variants I just posted to: Extension: wheel wheel-Version: 0.9 wheel-Packager: bdist_wheel-0.1 wheel-Root-Is-Purelib: true It does have the advantage that tools for manipulating the format can remain dumber, but that doesn't seem like *that* much of an advantage, especially since any such benefit could be eliminated completely by just switching to a completely standard ConfigParser format by putting the PEP defined settings into a [python] section. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Aug 28, 2012 at 9:09 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Wheel is a little different because once it's installed it is no longer a wheel, but it makes a decent example. That's not even repetition, it's just longer tag names. Repetition is having one Classifier: line for every trove classifier. It would be quite inconvenient to change the parser for PKG-INFO. It's a win to keep the file flat.

On Tue, Aug 28, 2012 at 11:20 PM, Daniel Holth <dholth@gmail.com> wrote:
Cool, it's the namespace I care about. Every piece of extended metadata must have an authority who gets to define what it means. If that means people register a "virtual" PyPI project just to reserve an extension namespace, I'm fine with that. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tuesday, August 28, 2012 at 9:09 AM, Nick Coghlan wrote:
To be more specific, there is setup.cfg (which I dislike for other reasons), and then there is METADATA. setup.cfg is an ini file but METADATA is a simple key: value file with a flat namespace so any namespacing you want to do in METADATA needs to be done at the key level. You could translate: [setuptools] requires-dist=foo in a setup.cfg into setuptools-requires-dist: foo in METADATA, but I'm not sure if that would be beneficial or not.

On Tue, Aug 28, 2012 at 11:23 PM, Donald Stufft <donald.stufft@gmail.com> wrote:
We're talking about the format for v1.3 of the metadata. That format is not defined yet, so it's not obligatory for it to remain a flat key value store. However, there are advantages to keeping it as such, so I'm fine with Daniel's suggested approach. The only thing I really care about is the namespacing, for the same reasons the IETF wrote RFC 6648, as Petri linked earlier [1]. Establishing proper name registration rules can categorically eliminate a bunch of problems further down the line (such as the past confusion between which metadata entries were defined by PEPs and which were setuptools-specific extensions that other tools might not understand). With PyPI based namespacing we get clear orthogonal naming with clear lines of authority: 1. PEPs continue to define the core metadata used by PyPI, the standard library (once we get updated packaging support in place) and most other tools 2. Any members of the community with a specific interest can register a PyPI project to define additional metadata without risking naming conflicts. This need may arise in the context of a specific project, and thus use that project's name, or else it may be a project registered for the express purpose of being a metadata namespace, and not actually correspond to any installable module. The main point is to take advantage of an existing automated Python-specific name and resource registry to avoid naming conflicts without Java-style reverse DNS based clutter, and without python-dev having to explicitly approve each and every metadata extension. Cheers, Nick. [1] https://tools.ietf.org/html/rfc6648#section-4 Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tuesday, August 28, 2012 at 9:41 AM, Nick Coghlan wrote:
I'm happy with any form of a namespace to be quite honest. I have a bit of a preference for no or flat namespace but i'm perfectly fine with a PyPI based namespace. The important part is a defined way to extend the data that even when tools don't understand the extended data they can losslessly move it around from setup.cfg/setup.py/whatever to METADATA and any other format, even if they themselves don't utilize it, leaving it intact for tools that _do_ utilize it.

On Tue, Aug 28, 2012 at 10:33 PM, Daniel Holth <dholth@gmail.com> wrote:
Right, but the problem with that is it's defining a couple of *new* namespaces to manage: - the filenames within dist_info (although uppercasing a PyPI project name is pretty safe) - the "Packager" field (bdist_wheel is a distutils command rather than a PyPI project) By using PyPI distribution names to indicate custom sections in the main metadata file, we would get to exploit an existing registry that enforces uniqueness without imposing significant overhead.
Yes, the main metadata file could definitely go that way. The three main ways I can see an extensible metadata format working are: 1. The way wheel currently works (separate WHEEL file, naming conflicts resolved largely by first-in-first-served with no official registry, no obvious indication which project defines the format) 2. PyPI as extension registry, with an ini-file inspired section syntax inside dist-info/METADATA <standard metadata must appear first> [wheel] Version: 0.9 Packager: bdist_wheel-0.1 Root-Is-Purelib: true 3. PyPI as extension registry, with an indented section syntax inside dist-info/METADATA <custom metadata sections may appear anywhere in the file> Extended-Metadata: wheel Version: 0.9 Packager: bdist_wheel-0.1 Root-Is-Purelib: true My preference is currently for the ini-style variant, but I could definitely live with the indented approach.Either way, any project registered on PyPI would be free to add their own extensions without fear of naming conflicts or any doubts about the relevant authority for the meaning of the fields. Standard tools could just treat those sections as opaque blocks of text to be preserved verbatim, or else they could be constrained so that the standard tools could pick out the individual key:value pairs. Namespacing an extension mechanism based on PyPI distributions names should be pretty straightforward and it will mean that a lot of problems that can otherwise arise with extensible metadata systems should simply never come up. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tuesday, August 28, 2012 at 8:28 AM, Nick Coghlan wrote:
The biggest reason I have against namespacing them is it makes moving from experimental to standard easier, but I'm ok with some form of a namespace. The biggest reason I see against using PyPI names as the namespace is it needlessly ties a piece of data to the original creator. Similar to how right now you could write a less hacky setuptools, but in order to do so you need to continue to use the setuptools package name (see distribute). Using PyPI names means that in the requires-dist example it would be something like setuptools-requires-dist, and even if I make my own tool that supports the same concept as setuptools's requires-dist I would need to use setuptools-requires-dist. The concept of metadata I think should be divorced from specific implementations. Obviously there are going to be some implementation specific issues but I think it's much cleaner to have a x-requires-dist that any implementation can use than to have whoever-invented-it-first-requires-dist or a twenty-different-forms-of-requires-dist.

Maybe I misphrased. By "accepted" I meant "widely implemented". From the day this gets published until it is really usable, I still believe 10 years is realistic. For example, setuptools doesn't implement Meta-data 1.2, and nearly nobody uses it, 8 years after it was written.
The problem is that flooding people with specifications is a guarantee that they will not get implemented. So we can have one metadata specification every ten years; if we have more, none of them will be implemented (except in the tool of the author of the PEP). Regards, Martin

On Tue, Aug 28, 2012 at 10:47 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Why not. You get the feature in the tool, and you don't get it elsewhere, but the other implementation can still parse what it understands. The tool author promotes his tool for this reason. The extension format is intentionally ugly so that people will standardize eventually if only for aesthetic reasons. Yes, you have to support popular extensions forever, it's a messy world we live in. Two tools that implement Metadata 1.2+ are called wheel and distribute
Extension: distribute Distribute-Provides-Extra: foo (Just require un-hyphenated names in Extension: or map them to underscore _ if you must) -1 on doing anything but mapping them to package names. I can't provide a regex to strictly validate Requires-Dist: foo ; condition because condition is an impoverished subset of the Python language filtered with the wonderful ast module. Are you really willing to unpack and validate PKG-INFO on every archive that is uploaded to pypi?

Am 28.08.12 17:30, schrieb Daniel Holth:
Are you really willing to unpack and validate PKG-INFO on every archive that is uploaded to pypi?
Users should run the "register" command, which will provide the metadata information. Also, the UI needs to be extended to allow to fill out and edit metadata information interactively, or upload it. And yes, PyPI already extracts (but currently doesn't further process) PKG-INFO from every archive that is uploaded. So PyPI absolutely needs to "know" about the meta-data. Regards, Martin

On Mon, Aug 27, 2012 at 10:56 AM, Daniel Holth <dholth@gmail.com> wrote:
Somehow I completely overlooked this thread until now. Thanks Daniel for getting the ball rolling on this. There have already been many bytes spilled on metadata extensions, and although I agree it would be enormously useful to build an extension mechanism into the metadata format, I don't have much riding on that, or much more to add that hasn't been said. There hasn't been much said about Setup-Requires-Dist, so I'm guessing it's uncontroversial. But since that's sort of my hobbyhorse I thought I would make a comment on it. The thing I love about the Setup-Requires-Dist feature is that, if properly supported by different installers, it can free those installers from a fair bit of responsibility. For example, in greatly simplifies the thorny issue of "compilers". The existing compiler support in distutils, while not without its problems, does work in most cases for building common C-extensions. distutils2 has already made some progress on cleaning up the interface for compilers, and making it easier to register new compiler classes that can be imported from an arbitrary package. This allows projects with special needs (such as Fortran compiler support) to ship their own compiler class with the project. Or if there's a good enough third-party package that provides Fortran compiler support, projects may use it in their build process. Support for Setup-Requires-Dist ensures that a third-party compiler package can be made available at build-time. What's great about this, is that even if the stdlib still includes a build system, it doesn't necessarily have to anticipate every possible need for building every kind of project (it should, at a minimum, be able to build pure-Python projects). If someone wants to add MSVC2012 support they can do that as a third-party package. One could even create "compilers" for other build systems like waf, or even provide an entry-point to meta-build systems like bento. Am I making sense? Erik

Add to metadata 1.3: Description-File: README(\..+)? Meaning the description should be read from a file in the same directory as PKG-INFO or METADATA (including in the .dist-info directories) and we strongly recommend it be named as README.* and be utf-8 encoded text. Description: is the only multi-line field in the metadata. It is almost never needed at runtime. It would be great for performance and simplify the parser to just put it in another file. Mutually exclusive with Description. May beg for a Summary: tag with a one-line description.

On Fri, Sep 14, 2012 at 12:30 PM, Daniel Holth <dholth@gmail.com> wrote:
Can we make Description-File multiple-use? The meaning of this would be that the Description is formed from concatenating each Description-File in order. That raises the question: Is ordering guaranteed for multiple-use fields? I ask, because distutils2 supports exactly such a feature, and I've found it useful. For example, if I have a README.rst and a CHANGELOG.rst I can specify: description-file = README.rst CHANGELOG.rst Then the full description, contains my readme and my changelog, which look nice together on PyPI, but I prefer to keep as separate files in the source. My only other concern is that if the value of this field can theoretically be arbitrary, it could conflict with other .dist-info files. Does the .dist-info format allow subdirectories? Placing description-files in a subdirectory of .dist-info could be a reasonable workaround. Erik

On Fri, Sep 14, 2012 at 1:43 PM, Erik Bray <erik.m.bray@gmail.com> wrote:
The .dist-info design asks for every metadata file (the one in all caps, not any of the other metadata in .dist-info) to be parsed for many packaging operations that do not require the description, such as resolving the dependency graph of a package. Description-File would give an installer the option to pull Description: out into Description-File:. I would expect the concatenation to happen before this point. I would like to forbid subdirectories in .dist-info but I think they are allowed. The order of multi-use fields is probably preserved. I don't think it is required to be by any spec.

On Fri, Sep 14, 2012 at 1:57 PM, Daniel Holth <dholth@gmail.com> wrote:
I understand now. In this case why even allow flexibility in the description file name? Just make it description.txt, and the Description-File field just some boolean indicator of whether or not a description file exists? Erik

On Fri, Sep 14, 2012 at 2:03 PM, Donald Stufft <donald.stufft@gmail.com> wrote:
OK. In practice, Description: is the only field that is likely to be hundreds of lines long which seems wasteful. The parser complexity is a non-issue. (I know the PEP says Description: should not be the instruction manual, but it is wrong because that is the way to get useful data up on a package's pypi page.)

On 14 September 2012 17:30, Daniel Holth <dholth@gmail.com> wrote:
we strongly recommend it be named as README.* and be utf-8 encoded text.
I'd very strongly recommend that the encoding should be mandated. I'm happy with UTF-8 (although I expect a lot of "accidental" latin-1 files, particularly from Windows users) but I think it would be a mistake to have the specification leaving the encoding of *any* string data unclear. Without the encoding being specified, either in the metadata itself ("Description-File-Encoding: utf8"? Please, no) or by the specification mandating it, how is a program expected to read the description? Other than this point, I have no opinion on the proposal. Paul.

Am 27.08.12 16:56, schrieb Daniel Holth:
You can't add new fields to the format after the fact, unless the format had provided for such additions (which it does not - there is no mention of custom fields anywhere, and no elaboration on how "unknown" fields should be processed). So if you want to add new fields, you need to create a new version of the metadata. Prepare for a ten-year period of acceptance - so it would be good to be sure that no further additions are desired within the next ten years before seeking approval for the PEP. Regards, Martin

On Mon, Aug 27, 2012 at 4:29 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
I don't know of a tool that doesn't reliably ignore extra fields, but I will put you down as being in favor of an X- fields paragraph: Extensions (X- Fields) :::::::::::::::::::::: Metadata files can contain fields that are not part of the specification, called *extensions*. These fields start with with `X-`.

Daniel Holth wrote:
See RFC 6648 for why such X-fields may not be a good idea: http://tools.ietf.org/html/rfc6648

Petri Lehtinen writes:
But note that the RFC also says that the preferred solution to the problem that X-fields are intended to solve is an easily accessible name registry and a simple registration procedure. If Martin's "be prepared for a ten-year period to acceptance" is serious, what should be done about such a registry?

I'm happy for PyPI to host such a registry. A specificaion for the registry should be part of the PEP for the 1.3 format, but I would propose this structure (without having researched in detail what other registries feature, but with a rough idea what IANA registries typically include): - name of metadata field - name of registrant (individual or PyPI package) - contact email address (published) - expiration date; by default, extensions expire 1 month after their registration, unless renewed; maximum expiration time is 5 years - English description of the field - regular expression to validate the field Deleting undesired extensions would not be possible, instead, one would have to create another extension if the syntax or semantics changes Regards, Martin

On Tuesday, August 28, 2012 at 10:43 AM, "Martin v. Löwis" wrote:
PyPI packages itself could serve as a registry, but I like the idea of a separate registry better in many ways because it lets you divorce the namespace from the package. The question being would this be a x-registered-name type system or a registered-namespace-* type system? It occurs to me one problem with arbitrary namespaces is there is a unintended collision problem. e.g. you have the foo-bar namespace and the foo namespace, what happens if you have a test key inside of foo-bar and a bar-test inside of the foo namepspace. They'll both end up being foo-bar-test. This makes me think that we need a seperate registry and that if we go the namespace route it should be limited to alphanumerics only so that you don't have the foo/foo-bar collision problem.
What happens when it expires? Is that name freed up for future use? I think that freeing up the name is likely to be a bad idea since we can't go backwards in time (as you alluded to later about not deleting them), so what does expiration do?

On Wed, Aug 29, 2012 at 12:53 AM, Donald Stufft <donald.stufft@gmail.com> wrote:
Please, don't. The software and infrastructure to run PyPI exists. Some level of namespacing makes sense to separate out extension management to different groups of people, but creating a whole management application just for this would be serious overkill. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tuesday, August 28, 2012 at 11:05 AM, Nick Coghlan wrote:
How do you deal with a PyPI package foo which wants a bar-test value (foo-bar-test), and a PyPI package foo-bar with a value test (foo-bar-test). PyPI packages allow too much in the way of names to be able to fully namespace it without collisions.

Am 28.08.12 16:53, schrieb Donald Stufft:
Maybe I didn't express myself clearly - this is exactly what I proposed. The registry would be implemented in the same software as PyPI, and run on the same machine, and (perhaps) have pypi.python.org as it's domain name, but otherwise would be decoupled from Python packages.
What happens when it expires? Is that name freed up for future use?
Yes, exactly.
Why would it require going backwards in time? Existing usages of the extension just become invalid, e.g. with the consequence that you can't upload the package to PyPI anymore unless you remove the extension, or re-register it. If the extension is in active use, somebody certainly will make sure it stays registered. Expiration is to free up names that are not in active use, but are otherwise reasonable names for metadata fields (say, Requires-Unicode-Version). Regards, Martin

On Tuesday, August 28, 2012 at 11:07 AM, "Martin v. Löwis" wrote:
What do you do with packages that have already been uploaded with requires-unicode-version once it expires? If the point of a registry is to remove ambiguity from what any particular key means, won't expiring and allowing reregistration of an in use name (even if it's no longer being uploaded, but is still available inside of a package) reintroduce that same ambiguity? How will we know that requires-unicode-version from a package uploaded a year ago and has since expired is different than requires-unicode-version from a package uploaded yesterday and has been reregistered?

On Tue, 28 Aug 2012 11:17:08 -0400, Donald Stufft <donald.stufft@gmail.com> wrote:
Ah, that's a better phrasing of the same concern I had but couldn't figure out how to articulate. I don't recall any RFC registries that have expiration dates for entries. Are there any? RFC registries usually have an organization vetting the entries, whereas it seems like we want this to be an open registry. Note that the MIME-type specification allows for "vendor types", which is a namespace mechanism and allows delegation of vetting authority. That sounds more like Nick's proposal. (I'm sure there is some way to solve the ambiguity issue.) We could still have a (vetted) registry for "official" names, if we wanted. That would follow the MIME model. Or we can still have a separate registry, but only "qualified" (namespaced) names are open for anyone to register, without any expiration dates. -- R. David Murray If you like the work I do for Python, you can enable me to spend more time doing it by supporting me here: http://gittip.com/bitdancer

Am 28.08.12 17:38, schrieb R. David Murray:
I don't recall any RFC registries that have expiration dates for entries. Are there any?
The RFC database itself has expiration dates on specifications, namely on I-D documents (internet drafts). The expire 6 months after their initial publication, unless renewed. For number assignments, the risk is that it will eventually run out of numbers, in which case the protocol gets redesigned to increase the number space. For name assignments, the risk is that many similar-sounding elements become used, and people accept that as a trade-off for the problems you see in my expiration proposal. The most popular name registry that does have expiration (despite being hierarchical) is the DNS: you have to renew your names yearly in most TLDs. People apparently accept the risk of confusion when a domain expires and gets reused by someone else (and yes, the DNS *is* an "RFC registry" :-)
RFC registries usually have an organization vetting the entries, whereas it seems like we want this to be an open registry.
It very much depends. If you browse over the IANA registries, you find that many parameter space require "IETF consensus", so they can be extended only by RFC (similar to the status quo in metadata). There are IANA registries that are open (e.g. SNMP, or MIME); things are assigned in a first-come first-served manner (e.g. try to find out what 1.3.6.1.4.1.18832.11.3 is :-)
I don't consider it an absolute necessity that there is an expiration. I do consider it a flaw in (some) IANA name registrations that there is no expiration to them; I can report that people regularly want to claim some PyPI package name on the basis that the original owner didn't ever release any software under that name. Regards, Martin

On Tue, Aug 28, 2012 at 06:36:40PM +0200, "\"Martin v. L?wis\"" <martin@v.loewis.de> wrote:
Does that expiration mean something? The draft for Web Proxy Autodiscovery Protocol[1] expired in 1999 but still is widely implemented and used. 1. https://en.wikipedia.org/wiki/Web_Proxy_Autodiscovery_Protocol Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Am 28.08.12 19:15, schrieb Oleg Broytman:
It's explained in RFC 2026. An internet draft is not an internet standard, it may get changed at any time. An I-D which is expired and still used has the same relevance as a proprietary standard; it has nothing to do with the internet standards process. Whether this has any practical consequence depends on the market, of course. Customers that insist on standards compliance will look for RFC compliance, but typically not for I-D compliance. If the field of standardization is of relevance for such users, they will eventually ask for an RFC to be issued, which then may or may not be compatible with a long-standing proprietary standard. Regards, Martin

On Tue, Aug 28, 2012 at 08:19:08PM +0200, "\"Martin v. L?wis\"" <martin@v.loewis.de> wrote:
I see. Thank you! Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

After this discussion it seemed wiser to submit my proposed 1.2 edits as Metadata 1.3, adding Provides-Extra, Setup-Requires-Dist, and Extension (with no defined registration procedure). This version is sure to be exciting as it also specifies that the values are UTF-8 with tolerant decoding and re-defines environment markers in terms of the ast module (is there a better way to specify a subset of Python?). The proposed Metadata 1.3 is at https://bitbucket.org/dholth/python-peps/changeset/8fa1de7478e95b5ef3a18c327... Thanks, Daniel Holth

Am 31.08.12 05:16, schrieb Daniel Holth:
Thanks for doing this. A few comments: 1. -1 on "tolerant decoding". I think the format should clearly specify what fields are text (I think most of them are), and mandate that they be in UTF-8. If there is a need for binary data, they should be specified to be in base64 encoding (but I don't think any of the fields really are binary data). 2. The extensions section should discuss order. E.g. is it ok to write Chili-Type: Poblano Extension: Chili Platform: Basmati Extension: Garlic Chili-Heat: Mild Garlic-Size: 1tsp 3. There should be a specification of how collisions between extension fields and standard fields are resolved. E.g. if I have Extension: Home Home-page: http://www.python.org is Home-page the extension field or the PEP 345 field? There are several ways to resolve this; I suggest giving precedence to the standard field (unless you specify that extensions must follow all standard fields, in which case you can drop the extension prefix from the extension keys). 4. There needs to be a discusion of the meta-syntax. PEP 314 still mentioned that this is RFC 822; PEP 345 dropped that and didn't say anything about the syntax of fields (i.e. not even that they are key-value, that the colon is a separator, that the keys are case-insensitive, etc). Regards, Martin

On Friday, August 31, 2012 at 6:48 AM, "Martin v. Löwis" wrote:
Unless i'm mistaken (which I may be!) I believe that a / can be used as the separator between the namespace and the "real" key. Home-page: http://www.python.org Extension: Home Home/other-thing: Foo Doing this is the "Extension" field required?

On Aug 31, 2012, at 6:54 AM, Donald Stufft <donald.stufft@gmail.com> wrote:
Not bad.
Doing this is the "Extension" field required?
Yes it is required. A simple lookup for data ['extension'] tells you what to expect.

On Fri, 31 Aug 2012 07:01:17 -0400, Daniel Holth <dholth@gmail.com> wrote:
It also allows for typo detection, which automatically interpreting prefix strings as extensions names would not. -- R. David Murray If you like the work I do for Python, you can enable me to spend more time doing it by supporting me here: http://gittip.com/bitdancer

On Fri, Aug 31, 2012 at 10:41 PM, R. David Murray <rdmurray@bitdance.com> wrote:
It also allows for typo detection, which automatically interpreting prefix strings as extensions names would not.
+1 on retaining the explicit extension field, mainly for the cross-validation benefits (including easily checking which extension syntax is used by a module). However, also +1 on using "/" as the extension separator to avoid ambiguity in field names, as well as restoring the explicit requirement that metadata entries use valid RFC 822 metasyntax. If the precise rules can be articulated as a 3.3 email module policy, so much the better. I've now pushed Daniel's latest draft as PEP 426. I added the following section on "Metadata Files", which restores some background info on the overall file format that went AWOL in v1.2: ----------------------------------------------------------------------- Metadata Files ============== The syntax defined in this PEP is for use with Python distribution metadata files. This file format is a single set of RFC-822 headers parseable by the ``rfc822`` or ``email`` modules. The field names listed in the `Fields`_ section are used as the header names. There are two standard locations for these metadata files: * the ``PKG-INFO`` file included in the base directory of Python source distribution archives (as created by the distutils ``sdist`` command) * the ``dist-info/METADATA`` files in a Python installation database, as described in PEP 376. Other tools involved in Python distribution may choose to record this metadata in additional tool-specific locations (e.g. as part of a binary distribution archive format). ----------------------------------------------------------------------- As far as I know, the sdist archive format isn't actually defined anywhere beyond "archives like those created by the distutils sdist command". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Am 31.08.12 15:57, schrieb Nick Coghlan:
Unfortunately, this conflicts with the desire to use UTF-8 in attribute values - RFC 822 (and also 2822) don't support this, but require the use oF MIME instead (Q or B encoding). RFC 2822 also has a continuation line semantics which traditionally conflicts with the metadata; in particular, line breaks cannot be represented (but are interpreted as continuation lines instead). OTOH, several of the metadata fields do require line breaks, in particular those formatted as ReST. Regards, Martin

Some edits to include / and remove rfc822 again. What is the right email.policy.Policy()? https://bitbucket.org/dholth/python-peps/changeset/8ec6dd453ccbde6d34c63d2d2...

On Fri, 31 Aug 2012 12:18:05 -0400, Daniel Holth <dholth@gmail.com> wrote:
Some edits to include / and remove rfc822 again. What is the right email.policy.Policy()?
When I discussed using email to parse metadata with Tarek a long time ago, I thought he was going to move to using a delimiter-substitution algorithm to encode and recover the line breaks. Perhaps that discussion wasn't in this same context, but I thought it was. If you did that, then 'SMTP' would be the correct policy for RFC2822/5322. But that isn't really going to work for this use case, even with the above hack. As Martin pointed out, RFC2822 does not allow utf-8 in the values. RFC 5335, which is Experimental, does. A medium term goal of the email package is to support that RFC, so this might be a motivation to move that higher in my feature priority list. (Support mostly involves switches to allow unicode/utf8 to be *written*; the parsing side works already, though it is not thoroughly tested.) However, all that aside, to answer your question you are really going to want to define a custom policy that derives from email.policy.Policy. Especially if you want to not follow the email RFCs and do want to assign meaning to the line separators. You can do that with a custom policy and thus still be able the use the email parsing infrastructure to read and write the files. I'll be glad to help out with creating the custom policy once we've reached that stage of the process. -- R. David Murray If you like the work I do for Python, you can enable me to spend more time doing it by supporting me here: http://gittip.com/bitdancer

On Fri, Aug 31, 2012 at 12:53 PM, R. David Murray <rdmurray@bitdance.com> wrote:
Thanks. For the time being I am happily using the surrogateescape/bytesgenerator hack and it preserves UTF-8 and linebreaks. I don't have a strong opinion about the line continuation policy; I do not have code that relies on parsing the long description from PKG-INFO files.

"Martin v. Löwis" writes:
This can be achieved simply by extending the set of characters permitted, as MIME did for message bodies. I'd be cautious about RFC 5335, not just because it's experimental, but because there may be other requirements we don't want to mess with. (If RDM says otherwise, listen to him. I just know the RFC exists.)
Of course line breaks can be represented, without any further change to RFC 2822. Just use Unicode LINE SEPARATOR. You could even do it within ASCII by adhering strictly to RFC 2822 syntax which interprets continuation lines by removing exactly the CRLF pair. Just use ASCII TAB as the field separator. There's a final dodge that occurs to me: the semantics you're talking about are *lexical* semantics in the RFC 2822 context (line unfolding and RFC 2047 decoding). We could possibly in the context of the email module treat Metadata as an intermediate post-lexical-decoding pre-syntactic-analysis representation. I don't know if that makes sense in the context of using email module facilities to parse Metadata. Steve

On Sat, 01 Sep 2012 13:55:11 +0900, "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
That is essentially what that RFC does. I haven't gone through it with a fine-tooth yet, but that's why I say the parsing side mostly works already: we allow unicode characters anywhere non-special-characters are allowed during parsing. The only issue is that we encode non-ASCII using the normal rules during serialization, so we need a new policy control to disable that. I'm thinking it will be any easy addition...the hard part for RFC5335 is doing that fine-tooth read and adding appropriate tests. Alternatively, as Donald pointed out, you can use the Binary mode, where the utf-8 bytes just go along for the ride. In the context of the metadata, I think that should produce the desired results, since there should be no need to re-wrap metadata lines. It will also preserve the line endings *if* you don't use the new policies. But that is why I would prefer to use explicit RFC5335 support...I'd like the email backward compatibility policy to go away some day :) (On the gripping hand, it will always be possible to recreate it as a custom policy.)
Yes, that is what I was talking to Tarek about. And since ReST source shouldn't contain tabs, a tab would probably work as the separator, if for some reason you didn't want to use LINE SEPARATOR.
The policy has hooks that support this. A policy gets handed the source line complete with the line breaks, determines what gets stored in the model, and also gets to control what gets handed back to the application when a header is retrieved from the model. The policy can also control the header folding during serialization. So preserving line separators using a custom policy is not only possible, but should be fairly easy. --David

Am 31.08.12 12:54, schrieb Donald Stufft:
What do you mean by "can be"? In RFC 822, a slash can be in a field-name, yes, but the PEPs recently became silent on the meta-syntax.
Well, in my example it would then be Home-page: http://www.python.org Home/page: Foo I don't think the Extension field is necessary if there is a promise that standard fields won't ever include slashes. Regards, Martin

On Aug 31, 2012, at 6:48 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Ok. If you want you can check the version to decide how strict you want to be.
Ordering doesn't matter and collisions with existing tags are not allowed.
I think the new profile support for email Parser will handle this perfectly.
Regards, Martin

Am 28.08.12 17:17, schrieb Donald Stufft:
What do you do with packages that have already been uploaded with requires-unicode-version once it expires?
Who is "I" in this case? The PyPI installation? Mark the keys in the database as expired, and stop displaying them. If the key is restored, and the values are still syntactically correct, restore the values. Or is "I" software which downloads packages? Continue doing what it always does for invalid meta-data: I recommend to issue a warning; aborting the setup could also work.
No: if nobody renews the old registration, it's because the extension is not in use. So the case you are constructing won't happen in practice.
If the packages that were uploaded a year ago are still in active use, somebody will renew the registration. So the case won't happen. If nobody cares about the specific field, it may break, which is then well-deserved. Regards, Martin

On Tue, 28 Aug 2012 18:08:51 +0200, =?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?= <martin@v.loewis.de> wrote:
The problem Donald is asking about is: the old registration expires, and a *new* registration is entered with a different meaning, but packages still exist on PyPI that have the key with the old meaning. That seems likely to happen in practice. Or if it doesn't, then allowing for the recycling of names probably isn't important. -- R. David Murray If you like the work I do for Python, you can enable me to spend more time doing it by supporting me here: http://gittip.com/bitdancer

Am 28.08.12 18:27, schrieb R. David Murray:
Let me retry answering the question: Expiration *is* important in the case the key was just registered and never used, because it may be a good name for something, but can't be used because it is reserved for a use case that has no users. If the key is *widely* used, the scenario you assume is *not* likely in practice - either the original registrant will renew the registration before it expires, or somebody else will reregister it after it expires. There is also the case of a key that is used in a few packages (one or two packages seems a likely case - namely packages produced by the original registrant for the purpose of testing). Assuming the registrant then loses interest, and nobody else starts using the keys (i.e. they are not widely used), then these packages will break (in a mode that can be painted in different colors). This may happen, but I don't consider it a problem. If the original author finds the package broken, he will have to release a new version without the these keys, or re-register them under a new name (since his original name is now taken by somebody else - who hopefully can attract more users with his definition of the key). There is also the potential risk of key-jacking, which can be resolved administratively (by revoking the abusive registration). Regards, Martin

On Tue, 28 Aug 2012 18:47:16 +0200, =?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?= <martin@v.loewis.de> wrote:
OK, I understand your logic now. Yes that does make sense to me. There are tradeoffs to be made, and this seems like a reasonable tradeoff given the goals articulated so far. -- R. David Murray If you like the work I do for Python, you can enable me to spend more time doing it by supporting me here: http://gittip.com/bitdancer

On Tue, Aug 28, 2012 at 6:29 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
I agree with this point - the main reason the metadata PEP is still lingering at Accepted rather than Final is the tangled relationship between distutils and other projects that led to the complete distutils feature freeze. Until distutils2 makes it into the standard library as the packaging module, the standard library is going to be stuck at v1.1 of the metadata format.
However, this point I really don't agree with. The packaging ecosystem is currently evolving outside the standard library, but the standardisation process for the data interchange formats still falls under the authority of python-dev and the PEP process. If there are things missing from v1.2 of the metadata spec, then define v1.3 to address those known problems. Don't overengineer it in an attempt to anticipate every possible need that might come in the next decade. Tools outside the standard library are then free to adopt the new standard, even while the stdlib itself continues to lag behind. When the packaging module is finally added (hopefully 3.4, even if that means we have to temporarily cull the entire compiler subpackage), it will handle the most recent accepted version of the metadata format (as well as any previous versions). If more holes reveal themselves in the next 18 months, then it's OK if v1.4 is created when it becomes clear that it's necessary. At the very least, something v1.3 should make explicit is that custom metadata should NOT be put into the .dist-info/METADATA (PEP 376 location, PKG-INFO, in distutils terms) file. Instead, that data should be placed in a *separate* file in the .dist-info directory. Something that *may* be appropriate is a new field in METADATA that explicitly calls out such custom metadata files by naming the PyPI distribution that is the authority for the relevant format (e.g. "Custom-Metadata: wheel" to indicate that 'wheel' defined metadata is present) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Aug 28, 2012 at 9:04 PM, Daniel Holth <dholth@gmail.com> wrote:
Setuptools just uses path.exists() when it needs a particular file and will not bother parsing pkg-info at all if it can help it. The metadata edits for 1.2 fold some of those files into metadata.
You can't use path.exists() on metadata published by a webservice (or still inside a zipfile), but you can download or read the main metadata file. Still, I don't really care whether or not such a field indicating the presence of custom metadata is added, I'm mainly registering a strong -1 on allowing extension fields (in the form of X- headers or CSS style prefixed headers) in the metadata file itself. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I personally think that at a minimum we should have X-Fields that get moved into the normal METADATA file, and personally I would prefer to just drop the X- prefix completely. I think any spec which doesn't include first class support for extending it with new metadata is going to essentially kick the can down the road and solve the problems of today without leaving room to solve the problems of tomorrow. I know that distutils2 have requires-dist, but for the sake of argument pretend they don't. If there is first class support for extending the metadata with new fields, a project could come along, and add a requires-dist (or x-requires-dist) concept to metadata. Tools that understand it would see that data and be able to act on it, tools that don't understand it would simply write it to the METADATA file incase in the future a tool that does understand it needs to act on it. Essentially first class support for extending the metadata outside of a PEP process means that outside of the stdlib people can experiment and try new things, existing tools will continue to work and just ignore that extra data (but leave it intact), new tools will be able to utilize it to do something useful. Ideally as a new concept is tested externally and begins to gain acceptance a new metadata version could be created that standardizes that field as part of the spec instead of an extension. On Tuesday, August 28, 2012 at 7:45 AM, Nick Coghlan wrote:

On Tue, Aug 28, 2012 at 8:07 AM, Donald Stufft <donald.stufft@gmail.com> wrote:
That is my preference as well. The standard library basically ignores every metadata field or metadata file inside or outside of metadata currently, so where is the harm changing the official document to read "you may add new metadata fields to metadata" with an updated standard library that only ignores some of the metadata in metadata instead of all of it. The community is small enough to handle it.

On Tue, Aug 28, 2012 at 10:28 PM, Daniel Holth <dholth@gmail.com> wrote:
I will campaign ardently against any such proposal. Any extension field must be clearly traceable to an authority that gets to define what it means to avoid a repeat of the setuptools debacle. Namespaces are a honkin' great idea, let's do more of those :P Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Am 28.08.12 14:28, schrieb Daniel Holth:
The problem with that (and the reason to introduce the X- prefix in RFC 822) is that allowing arbitrary additions will make evolution difficult: if you want to standardize a certain field at some point, you either need to pick a name that is unused in all implementations (which you never can be really certain about), or you break some existing tool by making the addition (unless the addition happens to have the exact same syntax and semantics as the prior use). Regards, Martin

On Tue, Aug 28, 2012 at 10:07 PM, Donald Stufft <donald.stufft@gmail.com> wrote:
Hell no. We've been down this road with setuptools and it *sucks*. Everybody gets confused, because you can't tell just by looking at a metadata file what's part of the standard and what's been added just because a tool developer thought it was a good idea without being able to obtain broad consensus (perhaps because others couldn't see the point until the extension had been field tested for an extended period). Almost *nobody* reads metadata specs other than the people that helped write them. Everyone else copies a file that works, and tweaks it to suit, or they use a tool that generates the metadata for them based on some other interface. The least-awful widespread extension approach I'm aware of is CSS vendor prefixes. X- headers suck because they only give you two namespaces - the "standard" namespace and the "extension" namespace. That means everyone is quickly forced back into seeking agreement and consensus to avoid naming conflicts for extension fields. However, I'm open to the idea of a properly namespaced extension mechanism, which is exactly why I suggested separate files flagged in the main metadata with the PyPI project that defines the format of those extensions. I'm also open to the idea of extensions appearing in [PyPI distribution] prefixed sections after the standard metadata so, for example, there could be a [wheel] section in METADATA rather than a separate WHEEL file. We already have a namespace registry in the form of PyPI, so there's no reason to invent a new one, and allowing *any* PyPI distribution to add custom metadata fields without name conflicts would allow easy experimentation while still making it clear which fields are defined in PEPs and which are defined by particular projects.
Agreed, and this is the kind of thing a v1.3 metadata PEP could define. It just needs to be properly namespaced, and the obvious namespacing mechanism is PyPI project names. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Aug 28, 2012 at 8:28 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Wheel deals with this somewhat by including a Packager: bdist_wheel line in WHEEL so that you can deal with packager-specific bugs. Bento uses indentation so you can have sections: Key: value Indented Key: value

How about Extensions are fields that start with a pypi-registered name followed by a hyphen. A file that contains extension fields declares them with Extension: name : Extension: pypiname pypiname-Field: value

On Tue, Aug 28, 2012 at 10:57 PM, Daniel Holth <dholth@gmail.com> wrote:
The repetition seems rather annoying. Compare the two section based variants I just posted to: Extension: wheel wheel-Version: 0.9 wheel-Packager: bdist_wheel-0.1 wheel-Root-Is-Purelib: true It does have the advantage that tools for manipulating the format can remain dumber, but that doesn't seem like *that* much of an advantage, especially since any such benefit could be eliminated completely by just switching to a completely standard ConfigParser format by putting the PEP defined settings into a [python] section. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Aug 28, 2012 at 9:09 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Wheel is a little different because once it's installed it is no longer a wheel, but it makes a decent example. That's not even repetition, it's just longer tag names. Repetition is having one Classifier: line for every trove classifier. It would be quite inconvenient to change the parser for PKG-INFO. It's a win to keep the file flat.

On Tue, Aug 28, 2012 at 11:20 PM, Daniel Holth <dholth@gmail.com> wrote:
Cool, it's the namespace I care about. Every piece of extended metadata must have an authority who gets to define what it means. If that means people register a "virtual" PyPI project just to reserve an extension namespace, I'm fine with that. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tuesday, August 28, 2012 at 9:09 AM, Nick Coghlan wrote:
To be more specific, there is setup.cfg (which I dislike for other reasons), and then there is METADATA. setup.cfg is an ini file but METADATA is a simple key: value file with a flat namespace so any namespacing you want to do in METADATA needs to be done at the key level. You could translate: [setuptools] requires-dist=foo in a setup.cfg into setuptools-requires-dist: foo in METADATA, but I'm not sure if that would be beneficial or not.

On Tue, Aug 28, 2012 at 11:23 PM, Donald Stufft <donald.stufft@gmail.com> wrote:
We're talking about the format for v1.3 of the metadata. That format is not defined yet, so it's not obligatory for it to remain a flat key value store. However, there are advantages to keeping it as such, so I'm fine with Daniel's suggested approach. The only thing I really care about is the namespacing, for the same reasons the IETF wrote RFC 6648, as Petri linked earlier [1]. Establishing proper name registration rules can categorically eliminate a bunch of problems further down the line (such as the past confusion between which metadata entries were defined by PEPs and which were setuptools-specific extensions that other tools might not understand). With PyPI based namespacing we get clear orthogonal naming with clear lines of authority: 1. PEPs continue to define the core metadata used by PyPI, the standard library (once we get updated packaging support in place) and most other tools 2. Any members of the community with a specific interest can register a PyPI project to define additional metadata without risking naming conflicts. This need may arise in the context of a specific project, and thus use that project's name, or else it may be a project registered for the express purpose of being a metadata namespace, and not actually correspond to any installable module. The main point is to take advantage of an existing automated Python-specific name and resource registry to avoid naming conflicts without Java-style reverse DNS based clutter, and without python-dev having to explicitly approve each and every metadata extension. Cheers, Nick. [1] https://tools.ietf.org/html/rfc6648#section-4 Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tuesday, August 28, 2012 at 9:41 AM, Nick Coghlan wrote:
I'm happy with any form of a namespace to be quite honest. I have a bit of a preference for no or flat namespace but i'm perfectly fine with a PyPI based namespace. The important part is a defined way to extend the data that even when tools don't understand the extended data they can losslessly move it around from setup.cfg/setup.py/whatever to METADATA and any other format, even if they themselves don't utilize it, leaving it intact for tools that _do_ utilize it.

On Tue, Aug 28, 2012 at 10:33 PM, Daniel Holth <dholth@gmail.com> wrote:
Right, but the problem with that is it's defining a couple of *new* namespaces to manage: - the filenames within dist_info (although uppercasing a PyPI project name is pretty safe) - the "Packager" field (bdist_wheel is a distutils command rather than a PyPI project) By using PyPI distribution names to indicate custom sections in the main metadata file, we would get to exploit an existing registry that enforces uniqueness without imposing significant overhead.
Yes, the main metadata file could definitely go that way. The three main ways I can see an extensible metadata format working are: 1. The way wheel currently works (separate WHEEL file, naming conflicts resolved largely by first-in-first-served with no official registry, no obvious indication which project defines the format) 2. PyPI as extension registry, with an ini-file inspired section syntax inside dist-info/METADATA <standard metadata must appear first> [wheel] Version: 0.9 Packager: bdist_wheel-0.1 Root-Is-Purelib: true 3. PyPI as extension registry, with an indented section syntax inside dist-info/METADATA <custom metadata sections may appear anywhere in the file> Extended-Metadata: wheel Version: 0.9 Packager: bdist_wheel-0.1 Root-Is-Purelib: true My preference is currently for the ini-style variant, but I could definitely live with the indented approach.Either way, any project registered on PyPI would be free to add their own extensions without fear of naming conflicts or any doubts about the relevant authority for the meaning of the fields. Standard tools could just treat those sections as opaque blocks of text to be preserved verbatim, or else they could be constrained so that the standard tools could pick out the individual key:value pairs. Namespacing an extension mechanism based on PyPI distributions names should be pretty straightforward and it will mean that a lot of problems that can otherwise arise with extensible metadata systems should simply never come up. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tuesday, August 28, 2012 at 8:28 AM, Nick Coghlan wrote:
The biggest reason I have against namespacing them is it makes moving from experimental to standard easier, but I'm ok with some form of a namespace. The biggest reason I see against using PyPI names as the namespace is it needlessly ties a piece of data to the original creator. Similar to how right now you could write a less hacky setuptools, but in order to do so you need to continue to use the setuptools package name (see distribute). Using PyPI names means that in the requires-dist example it would be something like setuptools-requires-dist, and even if I make my own tool that supports the same concept as setuptools's requires-dist I would need to use setuptools-requires-dist. The concept of metadata I think should be divorced from specific implementations. Obviously there are going to be some implementation specific issues but I think it's much cleaner to have a x-requires-dist that any implementation can use than to have whoever-invented-it-first-requires-dist or a twenty-different-forms-of-requires-dist.

Maybe I misphrased. By "accepted" I meant "widely implemented". From the day this gets published until it is really usable, I still believe 10 years is realistic. For example, setuptools doesn't implement Meta-data 1.2, and nearly nobody uses it, 8 years after it was written.
The problem is that flooding people with specifications is a guarantee that they will not get implemented. So we can have one metadata specification every ten years; if we have more, none of them will be implemented (except in the tool of the author of the PEP). Regards, Martin

On Tue, Aug 28, 2012 at 10:47 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Why not. You get the feature in the tool, and you don't get it elsewhere, but the other implementation can still parse what it understands. The tool author promotes his tool for this reason. The extension format is intentionally ugly so that people will standardize eventually if only for aesthetic reasons. Yes, you have to support popular extensions forever, it's a messy world we live in. Two tools that implement Metadata 1.2+ are called wheel and distribute
Extension: distribute Distribute-Provides-Extra: foo (Just require un-hyphenated names in Extension: or map them to underscore _ if you must) -1 on doing anything but mapping them to package names. I can't provide a regex to strictly validate Requires-Dist: foo ; condition because condition is an impoverished subset of the Python language filtered with the wonderful ast module. Are you really willing to unpack and validate PKG-INFO on every archive that is uploaded to pypi?

Am 28.08.12 17:30, schrieb Daniel Holth:
Are you really willing to unpack and validate PKG-INFO on every archive that is uploaded to pypi?
Users should run the "register" command, which will provide the metadata information. Also, the UI needs to be extended to allow to fill out and edit metadata information interactively, or upload it. And yes, PyPI already extracts (but currently doesn't further process) PKG-INFO from every archive that is uploaded. So PyPI absolutely needs to "know" about the meta-data. Regards, Martin

On Mon, Aug 27, 2012 at 10:56 AM, Daniel Holth <dholth@gmail.com> wrote:
Somehow I completely overlooked this thread until now. Thanks Daniel for getting the ball rolling on this. There have already been many bytes spilled on metadata extensions, and although I agree it would be enormously useful to build an extension mechanism into the metadata format, I don't have much riding on that, or much more to add that hasn't been said. There hasn't been much said about Setup-Requires-Dist, so I'm guessing it's uncontroversial. But since that's sort of my hobbyhorse I thought I would make a comment on it. The thing I love about the Setup-Requires-Dist feature is that, if properly supported by different installers, it can free those installers from a fair bit of responsibility. For example, in greatly simplifies the thorny issue of "compilers". The existing compiler support in distutils, while not without its problems, does work in most cases for building common C-extensions. distutils2 has already made some progress on cleaning up the interface for compilers, and making it easier to register new compiler classes that can be imported from an arbitrary package. This allows projects with special needs (such as Fortran compiler support) to ship their own compiler class with the project. Or if there's a good enough third-party package that provides Fortran compiler support, projects may use it in their build process. Support for Setup-Requires-Dist ensures that a third-party compiler package can be made available at build-time. What's great about this, is that even if the stdlib still includes a build system, it doesn't necessarily have to anticipate every possible need for building every kind of project (it should, at a minimum, be able to build pure-Python projects). If someone wants to add MSVC2012 support they can do that as a third-party package. One could even create "compilers" for other build systems like waf, or even provide an entry-point to meta-build systems like bento. Am I making sense? Erik

Add to metadata 1.3: Description-File: README(\..+)? Meaning the description should be read from a file in the same directory as PKG-INFO or METADATA (including in the .dist-info directories) and we strongly recommend it be named as README.* and be utf-8 encoded text. Description: is the only multi-line field in the metadata. It is almost never needed at runtime. It would be great for performance and simplify the parser to just put it in another file. Mutually exclusive with Description. May beg for a Summary: tag with a one-line description.

On Fri, Sep 14, 2012 at 12:30 PM, Daniel Holth <dholth@gmail.com> wrote:
Can we make Description-File multiple-use? The meaning of this would be that the Description is formed from concatenating each Description-File in order. That raises the question: Is ordering guaranteed for multiple-use fields? I ask, because distutils2 supports exactly such a feature, and I've found it useful. For example, if I have a README.rst and a CHANGELOG.rst I can specify: description-file = README.rst CHANGELOG.rst Then the full description, contains my readme and my changelog, which look nice together on PyPI, but I prefer to keep as separate files in the source. My only other concern is that if the value of this field can theoretically be arbitrary, it could conflict with other .dist-info files. Does the .dist-info format allow subdirectories? Placing description-files in a subdirectory of .dist-info could be a reasonable workaround. Erik

On Fri, Sep 14, 2012 at 1:43 PM, Erik Bray <erik.m.bray@gmail.com> wrote:
The .dist-info design asks for every metadata file (the one in all caps, not any of the other metadata in .dist-info) to be parsed for many packaging operations that do not require the description, such as resolving the dependency graph of a package. Description-File would give an installer the option to pull Description: out into Description-File:. I would expect the concatenation to happen before this point. I would like to forbid subdirectories in .dist-info but I think they are allowed. The order of multi-use fields is probably preserved. I don't think it is required to be by any spec.

On Fri, Sep 14, 2012 at 1:57 PM, Daniel Holth <dholth@gmail.com> wrote:
I understand now. In this case why even allow flexibility in the description file name? Just make it description.txt, and the Description-File field just some boolean indicator of whether or not a description file exists? Erik

On Fri, Sep 14, 2012 at 2:03 PM, Donald Stufft <donald.stufft@gmail.com> wrote:
OK. In practice, Description: is the only field that is likely to be hundreds of lines long which seems wasteful. The parser complexity is a non-issue. (I know the PEP says Description: should not be the instruction manual, but it is wrong because that is the way to get useful data up on a package's pypi page.)

On 14 September 2012 17:30, Daniel Holth <dholth@gmail.com> wrote:
we strongly recommend it be named as README.* and be utf-8 encoded text.
I'd very strongly recommend that the encoding should be mandated. I'm happy with UTF-8 (although I expect a lot of "accidental" latin-1 files, particularly from Windows users) but I think it would be a mistake to have the specification leaving the encoding of *any* string data unclear. Without the encoding being specified, either in the metadata itself ("Description-File-Encoding: utf8"? Please, no) or by the specification mandating it, how is a program expected to read the description? Other than this point, I have no opinion on the proposal. Paul.
participants (11)
-
"Martin v. Löwis"
-
Daniel Holth
-
Donald Stufft
-
Eric Snow
-
Erik Bray
-
Nick Coghlan
-
Oleg Broytman
-
Paul Moore
-
Petri Lehtinen
-
R. David Murray
-
Stephen J. Turnbull