From richard at python.org  Fri Mar  1 00:21:59 2013
From: richard at python.org (Richard Jones)
Date: Fri, 1 Mar 2013 10:21:59 +1100
Subject: [Catalog-sig] remove historic download/homepage links for a
	project
In-Reply-To: <kgo39d$nao$1@ger.gmane.org>
References: <512E3588.4020305@egenix.com>
	<CAL0kPAW3tRmcO5b-HDfYttS6S9Xk04L-rJ5=_6_AAMEdzp3Lqg@mail.gmail.com>
	<20130227183754.GR9677@merlinux.eu>
	<CAKgW=6JZjPpcHrdjxn-z9vYVAiZit=xY3uv0sNjRhb-nVjuBnQ@mail.gmail.com>
	<512E6361.1030108@inaugust.com>
	<CAL0kPAUnn91timF6y88UAxNVT6UusGx5xsgT_2XUiPk3TTuhVA@mail.gmail.com>
	<CALeMXf6KJnWqE-bGsvUN4Qt+=z3fXQ4-xvC23Ckt9FYZ0xtJyA@mail.gmail.com>
	<CAKgW=6K=rp8mx+oihGNCKSvL6Fy-3QC9Kr-m6gWkPf8ZBguOUw@mail.gmail.com>
	<20130228092835.GX9677@merlinux.eu> <kgnjkq$gpj$2@ger.gmane.org>
	<20130228134100.GZ9677@merlinux.eu>
	<3065EDAA-8BCE-4D5F-A59F-D0D4F2B33B25@mac.com>
	<kgo39d$nao$1@ger.gmane.org>
Message-ID: <CAHrZfZBn_xJ9BKdcfppCgsq-GrnkL67FNrfxZAtWK5K=_Tg1Nw@mail.gmail.com>

On 1 March 2013 04:10, Tres Seaver <tseaver at palladion.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 02/28/2013 11:27 AM, Ronald Oussoren wrote:
>
>> But necessary to have. Or am the only one that accidently released a
>> version that had serious bugs?
>
> Nope.  The way to address such a version is to release a new, fixed
> version (preferably one with a suitably-PEP-compliant version which
> indicates the version being corrected).  The only legitimate reason to
> yank a release is that you are under legal compulsion to do so (a
> takedown notice or equivalent), or you discover that the version released
> has been trojaned in some way.

You may have listed the only reason *you will allow* but the owner of
the package can do whatever they want. You're correct that once the
package is "out in the wild" you can't get all those copies back, but
they can (for whatever reason they have and no, I'm not going to
needlessly speculate) remove it from PyPI. You have no legal or moral
right to compel them to do otherwise.


    Richard

From pje at telecommunity.com  Fri Mar  1 00:31:17 2013
From: pje at telecommunity.com (PJ Eby)
Date: Thu, 28 Feb 2013 18:31:17 -0500
Subject: [Catalog-sig] Deprecate External Links
In-Reply-To: <528718A2FA614C0288E562FEED8F85A4@gmail.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512E28CB.9080907@egenix.com>
	<2C0A235BC980420C8632A75D39953B1A@gmail.com>
	<512E3588.4020305@egenix.com>
	<CAL0kPAW3tRmcO5b-HDfYttS6S9Xk04L-rJ5=_6_AAMEdzp3Lqg@mail.gmail.com>
	<20130227183754.GR9677@merlinux.eu>
	<CAKgW=6JZjPpcHrdjxn-z9vYVAiZit=xY3uv0sNjRhb-nVjuBnQ@mail.gmail.com>
	<512E6361.1030108@inaugust.com>
	<CAL0kPAUnn91timF6y88UAxNVT6UusGx5xsgT_2XUiPk3TTuhVA@mail.gmail.com>
	<20130228090034.GV9677@merlinux.eu>
	<CADiSq7cGVZB8Uzpp9XM8WG5T5Fven1Eo=WSrWaQctNBz36hcaw@mail.gmail.com>
	<CALeMXf6KdR4tEDtj+oQMHyfG40g4XrkMi8PxNY=3qfLPeOos_g@mail.gmail.com>
	<528718A2FA614C0288E562FEED8F85A4@gmail.com>
Message-ID: <CALeMXf45kcCbKLbhY3ZKrtSmZir6hVJQEcy=xwQ5m5JM_BkjLQ@mail.gmail.com>

On Thu, Feb 28, 2013 at 5:00 PM, Donald Stufft <donald.stufft at gmail.com> wrote:
> SSL checking on upload should be possible, do you want
> a patch?

If it uses the 'requests' library, yes, I'll accept one.  But I don't
want to do any direct implementation of SSL cert checking in
setuptools, at least in the short run (next few weeks), because:

1. I don't consider myself qualified as yet to write a correct patch
or even verify that a contributed patch is correct/safe, and

2. There is a licensing issue with including the Mozilla root
certificate set in setuptools under its current license, and I'm not
100% certain I can *change* the license.  (I *could* potentially use a
platform-provided cert set, but that's not really an option on Windows
unless you have Windows expertise above my paygrade for pulling that
stuff out of the registry.)

So, by delegating to the requests library, I can bypass both of those
issues in the short term.  In the longer term (>1 month from now),
more integrated solutions may be more feasible.  Using "requests" is
the best I think I can reasonably achieve by PyCon, but I *will* be
publicizing a set of instructions for how to "safely" download
setuptools and requests (via https in a browser to prevent MITM
attacks), as well as how to configure easy_install for more secure
default settings.  (And easy_install will always use "requests" if
present, unless specifically asked not to with a --no-ssl-verify
option.)

From donald.stufft at gmail.com  Fri Mar  1 00:36:02 2013
From: donald.stufft at gmail.com (Donald Stufft)
Date: Thu, 28 Feb 2013 18:36:02 -0500
Subject: [Catalog-sig] Deprecate External Links
In-Reply-To: <CALeMXf45kcCbKLbhY3ZKrtSmZir6hVJQEcy=xwQ5m5JM_BkjLQ@mail.gmail.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512E28CB.9080907@egenix.com>
	<2C0A235BC980420C8632A75D39953B1A@gmail.com>
	<512E3588.4020305@egenix.com>
	<CAL0kPAW3tRmcO5b-HDfYttS6S9Xk04L-rJ5=_6_AAMEdzp3Lqg@mail.gmail.com>
	<20130227183754.GR9677@merlinux.eu>
	<CAKgW=6JZjPpcHrdjxn-z9vYVAiZit=xY3uv0sNjRhb-nVjuBnQ@mail.gmail.com>
	<512E6361.1030108@inaugust.com>
	<CAL0kPAUnn91timF6y88UAxNVT6UusGx5xsgT_2XUiPk3TTuhVA@mail.gmail.com>
	<20130228090034.GV9677@merlinux.eu>
	<CADiSq7cGVZB8Uzpp9XM8WG5T5Fven1Eo=WSrWaQctNBz36hcaw@mail.gmail.com>
	<CALeMXf6KdR4tEDtj+oQMHyfG40g4XrkMi8PxNY=3qfLPeOos_g@mail.gmail.com>
	<528718A2FA614C0288E562FEED8F85A4@gmail.com>
	<CALeMXf45kcCbKLbhY3ZKrtSmZir6hVJQEcy=xwQ5m5JM_BkjLQ@mail.gmail.com>
Message-ID: <A2039517F5BC443AA3E5F835D7FA41F4@gmail.com>

On Thursday, February 28, 2013 at 6:31 PM, PJ Eby wrote:
> On Thu, Feb 28, 2013 at 5:00 PM, Donald Stufft <donald.stufft at gmail.com (mailto:donald.stufft at gmail.com)> wrote:
> > SSL checking on upload should be possible, do you want
> > a patch?
> > 
> 
> 
> If it uses the 'requests' library, yes, I'll accept one. But I don't
> want to do any direct implementation of SSL cert checking in
> setuptools, at least in the short run (next few weeks), because:
> 
> 

Does setuptools support Python3? (or do you want it to?) 
> 
> 1. I don't consider myself qualified as yet to write a correct patch
> or even verify that a contributed patch is correct/safe, and
> 
> 

There's existing implementations out there that add cert checking
to urllib, it's fairly short. 
> 
> 2. There is a licensing issue with including the Mozilla root
> certificate set in setuptools under its current license, and I'm not
> 100% certain I can *change* the license. (I *could* potentially use a
> platform-provided cert set, but that's not really an option on Windows
> unless you have Windows expertise above my paygrade for pulling that
> stuff out of the registry.)
> 
> 

Shouldn't be any issue, the PSF license is very liberal and the MPL
works on a per file (as opposed to a per project) basis. So if you
include the cert bundle that particular file is MPL licensed while
setuptools itself remains PSF.
> 
> So, by delegating to the requests library, I can bypass both of those
> issues in the short term. In the longer term (>1 month from now),
> more integrated solutions may be more feasible. Using "requests" is
> the best I think I can reasonably achieve by PyCon, but I *will* be
> publicizing a set of instructions for how to "safely" download
> setuptools and requests (via https in a browser to prevent MITM
> attacks), as well as how to configure easy_install for more secure
> default settings. (And easy_install will always use "requests" if
> present, unless specifically asked not to with a --no-ssl-verify
> option.)
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130228/f29d5190/attachment.html>

From donald.stufft at gmail.com  Fri Mar  1 01:13:00 2013
From: donald.stufft at gmail.com (Donald Stufft)
Date: Thu, 28 Feb 2013 19:13:00 -0500
Subject: [Catalog-sig] Pypi cdn for hosted packages
In-Reply-To: <73b43f9b-ed6d-40aa-ad17-40e1992dd295@email.android.com>
References: <23CB2462-E646-4F51-B3BF-110FA3FB2F21@gmail.com>
	<CAPDm-FgJBSH9rdxQE5D8qw_=uO2XajVhPqtHty3TA8g3gbheUA@mail.gmail.com>
	<73b43f9b-ed6d-40aa-ad17-40e1992dd295@email.android.com>
Message-ID: <A2875604120D428982E345E35F366447@gmail.com>

On Thursday, February 28, 2013 at 10:13 AM, Noah Kantrowitz wrote:
> Reponding from my phone quickly before this gets any further, will write more later. Plan is to have pypi move package download links to a new hostname (probably pypi-download.python.org (http://pypi-download.python.org)) and then throw that behind fastly. This sidesteps 100% of issues with dynamic pages, etc. Simple index with be handled secondarily.
Just an aside, can we use a pythonhosted.org domain, like
https://packages.pythonhosted.org/ or something?

That will prevent gifar like attacks where someone finds a way
to create a file that both looks like a valid file to PyPI, but
that browsers will interpret as something executable. Or rather
it prevents it from being able to attack *.python.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130228/4f755631/attachment.html>

From tjreedy at udel.edu  Fri Mar  1 02:31:05 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 28 Feb 2013 20:31:05 -0500
Subject: [Catalog-sig] PyPI terms
In-Reply-To: <D07E42FB-A7A9-4477-9E82-2BDEE066EA2A@coderanger.net>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512E28CB.9080907@egenix.com>
	<c5b80cd2-e956-49db-942d-ee5f8a08d00d@email.android.com>
	<512E422C.3070001@egenix.com>
	<4A372726-1248-4E43-AC00-863DA153D42C@coderanger.net>
	<512F2FE9.9080001@egenix.com>
	<3D8F33FF-A4FA-45B9-8AF5-97DA91876C1E@coderanger.net>
	<512F9E8A.1010707@egenix.com>
	<D07E42FB-A7A9-4477-9E82-2BDEE066EA2A@coderanger.net>
Message-ID: <kgp0ct$foe$1@ger.gmane.org>

On 2/28/2013 1:19 PM, Noah Kantrowitz wrote:

> Because I happen to have YouTube open anyway:
>
> """ For clarity, you retain all of your ownership rights in your
> Content. However, by submitting Content to YouTube, you hereby grant
> YouTube a worldwide, non-exclusive, royalty-free, sublicenseable and
> transferable license to use, reproduce, distribute, prepare
> derivative works of, display, and perform the Content in connection
> with the Service and YouTube's (and its successors' and affiliates')
> business, including without limitation for promoting and
> redistributing part or all of the Service (and derivative works
> thereof) in any media formats and through any media channels. You
> also hereby grant each user of the Service a non-exclusive license to
> access your Content through the Service, and to use, reproduce,
> distribute, display and perform such Content as permitted through the
> functionality of the Service and under these Terms of Service. The
> above licenses granted by you in video Content you submit to the
> Service terminate within a commercially reasonable time after you
> remove or delete your videos from the Service. You understand and
> agree, however, that YouTube may retain, but not display, distribute,
> or perform, server copies of your videos that have been removed or
> deleted. The above licenses granted by you in user comments you
> submit are perpetual and irrevocable. """
>
> Slightly different wording,

Noah, I understand that you desperately do not want to admit that the 
PSF requirement for uploading to it servers is unusually broad, because 
you do not want to admit that rational people might have a reason to not 
upload, but there it is.

1. The uploader only authorizes distribution via the YouTube 
infrastructure. Indeed, Google want that limitation because it wants to 
be the one that monetizes distribution. So it only streams videos (free 
ones, anyway) and does *not* download. Anyone who subverts this and 
captures the stream as a download has no rights to it.

2. The uploader can terminate the license with Google. Because of #1, 
such termination stops anyone from legal distribution.

Note: Flickr gives uploaders the choice of whether images can be 
downloaded or only embedded in a flickr web page. It also lets uploaders 
set the license that applies to flickr users. And it allows deletion of 
images.

> only the license to comments is irrevocable,

Irrelevant to this discussion.

 > for videos they just promise to stop distributing

This is the important point.

> but not actually remove your content.

This is a mostly irrelevant practical issue. Finding and scrubbing every 
backup copy is difficult and expensive, especially for disk-image 
backups or serial tape media (if indeed they still use such) or backups 
stuck down in a deep salt mine. Any repository that does backups has to 
have this proviso. (I am sure, for instance, that Flickr does now.)


My take on the current license is this: the original upload license was 
rather minimal. The lawyer decided it was insufficient. Rather that 
craft a broader license with the absolute minimum rights grant 
necessary, the lawyer took the easy, quick, and cheap-for-psf route of a 
maximal rights grant. That is okay with me as long as it is not 
mis-represented and as long as people do not try to bludgeon me or 
anyone else in signing something we do not agree to.

Note: when I contribute text and code to the CPython repository, I also 
give up all control. I know and accept that, and even want that, because 
it also means that I can re-write *other* people's text and code. But 
people may reasonably want to keep more control over their independent 
sole-author work.

-- 
Terry Jan Reedy


From tseaver at palladion.com  Fri Mar  1 04:08:34 2013
From: tseaver at palladion.com (Tres Seaver)
Date: Thu, 28 Feb 2013 22:08:34 -0500
Subject: [Catalog-sig] remove historic download/homepage links for a
	project
In-Reply-To: <CAHrZfZBn_xJ9BKdcfppCgsq-GrnkL67FNrfxZAtWK5K=_Tg1Nw@mail.gmail.com>
References: <512E3588.4020305@egenix.com>
	<CAL0kPAW3tRmcO5b-HDfYttS6S9Xk04L-rJ5=_6_AAMEdzp3Lqg@mail.gmail.com>
	<20130227183754.GR9677@merlinux.eu>
	<CAKgW=6JZjPpcHrdjxn-z9vYVAiZit=xY3uv0sNjRhb-nVjuBnQ@mail.gmail.com>
	<512E6361.1030108@inaugust.com>
	<CAL0kPAUnn91timF6y88UAxNVT6UusGx5xsgT_2XUiPk3TTuhVA@mail.gmail.com>
	<CALeMXf6KJnWqE-bGsvUN4Qt+=z3fXQ4-xvC23Ckt9FYZ0xtJyA@mail.gmail.com>
	<CAKgW=6K=rp8mx+oihGNCKSvL6Fy-3QC9Kr-m6gWkPf8ZBguOUw@mail.gmail.com>
	<20130228092835.GX9677@merlinux.eu> <kgnjkq$gpj$2@ger.gmane.org>
	<20130228134100.GZ9677@merlinux.eu>
	<3065EDAA-8BCE-4D5F-A59F-D0D4F2B33B25@mac.com>
	<kgo39d$nao$1@ger.gmane.org>
	<CAHrZfZBn_xJ9BKdcfppCgsq-GrnkL67FNrfxZAtWK5K=_Tg1Nw@mail.gmail.com>
Message-ID: <kgp6ae$vkg$1@ger.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/28/2013 06:21 PM, Richard Jones wrote:
> On 1 March 2013 04:10, Tres Seaver <tseaver at palladion.com> wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>> 
>> On 02/28/2013 11:27 AM, Ronald Oussoren wrote:
>> 
>>> But necessary to have. Or am the only one that accidently released
>>> a version that had serious bugs?
>> 
>> Nope.  The way to address such a version is to release a new, fixed 
>> version (preferably one with a suitably-PEP-compliant version which 
>> indicates the version being corrected).  The only legitimate reason
>> to yank a release is that you are under legal compulsion to do so
>> (a takedown notice or equivalent), or you discover that the version
>> released has been trojaned in some way.
> 
> You may have listed the only reason *you will allow* but the owner of 
> the package can do whatever they want. You're correct that once the 
> package is "out in the wild" you can't get all those copies back, but 
> they can (for whatever reason they have and no, I'm not going to 
> needlessly speculate) remove it from PyPI. You have no legal or moral 
> right to compel them to do otherwise.

I wasn't claiming any right:  I was arguing that anybody who shares
software with the community does the community a disservice by removing a
release because it "has serious bugs."  Brown-bag releases happen:  ab
open source community repairs the damage from them by making new
releases, not by covering them up.


Tres.
- -- 
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iEYEARECAAYFAlEwG7IACgkQ+gerLs4ltQ6RCACggZ38+vBTCXGlnwtm/mrmvkCp
370An1S6hQJkmJBVFQ5dkO+XeElkUPuj
=zjAd
-----END PGP SIGNATURE-----


From regebro at gmail.com  Fri Mar  1 04:37:14 2013
From: regebro at gmail.com (Lennart Regebro)
Date: Fri, 1 Mar 2013 04:37:14 +0100
Subject: [Catalog-sig] Deprecate External Links
In-Reply-To: <20130228195242.GA9677@merlinux.eu>
References: <20130227183754.GR9677@merlinux.eu>
	<CAKgW=6JZjPpcHrdjxn-z9vYVAiZit=xY3uv0sNjRhb-nVjuBnQ@mail.gmail.com>
	<512E6361.1030108@inaugust.com> <20130227201642.GT9677@merlinux.eu>
	<AD6EC359AF524D739BC958D63620D93F@gmail.com>
	<CADiSq7d7YMbJzRhUhHdfNexWQF6eCavpCSYMBorCodO2NVP0zg@mail.gmail.com>
	<674B990052E24AB58FF9614CCD7A9DC2@gmail.com>
	<CADiSq7c9Eu=uoNBk-Zx9kFSzKb0q4NVPZ=0AT1Vz0+4pir5v6g@mail.gmail.com>
	<CAL0kPAUsPa13qm2_jBCY=QbfhnJtouYw7uOA153-O9Q7_TvFvg@mail.gmail.com>
	<CAL0kPAXEY=XDyiqGWn2uXZv5jxvP0LXYS-737pmbgN1UEu1+KQ@mail.gmail.com>
	<20130228195242.GA9677@merlinux.eu>
Message-ID: <CAL0kPAUxU6N-Q8YKFZ-5HakS6bE-hLq9ovrFX4haO8=jZKNzQA@mail.gmail.com>

On Thu, Feb 28, 2013 at 8:52 PM, holger krekel <holger at merlinux.eu> wrote:
> There are also packages which have some (older) release files on pypi
> and newer ones outside (e.g. "lockfile" with 78256 downloads from
> code.google.com).  You didn't include such in your 2651 emails, or did you?

No, I didn't, I assumed they would be quite few.
Possibly a better algorithm is to check if the last release has files on PyPI.

//Lennart

From ronaldoussoren at mac.com  Fri Mar  1 08:09:52 2013
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Fri, 1 Mar 2013 08:09:52 +0100
Subject: [Catalog-sig] remove historic download/homepage links for
	a	project
In-Reply-To: <kgp6ae$vkg$1@ger.gmane.org>
References: <512E3588.4020305@egenix.com>
	<CAL0kPAW3tRmcO5b-HDfYttS6S9Xk04L-rJ5=_6_AAMEdzp3Lqg@mail.gmail.com>
	<20130227183754.GR9677@merlinux.eu>
	<CAKgW=6JZjPpcHrdjxn-z9vYVAiZit=xY3uv0sNjRhb-nVjuBnQ@mail.gmail.com>
	<512E6361.1030108@inaugust.com>
	<CAL0kPAUnn91timF6y88UAxNVT6UusGx5xsgT_2XUiPk3TTuhVA@mail.gmail.com>
	<CALeMXf6KJnWqE-bGsvUN4Qt+=z3fXQ4-xvC23Ckt9FYZ0xtJyA@mail.gmail.com>
	<CAKgW=6K=rp8mx+oihGNCKSvL6Fy-3QC9Kr-m6gWkPf8ZBguOUw@mail.gmail.com>
	<20130228092835.GX9677@merlinux.eu> <kgnjkq$gpj$2@ger.gmane.org>
	<20130228134100.GZ9677@merlinux.eu>
	<3065EDAA-8BCE-4D5F-A59F-D0D4F2B33B25@mac.com>
	<kgo39d$nao$1@ger.gmane.org>
	<CAHrZfZBn_xJ9BKdcfppCgsq-GrnkL67FNrfxZAtWK5K=_Tg1Nw@mail.gmail.com>
	<kgp6ae$vkg$1@ger.gmane.org>
Message-ID: <F92342D5-2BDD-49AC-B508-9F032B8D45A4@mac.com>


On 1 Mar, 2013, at 4:08, Tres Seaver <tseaver at palladion.com> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 02/28/2013 06:21 PM, Richard Jones wrote:
>> On 1 March 2013 04:10, Tres Seaver <tseaver at palladion.com> wrote:
>>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>>> 
>>> On 02/28/2013 11:27 AM, Ronald Oussoren wrote:
>>> 
>>>> But necessary to have. Or am the only one that accidently released
>>>> a version that had serious bugs?
>>> 
>>> Nope.  The way to address such a version is to release a new, fixed 
>>> version (preferably one with a suitably-PEP-compliant version which 
>>> indicates the version being corrected).  The only legitimate reason
>>> to yank a release is that you are under legal compulsion to do so
>>> (a takedown notice or equivalent), or you discover that the version
>>> released has been trojaned in some way.
>> 
>> You may have listed the only reason *you will allow* but the owner of 
>> the package can do whatever they want. You're correct that once the 
>> package is "out in the wild" you can't get all those copies back, but 
>> they can (for whatever reason they have and no, I'm not going to 
>> needlessly speculate) remove it from PyPI. You have no legal or moral 
>> right to compel them to do otherwise.
> 
> I wasn't claiming any right:  I was arguing that anybody who shares
> software with the community does the community a disservice by removing a
> release because it "has serious bugs."  Brown-bag releases happen:  ab
> open source community repairs the damage from them by making new
> releases, not by covering them up.

I luckily haven't run into this with software I release on PyPI yet, but sometimes
pulling back an update while working on a fix is the responsible thing to do. 

<snark>
You must be living in some other community than I do, I usually get to fix
my own bugs.
</snark>

Ronald

> 
> 
> Tres.
> - -- 
> ===================================================================
> Tres Seaver          +1 540-429-0999          tseaver at palladion.com
> Palladion Software   "Excellence by Design"    http://palladion.com
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with undefined - http://www.enigmail.net/
> 
> iEYEARECAAYFAlEwG7IACgkQ+gerLs4ltQ6RCACggZ38+vBTCXGlnwtm/mrmvkCp
> 370An1S6hQJkmJBVFQ5dkO+XeElkUPuj
> =zjAd
> -----END PGP SIGNATURE-----
> 
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


From regebro at gmail.com  Fri Mar  1 08:35:23 2013
From: regebro at gmail.com (Lennart Regebro)
Date: Fri, 1 Mar 2013 08:35:23 +0100
Subject: [Catalog-sig] remove historic download/homepage links for a
	project
In-Reply-To: <F92342D5-2BDD-49AC-B508-9F032B8D45A4@mac.com>
References: <512E3588.4020305@egenix.com>
	<CAL0kPAW3tRmcO5b-HDfYttS6S9Xk04L-rJ5=_6_AAMEdzp3Lqg@mail.gmail.com>
	<20130227183754.GR9677@merlinux.eu>
	<CAKgW=6JZjPpcHrdjxn-z9vYVAiZit=xY3uv0sNjRhb-nVjuBnQ@mail.gmail.com>
	<512E6361.1030108@inaugust.com>
	<CAL0kPAUnn91timF6y88UAxNVT6UusGx5xsgT_2XUiPk3TTuhVA@mail.gmail.com>
	<CALeMXf6KJnWqE-bGsvUN4Qt+=z3fXQ4-xvC23Ckt9FYZ0xtJyA@mail.gmail.com>
	<CAKgW=6K=rp8mx+oihGNCKSvL6Fy-3QC9Kr-m6gWkPf8ZBguOUw@mail.gmail.com>
	<20130228092835.GX9677@merlinux.eu> <kgnjkq$gpj$2@ger.gmane.org>
	<20130228134100.GZ9677@merlinux.eu>
	<3065EDAA-8BCE-4D5F-A59F-D0D4F2B33B25@mac.com>
	<kgo39d$nao$1@ger.gmane.org>
	<CAHrZfZBn_xJ9BKdcfppCgsq-GrnkL67FNrfxZAtWK5K=_Tg1Nw@mail.gmail.com>
	<kgp6ae$vkg$1@ger.gmane.org>
	<F92342D5-2BDD-49AC-B508-9F032B8D45A4@mac.com>
Message-ID: <CAL0kPAXEmPGdwT-av1AkqprgTXjF5DaxZdAkiQ8aANuidkMfkg@mail.gmail.com>

On Fri, Mar 1, 2013 at 8:09 AM, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
> I luckily haven't run into this with software I release on PyPI yet, but sometimes
> pulling back an update while working on a fix is the responsible thing to do.

The the bug leads to data loss or security holes I agree.

//Lennart

From reinout at vanrees.org  Fri Mar  1 10:02:46 2013
From: reinout at vanrees.org (Reinout van Rees)
Date: Fri, 01 Mar 2013 10:02:46 +0100
Subject: [Catalog-sig] Deprecate External Links
In-Reply-To: <20130228200848.GB9677@merlinux.eu>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org> <20130228200848.GB9677@merlinux.eu>
Message-ID: <kgpqrj$2pu$1@ger.gmane.org>

On 28-02-13 21:08, holger krekel wrote:
>> I have seen that position in this discussion ("I have to upload 120
>> >files per release, so I won't do that", for instance).

> haven't seen that.

Marc-Andre Lemburg said this, which I took to mean 120 uploads per release:

"""
However, taking our egenix-mx-base package as example, we have
120 distribution files for every single release. Uploading those
to PyPI would not only take long, but also ...
"""



Reinout

-- 
Reinout van Rees                    http://reinout.vanrees.org/
reinout at vanrees.org             http://www.nelen-schuurmans.nl/
"If you're not sure what to do, make something. -- Paul Graham"


From holger at merlinux.eu  Fri Mar  1 10:20:09 2013
From: holger at merlinux.eu (holger krekel)
Date: Fri, 1 Mar 2013 09:20:09 +0000
Subject: [Catalog-sig] Deprecate External Links
In-Reply-To: <kgpqrj$2pu$1@ger.gmane.org>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org> <20130228200848.GB9677@merlinux.eu>
	<kgpqrj$2pu$1@ger.gmane.org>
Message-ID: <20130301092009.GD9677@merlinux.eu>

On Fri, Mar 01, 2013 at 10:02 +0100, Reinout van Rees wrote:
> On 28-02-13 21:08, holger krekel wrote:
> >>I have seen that position in this discussion ("I have to upload 120
> >>>files per release, so I won't do that", for instance).
> 
> >haven't seen that.
> 
> Marc-Andre Lemburg said this, which I took to mean 120 uploads per release:
> 
> """
> However, taking our egenix-mx-base package as example, we have
> 120 distribution files for every single release. Uploading those
> to PyPI would not only take long, but also ...
> """

Ah ok, thanks.  Didn't interpret Marc-Andre's post as claiming that 
downloads/homepage crawling is a good idea, though.  Just that there
has been reasons not to upload things which need to be addressed,
especially the need for enough storage space.

best,
holger

> 
> 
> Reinout
> 
> -- 
> Reinout van Rees                    http://reinout.vanrees.org/
> reinout at vanrees.org             http://www.nelen-schuurmans.nl/
> "If you're not sure what to do, make something. -- Paul Graham"
> 
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
> 

From mal at egenix.com  Fri Mar  1 10:24:53 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 01 Mar 2013 10:24:53 +0100
Subject: [Catalog-sig] Deprecate External Links
In-Reply-To: <kgpqrj$2pu$1@ger.gmane.org>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org> <20130228200848.GB9677@merlinux.eu>
	<kgpqrj$2pu$1@ger.gmane.org>
Message-ID: <513073E5.20900@egenix.com>

On 01.03.2013 10:02, Reinout van Rees wrote:
> On 28-02-13 21:08, holger krekel wrote:
>>> I have seen that position in this discussion ("I have to upload 120
>>> >files per release, so I won't do that", for instance).
> 
>> haven't seen that.
> 
> Marc-Andre Lemburg said this, which I took to mean 120 uploads per release:
> 
> """
> However, taking our egenix-mx-base package as example, we have
> 120 distribution files for every single release. Uploading those
> to PyPI would not only take long, but also ...
> """

Correct, with a total of over 100MB per release. However, the above
quote is slightly incorrect: I did not say "I won't do that", just
that there are issues with doing this:

* It currently takes too long uploading that many files to
  PyPI. This causes a problem, since in order to start the upload,
  we have to register the release on PyPI, which tools will then
  immediately find. However, during the upload time, they won't
  necessarily find the right files to download and then fail.

  The proposed pull mechanism (see
  http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
  would work around this problem: tools would simply go to
  our servers in case they can't find the files on PyPI.

* PyPI doesn't allow us to upload two egg files with the same
  name: we have to provide egg files for UCS2 Python builds and
  UCS4 Python builds, since easy_install/setuptools/pip don't
  differentiate between the two variants. This is the main
  reason why we're hosting our own PyPI-style indexes, one for
  UCS2 and the other for UCS4 builds:
  https://downloads.egenix.com/python/index/ucs2/
  https://downloads.egenix.com/python/index/ucs4/

* I'm not sure whether we want to import our crypto packages
  to the US, so for a subset of the files, we'd probably
  continue to use our servers in Germany.

  Again, with the above proposal, this shouldn't be a problem.

* Ihe PyPI terms are a bummer for us, but this can be fixed,
  I guess.

If we can resolve the issues, we'd have no problem having the
files mirrored on PyPI.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 01 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From holger at merlinux.eu  Fri Mar  1 10:46:55 2013
From: holger at merlinux.eu (holger krekel)
Date: Fri, 1 Mar 2013 09:46:55 +0000
Subject: [Catalog-sig] Deprecate External Links
In-Reply-To: <513073E5.20900@egenix.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org> <20130228200848.GB9677@merlinux.eu>
	<kgpqrj$2pu$1@ger.gmane.org> <513073E5.20900@egenix.com>
Message-ID: <20130301094655.GE9677@merlinux.eu>

On Fri, Mar 01, 2013 at 10:24 +0100, M.-A. Lemburg wrote:
> On 01.03.2013 10:02, Reinout van Rees wrote:
> > On 28-02-13 21:08, holger krekel wrote:
> >>> I have seen that position in this discussion ("I have to upload 120
> >>> >files per release, so I won't do that", for instance).
> > 
> >> haven't seen that.
> > 
> > Marc-Andre Lemburg said this, which I took to mean 120 uploads per release:
> > 
> > """
> > However, taking our egenix-mx-base package as example, we have
> > 120 distribution files for every single release. Uploading those
> > to PyPI would not only take long, but also ...
> > """
> 
> Correct, with a total of over 100MB per release. However, the above
> quote is slightly incorrect: I did not say "I won't do that", just
> that there are issues with doing this:
> 
> * It currently takes too long uploading that many files to
>   PyPI. This causes a problem, since in order to start the upload,
>   we have to register the release on PyPI, which tools will then
>   immediately find. However, during the upload time, they won't
>   necessarily find the right files to download and then fail.

You can actually skip the register and directly upload, it will
create release metadata on the fly.  Not sure if it's complete
but you can then do a "register" to update it if needed.

best,
holger

>   The proposed pull mechanism (see
>   http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
>   would work around this problem: tools would simply go to
>   our servers in case they can't find the files on PyPI.
> 
> * PyPI doesn't allow us to upload two egg files with the same
>   name: we have to provide egg files for UCS2 Python builds and
>   UCS4 Python builds, since easy_install/setuptools/pip don't
>   differentiate between the two variants. This is the main
>   reason why we're hosting our own PyPI-style indexes, one for
>   UCS2 and the other for UCS4 builds:
>   https://downloads.egenix.com/python/index/ucs2/
>   https://downloads.egenix.com/python/index/ucs4/
> 
> * I'm not sure whether we want to import our crypto packages
>   to the US, so for a subset of the files, we'd probably
>   continue to use our servers in Germany.
> 
>   Again, with the above proposal, this shouldn't be a problem.
> 
> * Ihe PyPI terms are a bummer for us, but this can be fixed,
>   I guess.
> 
> If we can resolve the issues, we'd have no problem having the
> files mirrored on PyPI.
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 01 2013)
> >>> Python Projects, Consulting and Support ...   http://www.egenix.com/
> >>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
> >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>            Registered at Amtsgericht Duesseldorf: HRB 46611
>                http://www.egenix.com/company/contact/
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
> 

From richard at python.org  Fri Mar  1 10:53:11 2013
From: richard at python.org (Richard Jones)
Date: Fri, 1 Mar 2013 20:53:11 +1100
Subject: [Catalog-sig] Deprecate External Links
In-Reply-To: <513073E5.20900@egenix.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org> <20130228200848.GB9677@merlinux.eu>
	<kgpqrj$2pu$1@ger.gmane.org> <513073E5.20900@egenix.com>
Message-ID: <CAHrZfZBh3b=joi7hDz=0gONeaB6jS589gtx59D8iuvmb+pY0VQ@mail.gmail.com>

On 1 March 2013 20:24, M.-A. Lemburg <mal at egenix.com> wrote:
> * PyPI doesn't allow us to upload two egg files with the same
>   name: we have to provide egg files for UCS2 Python builds and
>   UCS4 Python builds, since easy_install/setuptools/pip don't
>   differentiate between the two variants. This is the main
>   reason why we're hosting our own PyPI-style indexes, one for
>   UCS2 and the other for UCS4 builds:
>   https://downloads.egenix.com/python/index/ucs2/
>   https://downloads.egenix.com/python/index/ucs4/

Hm. that's a tricky one. I've assumed that the filename encodes all of
the relevant build information. Perhaps that should be addressed
(otherwise pity the poor user who downloads one or the other
incorrectly and then runs into issues that are probably quite
perplexing.)


    Richard

From holger at merlinux.eu  Fri Mar  1 11:19:56 2013
From: holger at merlinux.eu (holger krekel)
Date: Fri, 1 Mar 2013 10:19:56 +0000
Subject: [Catalog-sig] homepage/download metadata cleaning
Message-ID: <20130301101956.GH9677@merlinux.eu>

Hi Richard, all,

somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py"
script which takes a project name as an argument and then goes to 
pypi.python.org and removes all homepage/download metadata entries for 
this project.  This sanitizes/speeds up installation because
pip/easy_install don't need to crawl them anymore.  I just did this for
three of my projects, (pytest, tox and py) and it seems to work fine.

Now before i release this as a tool, i wonder: Is it a good idea to remove
download/homepage entries?  Is there any current machine use (other than
the dreaded crawling) for the homepage/download_url per-release metadata 
fields?

For humans the homepage link is nicely discoverable if the long-description
doesn't mention it prominently.  But i think there also is a "project url" 
or "bugtrack url" for a project so maybe those could be used to reference 
these important pages?  (i am a bit confused on the exact meaning of those
urls, btw).

Should we maybe stop advertising "homepage" and "download_url"
and instead see to extend project-url/bugtrackurl to be used
and shown nicely? The latter are independent of releases which i think
makes sense - what use are old probably unreachable/borked homepages
anyway.  And it's also not too bad having to go once to pypi.python.org
to set it, usually it seldomly changes.

best,
holger

From mal at egenix.com  Fri Mar  1 12:04:24 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 01 Mar 2013 12:04:24 +0100
Subject: [Catalog-sig] homepage/download metadata cleaning
In-Reply-To: <20130301101956.GH9677@merlinux.eu>
References: <20130301101956.GH9677@merlinux.eu>
Message-ID: <51308B38.9030709@egenix.com>

On 01.03.2013 11:19, holger krekel wrote:
> Hi Richard, all,
> 
> somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py"
> script which takes a project name as an argument and then goes to 
> pypi.python.org and removes all homepage/download metadata entries for 
> this project.  This sanitizes/speeds up installation because
> pip/easy_install don't need to crawl them anymore.  I just did this for
> three of my projects, (pytest, tox and py) and it seems to work fine.

Does it also cleanup the links that PyPI adds to the /simple/ by
parsing the project description for links ?

I think those are far nastier than the homepage and download links,
which can be put to some good use to limit the external lookups
(see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)

See e.g. https://pypi.python.org/simple/zc.buildout/
for a good example of the mess this generates... even mailto links
get listed and "file:///" links open up the installers for all
kinds of nasty things (unless they explicitly protect against
following these).

> Now before i release this as a tool, i wonder: Is it a good idea to remove
> download/homepage entries?  Is there any current machine use (other than
> the dreaded crawling) for the homepage/download_url per-release metadata 
> fields?
> 
> For humans the homepage link is nicely discoverable if the long-description
> doesn't mention it prominently.  But i think there also is a "project url" 
> or "bugtrack url" for a project so maybe those could be used to reference 
> these important pages?  (i am a bit confused on the exact meaning of those
> urls, btw).
> 
> Should we maybe stop advertising "homepage" and "download_url"
> and instead see to extend project-url/bugtrackurl to be used
> and shown nicely? The latter are independent of releases which i think
> makes sense - what use are old probably unreachable/borked homepages
> anyway.  And it's also not too bad having to go once to pypi.python.org
> to set it, usually it seldomly changes.

I think it would be better to differentiate between showing the
fields on the project pages, where they provide useful resources
for people, and their use on the /simple/ index pages which are
meant for programs to parse.

IMO, the homepage and download links on the project pages are
indeed very useful for people. On the /simple/ index a homepage
link is probably not all that useful (provided a download link
is set). The download links serve the purpose of directing
tools to the right location, so those do belong on the /simple/
index listings. I'd completely remove the links parsed from
the descriptions, since those don't really provide a good
basis for crawling (the description is meant for humans to
parse, not programs).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 01 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From donald.stufft at gmail.com  Fri Mar  1 12:09:54 2013
From: donald.stufft at gmail.com (Donald Stufft)
Date: Fri, 1 Mar 2013 06:09:54 -0500
Subject: [Catalog-sig] homepage/download metadata cleaning
In-Reply-To: <51308B38.9030709@egenix.com>
References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com>
Message-ID: <71AA0F5ADB4E4C33BBB37833733526A0@gmail.com>

On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote:
> On 01.03.2013 11:19, holger krekel wrote:
> > Hi Richard, all,
> > 
> > somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py"
> > script which takes a project name as an argument and then goes to 
> > pypi.python.org (http://pypi.python.org) and removes all homepage/download metadata entries for 
> > this project. This sanitizes/speeds up installation because
> > pip/easy_install don't need to crawl them anymore. I just did this for
> > three of my projects, (pytest, tox and py) and it seems to work fine.
> > 
> 
> 
> Does it also cleanup the links that PyPI adds to the /simple/ by
> parsing the project description for links ?
> 
> I think those are far nastier than the homepage and download links,
> which can be put to some good use to limit the external lookups
> (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
> 
> See e.g. https://pypi.python.org/simple/zc.buildout/
> for a good example of the mess this generates... even mailto links
> get listed and "file:///" links open up the installers for all
> kinds of nasty things (unless they explicitly protect against
> following these).
> 
> 

pip at least, and I assume the other tools don't spider those links, but
they do consider them for download (e.g. if the link looks installable
it will be a candidate for installing, but  it won't fetch it, and look for 
more links like it will donwnload_url/home_page).

I believe that's the way it's structured atm.
> 
> > Now before i release this as a tool, i wonder: Is it a good idea to remove
> > download/homepage entries? Is there any current machine use (other than
> > the dreaded crawling) for the homepage/download_url per-release metadata 
> > fields?
> > 
> > For humans the homepage link is nicely discoverable if the long-description
> > doesn't mention it prominently. But i think there also is a "project url" 
> > or "bugtrack url" for a project so maybe those could be used to reference 
> > these important pages? (i am a bit confused on the exact meaning of those
> > urls, btw).
> > 
> > Should we maybe stop advertising "homepage" and "download_url"
> > and instead see to extend project-url/bugtrackurl to be used
> > and shown nicely? The latter are independent of releases which i think
> > makes sense - what use are old probably unreachable/borked homepages
> > anyway. And it's also not too bad having to go once to pypi.python.org (http://pypi.python.org)
> > to set it, usually it seldomly changes.
> > 
> 
> 
> I think it would be better to differentiate between showing the
> fields on the project pages, where they provide useful resources
> for people, and their use on the /simple/ index pages which are
> meant for programs to parse.
> 
> IMO, the homepage and download links on the project pages are
> indeed very useful for people. On the /simple/ index a homepage
> link is probably not all that useful (provided a download link
> is set). The download links serve the purpose of directing
> tools to the right location, so those do belong on the /simple/
> index listings. I'd completely remove the links parsed from
> the descriptions, since those don't really provide a good
> basis for crawling (the description is meant for humans to
> parse, not programs).
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com (http://eGenix.com)
> 
> Professional Python Services directly from the Source (#1, Mar 01 2013)
> > > > Python Projects, Consulting and Support ... http://www.egenix.com/
> > > > mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/
> > > > mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
> > > > 
> > > 
> > 
> 
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
> eGenix.com (http://eGenix.com) Software, Skills and Services GmbH Pastor-Loeh-Str.48
> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
> Registered at Amtsgericht Duesseldorf: HRB 46611
> http://www.egenix.com/company/contact/
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org (mailto:Catalog-SIG at python.org)
> http://mail.python.org/mailman/listinfo/catalog-sig
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130301/86943458/attachment.html>

From donald.stufft at gmail.com  Fri Mar  1 12:10:30 2013
From: donald.stufft at gmail.com (Donald Stufft)
Date: Fri, 1 Mar 2013 06:10:30 -0500
Subject: [Catalog-sig] homepage/download metadata cleaning
In-Reply-To: <51308B38.9030709@egenix.com>
References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com>
Message-ID: <AAAFD81A821F48F4A0DFC42BBDF98C6C@gmail.com>

On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote:
> On 01.03.2013 11:19, holger krekel wrote:
> > Hi Richard, all,
> > 
> > somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py"
> > script which takes a project name as an argument and then goes to 
> > pypi.python.org (http://pypi.python.org) and removes all homepage/download metadata entries for 
> > this project. This sanitizes/speeds up installation because
> > pip/easy_install don't need to crawl them anymore. I just did this for
> > three of my projects, (pytest, tox and py) and it seems to work fine.
> > 
> 
> 
> Does it also cleanup the links that PyPI adds to the /simple/ by
> parsing the project description for links ?
> 
> I think those are far nastier than the homepage and download links,
> which can be put to some good use to limit the external lookups
> (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
> 
> See e.g. https://pypi.python.org/simple/zc.buildout/
> for a good example of the mess this generates... even mailto links
> get listed and "file:///" links open up the installers for all
> kinds of nasty things (unless they explicitly protect against
> following these).
> 
> > Now before i release this as a tool, i wonder: Is it a good idea to remove
> > download/homepage entries? Is there any current machine use (other than
> > the dreaded crawling) for the homepage/download_url per-release metadata 
> > fields?
> > 
> > For humans the homepage link is nicely discoverable if the long-description
> > doesn't mention it prominently. But i think there also is a "project url" 
> > or "bugtrack url" for a project so maybe those could be used to reference 
> > these important pages? (i am a bit confused on the exact meaning of those
> > urls, btw).
> > 
> > Should we maybe stop advertising "homepage" and "download_url"
> > and instead see to extend project-url/bugtrackurl to be used
> > and shown nicely? The latter are independent of releases which i think
> > makes sense - what use are old probably unreachable/borked homepages
> > anyway. And it's also not too bad having to go once to pypi.python.org (http://pypi.python.org)
> > to set it, usually it seldomly changes.
> > 
> 
> 
> I think it would be better to differentiate between showing the
> fields on the project pages, where they provide useful resources
> for people, and their use on the /simple/ index pages which are
> meant for programs to parse.
> 
> IMO, the homepage and download links on the project pages are
> indeed very useful for people. On the /simple/ index a homepage
> link is probably not all that useful (provided a download link
> is set). The download links serve the purpose of directing
> tools to the right location, so those do belong on the /simple/
> index listings. I'd completely remove the links parsed from
> the descriptions, since those don't really provide a good
> basis for crawling (the description is meant for humans to
> parse, not programs).
> 
> 

I'd prefer this to eventually get replaced by the project-url metadata
but that's not available yet and at the moment are useful. 
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com (http://eGenix.com)
> 
> Professional Python Services directly from the Source (#1, Mar 01 2013)
> > > > Python Projects, Consulting and Support ... http://www.egenix.com/
> > > > mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/
> > > > mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
> > > > 
> > > 
> > 
> 
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
> eGenix.com (http://eGenix.com) Software, Skills and Services GmbH Pastor-Loeh-Str.48
> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
> Registered at Amtsgericht Duesseldorf: HRB 46611
> http://www.egenix.com/company/contact/
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org (mailto:Catalog-SIG at python.org)
> http://mail.python.org/mailman/listinfo/catalog-sig
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130301/49e2d932/attachment-0001.html>

From holger at merlinux.eu  Fri Mar  1 12:17:07 2013
From: holger at merlinux.eu (holger krekel)
Date: Fri, 1 Mar 2013 11:17:07 +0000
Subject: [Catalog-sig] homepage/download metadata cleaning
In-Reply-To: <71AA0F5ADB4E4C33BBB37833733526A0@gmail.com>
References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com>
	<71AA0F5ADB4E4C33BBB37833733526A0@gmail.com>
Message-ID: <20130301111707.GI9677@merlinux.eu>

On Fri, Mar 01, 2013 at 06:09 -0500, Donald Stufft wrote:
> On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote:
> > On 01.03.2013 11:19, holger krekel wrote:
> > > Hi Richard, all,
> > > 
> > > somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py"
> > > script which takes a project name as an argument and then goes to 
> > > pypi.python.org (http://pypi.python.org) and removes all homepage/download metadata entries for 
> > > this project. This sanitizes/speeds up installation because
> > > pip/easy_install don't need to crawl them anymore. I just did this for
> > > three of my projects, (pytest, tox and py) and it seems to work fine.
> > > 
> > 
> > 
> > Does it also cleanup the links that PyPI adds to the /simple/ by
> > parsing the project description for links ?
> > 
> > I think those are far nastier than the homepage and download links,
> > which can be put to some good use to limit the external lookups
> > (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
> > 
> > See e.g. https://pypi.python.org/simple/zc.buildout/
> > for a good example of the mess this generates... even mailto links
> > get listed and "file:///" links open up the installers for all
> > kinds of nasty things (unless they explicitly protect against
> > following these).
> > 
> > 
> 
> pip at least, and I assume the other tools don't spider those links, but
> they do consider them for download (e.g. if the link looks installable
> it will be a candidate for installing, but  it won't fetch it, and look for 
> more links like it will donwnload_url/home_page).
> 
> I believe that's the way it's structured atm.

That's right. Even though the long-description extracted links 
look ugly on a simple/PKGNAME page, neither pip nor easy_install do anything
with them except if the "href" ends in "#egg=PKGNAME-" in which case they are
taken as pointing to a development tarball (e.g. at github or bitbucket).
ASFAIK a link like "PKGNAME-VER.tar.gz" will not be treated as
an installation candidate, just the "#egg=PKGNAME" one.

best,
holger


> > 
> > > Now before i release this as a tool, i wonder: Is it a good idea to remove
> > > download/homepage entries? Is there any current machine use (other than
> > > the dreaded crawling) for the homepage/download_url per-release metadata 
> > > fields?
> > > 
> > > For humans the homepage link is nicely discoverable if the long-description
> > > doesn't mention it prominently. But i think there also is a "project url" 
> > > or "bugtrack url" for a project so maybe those could be used to reference 
> > > these important pages? (i am a bit confused on the exact meaning of those
> > > urls, btw).
> > > 
> > > Should we maybe stop advertising "homepage" and "download_url"
> > > and instead see to extend project-url/bugtrackurl to be used
> > > and shown nicely? The latter are independent of releases which i think
> > > makes sense - what use are old probably unreachable/borked homepages
> > > anyway. And it's also not too bad having to go once to pypi.python.org (http://pypi.python.org)
> > > to set it, usually it seldomly changes.
> > > 
> > 
> > 
> > I think it would be better to differentiate between showing the
> > fields on the project pages, where they provide useful resources
> > for people, and their use on the /simple/ index pages which are
> > meant for programs to parse.
> > 
> > IMO, the homepage and download links on the project pages are
> > indeed very useful for people. On the /simple/ index a homepage
> > link is probably not all that useful (provided a download link
> > is set). The download links serve the purpose of directing
> > tools to the right location, so those do belong on the /simple/
> > index listings. I'd completely remove the links parsed from
> > the descriptions, since those don't really provide a good
> > basis for crawling (the description is meant for humans to
> > parse, not programs).
> > 
> > -- 
> > Marc-Andre Lemburg
> > eGenix.com (http://eGenix.com)
> > 
> > Professional Python Services directly from the Source (#1, Mar 01 2013)
> > > > > Python Projects, Consulting and Support ... http://www.egenix.com/
> > > > > mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/
> > > > > mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
> > > > > 
> > > > 
> > > 
> > 
> > ________________________________________________________________________
> > 
> > ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> > 
> > eGenix.com (http://eGenix.com) Software, Skills and Services GmbH Pastor-Loeh-Str.48
> > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
> > Registered at Amtsgericht Duesseldorf: HRB 46611
> > http://www.egenix.com/company/contact/
> > _______________________________________________
> > Catalog-SIG mailing list
> > Catalog-SIG at python.org (mailto:Catalog-SIG at python.org)
> > http://mail.python.org/mailman/listinfo/catalog-sig
> > 
> > 
> 
> 

From jnoller at gmail.com  Fri Mar  1 12:28:00 2013
From: jnoller at gmail.com (Jesse Noller)
Date: Fri, 1 Mar 2013 06:28:00 -0500
Subject: [Catalog-sig] PyPI terms
In-Reply-To: <kgp0ct$foe$1@ger.gmane.org>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512E28CB.9080907@egenix.com>
	<c5b80cd2-e956-49db-942d-ee5f8a08d00d@email.android.com>
	<512E422C.3070001@egenix.com>
	<4A372726-1248-4E43-AC00-863DA153D42C@coderanger.net>
	<512F2FE9.9080001@egenix.com>
	<3D8F33FF-A4FA-45B9-8AF5-97DA91876C1E@coderanger.net>
	<512F9E8A.1010707@egenix.com>
	<D07E42FB-A7A9-4477-9E82-2BDEE066EA2A@coderanger.net>
	<kgp0ct$foe$1@ger.gmane.org>
Message-ID: <F29AA0D9-E18C-4F60-88D4-3DD6D30720D1@gmail.com>

Since we're hotly contesting the pypi terms of service - I thought I'd page Van, who is the chairman and I'm pretty sure drafted the terms of service for pypi for the foundation.

He should be able to bludgeon us all!

Jesse

On Feb 28, 2013, at 8:31 PM, Terry Reedy <tjreedy at udel.edu> wrote:

> On 2/28/2013 1:19 PM, Noah Kantrowitz wrote:
> 
>> Because I happen to have YouTube open anyway:
>> 
>> """ For clarity, you retain all of your ownership rights in your
>> Content. However, by submitting Content to YouTube, you hereby grant
>> YouTube a worldwide, non-exclusive, royalty-free, sublicenseable and
>> transferable license to use, reproduce, distribute, prepare
>> derivative works of, display, and perform the Content in connection
>> with the Service and YouTube's (and its successors' and affiliates')
>> business, including without limitation for promoting and
>> redistributing part or all of the Service (and derivative works
>> thereof) in any media formats and through any media channels. You
>> also hereby grant each user of the Service a non-exclusive license to
>> access your Content through the Service, and to use, reproduce,
>> distribute, display and perform such Content as permitted through the
>> functionality of the Service and under these Terms of Service. The
>> above licenses granted by you in video Content you submit to the
>> Service terminate within a commercially reasonable time after you
>> remove or delete your videos from the Service. You understand and
>> agree, however, that YouTube may retain, but not display, distribute,
>> or perform, server copies of your videos that have been removed or
>> deleted. The above licenses granted by you in user comments you
>> submit are perpetual and irrevocable. """
>> 
>> Slightly different wording,
> 
> Noah, I understand that you desperately do not want to admit that the PSF requirement for uploading to it servers is unusually broad, because you do not want to admit that rational people might have a reason to not upload, but there it is.
> 
> 1. The uploader only authorizes distribution via the YouTube infrastructure. Indeed, Google want that limitation because it wants to be the one that monetizes distribution. So it only streams videos (free ones, anyway) and does *not* download. Anyone who subverts this and captures the stream as a download has no rights to it.
> 
> 2. The uploader can terminate the license with Google. Because of #1, such termination stops anyone from legal distribution.
> 
> Note: Flickr gives uploaders the choice of whether images can be downloaded or only embedded in a flickr web page. It also lets uploaders set the license that applies to flickr users. And it allows deletion of images.
> 
>> only the license to comments is irrevocable,
> 
> Irrelevant to this discussion.
> 
> > for videos they just promise to stop distributing
> 
> This is the important point.
> 
>> but not actually remove your content.
> 
> This is a mostly irrelevant practical issue. Finding and scrubbing every backup copy is difficult and expensive, especially for disk-image backups or serial tape media (if indeed they still use such) or backups stuck down in a deep salt mine. Any repository that does backups has to have this proviso. (I am sure, for instance, that Flickr does now.)
> 
> 
> My take on the current license is this: the original upload license was rather minimal. The lawyer decided it was insufficient. Rather that craft a broader license with the absolute minimum rights grant necessary, the lawyer took the easy, quick, and cheap-for-psf route of a maximal rights grant. That is okay with me as long as it is not mis-represented and as long as people do not try to bludgeon me or anyone else in signing something we do not agree to.
> 
> Note: when I contribute text and code to the CPython repository, I also give up all control. I know and accept that, and even want that, because it also means that I can re-write *other* people's text and code. But people may reasonably want to keep more control over their independent sole-author work.
> 
> -- 
> Terry Jan Reedy
> 
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

From jnoller at gmail.com  Fri Mar  1 12:30:02 2013
From: jnoller at gmail.com (Jesse Noller)
Date: Fri, 1 Mar 2013 06:30:02 -0500
Subject: [Catalog-sig] Deprecate External Links
In-Reply-To: <513073E5.20900@egenix.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org> <20130228200848.GB9677@merlinux.eu>
	<kgpqrj$2pu$1@ger.gmane.org> <513073E5.20900@egenix.com>
Message-ID: <D2A21436-17AD-4F54-9CCB-63E636DEAE9A@gmail.com>

Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a bummer so we can see if there is actually an issue to be resolved or a matter of taste?

We need to protect the foundation while preserving author rights - but I don't want one user / subset dictating how we evolve the technology.

Jesse 

On Mar 1, 2013, at 4:24 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:

> On 01.03.2013 10:02, Reinout van Rees wrote:
>> On 28-02-13 21:08, holger krekel wrote:
>>>> I have seen that position in this discussion ("I have to upload 120
>>>>> files per release, so I won't do that", for instance).
>> 
>>> haven't seen that.
>> 
>> Marc-Andre Lemburg said this, which I took to mean 120 uploads per release:
>> 
>> """
>> However, taking our egenix-mx-base package as example, we have
>> 120 distribution files for every single release. Uploading those
>> to PyPI would not only take long, but also ...
>> """
> 
> Correct, with a total of over 100MB per release. However, the above
> quote is slightly incorrect: I did not say "I won't do that", just
> that there are issues with doing this:
> 
> * It currently takes too long uploading that many files to
>  PyPI. This causes a problem, since in order to start the upload,
>  we have to register the release on PyPI, which tools will then
>  immediately find. However, during the upload time, they won't
>  necessarily find the right files to download and then fail.
> 
>  The proposed pull mechanism (see
>  http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
>  would work around this problem: tools would simply go to
>  our servers in case they can't find the files on PyPI.
> 
> * PyPI doesn't allow us to upload two egg files with the same
>  name: we have to provide egg files for UCS2 Python builds and
>  UCS4 Python builds, since easy_install/setuptools/pip don't
>  differentiate between the two variants. This is the main
>  reason why we're hosting our own PyPI-style indexes, one for
>  UCS2 and the other for UCS4 builds:
>  https://downloads.egenix.com/python/index/ucs2/
>  https://downloads.egenix.com/python/index/ucs4/
> 
> * I'm not sure whether we want to import our crypto packages
>  to the US, so for a subset of the files, we'd probably
>  continue to use our servers in Germany.
> 
>  Again, with the above proposal, this shouldn't be a problem.
> 
> * Ihe PyPI terms are a bummer for us, but this can be fixed,
>  I guess.
> 
> If we can resolve the issues, we'd have no problem having the
> files mirrored on PyPI.
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 01 2013)
>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

From mal at egenix.com  Fri Mar  1 12:47:22 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 01 Mar 2013 12:47:22 +0100
Subject: [Catalog-sig] PyPI terms (was: Deprecate External Links)
In-Reply-To: <D2A21436-17AD-4F54-9CCB-63E636DEAE9A@gmail.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org> <20130228200848.GB9677@merlinux.eu>
	<kgpqrj$2pu$1@ger.gmane.org> <513073E5.20900@egenix.com>
	<D2A21436-17AD-4F54-9CCB-63E636DEAE9A@gmail.com>
Message-ID: <5130954A.2050805@egenix.com>

On 01.03.2013 12:30, Jesse Noller wrote:
> Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a bummer so we can see if there is actually an issue to be resolved or a matter of taste?
> 
> We need to protect the foundation while preserving author rights - but I don't want one user / subset dictating how we evolve the technology.

I think we should move this discussion to the python-legal-sig list:

http://mail.python.org/mailman/listinfo/python-legal-sig

Let me know when you've subscribed and then we can hash things
out on that list. The catalog sig is not really the suitable
place for these discussions.

> Jesse 
> 
> On Mar 1, 2013, at 4:24 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:
> 
>> On 01.03.2013 10:02, Reinout van Rees wrote:
>>> On 28-02-13 21:08, holger krekel wrote:
>>>>> I have seen that position in this discussion ("I have to upload 120
>>>>>> files per release, so I won't do that", for instance).
>>>
>>>> haven't seen that.
>>>
>>> Marc-Andre Lemburg said this, which I took to mean 120 uploads per release:
>>>
>>> """
>>> However, taking our egenix-mx-base package as example, we have
>>> 120 distribution files for every single release. Uploading those
>>> to PyPI would not only take long, but also ...
>>> """
>>
>> Correct, with a total of over 100MB per release. However, the above
>> quote is slightly incorrect: I did not say "I won't do that", just
>> that there are issues with doing this:
>>
>> * It currently takes too long uploading that many files to
>>  PyPI. This causes a problem, since in order to start the upload,
>>  we have to register the release on PyPI, which tools will then
>>  immediately find. However, during the upload time, they won't
>>  necessarily find the right files to download and then fail.
>>
>>  The proposed pull mechanism (see
>>  http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
>>  would work around this problem: tools would simply go to
>>  our servers in case they can't find the files on PyPI.
>>
>> * PyPI doesn't allow us to upload two egg files with the same
>>  name: we have to provide egg files for UCS2 Python builds and
>>  UCS4 Python builds, since easy_install/setuptools/pip don't
>>  differentiate between the two variants. This is the main
>>  reason why we're hosting our own PyPI-style indexes, one for
>>  UCS2 and the other for UCS4 builds:
>>  https://downloads.egenix.com/python/index/ucs2/
>>  https://downloads.egenix.com/python/index/ucs4/
>>
>> * I'm not sure whether we want to import our crypto packages
>>  to the US, so for a subset of the files, we'd probably
>>  continue to use our servers in Germany.
>>
>>  Again, with the above proposal, this shouldn't be a problem.
>>
>> * Ihe PyPI terms are a bummer for us, but this can be fixed,
>>  I guess.
>>
>> If we can resolve the issues, we'd have no problem having the
>> files mirrored on PyPI.
>>
>> -- 
>> Marc-Andre Lemburg
>> eGenix.com
>>
>> Professional Python Services directly from the Source  (#1, Mar 01 2013)
>>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
>> ________________________________________________________________________
>>
>> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
>>
>>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>>           Registered at Amtsgericht Duesseldorf: HRB 46611
>>               http://www.egenix.com/company/contact/
>> _______________________________________________
>> Catalog-SIG mailing list
>> Catalog-SIG at python.org
>> http://mail.python.org/mailman/listinfo/catalog-sig

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 01 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From jnoller at gmail.com  Fri Mar  1 13:18:24 2013
From: jnoller at gmail.com (Jesse Noller)
Date: Fri, 1 Mar 2013 07:18:24 -0500
Subject: [Catalog-sig] PyPI terms (was: Deprecate External Links)
In-Reply-To: <5130954A.2050805@egenix.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org> <20130228200848.GB9677@merlinux.eu>
	<kgpqrj$2pu$1@ger.gmane.org> <513073E5.20900@egenix.com>
	<D2A21436-17AD-4F54-9CCB-63E636DEAE9A@gmail.com>
	<5130954A.2050805@egenix.com>
Message-ID: <DC9821A5-75C8-4523-81E0-9F9CAD139C34@gmail.com>

I am subscribed: I made the list. We're both board directors too. Changes to the tos should come from legal counsel, and the board

On Mar 1, 2013, at 6:47 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:

> On 01.03.2013 12:30, Jesse Noller wrote:
>> Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a bummer so we can see if there is actually an issue to be resolved or a matter of taste?
>> 
>> We need to protect the foundation while preserving author rights - but I don't want one user / subset dictating how we evolve the technology.
> 
> I think we should move this discussion to the python-legal-sig list:
> 
> http://mail.python.org/mailman/listinfo/python-legal-sig
> 
> Let me know when you've subscribed and then we can hash things
> out on that list. The catalog sig is not really the suitable
> place for these discussions.
> 
>> Jesse 
>> 
>> On Mar 1, 2013, at 4:24 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:
>> 
>>> On 01.03.2013 10:02, Reinout van Rees wrote:
>>>> On 28-02-13 21:08, holger krekel wrote:
>>>>>> I have seen that position in this discussion ("I have to upload 120
>>>>>>> files per release, so I won't do that", for instance).
>>>> 
>>>>> haven't seen that.
>>>> 
>>>> Marc-Andre Lemburg said this, which I took to mean 120 uploads per release:
>>>> 
>>>> """
>>>> However, taking our egenix-mx-base package as example, we have
>>>> 120 distribution files for every single release. Uploading those
>>>> to PyPI would not only take long, but also ...
>>>> """
>>> 
>>> Correct, with a total of over 100MB per release. However, the above
>>> quote is slightly incorrect: I did not say "I won't do that", just
>>> that there are issues with doing this:
>>> 
>>> * It currently takes too long uploading that many files to
>>> PyPI. This causes a problem, since in order to start the upload,
>>> we have to register the release on PyPI, which tools will then
>>> immediately find. However, during the upload time, they won't
>>> necessarily find the right files to download and then fail.
>>> 
>>> The proposed pull mechanism (see
>>> http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
>>> would work around this problem: tools would simply go to
>>> our servers in case they can't find the files on PyPI.
>>> 
>>> * PyPI doesn't allow us to upload two egg files with the same
>>> name: we have to provide egg files for UCS2 Python builds and
>>> UCS4 Python builds, since easy_install/setuptools/pip don't
>>> differentiate between the two variants. This is the main
>>> reason why we're hosting our own PyPI-style indexes, one for
>>> UCS2 and the other for UCS4 builds:
>>> https://downloads.egenix.com/python/index/ucs2/
>>> https://downloads.egenix.com/python/index/ucs4/
>>> 
>>> * I'm not sure whether we want to import our crypto packages
>>> to the US, so for a subset of the files, we'd probably
>>> continue to use our servers in Germany.
>>> 
>>> Again, with the above proposal, this shouldn't be a problem.
>>> 
>>> * Ihe PyPI terms are a bummer for us, but this can be fixed,
>>> I guess.
>>> 
>>> If we can resolve the issues, we'd have no problem having the
>>> files mirrored on PyPI.
>>> 
>>> -- 
>>> Marc-Andre Lemburg
>>> eGenix.com
>>> 
>>> Professional Python Services directly from the Source  (#1, Mar 01 2013)
>>>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
>>> ________________________________________________________________________
>>> 
>>> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
>>> 
>>>  eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>>>   D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>>>          Registered at Amtsgericht Duesseldorf: HRB 46611
>>>              http://www.egenix.com/company/contact/
>>> _______________________________________________
>>> Catalog-SIG mailing list
>>> Catalog-SIG at python.org
>>> http://mail.python.org/mailman/listinfo/catalog-sig
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 01 2013)
>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/

From mal at egenix.com  Fri Mar  1 13:20:55 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 01 Mar 2013 13:20:55 +0100
Subject: [Catalog-sig] PyPI terms
In-Reply-To: <DC9821A5-75C8-4523-81E0-9F9CAD139C34@gmail.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org> <20130228200848.GB9677@merlinux.eu>
	<kgpqrj$2pu$1@ger.gmane.org> <513073E5.20900@egenix.com>
	<D2A21436-17AD-4F54-9CCB-63E636DEAE9A@gmail.com>
	<5130954A.2050805@egenix.com>
	<DC9821A5-75C8-4523-81E0-9F9CAD139C34@gmail.com>
Message-ID: <51309D27.4000204@egenix.com>

On 01.03.2013 13:18, Jesse Noller wrote:
> I am subscribed: I made the list. We're both board directors too. Changes to the tos should come from legal counsel, and the board

Van and all others who are interested as well ?

> On Mar 1, 2013, at 6:47 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:
> 
>> On 01.03.2013 12:30, Jesse Noller wrote:
>>> Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a bummer so we can see if there is actually an issue to be resolved or a matter of taste?
>>>
>>> We need to protect the foundation while preserving author rights - but I don't want one user / subset dictating how we evolve the technology.
>>
>> I think we should move this discussion to the python-legal-sig list:
>>
>> http://mail.python.org/mailman/listinfo/python-legal-sig
>>
>> Let me know when you've subscribed and then we can hash things
>> out on that list. The catalog sig is not really the suitable
>> place for these discussions.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 01 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From jnoller at gmail.com  Fri Mar  1 13:23:53 2013
From: jnoller at gmail.com (Jesse Noller)
Date: Fri, 1 Mar 2013 07:23:53 -0500
Subject: [Catalog-sig] PyPI terms (was: Deprecate External Links)
In-Reply-To: <5130954A.2050805@egenix.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org> <20130228200848.GB9677@merlinux.eu>
	<kgpqrj$2pu$1@ger.gmane.org> <513073E5.20900@egenix.com>
	<D2A21436-17AD-4F54-9CCB-63E636DEAE9A@gmail.com>
	<5130954A.2050805@egenix.com>
Message-ID: <2EC9E943-A57F-4444-956D-FA3AB7AE13AD@gmail.com>

Either the tos is preventing pypi tech, security and distribution enhancements, or they aren't.

If its the latter: then we can stop trotting them out as a reason to grossly improve our services and infrastructure.

On Mar 1, 2013, at 6:47 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:

> On 01.03.2013 12:30, Jesse Noller wrote:
>> Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a bummer so we can see if there is actually an issue to be resolved or a matter of taste?
>> 
>> We need to protect the foundation while preserving author rights - but I don't want one user / subset dictating how we evolve the technology.
> 
> I think we should move this discussion to the python-legal-sig list:
> 
> http://mail.python.org/mailman/listinfo/python-legal-sig
> 
> Let me know when you've subscribed and then we can hash things
> out on that list. The catalog sig is not really the suitable
> place for these discussions.
> 
>> Jesse 
>> 
>> On Mar 1, 2013, at 4:24 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:
>> 
>>> On 01.03.2013 10:02, Reinout van Rees wrote:
>>>> On 28-02-13 21:08, holger krekel wrote:
>>>>>> I have seen that position in this discussion ("I have to upload 120
>>>>>>> files per release, so I won't do that", for instance).
>>>> 
>>>>> haven't seen that.
>>>> 
>>>> Marc-Andre Lemburg said this, which I took to mean 120 uploads per release:
>>>> 
>>>> """
>>>> However, taking our egenix-mx-base package as example, we have
>>>> 120 distribution files for every single release. Uploading those
>>>> to PyPI would not only take long, but also ...
>>>> """
>>> 
>>> Correct, with a total of over 100MB per release. However, the above
>>> quote is slightly incorrect: I did not say "I won't do that", just
>>> that there are issues with doing this:
>>> 
>>> * It currently takes too long uploading that many files to
>>> PyPI. This causes a problem, since in order to start the upload,
>>> we have to register the release on PyPI, which tools will then
>>> immediately find. However, during the upload time, they won't
>>> necessarily find the right files to download and then fail.
>>> 
>>> The proposed pull mechanism (see
>>> http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
>>> would work around this problem: tools would simply go to
>>> our servers in case they can't find the files on PyPI.
>>> 
>>> * PyPI doesn't allow us to upload two egg files with the same
>>> name: we have to provide egg files for UCS2 Python builds and
>>> UCS4 Python builds, since easy_install/setuptools/pip don't
>>> differentiate between the two variants. This is the main
>>> reason why we're hosting our own PyPI-style indexes, one for
>>> UCS2 and the other for UCS4 builds:
>>> https://downloads.egenix.com/python/index/ucs2/
>>> https://downloads.egenix.com/python/index/ucs4/
>>> 
>>> * I'm not sure whether we want to import our crypto packages
>>> to the US, so for a subset of the files, we'd probably
>>> continue to use our servers in Germany.
>>> 
>>> Again, with the above proposal, this shouldn't be a problem.
>>> 
>>> * Ihe PyPI terms are a bummer for us, but this can be fixed,
>>> I guess.
>>> 
>>> If we can resolve the issues, we'd have no problem having the
>>> files mirrored on PyPI.
>>> 
>>> -- 
>>> Marc-Andre Lemburg
>>> eGenix.com
>>> 
>>> Professional Python Services directly from the Source  (#1, Mar 01 2013)
>>>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
>>> ________________________________________________________________________
>>> 
>>> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
>>> 
>>>  eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>>>   D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>>>          Registered at Amtsgericht Duesseldorf: HRB 46611
>>>              http://www.egenix.com/company/contact/
>>> _______________________________________________
>>> Catalog-SIG mailing list
>>> Catalog-SIG at python.org
>>> http://mail.python.org/mailman/listinfo/catalog-sig
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 01 2013)
>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/

From jnoller at gmail.com  Fri Mar  1 13:26:35 2013
From: jnoller at gmail.com (Jesse Noller)
Date: Fri, 1 Mar 2013 07:26:35 -0500
Subject: [Catalog-sig] PyPI terms
In-Reply-To: <51309D27.4000204@egenix.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org> <20130228200848.GB9677@merlinux.eu>
	<kgpqrj$2pu$1@ger.gmane.org> <513073E5.20900@egenix.com>
	<D2A21436-17AD-4F54-9CCB-63E636DEAE9A@gmail.com>
	<5130954A.2050805@egenix.com>
	<DC9821A5-75C8-4523-81E0-9F9CAD139C34@gmail.com>
	<51309D27.4000204@egenix.com>
Message-ID: <68D705AC-BBD0-415D-8AC3-8DEA93F02FE3@gmail.com>



On Mar 1, 2013, at 7:20 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:

> On 01.03.2013 13:18, Jesse Noller wrote:
>> I am subscribed: I made the list. We're both board directors too. Changes to the tos should come from legal counsel, and the board
> 
> Van and all others who are interested as well ?

I do not see a reason for van to be subscribed unless he really wants more email. Actual issues need to be addressed by the board, and elevated there

> 
>> On Mar 1, 2013, at 6:47 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:
>> 
>>> On 01.03.2013 12:30, Jesse Noller wrote:
>>>> Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a bummer so we can see if there is actually an issue to be resolved or a matter of taste?
>>>> 
>>>> We need to protect the foundation while preserving author rights - but I don't want one user / subset dictating how we evolve the technology.
>>> 
>>> I think we should move this discussion to the python-legal-sig list:
>>> 
>>> http://mail.python.org/mailman/listinfo/python-legal-sig
>>> 
>>> Let me know when you've subscribed and then we can hash things
>>> out on that list. The catalog sig is not really the suitable
>>> place for these discussions.
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 01 2013)
>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/

From mal at egenix.com  Fri Mar  1 14:56:11 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 01 Mar 2013 14:56:11 +0100
Subject: [Catalog-sig] PyPI terms
In-Reply-To: <5130954A.2050805@egenix.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org> <20130228200848.GB9677@merlinux.eu>
	<kgpqrj$2pu$1@ger.gmane.org> <513073E5.20900@egenix.com>
	<D2A21436-17AD-4F54-9CCB-63E636DEAE9A@gmail.com>
	<5130954A.2050805@egenix.com>
Message-ID: <5130B37B.6050501@egenix.com>

On 01.03.2013 12:47, M.-A. Lemburg wrote:
> On 01.03.2013 12:30, Jesse Noller wrote:
>> Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a bummer so we can see if there is actually an issue to be resolved or a matter of taste?
>>
>> We need to protect the foundation while preserving author rights - but I don't want one user / subset dictating how we evolve the technology.
> 
> I think we should move this discussion to the python-legal-sig list:
> 
> http://mail.python.org/mailman/listinfo/python-legal-sig
> 
> Let me know when you've subscribed and then we can hash things
> out on that list. The catalog sig is not really the suitable
> place for these discussions.

I've kicked off the discussion on the other list. See you there.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 01 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From jnoller at gmail.com  Fri Mar  1 15:02:39 2013
From: jnoller at gmail.com (Jesse Noller)
Date: Fri, 1 Mar 2013 09:02:39 -0500
Subject: [Catalog-sig] PyPI terms
In-Reply-To: <5130B37B.6050501@egenix.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com>
	<20130228094343.GY9677@merlinux.eu> <kgnk55$oek$1@ger.gmane.org>
	<20130228200848.GB9677@merlinux.eu> <kgpqrj$2pu$1@ger.gmane.org>
	<513073E5.20900@egenix.com>
	<D2A21436-17AD-4F54-9CCB-63E636DEAE9A@gmail.com>
	<5130954A.2050805@egenix.com> <5130B37B.6050501@egenix.com>
Message-ID: <CACQrdOny5KQJFC1mX-_EAdmTg7ZRae97W4c_ZFBLSBUtQ4ZgUg@mail.gmail.com>

Okie doke. So we can move on to putting up the CDN and deprecating external
links for now?


On Fri, Mar 1, 2013 at 8:56 AM, M.-A. Lemburg <mal at egenix.com> wrote:

> On 01.03.2013 12:47, M.-A. Lemburg wrote:
> > On 01.03.2013 12:30, Jesse Noller wrote:
> >> Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a
> bummer so we can see if there is actually an issue to be resolved or a
> matter of taste?
> >>
> >> We need to protect the foundation while preserving author rights - but
> I don't want one user / subset dictating how we evolve the technology.
> >
> > I think we should move this discussion to the python-legal-sig list:
> >
> > http://mail.python.org/mailman/listinfo/python-legal-sig
> >
> > Let me know when you've subscribed and then we can hash things
> > out on that list. The catalog sig is not really the suitable
> > place for these discussions.
>
> I've kicked off the discussion on the other list. See you there.
>
> --
> Marc-Andre Lemburg
> eGenix.com
>
> Professional Python Services directly from the Source  (#1, Mar 01 2013)
> >>> Python Projects, Consulting and Support ...   http://www.egenix.com/
> >>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
> >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
>
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
>
>    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>            Registered at Amtsgericht Duesseldorf: HRB 46611
>                http://www.egenix.com/company/contact/
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130301/e0030940/attachment.html>

From mal at egenix.com  Fri Mar  1 15:11:12 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 01 Mar 2013 15:11:12 +0100
Subject: [Catalog-sig] PyPI terms
In-Reply-To: <CACQrdOny5KQJFC1mX-_EAdmTg7ZRae97W4c_ZFBLSBUtQ4ZgUg@mail.gmail.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org> <20130228200848.GB9677@merlinux.eu>
	<kgpqrj$2pu$1@ger.gmane.org> <513073E5.20900@egenix.com>
	<D2A21436-17AD-4F54-9CCB-63E636DEAE9A@gmail.com>
	<5130954A.2050805@egenix.com> <5130B37B.6050501@egenix.com>
	<CACQrdOny5KQJFC1mX-_EAdmTg7ZRae97W4c_ZFBLSBUtQ4ZgUg@mail.gmail.com>
Message-ID: <5130B700.9070705@egenix.com>

On 01.03.2013 15:02, Jesse Noller wrote:
> Okie doke. So we can move on to putting up the CDN and deprecating external
> links for now?

I don't think anyone is against putting up a CDN. It should meet
the same security requirements we have for the pypi server itself,
ie. HTTPS all the way, proper certificates, operated by the PSF,
perhaps run on a different domain, and whatever other goodies
Donald can come up with ;-)

For the external links we need a migration path... that's in the works.

See http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal for
a proposal that allows migrating away from relying on external
hosts in a backwards compatible and secure way.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 01 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From van.lindberg at gmail.com  Fri Mar  1 15:37:49 2013
From: van.lindberg at gmail.com (VanL)
Date: Fri, 1 Mar 2013 08:37:49 -0600
Subject: [Catalog-sig] PyPI terms
In-Reply-To: <5130B37B.6050501@egenix.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org> <20130228200848.GB9677@merlinux.eu>
	<kgpqrj$2pu$1@ger.gmane.org> <513073E5.20900@egenix.com>
	<D2A21436-17AD-4F54-9CCB-63E636DEAE9A@gmail.com>
	<5130954A.2050805@egenix.com> <5130B37B.6050501@egenix.com>
Message-ID: <948E0503DDAA4FB496104EC483F993F4@gmail.com>

Please forward to catalog-sig if this gets bounced. I'm not on that list.


I drafted these terms of service. I know they are broad. They were made exactly as broad as was needed.

This was not the case that we took the cheap-and-easy route of a maximal rights grant. (And besides, it would have been equally cheap for the PSF either way). What it was is that we investigated and found out all the different ways that people were using PyPI. Of particular importance were these:

- Automated access from scripts (We can't pass through any license terms - no click through or agreement to use
- Automated mirroring - and re-mirroring of mirrors - without any agreement, both to public and private repositories (We need the right to distribute and to allow others to distribute. We needed to protect our downstream and make sure that their common use cases aren't infringing)

These terms were chosen so that our community would have the rights to do these very common things and not be infringing. The only way we could do this was by asking for a broader grant at the time of distribution.

Also, what no one gets is that *the license does not allow modification!* So you can distribute far and wide for any purpose - but you can only distribute what the original author uploaded without being liable for infringement.

People have also said that this overrules the licenses on their packages. That is not so! The licenses in this case run in parallel, and distribution needs to satisfy both licenses or it cannot be done at all.

This was the subject of a lot of thought and a lot of work that a lot of people have not even considered, and it was chosen very deliberately to protect our overall community. Because the protection of the community is a broad purpose, it needed some broad provisions - but it is as tightly crafted as I could get while still not making our known downstream uses infringing.

If it gets changed, it will be over my strenuous objections. 

Van


____________________________
Van Lindberg
van.lindberg at gmail.com


On Friday, March 1, 2013 at 7:56 AM, M.-A. Lemburg wrote:

> On 01.03.2013 12:47, M.-A. Lemburg wrote:
> > On 01.03.2013 12:30, Jesse Noller wrote:
> > > Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a bummer so we can see if there is actually an issue to be resolved or a matter of taste?
> > > 
> > > We need to protect the foundation while preserving author rights - but I don't want one user / subset dictating how we evolve the technology.
> > 
> > I think we should move this discussion to the python-legal-sig list:
> > 
> > http://mail.python.org/mailman/listinfo/python-legal-sig
> > 
> > Let me know when you've subscribed and then we can hash things
> > out on that list. The catalog sig is not really the suitable
> > place for these discussions.
> > 
> 
> 
> I've kicked off the discussion on the other list. See you there.
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com (http://eGenix.com)
> 
> Professional Python Services directly from the Source (#1, Mar 01 2013)
> > > > Python Projects, Consulting and Support ... http://www.egenix.com/
> > > > mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/
> > > > mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
> > > > 
> > > 
> > 
> 
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
> eGenix.com (http://eGenix.com) Software, Skills and Services GmbH Pastor-Loeh-Str.48
> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
> Registered at Amtsgericht Duesseldorf: HRB 46611
> http://www.egenix.com/company/contact/
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130301/389bd7a7/attachment-0001.html>

From holger at merlinux.eu  Fri Mar  1 16:19:24 2013
From: holger at merlinux.eu (holger krekel)
Date: Fri, 1 Mar 2013 15:19:24 +0000
Subject: [Catalog-sig] PyPI terms
In-Reply-To: <5130B700.9070705@egenix.com>
References: <20130228094343.GY9677@merlinux.eu> <kgnk55$oek$1@ger.gmane.org>
	<20130228200848.GB9677@merlinux.eu> <kgpqrj$2pu$1@ger.gmane.org>
	<513073E5.20900@egenix.com>
	<D2A21436-17AD-4F54-9CCB-63E636DEAE9A@gmail.com>
	<5130954A.2050805@egenix.com> <5130B37B.6050501@egenix.com>
	<CACQrdOny5KQJFC1mX-_EAdmTg7ZRae97W4c_ZFBLSBUtQ4ZgUg@mail.gmail.com>
	<5130B700.9070705@egenix.com>
Message-ID: <20130301151924.GK9677@merlinux.eu>

On Fri, Mar 01, 2013 at 15:11 +0100, M.-A. Lemburg wrote:
> On 01.03.2013 15:02, Jesse Noller wrote:
> > Okie doke. So we can move on to putting up the CDN and deprecating external
> > links for now?
> 
> I don't think anyone is against putting up a CDN. It should meet
> the same security requirements we have for the pypi server itself,
> ie. HTTPS all the way, proper certificates, operated by the PSF,
> perhaps run on a different domain, and whatever other goodies
> Donald can come up with ;-)
> 
> For the external links we need a migration path... that's in the works.
> 
> See http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal for
> a proposal that allows migrating away from relying on external
> hosts in a backwards compatible and secure way.

The page doesn't describe the current "scraping" situation accurately.
As mentioned in my last post, pip/easy_install do _not_ visit
all links found in simple/PKGNAME. Only the ones with rel="home_page" or
rel="download".  So the proposal effectively says to not visit
"homepage" links by default and use a special format for download ones.
The special format i am not sure about - i guess the SHA256 hash there
is to make sure the target content is the correct one, right?
What about abusing download_url some more and do a multiline-format like 
this:

    HASH1 URL-TO-RELEASE-FILE1
    HASH2 URL-TO-RELEASE-FILE2

This way we can avoid any additional http-requests on the pip/easy_install
client side _and_ allow multiple release files.  The simple/PKGNAME metadata 
would contain all information that is needed (and we could probably introduce
a special syntax for #egg github/bitbucket-style tarballs). Those URLs would 
only be retrieved if the client-side installer determines it needs them because
of the user-required version.  You wouldn't need to create a special
"-download.html" file then, no additional http requests, and it's easy to 
create this format without much tool support.

Can't incorporate this into the wiki right now myself and i'd probably 
like to structure the page differently.  The issue here really is the
(future) behaviour of easy_install and pip, not so much distutils or the
pypi server (apart from the worthwhile-to-consider idea of
pulling/caching things).

On a side note i'd rather prefer this to be a github/bitbucket project
where i can submit a pull request :)

best,
holger


> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 01 2013)
> >>> Python Projects, Consulting and Support ...   http://www.egenix.com/
> >>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
> >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>            Registered at Amtsgericht Duesseldorf: HRB 46611
>                http://www.egenix.com/company/contact/
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
> 

From pje at telecommunity.com  Fri Mar  1 17:29:17 2013
From: pje at telecommunity.com (PJ Eby)
Date: Fri, 1 Mar 2013 11:29:17 -0500
Subject: [Catalog-sig] homepage/download metadata cleaning
In-Reply-To: <20130301111707.GI9677@merlinux.eu>
References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com>
	<71AA0F5ADB4E4C33BBB37833733526A0@gmail.com>
	<20130301111707.GI9677@merlinux.eu>
Message-ID: <CALeMXf6cCHHrR7UvarVuB+zuTwU1xD8vj4XNyWc2wQtsyouFtw@mail.gmail.com>

On Fri, Mar 1, 2013 at 6:17 AM, holger krekel <holger at merlinux.eu> wrote:
> On Fri, Mar 01, 2013 at 06:09 -0500, Donald Stufft wrote:
>> On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote:
>> > On 01.03.2013 11:19, holger krekel wrote:
>> > > Hi Richard, all,
>> > >
>> > > somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py"
>> > > script which takes a project name as an argument and then goes to
>> > > pypi.python.org (http://pypi.python.org) and removes all homepage/download metadata entries for
>> > > this project. This sanitizes/speeds up installation because
>> > > pip/easy_install don't need to crawl them anymore. I just did this for
>> > > three of my projects, (pytest, tox and py) and it seems to work fine.
>> > >
>> >
>> >
>> > Does it also cleanup the links that PyPI adds to the /simple/ by
>> > parsing the project description for links ?
>> >
>> > I think those are far nastier than the homepage and download links,
>> > which can be put to some good use to limit the external lookups
>> > (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
>> >
>> > See e.g. https://pypi.python.org/simple/zc.buildout/
>> > for a good example of the mess this generates... even mailto links
>> > get listed and "file:///" links open up the installers for all
>> > kinds of nasty things (unless they explicitly protect against
>> > following these).
>> >
>> >
>>
>> pip at least, and I assume the other tools don't spider those links, but
>> they do consider them for download (e.g. if the link looks installable
>> it will be a candidate for installing, but  it won't fetch it, and look for
>> more links like it will donwnload_url/home_page).
>>
>> I believe that's the way it's structured atm.
>
> That's right. Even though the long-description extracted links
> look ugly on a simple/PKGNAME page, neither pip nor easy_install do anything
> with them except if the "href" ends in "#egg=PKGNAME-" in which case they are
> taken as pointing to a development tarball (e.g. at github or bitbucket).
> ASFAIK a link like "PKGNAME-VER.tar.gz" will not be treated as
> an installation candidate, just the "#egg=PKGNAME" one.

Both are considered "primary links".  A primary link is a link whose
filename portion matches one of the supported distutils or setuptools
file formats, or is marked with an #egg tag.  Primary links are
indexed as to project name and version, so that if that version/format
is chosen as the best candidate, it will be downloaded and installed.

Links marked with rel="homepage" or rel="download" are "secondary
links".  Secondary links are actively retrieved and scanned to look
for more primary links.  No further secondary links are scanned or
followed.  (Details of all of this can be found at:
http://peak.telecommunity.com/DevCenter/setuptools#making-your-package-available-for-easyinstall
)

This basically means that MAL's proposal for a download.html file is
actually a bit moot: you can just stick direct "primary" download URLs
in your PyPI description field, and the tools will pick them up.  They
can even include #md5 info.  (See
http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api
- item 4 mentions the description part.)

This means, by the way, that you could make an external link cleaner
which spiders the external pages and pulls the candidates onto the
description for that release, thereby keeping useful primary links and
getting rid of the secondary links used to fetch them.

From pje at telecommunity.com  Fri Mar  1 17:37:56 2013
From: pje at telecommunity.com (PJ Eby)
Date: Fri, 1 Mar 2013 11:37:56 -0500
Subject: [Catalog-sig] Deprecate External Links
In-Reply-To: <513073E5.20900@egenix.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org> <20130228200848.GB9677@merlinux.eu>
	<kgpqrj$2pu$1@ger.gmane.org> <513073E5.20900@egenix.com>
Message-ID: <CALeMXf4MaK_hr=WdVcSJkyon-gtRSEXezDW6PszFEME1CQT-Qw@mail.gmail.com>

On Fri, Mar 1, 2013 at 4:24 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 01.03.2013 10:02, Reinout van Rees wrote:
>> On 28-02-13 21:08, holger krekel wrote:
>>>> I have seen that position in this discussion ("I have to upload 120
>>>> >files per release, so I won't do that", for instance).
>>
>>> haven't seen that.
>>
>> Marc-Andre Lemburg said this, which I took to mean 120 uploads per release:
>>
>> """
>> However, taking our egenix-mx-base package as example, we have
>> 120 distribution files for every single release. Uploading those
>> to PyPI would not only take long, but also ...
>> """
>
> Correct, with a total of over 100MB per release. However, the above
> quote is slightly incorrect: I did not say "I won't do that", just
> that there are issues with doing this:
>
> * It currently takes too long uploading that many files to
>   PyPI. This causes a problem, since in order to start the upload,
>   we have to register the release on PyPI, which tools will then
>   immediately find. However, during the upload time, they won't
>   necessarily find the right files to download and then fail.

Actually, easy_install doesn't pay any attention to what releases are
registered.  It just looks for primary and secondary links.  If there
are links for a version that it can use, it uses it.  If it does not
find links for a version, then that version does not exist, as far as
it is concerned.  So registering without files is not a problem.


>   The proposed pull mechanism (see
>   http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
>   would work around this problem: tools would simply go to
>   our servers in case they can't find the files on PyPI.

That proposal is unnecessary, actually.  You could *right now* simply
place binary download links (with optional "#md5=...." verification)
in your package's description field, and the moment you registered the
package, existing tools would find those links and download them from
your site.  You could then remove your home page and download URLs
from the relevant fields, and place them also in the description.
(easy_install does not follow non-download links within the
description -- i.e., links that don't end in .egg, .tgz, etc. and
don't have an #egg tag.)


> * PyPI doesn't allow us to upload two egg files with the same
>   name: we have to provide egg files for UCS2 Python builds and
>   UCS4 Python builds, since easy_install/setuptools/pip don't
>   differentiate between the two variants.

They can if it's part of the platform string; the catch is that right
now it's not.  We'd have to go through an upgrade cycle of the tools
to support that.  I need to take a look at what PEP 427 is doing (and
you should too, if you haven't already) to get this part sorted out.

From mal at egenix.com  Fri Mar  1 17:50:18 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 01 Mar 2013 17:50:18 +0100
Subject: [Catalog-sig] [Python-legal-sig]  PyPI terms
In-Reply-To: <948E0503DDAA4FB496104EC483F993F4@gmail.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org> <20130228200848.GB9677@merlinux.eu>
	<kgpqrj$2pu$1@ger.gmane.org> <513073E5.20900@egenix.com>
	<D2A21436-17AD-4F54-9CCB-63E636DEAE9A@gmail.com>
	<5130954A.2050805@egenix.com> <5130B37B.6050501@egenix.com>
	<948E0503DDAA4FB496104EC483F993F4@gmail.com>
Message-ID: <5130DC4A.1030406@egenix.com>

Hi Van,

please read my long posting to the python-legal list. This explains the
concerns and makes suggestions on how to improve things in a way
that is compatible with what PyPI is and how it is used today:

http://mail.python.org/pipermail/python-legal-sig/2013-March/000000.html

PS: I'd prefer if you not cross-post to both lists and keep the
discussion to the legal list.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 01 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From mal at egenix.com  Fri Mar  1 20:31:28 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 01 Mar 2013 20:31:28 +0100
Subject: [Catalog-sig] homepage/download metadata cleaning
In-Reply-To: <20130301111707.GI9677@merlinux.eu>
References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com>
	<71AA0F5ADB4E4C33BBB37833733526A0@gmail.com>
	<20130301111707.GI9677@merlinux.eu>
Message-ID: <51310210.5050203@egenix.com>

On 01.03.2013 12:17, holger krekel wrote:
> On Fri, Mar 01, 2013 at 06:09 -0500, Donald Stufft wrote:
>> On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote:
>>> On 01.03.2013 11:19, holger krekel wrote:
>>>> Hi Richard, all,
>>>>
>>>> somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py"
>>>> script which takes a project name as an argument and then goes to 
>>>> pypi.python.org (http://pypi.python.org) and removes all homepage/download metadata entries for 
>>>> this project. This sanitizes/speeds up installation because
>>>> pip/easy_install don't need to crawl them anymore. I just did this for
>>>> three of my projects, (pytest, tox and py) and it seems to work fine.
>>>>
>>>
>>>
>>> Does it also cleanup the links that PyPI adds to the /simple/ by
>>> parsing the project description for links ?
>>>
>>> I think those are far nastier than the homepage and download links,
>>> which can be put to some good use to limit the external lookups
>>> (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
>>>
>>> See e.g. https://pypi.python.org/simple/zc.buildout/
>>> for a good example of the mess this generates... even mailto links
>>> get listed and "file:///" links open up the installers for all
>>> kinds of nasty things (unless they explicitly protect against
>>> following these).
>>>
>>>
>>
>> pip at least, and I assume the other tools don't spider those links, but
>> they do consider them for download (e.g. if the link looks installable
>> it will be a candidate for installing, but  it won't fetch it, and look for 
>> more links like it will donwnload_url/home_page).
>>
>> I believe that's the way it's structured atm.
> 
> That's right. Even though the long-description extracted links 
> look ugly on a simple/PKGNAME page, neither pip nor easy_install do anything
> with them except if the "href" ends in "#egg=PKGNAME-" in which case they are
> taken as pointing to a development tarball (e.g. at github or bitbucket).
> ASFAIK a link like "PKGNAME-VER.tar.gz" will not be treated as
> an installation candidate, just the "#egg=PKGNAME" one.

Hmm, then why not remove links that don't match the above from
the /simple/ index pages ?

Note that it's easily possible to make e.g. file:/// links
have a fragment that matches what you described, so I guess the
filters would have to be more careful about what to allow
(e.g. only http/ftp schemes, perhaps even only https schemes)
and what not.

BTW: Are those links also shown as-is on the description page ?
People could do nasty stuff by adding "javascript:" links which look
like normal links to the descriptions.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 01 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From dholth at gmail.com  Fri Mar  1 21:25:52 2013
From: dholth at gmail.com (Daniel Holth)
Date: Fri, 1 Mar 2013 15:25:52 -0500
Subject: [Catalog-sig] PEP 425 / 427 compatibility tags
Message-ID: <CAG8k2+7ur3k51pFvqPqb+QZa4c1qvokscFzU=w8b9VzKAMOh9A@mail.gmail.com>

On Fri, Mar 1, 2013 at 11:37 AM, PJ Eby <pje at telecommunity.com> wrote:
> On Fri, Mar 1, 2013 at 4:24 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 01.03.2013 10:02, Reinout van Rees wrote:
>>> On 28-02-13 21:08, holger krekel wrote:
>>>>> I have seen that position in this discussion ("I have to upload 120
>>>>> >files per release, so I won't do that", for instance).
>>>
>>>> haven't seen that.
>>>
>>> Marc-Andre Lemburg said this, which I took to mean 120 uploads per release:
>>>
>>> """
>>> However, taking our egenix-mx-base package as example, we have
>>> 120 distribution files for every single release. Uploading those
>>> to PyPI would not only take long, but also ...
>>> """
>>
>> Correct, with a total of over 100MB per release. However, the above
>> quote is slightly incorrect: I did not say "I won't do that", just
>> that there are issues with doing this:
>>
>> * It currently takes too long uploading that many files to
>>   PyPI. This causes a problem, since in order to start the upload,
>>   we have to register the release on PyPI, which tools will then
>>   immediately find. However, during the upload time, they won't
>>   necessarily find the right files to download and then fail.
>
> Actually, easy_install doesn't pay any attention to what releases are
> registered.  It just looks for primary and secondary links.  If there
> are links for a version that it can use, it uses it.  If it does not
> find links for a version, then that version does not exist, as far as
> it is concerned.  So registering without files is not a problem.
>
>
>>   The proposed pull mechanism (see
>>   http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
>>   would work around this problem: tools would simply go to
>>   our servers in case they can't find the files on PyPI.
>
> That proposal is unnecessary, actually.  You could *right now* simply
> place binary download links (with optional "#md5=...." verification)
> in your package's description field, and the moment you registered the
> package, existing tools would find those links and download them from
> your site.  You could then remove your home page and download URLs
> from the relevant fields, and place them also in the description.
> (easy_install does not follow non-download links within the
> description -- i.e., links that don't end in .egg, .tgz, etc. and
> don't have an #egg tag.)
>
>
>> * PyPI doesn't allow us to upload two egg files with the same
>>   name: we have to provide egg files for UCS2 Python builds and
>>   UCS4 Python builds, since easy_install/setuptools/pip don't
>>   differentiate between the two variants.
>
> They can if it's part of the platform string; the catch is that right
> now it's not.  We'd have to go through an upgrade cycle of the tools
> to support that.  I need to take a look at what PEP 427 is doing (and
> you should too, if you haven't already) to get this part sorted out.

The compatibility tags are specified in
http://www.python.org/dev/peps/pep-0425/ and are first used with PEP
427. The scheme defines a tag which is a combination of
implementation, abi, and platform tags, and an algorithm for choosing
the "most preferred" among the available builds for a particular
release of some distribution.

The ABI tags are basically abbreviated versions of the tags from
http://www.python.org/dev/peps/pep-3149/ and look like "cp32dmu" for a
debug, malloc, wide unicode build of CPython 3.2, or just "cp32" for a
Python 3.2 with none of those features compiled in. Your package would
probably use tags like "cp32-cp32mu-linux_x86_64".

Even though PEP 3149 is a Python 3.2 feature, the *PEP 425* ABI tags
are supposed to work in the same way with older version of Python,
e.g. "py26u" for a Python 2.6 unicode build.

From mal at egenix.com  Fri Mar  1 21:27:38 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 01 Mar 2013 21:27:38 +0100
Subject: [Catalog-sig] homepage/download metadata cleaning
In-Reply-To: <CALeMXf6cCHHrR7UvarVuB+zuTwU1xD8vj4XNyWc2wQtsyouFtw@mail.gmail.com>
References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com>
	<71AA0F5ADB4E4C33BBB37833733526A0@gmail.com>
	<20130301111707.GI9677@merlinux.eu>
	<CALeMXf6cCHHrR7UvarVuB+zuTwU1xD8vj4XNyWc2wQtsyouFtw@mail.gmail.com>
Message-ID: <51310F3A.6010100@egenix.com>

Thank for the feedback, Holger and Phillip. I'll bake this into
a version 0.2 of the proposal over the weekend.

On 01.03.2013 17:29, PJ Eby wrote:
> On Fri, Mar 1, 2013 at 6:17 AM, holger krekel <holger at merlinux.eu> wrote:
>> On Fri, Mar 01, 2013 at 06:09 -0500, Donald Stufft wrote:
>>> On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote:
>>>> On 01.03.2013 11:19, holger krekel wrote:
>>>>> Hi Richard, all,
>>>>>
>>>>> somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py"
>>>>> script which takes a project name as an argument and then goes to
>>>>> pypi.python.org (http://pypi.python.org) and removes all homepage/download metadata entries for
>>>>> this project. This sanitizes/speeds up installation because
>>>>> pip/easy_install don't need to crawl them anymore. I just did this for
>>>>> three of my projects, (pytest, tox and py) and it seems to work fine.
>>>>>
>>>>
>>>>
>>>> Does it also cleanup the links that PyPI adds to the /simple/ by
>>>> parsing the project description for links ?
>>>>
>>>> I think those are far nastier than the homepage and download links,
>>>> which can be put to some good use to limit the external lookups
>>>> (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
>>>>
>>>> See e.g. https://pypi.python.org/simple/zc.buildout/
>>>> for a good example of the mess this generates... even mailto links
>>>> get listed and "file:///" links open up the installers for all
>>>> kinds of nasty things (unless they explicitly protect against
>>>> following these).
>>>>
>>>>
>>>
>>> pip at least, and I assume the other tools don't spider those links, but
>>> they do consider them for download (e.g. if the link looks installable
>>> it will be a candidate for installing, but  it won't fetch it, and look for
>>> more links like it will donwnload_url/home_page).
>>>
>>> I believe that's the way it's structured atm.
>>
>> That's right. Even though the long-description extracted links
>> look ugly on a simple/PKGNAME page, neither pip nor easy_install do anything
>> with them except if the "href" ends in "#egg=PKGNAME-" in which case they are
>> taken as pointing to a development tarball (e.g. at github or bitbucket).
>> ASFAIK a link like "PKGNAME-VER.tar.gz" will not be treated as
>> an installation candidate, just the "#egg=PKGNAME" one.
> 
> Both are considered "primary links".  A primary link is a link whose
> filename portion matches one of the supported distutils or setuptools
> file formats, or is marked with an #egg tag.  Primary links are
> indexed as to project name and version, so that if that version/format
> is chosen as the best candidate, it will be downloaded and installed.
> 
> Links marked with rel="homepage" or rel="download" are "secondary
> links".  Secondary links are actively retrieved and scanned to look
> for more primary links.  No further secondary links are scanned or
> followed.  (Details of all of this can be found at:
> http://peak.telecommunity.com/DevCenter/setuptools#making-your-package-available-for-easyinstall
> )
> 
> This basically means that MAL's proposal for a download.html file is
> actually a bit moot: you can just stick direct "primary" download URLs
> in your PyPI description field, and the tools will pick them up.  They
> can even include #md5 info.  (See
> http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api
> - item 4 mentions the description part.)
> 
> This means, by the way, that you could make an external link cleaner
> which spiders the external pages and pulls the candidates onto the
> description for that release, thereby keeping useful primary links and
> getting rid of the secondary links used to fetch them.
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
> 

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 01 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From donald.stufft at gmail.com  Fri Mar  1 23:39:03 2013
From: donald.stufft at gmail.com (Donald Stufft)
Date: Fri, 1 Mar 2013 17:39:03 -0500
Subject: [Catalog-sig] homepage/download metadata cleaning
In-Reply-To: <51310210.5050203@egenix.com>
References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com>
	<71AA0F5ADB4E4C33BBB37833733526A0@gmail.com>
	<20130301111707.GI9677@merlinux.eu> <51310210.5050203@egenix.com>
Message-ID: <4BACDE7617A842EF9BC1E155D82CFBB9@gmail.com>

On Friday, March 1, 2013 at 2:31 PM, M.-A. Lemburg wrote:
> On 01.03.2013 12:17, holger krekel wrote:
> > On Fri, Mar 01, 2013 at 06:09 -0500, Donald Stufft wrote:
> > > On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote:
> > > > On 01.03.2013 11:19, holger krekel wrote:
> > > > > Hi Richard, all,
> > > > > 
> > > > > somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py"
> > > > > script which takes a project name as an argument and then goes to 
> > > > > pypi.python.org (http://pypi.python.org) and removes all homepage/download metadata entries for 
> > > > > this project. This sanitizes/speeds up installation because
> > > > > pip/easy_install don't need to crawl them anymore. I just did this for
> > > > > three of my projects, (pytest, tox and py) and it seems to work fine.
> > > > > 
> > > > 
> > > > 
> > > > 
> > > > Does it also cleanup the links that PyPI adds to the /simple/ by
> > > > parsing the project description for links ?
> > > > 
> > > > I think those are far nastier than the homepage and download links,
> > > > which can be put to some good use to limit the external lookups
> > > > (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
> > > > 
> > > > See e.g. https://pypi.python.org/simple/zc.buildout/
> > > > for a good example of the mess this generates... even mailto links
> > > > get listed and "file:///" links open up the installers for all
> > > > kinds of nasty things (unless they explicitly protect against
> > > > following these).
> > > > 
> > > 
> > > 
> > > pip at least, and I assume the other tools don't spider those links, but
> > > they do consider them for download (e.g. if the link looks installable
> > > it will be a candidate for installing, but it won't fetch it, and look for 
> > > more links like it will donwnload_url/home_page).
> > > 
> > > I believe that's the way it's structured atm.
> > 
> > That's right. Even though the long-description extracted links 
> > look ugly on a simple/PKGNAME page, neither pip nor easy_install do anything
> > with them except if the "href" ends in "#egg=PKGNAME-" in which case they are
> > taken as pointing to a development tarball (e.g. at github or bitbucket).
> > ASFAIK a link like "PKGNAME-VER.tar.gz" will not be treated as
> > an installation candidate, just the "#egg=PKGNAME" one.
> > 
> 
> 
> Hmm, then why not remove links that don't match the above from
> the /simple/ index pages ?
> 
> Note that it's easily possible to make e.g. file:/// links
> have a fragment that matches what you described, so I guess the
> filters would have to be more careful about what to allow
> (e.g. only http/ftp schemes, perhaps even only https schemes)
> and what not.
> 
> BTW: Are those links also shown as-is on the description page ?
> People could do nasty stuff by adding "javascript:" links which look
> like normal links to the descriptions.
> 
> 

The descriptions don't allow javascript: urls anymore (I reported that
ages ago and Richard fixed it). home_page and probably download_url
do though.
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com (http://eGenix.com)
> 
> Professional Python Services directly from the Source (#1, Mar 01 2013)
> > > > Python Projects, Consulting and Support ... http://www.egenix.com/
> > > > mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/
> > > > mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
> > > > 
> > > 
> > 
> 
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
> eGenix.com (http://eGenix.com) Software, Skills and Services GmbH Pastor-Loeh-Str.48
> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
> Registered at Amtsgericht Duesseldorf: HRB 46611
> http://www.egenix.com/company/contact/
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130301/47c12896/attachment-0001.html>

From pje at telecommunity.com  Fri Mar  1 23:42:34 2013
From: pje at telecommunity.com (PJ Eby)
Date: Fri, 1 Mar 2013 17:42:34 -0500
Subject: [Catalog-sig] homepage/download metadata cleaning
In-Reply-To: <51310210.5050203@egenix.com>
References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com>
	<71AA0F5ADB4E4C33BBB37833733526A0@gmail.com>
	<20130301111707.GI9677@merlinux.eu> <51310210.5050203@egenix.com>
Message-ID: <CALeMXf5WwmBUN9nrSmj1-DCYy-RzYxhEcNfyVd8oTAsn+LFHGQ@mail.gmail.com>

On Fri, Mar 1, 2013 at 2:31 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> Hmm, then why not remove links that don't match the above from
> the /simple/ index pages ?

PyPI provides the links uninterpreted since the tools' interpretations
have evolved over time.


> Note that it's easily possible to make e.g. file:/// links
> have a fragment that matches what you described, so I guess the
> filters would have to be more careful about what to allow
> (e.g. only http/ftp schemes, perhaps even only https schemes)
> and what not.

file:// URLs are an intentionally supported feature of easy_install;
many users have local NFS-based or other shared repositories.  But
yes, it certainly would be reasonable to not include links to them on
PyPI.  ;-)


> BTW: Are those links also shown as-is on the description page ?
> People could do nasty stuff by adding "javascript:" links which look
> like normal links to the descriptions.

That's true, but is unrelated to the tools, since the tools can't
process javascript links.

It would probably be best, though, if PyPI filtered such URLs to
prevent script injection/CSRF attacks on logged-in PyPI users browsing
project descriptions.  I don't know if it already does this or not,
since I've never tried to inject a CSRF attack on PyPI.  ;-)

(I guess technically it would be a same-site request forgery rather
than a cross-site one, but you know what I mean.)

From regebro at gmail.com  Fri Mar  1 23:50:10 2013
From: regebro at gmail.com (Lennart Regebro)
Date: Fri, 1 Mar 2013 23:50:10 +0100
Subject: [Catalog-sig] homepage/download metadata cleaning
In-Reply-To: <51310210.5050203@egenix.com>
References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com>
	<71AA0F5ADB4E4C33BBB37833733526A0@gmail.com>
	<20130301111707.GI9677@merlinux.eu> <51310210.5050203@egenix.com>
Message-ID: <CAL0kPAW2t31UK1nvRvESkZyn7H=xG_0JnsRn4QRH4oyri1hLqw@mail.gmail.com>

On Fri, Mar 1, 2013 at 8:31 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> Hmm, then why not remove links that don't match the above from
> the /simple/ index pages ?

I think we can do that, but if we *start* with that, we will just
suddenly, with no warning, break everything.
Its' better if the installation tools can first warn, then remove
their support for this, and *then* we remove these links from
/simple/.

That way we break things gradually, with warnings so that package
managers can react and adapt.

From mal at egenix.com  Fri Mar  1 23:54:41 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 01 Mar 2013 23:54:41 +0100
Subject: [Catalog-sig] homepage/download metadata cleaning
In-Reply-To: <CAL0kPAW2t31UK1nvRvESkZyn7H=xG_0JnsRn4QRH4oyri1hLqw@mail.gmail.com>
References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com>
	<71AA0F5ADB4E4C33BBB37833733526A0@gmail.com>
	<20130301111707.GI9677@merlinux.eu> <51310210.5050203@egenix.com>
	<CAL0kPAW2t31UK1nvRvESkZyn7H=xG_0JnsRn4QRH4oyri1hLqw@mail.gmail.com>
Message-ID: <513131B1.7090507@egenix.com>

On 01.03.2013 23:50, Lennart Regebro wrote:
> On Fri, Mar 1, 2013 at 8:31 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> Hmm, then why not remove links that don't match the above from
>> the /simple/ index pages ?
> 
> I think we can do that, but if we *start* with that, we will just
> suddenly, with no warning, break everything.
> Its' better if the installation tools can first warn, then remove
> their support for this, and *then* we remove these links from
> /simple/.
> 
> That way we break things gradually, with warnings so that package
> managers can react and adapt.

As i understood Holger and Phillip, those linkes are not used by
the existing package managers. If there are no users, then nothing
should break, right ?

Of course, breaking things is a bad idea and I don't want to
push for that (migration is much better), I just wondered whether
this would be a low hanging fruit to clean up the /simple/ index
pages a bit.

Is there a tools that scans those non-distribution file
links from the package descriptions ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 01 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From holger at merlinux.eu  Sat Mar  2 00:02:20 2013
From: holger at merlinux.eu (holger krekel)
Date: Fri, 1 Mar 2013 23:02:20 +0000
Subject: [Catalog-sig] homepage/download metadata cleaning
In-Reply-To: <CAL0kPAW2t31UK1nvRvESkZyn7H=xG_0JnsRn4QRH4oyri1hLqw@mail.gmail.com>
References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com>
	<71AA0F5ADB4E4C33BBB37833733526A0@gmail.com>
	<20130301111707.GI9677@merlinux.eu> <51310210.5050203@egenix.com>
	<CAL0kPAW2t31UK1nvRvESkZyn7H=xG_0JnsRn4QRH4oyri1hLqw@mail.gmail.com>
Message-ID: <20130301230220.GM9677@merlinux.eu>

On Fri, Mar 01, 2013 at 23:50 +0100, Lennart Regebro wrote:
> On Fri, Mar 1, 2013 at 8:31 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> > Hmm, then why not remove links that don't match the above from
> > the /simple/ index pages ?
> 
> I think we can do that, but if we *start* with that, we will just
> suddenly, with no warning, break everything.
> Its' better if the installation tools can first warn, then remove
> their support for this, and *then* we remove these links from
> /simple/.

I think Marc-Andre was just refering to the superflous links
from the long-description, namely all links which don't match
the #egg format and don't have a rel of download/homepage.

Phillip clarified that pypi served all long-description links at the
time to leave it to the tools to interpret them.  The interpretation is
now pretty clear and so pypi doesn't need to provide them.  It shouldn't
break neither pip nor easy_install to remove those unused long-description
links.

> That way we break things gradually, with warnings so that package
> managers can react and adapt.

I generally agree to this strategy but would add that we should
also consider the life of system admins or other package installers
who may not be able to get maintainers to make new releases.
For me this mainly means to aim for changing defaults in pip and
easy_install but not to remove crawling abilities completely for
the time being.

best,
holger


From pje at telecommunity.com  Sat Mar  2 06:08:47 2013
From: pje at telecommunity.com (PJ Eby)
Date: Sat, 2 Mar 2013 00:08:47 -0500
Subject: [Catalog-sig] homepage/download metadata cleaning
In-Reply-To: <20130301230220.GM9677@merlinux.eu>
References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com>
	<71AA0F5ADB4E4C33BBB37833733526A0@gmail.com>
	<20130301111707.GI9677@merlinux.eu> <51310210.5050203@egenix.com>
	<CAL0kPAW2t31UK1nvRvESkZyn7H=xG_0JnsRn4QRH4oyri1hLqw@mail.gmail.com>
	<20130301230220.GM9677@merlinux.eu>
Message-ID: <CALeMXf4VO4fEmrbbwtGpM4ip7GFNbNOgOtVOyR5m5Ly6PEdrNQ@mail.gmail.com>

On Fri, Mar 1, 2013 at 6:02 PM, holger krekel <holger at merlinux.eu> wrote:
> On Fri, Mar 01, 2013 at 23:50 +0100, Lennart Regebro wrote:
>> On Fri, Mar 1, 2013 at 8:31 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> > Hmm, then why not remove links that don't match the above from
>> > the /simple/ index pages ?
>>
>> I think we can do that, but if we *start* with that, we will just
>> suddenly, with no warning, break everything.
>> Its' better if the installation tools can first warn, then remove
>> their support for this, and *then* we remove these links from
>> /simple/.
>
> I think Marc-Andre was just refering to the superflous links
> from the long-description, namely all links which don't match
> the #egg format and don't have a rel of download/homepage.
>
> Phillip clarified that pypi served all long-description links at the
> time to leave it to the tools to interpret them.  The interpretation is
> now pretty clear and so pypi doesn't need to provide them.  It shouldn't
> break neither pip nor easy_install to remove those unused long-description
> links.

Provided, of course, that PyPI follows the *exact same* interpretation
of what is and isn't an unused link.  Since unused links do no harm,
there is correspondingly no benefit to writing code to remove them,
that might introduce bugs.

To be clear, what I have proposed is simply removing the rel=""
attributes from the special links on hidden releases.  This will
prevent scraping of outdated home pages or download pages, but tools
will still be able to use a download or home page link that points to
an actual downloadable file or source checkout.

What would also be useful to have before that time, would be a tool to
let people either update their description links with direct external
links, or optionally upload the contents of those links instead...
preferably offered via a couple of buttons in PyPI's UI, as well as a
standalone tool or setup.py command to initiate the process remotely
or as part of a release process.  (Preferably, these tools would be
offered to authors *before* the date when the rel="" attributes would
be pulled from PyPI, of course.)

(In principle, we could make it even easier by just automatically
scraping the links and adding them to the descriptions (or some new
PyPI field for "external download links") of such releases, but I
think some kind of affirmative consent is probably in order, just to
avoid ruffling any feathers.)

Anyway, if the direct external links carry #md5 hashes, they'll be
slightly more secure and the "expired domain supplying fake links"
issue won't apply.

The final step in the process would be to drop the rel="" attributes
from *all* releases, not just hidden ones.  At that point, it wouldn't
be possible to download from an external site unless the author has
provided a direct download link, rather than a link to a page
containing download links.

We could then look at uptake on the use of the pull-uploader, and
feedback from package authors, to see whether dropping the remaining
external links and serving everything from PyPI is a viable option.

From donald.stufft at gmail.com  Tue Mar  5 10:01:20 2013
From: donald.stufft at gmail.com (Donald Stufft)
Date: Tue, 5 Mar 2013 04:01:20 -0500
Subject: [Catalog-sig] Deprecate External Links
In-Reply-To: <B0852E0EADD8426CA197E0952A46CD1B@gmail.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org>
	<B0852E0EADD8426CA197E0952A46CD1B@gmail.com>
Message-ID: <774ED93EA7CF45BFB894B8BC47DB7F8B@gmail.com>

On Thursday, February 28, 2013 at 8:35 AM, Donald Stufft wrote:
> > 
> > 
> > 
> 
> https://crate.io/externally-hosted/ A list of things that have no files hosted on
> PyPI but have a release. This doesn't include things that uploads sometimes
> but not everytime (argparse for example the latest releases have not been
> uploaded to PyPI).

Sorted out a better way of seeing what would be effected by this change. 

Here is a list of all versions that are currently installable via pip that
are not hosted on PyPI (and thus would be uninstallable if all external
links would be removed). This filters out projects that never existed
or are no longer installable due to issues with the external hosting.

I've also included the script I used to generate it.

https://gist.github.com/dstufft/5088915
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130305/bd63bf5d/attachment.html>

From donald.stufft at gmail.com  Tue Mar  5 10:10:08 2013
From: donald.stufft at gmail.com (Donald Stufft)
Date: Tue, 5 Mar 2013 04:10:08 -0500
Subject: [Catalog-sig] Deprecate External Links
In-Reply-To: <774ED93EA7CF45BFB894B8BC47DB7F8B@gmail.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org>
	<B0852E0EADD8426CA197E0952A46CD1B@gmail.com>
	<774ED93EA7CF45BFB894B8BC47DB7F8B@gmail.com>
Message-ID: <7CA6F385DB0B49D598903CF34287F015@gmail.com>

On Tuesday, March 5, 2013 at 4:01 AM, Donald Stufft wrote:
> On Thursday, February 28, 2013 at 8:35 AM, Donald Stufft wrote:
> > > 
> > 
> > https://crate.io/externally-hosted/ A list of things that have no files hosted on
> > PyPI but have a release. This doesn't include things that uploads sometimes
> > but not everytime (argparse for example the latest releases have not been
> > uploaded to PyPI).
> > 
> 
> Sorted out a better way of seeing what would be effected by this change. 
> 
> Here is a list of all versions that are currently installable via pip that
> are not hosted on PyPI (and thus would be uninstallable if all external
> links would be removed). This filters out projects that never existed
> or are no longer installable due to issues with the external hosting.
> 
> I've also included the script I used to generate it.
> 
> https://gist.github.com/dstufft/5088915 
Here's some numbers fetched from that data. 

928 projects w/ 2750 total versions have versions not installable
directly from PyPI.

721 projects w/ 2543 total versions have versions not installable
directly from PyPI if we don't consider the `dev` version.

This change would affect 2-3% of the projects on PyPI, and
just from scanning down the list it appears some of these
appear to merely be a forgotten upload and not a conscious
choice to not host their packages on PyPI (for example Django
has only 1 version not installable directly from PyPI).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130305/00f009ae/attachment.html>

From donald.stufft at gmail.com  Tue Mar  5 10:19:49 2013
From: donald.stufft at gmail.com (Donald Stufft)
Date: Tue, 5 Mar 2013 04:19:49 -0500
Subject: [Catalog-sig] Fw:  Deprecate External Links
In-Reply-To: <CD94D8AF6F21475EB3BB9495FB01FCD3@gmail.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org>
	<B0852E0EADD8426CA197E0952A46CD1B@gmail.com>
	<774ED93EA7CF45BFB894B8BC47DB7F8B@gmail.com>
	<5135B6E7.6010301@egenix.com>
	<CD94D8AF6F21475EB3BB9495FB01FCD3@gmail.com>
Message-ID: <A7F1F5B293E944EFA217251861EB64C6@gmail.com>

Forwarding this since I assume it was accidently sent to only me, 
and it's important to note that there is some sort of miscounting bug
going on.


Forwarded message:

> From: Donald Stufft <donald.stufft at gmail.com>
> To: M.-A. Lemburg <mal at egenix.com>
> Date: Tuesday, March 5, 2013 4:16:53 AM
> Subject: Re: [Catalog-sig] Deprecate External Links
> 
> On Tuesday, March 5, 2013 at 4:12 AM, M.-A. Lemburg wrote:
> > Perhaps I'm misunderstanding, but if the list contains packages that:
> > 
> > * are installable via pip
> > 
> > * are not hosted on PyPI
> > 
> > then why isn't e.g. egenix-mx-base included in that list ?
> Unsure, must be a bug in the script. I saw some BadStatusLine errors
> during the processing but I just assumed they were issues with the server
> pip was trying to fetch from. I'll see if I can't sort out the reasoning that
> egenix-mx-base doesn't show up.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130305/bf449594/attachment.html>

From chris at simplistix.co.uk  Tue Mar  5 10:51:41 2013
From: chris at simplistix.co.uk (Chris Withers)
Date: Tue, 05 Mar 2013 09:51:41 +0000
Subject: [Catalog-sig] revoked certificate error on chrome from PyPI?
Message-ID: <5135C02D.3080808@simplistix.co.uk>

Hi All,

When I go to PyPI on an older Chrome, I get a certificate revoked error 
and can't view the site.

What's going on here?

Works fine in newer chromes, but interested to know why older chrome 
sees a revoked cert...

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
             - http://www.simplistix.co.uk

From holger at merlinux.eu  Tue Mar  5 11:07:34 2013
From: holger at merlinux.eu (holger krekel)
Date: Tue, 5 Mar 2013 10:07:34 +0000
Subject: [Catalog-sig] Fw:  Deprecate External Links
In-Reply-To: <A7F1F5B293E944EFA217251861EB64C6@gmail.com>
References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com>
	<512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu>
	<kgnk55$oek$1@ger.gmane.org>
	<B0852E0EADD8426CA197E0952A46CD1B@gmail.com>
	<774ED93EA7CF45BFB894B8BC47DB7F8B@gmail.com>
	<5135B6E7.6010301@egenix.com>
	<CD94D8AF6F21475EB3BB9495FB01FCD3@gmail.com>
	<A7F1F5B293E944EFA217251861EB64C6@gmail.com>
Message-ID: <20130305100734.GZ9677@merlinux.eu>

On Tue, Mar 05, 2013 at 04:19 -0500, Donald Stufft wrote:
> Forwarding this since I assume it was accidently sent to only me, 
> and it's important to note that there is some sort of miscounting bug
> going on.
> 
> 
> Forwarded message:
> 
> > From: Donald Stufft <donald.stufft at gmail.com>
> > To: M.-A. Lemburg <mal at egenix.com>
> > Date: Tuesday, March 5, 2013 4:16:53 AM
> > Subject: Re: [Catalog-sig] Deprecate External Links
> > 
> > On Tuesday, March 5, 2013 at 4:12 AM, M.-A. Lemburg wrote:
> > > Perhaps I'm misunderstanding, but if the list contains packages that:
> > > 
> > > * are installable via pip
> > > 
> > > * are not hosted on PyPI
> > > 
> > > then why isn't e.g. egenix-mx-base included in that list ?
> > Unsure, must be a bug in the script. I saw some BadStatusLine errors
> > during the processing but I just assumed they were issues with the server
> > pip was trying to fetch from. I'll see if I can't sort out the reasoning that
> > egenix-mx-base doesn't show up.

FYI "lockfile" is also not in your list, and it only had lockfile-0.2 at
Pypi, the rest up to 0.9.1 is all at code.google (latest is
lockfile-0.9.1.tar.gz).

best,
holger

> 
> 

> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


From donald.stufft at gmail.com  Tue Mar  5 11:18:29 2013
From: donald.stufft at gmail.com (Donald Stufft)
Date: Tue, 5 Mar 2013 05:18:29 -0500
Subject: [Catalog-sig] revoked certificate error on chrome from PyPI?
In-Reply-To: <5135C02D.3080808@simplistix.co.uk>
References: <5135C02D.3080808@simplistix.co.uk>
Message-ID: <1610D0657D644BAC8BF80BDEBB7FCF2F@gmail.com>

On Tuesday, March 5, 2013 at 4:51 AM, Chris Withers wrote:
> Hi All,
> 
> When I go to PyPI on an older Chrome, I get a certificate revoked error 
> and can't view the site.
> 
> What's going on here?
> 
> Works fine in newer chromes, but interested to know why older chrome 
> sees a revoked cert...
> 
> Chris
What version of Chrome? v25 sees http://d.stufft.io/image/1J3W01473s42 
> 
> -- 
> Simplistix - Content Management, Batch Processing & Python Consulting
> - http://www.simplistix.co.uk
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org (mailto:Catalog-SIG at python.org)
> http://mail.python.org/mailman/listinfo/catalog-sig
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130305/cb1a8c43/attachment.html>

From chris at simplistix.co.uk  Tue Mar  5 11:19:46 2013
From: chris at simplistix.co.uk (Chris Withers)
Date: Tue, 05 Mar 2013 10:19:46 +0000
Subject: [Catalog-sig] revoked certificate error on chrome from PyPI?
In-Reply-To: <1610D0657D644BAC8BF80BDEBB7FCF2F@gmail.com>
References: <5135C02D.3080808@simplistix.co.uk>
	<1610D0657D644BAC8BF80BDEBB7FCF2F@gmail.com>
Message-ID: <5135C6C2.7060907@simplistix.co.uk>

On 05/03/2013 10:18, Donald Stufft wrote:
> On Tuesday, March 5, 2013 at 4:51 AM, Chris Withers wrote:
>> When I go to PyPI on an older Chrome, I get a certificate revoked error
>> and can't view the site.
>>
> What version of Chrome? v25 sees http://d.stufft.io/image/1J3W01473s42

12.0.742.112.

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
             - http://www.simplistix.co.uk

From rasky at develer.com  Tue Mar  5 12:09:23 2013
From: rasky at develer.com (Giovanni Bajo)
Date: Tue, 5 Mar 2013 12:09:23 +0100
Subject: [Catalog-sig] revoked certificate error on chrome from PyPI?
In-Reply-To: <5135C6C2.7060907@simplistix.co.uk>
References: <5135C02D.3080808@simplistix.co.uk>
	<1610D0657D644BAC8BF80BDEBB7FCF2F@gmail.com>
	<5135C6C2.7060907@simplistix.co.uk>
Message-ID: <30D06076-2343-4172-B438-2831F137CC6E@develer.com>

Il giorno 05/mar/2013, alle ore 11:19, Chris Withers <chris at simplistix.co.uk> ha scritto:

> On 05/03/2013 10:18, Donald Stufft wrote:
>> On Tuesday, March 5, 2013 at 4:51 AM, Chris Withers wrote:
>>> When I go to PyPI on an older Chrome, I get a certificate revoked error
>>> and can't view the site.
>>> 
>> What version of Chrome? v25 sees http://d.stufft.io/image/1J3W01473s42
> 
> 12.0.742.112.


Do you manage to see any specific error message? Can you attache a screenshot?
-- 
Giovanni Bajo   ::  rasky at develer.com
Develer S.r.l.  ::  http://www.develer.com

My Blog: http://giovanni.bajo.it






-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4346 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130305/a1a93696/attachment.bin>

From chris at simplistix.co.uk  Tue Mar  5 12:10:18 2013
From: chris at simplistix.co.uk (Chris Withers)
Date: Tue, 05 Mar 2013 11:10:18 +0000
Subject: [Catalog-sig] revoked certificate error on chrome from PyPI?
In-Reply-To: <30D06076-2343-4172-B438-2831F137CC6E@develer.com>
References: <5135C02D.3080808@simplistix.co.uk>
	<1610D0657D644BAC8BF80BDEBB7FCF2F@gmail.com>
	<5135C6C2.7060907@simplistix.co.uk>
	<30D06076-2343-4172-B438-2831F137CC6E@develer.com>
Message-ID: <5135D29A.5050103@simplistix.co.uk>

On 05/03/2013 11:09, Giovanni Bajo wrote:
> Il giorno 05/mar/2013, alle ore 11:19, Chris Withers<chris at simplistix.co.uk>  ha scritto:
>
>> On 05/03/2013 10:18, Donald Stufft wrote:
>>> On Tuesday, March 5, 2013 at 4:51 AM, Chris Withers wrote:
>>>> When I go to PyPI on an older Chrome, I get a certificate revoked error
>>>> and can't view the site.
>>>>
>>> What version of Chrome? v25 sees http://d.stufft.io/image/1J3W01473s42
>>
>> 12.0.742.112.
>
> Do you manage to see any specific error message? Can you attache a screenshot?

It's the standard "this certificate has been revoked" page from Chrome.

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
             - http://www.simplistix.co.uk

From mal at egenix.com  Tue Mar  5 12:28:29 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 05 Mar 2013 12:28:29 +0100
Subject: [Catalog-sig] revoked certificate error on chrome from PyPI?
In-Reply-To: <5135D29A.5050103@simplistix.co.uk>
References: <5135C02D.3080808@simplistix.co.uk>
	<1610D0657D644BAC8BF80BDEBB7FCF2F@gmail.com>
	<5135C6C2.7060907@simplistix.co.uk>
	<30D06076-2343-4172-B438-2831F137CC6E@develer.com>
	<5135D29A.5050103@simplistix.co.uk>
Message-ID: <5135D6DD.5030404@egenix.com>

On 05.03.2013 12:10, Chris Withers wrote:
> On 05/03/2013 11:09, Giovanni Bajo wrote:
>> Il giorno 05/mar/2013, alle ore 11:19, Chris Withers<chris at simplistix.co.uk>  ha scritto:
>>
>>> On 05/03/2013 10:18, Donald Stufft wrote:
>>>> On Tuesday, March 5, 2013 at 4:51 AM, Chris Withers wrote:
>>>>> When I go to PyPI on an older Chrome, I get a certificate revoked error
>>>>> and can't view the site.
>>>>>
>>>> What version of Chrome? v25 sees http://d.stufft.io/image/1J3W01473s42
>>>
>>> 12.0.742.112.
>>
>> Do you manage to see any specific error message? Can you attache a screenshot?
> 
> It's the standard "this certificate has been revoked" page from Chrome.

Hmm...

wget http://crl.startssl.com/crt2-crl.crl
openssl crl -inform DER -in crt2-crl.crl -text | fgrep 013A4D

doesn't return anything (013A4D is the PyPI cert serial).

A bug in Chrome ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 05 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From chris at simplistix.co.uk  Tue Mar  5 12:31:23 2013
From: chris at simplistix.co.uk (Chris Withers)
Date: Tue, 05 Mar 2013 11:31:23 +0000
Subject: [Catalog-sig] revoked certificate error on chrome from PyPI?
In-Reply-To: <5135D6DD.5030404@egenix.com>
References: <5135C02D.3080808@simplistix.co.uk>
	<1610D0657D644BAC8BF80BDEBB7FCF2F@gmail.com>
	<5135C6C2.7060907@simplistix.co.uk>
	<30D06076-2343-4172-B438-2831F137CC6E@develer.com>
	<5135D29A.5050103@simplistix.co.uk> <5135D6DD.5030404@egenix.com>
Message-ID: <5135D78B.9000000@simplistix.co.uk>

On 05/03/2013 11:28, M.-A. Lemburg wrote:
> wget http://crl.startssl.com/crt2-crl.crl
> openssl crl -inform DER -in crt2-crl.crl -text | fgrep 013A4D
>
> doesn't return anything (013A4D is the PyPI cert serial).
>
> A bug in Chrome ?

Might be a bug in my head...

My machine's time is currently deliberately set to 7 hrs in the past, 
debugging some weird time of day tests failures that CI has thrown up...

How would that cause the cert to appear revoked?

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
             - http://www.simplistix.co.uk

From rasky at develer.com  Tue Mar  5 12:37:51 2013
From: rasky at develer.com (Giovanni Bajo)
Date: Tue, 5 Mar 2013 12:37:51 +0100
Subject: [Catalog-sig] revoked certificate error on chrome from PyPI?
In-Reply-To: <5135D78B.9000000@simplistix.co.uk>
References: <5135C02D.3080808@simplistix.co.uk>
	<1610D0657D644BAC8BF80BDEBB7FCF2F@gmail.com>
	<5135C6C2.7060907@simplistix.co.uk>
	<30D06076-2343-4172-B438-2831F137CC6E@develer.com>
	<5135D29A.5050103@simplistix.co.uk> <5135D6DD.5030404@egenix.com>
	<5135D78B.9000000@simplistix.co.uk>
Message-ID: <ECC316A8-6FBB-4645-BD6F-43CD767CE890@develer.com>

Il giorno 05/mar/2013, alle ore 12:31, Chris Withers <chris at simplistix.co.uk> ha scritto:

> On 05/03/2013 11:28, M.-A. Lemburg wrote:
>> wget http://crl.startssl.com/crt2-crl.crl
>> openssl crl -inform DER -in crt2-crl.crl -text | fgrep 013A4D
>> 
>> doesn't return anything (013A4D is the PyPI cert serial).
>> 
>> A bug in Chrome ?
> 
> Might be a bug in my head...
> 
> My machine's time is currently deliberately set to 7 hrs in the past, debugging some weird time of day tests failures that CI has thrown up...
> 
> How would that cause the cert to appear revoked?


it might confuse the CRL code within Chrome 12 due to a bug. I don't think we should worry much.
-- 
Giovanni Bajo   ::  rasky at develer.com
Develer S.r.l.  ::  http://www.develer.com

My Blog: http://giovanni.bajo.it




-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4346 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130305/bb5d0143/attachment-0001.bin>

From chris at simplistix.co.uk  Tue Mar  5 12:41:09 2013
From: chris at simplistix.co.uk (Chris Withers)
Date: Tue, 05 Mar 2013 11:41:09 +0000
Subject: [Catalog-sig] revoked certificate error on chrome from PyPI?
In-Reply-To: <ECC316A8-6FBB-4645-BD6F-43CD767CE890@develer.com>
References: <5135C02D.3080808@simplistix.co.uk>
	<1610D0657D644BAC8BF80BDEBB7FCF2F@gmail.com>
	<5135C6C2.7060907@simplistix.co.uk>
	<30D06076-2343-4172-B438-2831F137CC6E@develer.com>
	<5135D29A.5050103@simplistix.co.uk> <5135D6DD.5030404@egenix.com>
	<5135D78B.9000000@simplistix.co.uk>
	<ECC316A8-6FBB-4645-BD6F-43CD767CE890@develer.com>
Message-ID: <5135D9D5.909@simplistix.co.uk>

On 05/03/2013 11:37, Giovanni Bajo wrote:
>> Might be a bug in my head...
>>
>> My machine's time is currently deliberately set to 7 hrs in the past, debugging some weird time of day tests failures that CI has thrown up...
>>
>> How would that cause the cert to appear revoked?
>
> it might confuse the CRL code within Chrome 12 due to a bug. I don't think we should worry much.

Indeed, sorry for the noise. If I still see anything when I'm finished 
messing with my system time, I'll let you know...

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
             - http://www.simplistix.co.uk

From ct at gocept.com  Tue Mar  5 16:34:37 2013
From: ct at gocept.com (Christian Theune)
Date: Tue, 5 Mar 2013 16:34:37 +0100
Subject: [Catalog-sig] Inconsistency on f.pypi.python.org with
	Products.PluggableAuthService
Message-ID: <kh53a9$i3$1@ger.gmane.org>

Hi,

it seems my fight to keep f.pypi.python.org is at least keeping the 
pypi-mirrors.org page happy.

Unfortunately one ouf our users detected another inconsistency that the 
mirror script doesn't find or clean up by itself. I also don't know how 
to get this back in line.

If you compare those pages:

http://f.pypi.python.org/packages/source/P/Products.PluggableAuthService/
http://f.pypi.python.org/simple/Products.PluggableAuthService
http://pypi.python.org/simple/Products.PluggableAuthService

There's definitely something wrong.

Suggestions?

Christian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130305/e9b3b954/attachment.html>

From donald.stufft at gmail.com  Tue Mar  5 17:08:44 2013
From: donald.stufft at gmail.com (Donald Stufft)
Date: Tue, 5 Mar 2013 11:08:44 -0500
Subject: [Catalog-sig] Inconsistency on f.pypi.python.org with
 Products.PluggableAuthService
In-Reply-To: <kh53a9$i3$1@ger.gmane.org>
References: <kh53a9$i3$1@ger.gmane.org>
Message-ID: <E279882C05644A2F9DDB690C2FC20873@gmail.com>

On Tuesday, March 5, 2013 at 10:34 AM, Christian Theune wrote:
> Hi,
> 
> it seems my fight to keep f.pypi.python.org (http://f.pypi.python.org) is at least keeping the pypi-mirrors.org (http://pypi-mirrors.org) page happy. 
> 
> Unfortunately one ouf our users detected another inconsistency that the mirror script doesn't find or clean up by itself. I also don't know how to get this back in line. 
> 
> If you compare those pages: 
> 
> http://f.pypi.python.org/packages/source/P/Products.PluggableAuthService/ 
> http://f.pypi.python.org/simple/Products.PluggableAuthService
> http://pypi.python.org/simple/Products.PluggableAuthService (http://f.pypi.python.org/simple/Products.PluggableAuthService)
> 
> There's definitely something wrong. 
> 
> Suggestions?
Looks like when something gets deleted the files don't properly
get cleaned up, look at:

    http://a.pypi.python.org/packages/source/P/Products.PluggableAuthService/ 
> 
> Christian 
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org (mailto:Catalog-SIG at python.org)
> http://mail.python.org/mailman/listinfo/catalog-sig
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130305/42a5ce6c/attachment.html>

From donald at stufft.io  Fri Mar  8 02:40:20 2013
From: donald at stufft.io (Donald Stufft)
Date: Thu, 7 Mar 2013 20:40:20 -0500
Subject: [Catalog-sig] Deprecation of External Urls, Statistics
Message-ID: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>

So I updated my script (had to remove eventlet) and I believe it's now accurate. The total time was ~54 hours so this is hardly scientific but it should give a good idea what sort of impact we are talking about.

This is a list of versions that pip's PackageFinder (what it uses to locate packages to install) could find that were not available on PyPI.

The results and script is available at: https://gist.github.com/dstufft/5088915

Some statistics:

    Projects affected (with dev): 2269
    Versions affected (with dev): 8006

    Projects affected (without dev): 1880
    Versions affected (without dev): 7586

These numbers are if all external urls were immediately removed from PyPI, so this would be the total affected. This does not test if the actual package is installable, just if pip is able to locate an url that it thinks represents a version for that project.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130307/62ee272c/attachment.pgp>

From mal at egenix.com  Fri Mar  8 12:49:51 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 08 Mar 2013 12:49:51 +0100
Subject: [Catalog-sig] Deprecation of External Urls, Statistics
In-Reply-To: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
Message-ID: <5139D05F.6030404@egenix.com>

On 08.03.2013 02:40, Donald Stufft wrote:
> So I updated my script (had to remove eventlet) and I believe it's now accurate. The total time was ~54 hours so this is hardly scientific but it should give a good idea what sort of impact we are talking about.
> 
> This is a list of versions that pip's PackageFinder (what it uses to locate packages to install) could find that were not available on PyPI.
> 
> The results and script is available at: https://gist.github.com/dstufft/5088915
> 
> Some statistics:
> 
>     Projects affected (with dev): 2269
>     Versions affected (with dev): 8006
> 
>     Projects affected (without dev): 1880
>     Versions affected (without dev): 7586
> 
> These numbers are if all external urls were immediately removed from PyPI, so this would be the total affected. This does not test if the actual package is installable, just if pip is able to locate an url that it thinks represents a version for that project.

Thanks for running the test.

About 10% of all packages. The numbers are already impressive,
but if you factor in the popularity of some of those
packages, the situation becomes worse.

I'm beginning to wonder whether caching the external link content
on the PyPI CDN wouldn't be a better idea.

We'd have to make that legally waterproof and also have an opt-out
mechanism, but it would get us from here to there a lot faster.

Together with the added hash tag on the download file URLs (*),
this would solve the availability and the security aspects.
Instead of deprecating external links altogether, we could then
deprecate non-compliant download links and get an overall
very flexible system for Python package distribution.

(*) Yes, I know, I still have to deliver the updated proposal -
been working on getting our indexes ready to serve as example :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From christian at python.org  Fri Mar  8 13:15:23 2013
From: christian at python.org (Christian Heimes)
Date: Fri, 08 Mar 2013 13:15:23 +0100
Subject: [Catalog-sig] hash tags (was: Deprecation of External Urls,
	Statistics)
In-Reply-To: <5139D05F.6030404@egenix.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com>
Message-ID: <5139D65B.3070907@python.org>

Am 08.03.2013 12:49, schrieb M.-A. Lemburg:
> Together with the added hash tag on the download file URLs (*),
> this would solve the availability and the security aspects.
> Instead of deprecating external links altogether, we could then
> deprecate non-compliant download links and get an overall
> very flexible system for Python package distribution.
> 
> (*) Yes, I know, I still have to deliver the updated proposal -
> been working on getting our indexes ready to serve as example :-)

How does your proposal look like? I like to propose query string-like
key/value pairs. key/value pairs are more flexible and allow us to
add/remove new information in the future.

I also propose that we add the file size in octets (bytes with 8bits in
each byte) to the fragment identifier. File size validation prohibits
e.g. length extension attacks. It is useful to download tools. I know
that HTTP servers usually set a Content-Length header for static files.
But the header is set by the CDN while the information in the fragment
identifier shall come from PyPI's internal database.

Example:

defusedxml-0.4.tar.gz#md5=09873c31ce773d48b8a4759571655a2c&sha1=33821e6891e3fc3829f5a238a93490f939533d62&octets=48324

Christian

From mal at egenix.com  Fri Mar  8 13:50:33 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 08 Mar 2013 13:50:33 +0100
Subject: [Catalog-sig] hash tags
In-Reply-To: <5139D65B.3070907@python.org>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
Message-ID: <5139DE99.9020005@egenix.com>

On 08.03.2013 13:15, Christian Heimes wrote:
> Am 08.03.2013 12:49, schrieb M.-A. Lemburg:
>> Together with the added hash tag on the download file URLs (*),
>> this would solve the availability and the security aspects.
>> Instead of deprecating external links altogether, we could then
>> deprecate non-compliant download links and get an overall
>> very flexible system for Python package distribution.
>>
>> (*) Yes, I know, I still have to deliver the updated proposal -
>> been working on getting our indexes ready to serve as example :-)
> 
> How does your proposal look like? 

Here's the first version with the basic idea:

http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal

After the feedback I got from Holger and Phillip, I'm currently
writing a new version, which drops some of the unneeded
requirements and spells out a few more things.

Here's a very short version...

Installers are modified:

* to only follow rel="download" links from the /simple/ index page,
  which have a hash tag (e.g. #md5=...)
* will only use the fetched download page if its contents match
  the hash tag
* scan that page for rel="download" links, which again have to
  have a hash tag to be taken into account
* only install files for which the hash tag matches the
  downloaded content

This should provide a good way to make sure that the downloaded
files are indeed under control of the package maintainer.

So far the only practical problem I've found with the approach
is that the download page may not contain dynamic data, e.g.
a date or timestamp, since that causes the hash tag not to
verify.

The package maintainer will also have to reregister the
package whenever changes to the download page are made -
but that's actually intended :-)

> I like to propose query string-like
> key/value pairs. key/value pairs are more flexible and allow us to
> add/remove new information in the future.

Good idea. I'll add that as extension mechanism.

> I also propose that we add the file size in octets (bytes with 8bits in
> each byte) to the fragment identifier. File size validation prohibits
> e.g. length extension attacks. It is useful to download tools. I know
> that HTTP servers usually set a Content-Length header for static files.
> But the header is set by the CDN while the information in the fragment
> identifier shall come from PyPI's internal database.
> 
> Example:
> 
> defusedxml-0.4.tar.gz#md5=09873c31ce773d48b8a4759571655a2c&sha1=33821e6891e3fc3829f5a238a93490f939533d62&octets=48324

Minor nit: s/octets/size

We could probably even add GPG sigs to the link.

The only problem with the extension mechanism is that the currently
available installers only support "#md5=...".

Perhaps there's some way to trick them into still working with
the query-style fragment links ?!

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From jnoller at gmail.com  Fri Mar  8 14:07:51 2013
From: jnoller at gmail.com (Jesse Noller)
Date: Fri, 8 Mar 2013 08:07:51 -0500
Subject: [Catalog-sig] Deprecation of External Urls, Statistics
In-Reply-To: <5139D05F.6030404@egenix.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com>
Message-ID: <396F795E-8B6D-4EF3-8B45-08527A04C60E@gmail.com>

As long as external URLs eventually are completely removed I'm okay with caching things

On Mar 8, 2013, at 6:49 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:

> On 08.03.2013 02:40, Donald Stufft wrote:
>> So I updated my script (had to remove eventlet) and I believe it's now accurate. The total time was ~54 hours so this is hardly scientific but it should give a good idea what sort of impact we are talking about.
>> 
>> This is a list of versions that pip's PackageFinder (what it uses to locate packages to install) could find that were not available on PyPI.
>> 
>> The results and script is available at: https://gist.github.com/dstufft/5088915
>> 
>> Some statistics:
>> 
>>    Projects affected (with dev): 2269
>>    Versions affected (with dev): 8006
>> 
>>    Projects affected (without dev): 1880
>>    Versions affected (without dev): 7586
>> 
>> These numbers are if all external urls were immediately removed from PyPI, so this would be the total affected. This does not test if the actual package is installable, just if pip is able to locate an url that it thinks represents a version for that project.
> 
> Thanks for running the test.
> 
> About 10% of all packages. The numbers are already impressive,
> but if you factor in the popularity of some of those
> packages, the situation becomes worse.
> 
> I'm beginning to wonder whether caching the external link content
> on the PyPI CDN wouldn't be a better idea.
> 
> We'd have to make that legally waterproof and also have an opt-out
> mechanism, but it would get us from here to there a lot faster.
> 
> Together with the added hash tag on the download file URLs (*),
> this would solve the availability and the security aspects.
> Instead of deprecating external links altogether, we could then
> deprecate non-compliant download links and get an overall
> very flexible system for Python package distribution.
> 
> (*) Yes, I know, I still have to deliver the updated proposal -
> been working on getting our indexes ready to serve as example :-)
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

From donald at stufft.io  Fri Mar  8 14:09:04 2013
From: donald at stufft.io (Donald Stufft)
Date: Fri, 8 Mar 2013 08:09:04 -0500
Subject: [Catalog-sig] hash tags
In-Reply-To: <5139DE99.9020005@egenix.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
Message-ID: <F8202A62-0015-4CF9-BD11-80C1C80D239E@stufft.io>

Accidentally sent this to only MAL so resending!

On Mar 8, 2013, at 7:50 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:

> On 08.03.2013 13:15, Christian Heimes wrote:
>> Am 08.03.2013 12:49, schrieb M.-A. Lemburg:
>>> Together with the added hash tag on the download file URLs (*),
>>> this would solve the availability and the security aspects.
>>> Instead of deprecating external links altogether, we could then
>>> deprecate non-compliant download links and get an overall
>>> very flexible system for Python package distribution.
>>> 
>>> (*) Yes, I know, I still have to deliver the updated proposal -
>>> been working on getting our indexes ready to serve as example :-)
>> 
>> How does your proposal look like? 
> 
> Here's the first version with the basic idea:
> 
> http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal
> 
> After the feedback I got from Holger and Phillip, I'm currently
> writing a new version, which drops some of the unneeded
> requirements and spells out a few more things.
> 
> Here's a very short version...
> 
> Installers are modified:
> 
> * to only follow rel="download" links from the /simple/ index page,
>  which have a hash tag (e.g. #md5=?)

Sounds like a pretty serious break in backwards compat. Only 29 releases out of 144493 currently have a #md5= in their download_url. Either PyPI will be expected to download url and compute a hash (DoS vector, will need to be coded properly) which is error prone and is likely to break in non obvious ways for maintainers.

While I'm obviously not against breaking backwards compatibility, I think if we're going to do that we might as well go whole hog and kill external links completely.

> * will only use the fetched download page if its contents match
>  the hash tag
> * scan that page for rel="download" links, which again have to
>  have a hash tag to be taken into account
> * only install files for which the hash tag matches the
>  downloaded content
> 
> This should provide a good way to make sure that the downloaded
> files are indeed under control of the package maintainer.
> 
> So far the only practical problem I've found with the approach
> is that the download page may not contain dynamic data, e.g.
> a date or timestamp, since that causes the hash tag not to
> verify.
> 
> The package maintainer will also have to reregister the
> package whenever changes to the download page are made -
> but that's actually intended :-)
> 
>> I like to propose query string-like
>> key/value pairs. key/value pairs are more flexible and allow us to
>> add/remove new information in the future.
> 
> Good idea. I'll add that as extension mechanism.
> 
>> I also propose that we add the file size in octets (bytes with 8bits in
>> each byte) to the fragment identifier. File size validation prohibits
>> e.g. length extension attacks. It is useful to download tools. I know
>> that HTTP servers usually set a Content-Length header for static files.
>> But the header is set by the CDN while the information in the fragment
>> identifier shall come from PyPI's internal database.
>> 
>> Example:
>> 
>> defusedxml-0.4.tar.gz#md5=09873c31ce773d48b8a4759571655a2c&sha1=33821e6891e3fc3829f5a238a93490f939533d62&octets=48324
> 
> Minor nit: s/octets/size
> 
> We could probably even add GPG sigs to the link.
> 
> The only problem with the extension mechanism is that the currently
> available installers only support "#md5=?".

pip works just fine with any of the algorithms from hashlib. The installers all
also support #egg=, and there might be some others I can't recall offhand.

> 
> Perhaps there's some way to trick them into still working with
> the query-style fragment links ?!
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/e5d84055/attachment-0001.pgp>

From donald at stufft.io  Fri Mar  8 14:13:25 2013
From: donald at stufft.io (Donald Stufft)
Date: Fri, 8 Mar 2013 08:13:25 -0500
Subject: [Catalog-sig] Deprecation of External Urls, Statistics
In-Reply-To: <396F795E-8B6D-4EF3-8B45-08527A04C60E@gmail.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com>
	<396F795E-8B6D-4EF3-8B45-08527A04C60E@gmail.com>
Message-ID: <5A30A698-71A8-445D-9565-07D5769951BD@stufft.io>


On Mar 8, 2013, at 8:07 AM, Jesse Noller <jnoller at gmail.com> wrote:

> As long as external URLs eventually are completely removed I'm okay with caching things

So I have mixed feelings on caching the urls. I'm not completely against it however it does present a problem of "Well how do we know if the url we are fetching is the accurate url for that package". Downloading and caching them and presenting them the same as if someone uploaded them directly to PyPI loses a point of distinction between "PyPI can verify this is the package that the author intended to release" and "This is something we think that the author releases, maybe, probably?".

It does solve the backwards compatibility issue of killing external urls immediately so I'm not flat out against it, but there may be legal issues involved too?

> 
> On Mar 8, 2013, at 6:49 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:
> 
>> On 08.03.2013 02:40, Donald Stufft wrote:
>>> So I updated my script (had to remove eventlet) and I believe it's now accurate. The total time was ~54 hours so this is hardly scientific but it should give a good idea what sort of impact we are talking about.
>>> 
>>> This is a list of versions that pip's PackageFinder (what it uses to locate packages to install) could find that were not available on PyPI.
>>> 
>>> The results and script is available at: https://gist.github.com/dstufft/5088915
>>> 
>>> Some statistics:
>>> 
>>>   Projects affected (with dev): 2269
>>>   Versions affected (with dev): 8006
>>> 
>>>   Projects affected (without dev): 1880
>>>   Versions affected (without dev): 7586
>>> 
>>> These numbers are if all external urls were immediately removed from PyPI, so this would be the total affected. This does not test if the actual package is installable, just if pip is able to locate an url that it thinks represents a version for that project.
>> 
>> Thanks for running the test.
>> 
>> About 10% of all packages. The numbers are already impressive,
>> but if you factor in the popularity of some of those
>> packages, the situation becomes worse.
>> 
>> I'm beginning to wonder whether caching the external link content
>> on the PyPI CDN wouldn't be a better idea.
>> 
>> We'd have to make that legally waterproof and also have an opt-out
>> mechanism, but it would get us from here to there a lot faster.
>> 
>> Together with the added hash tag on the download file URLs (*),
>> this would solve the availability and the security aspects.
>> Instead of deprecating external links altogether, we could then
>> deprecate non-compliant download links and get an overall
>> very flexible system for Python package distribution.
>> 
>> (*) Yes, I know, I still have to deliver the updated proposal -
>> been working on getting our indexes ready to serve as example :-)
>> 
>> -- 
>> Marc-Andre Lemburg
>> eGenix.com
>> 
>> Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
>> ________________________________________________________________________
>> 
>> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
>> 
>>  eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>>   D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>>          Registered at Amtsgericht Duesseldorf: HRB 46611
>>              http://www.egenix.com/company/contact/
>> _______________________________________________
>> Catalog-SIG mailing list
>> Catalog-SIG at python.org
>> http://mail.python.org/mailman/listinfo/catalog-sig


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/0b329351/attachment.pgp>

From donald at stufft.io  Fri Mar  8 14:18:44 2013
From: donald at stufft.io (Donald Stufft)
Date: Fri, 8 Mar 2013 08:18:44 -0500
Subject: [Catalog-sig] Deprecation of External Urls, Statistics
In-Reply-To: <5A30A698-71A8-445D-9565-07D5769951BD@stufft.io>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com>
	<396F795E-8B6D-4EF3-8B45-08527A04C60E@gmail.com>
	<5A30A698-71A8-445D-9565-07D5769951BD@stufft.io>
Message-ID: <814AE930-BFD3-4507-BFA2-21BC07C4C07A@stufft.io>


On Mar 8, 2013, at 8:13 AM, Donald Stufft <donald at stufft.io> wrote:

> 
> On Mar 8, 2013, at 8:07 AM, Jesse Noller <jnoller at gmail.com> wrote:
> 
>> As long as external URLs eventually are completely removed I'm okay with caching things
> 
> So I have mixed feelings on caching the urls. I'm not completely against it however it does present a problem of "Well how do we know if the url we are fetching is the accurate url for that package". Downloading and caching them and presenting them the same as if someone uploaded them directly to PyPI loses a point of distinction between "PyPI can verify this is the package that the author intended to release" and "This is something we think that the author releases, maybe, probably?".

The distinction can be fixed with a rel="external" or rel="cached" or whatever. I believe all the tools will still find them as downloadable targets and can be adapted to print a warning if that's desired. We *might* be caching a package that has already been replaced by an attacker but by caching and centralizing it we have a better way of removing it once it's found. The legal issues is something we'd probably need to ask VanL?

So that's an Ok, Neutral, and Unknown for my 3 major complaints.

> 
> It does solve the backwards compatibility issue of killing external urls immediately so I'm not flat out against it, but there may be legal issues involved too?
> 
>> 
>> On Mar 8, 2013, at 6:49 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:
>> 
>>> On 08.03.2013 02:40, Donald Stufft wrote:
>>>> So I updated my script (had to remove eventlet) and I believe it's now accurate. The total time was ~54 hours so this is hardly scientific but it should give a good idea what sort of impact we are talking about.
>>>> 
>>>> This is a list of versions that pip's PackageFinder (what it uses to locate packages to install) could find that were not available on PyPI.
>>>> 
>>>> The results and script is available at: https://gist.github.com/dstufft/5088915
>>>> 
>>>> Some statistics:
>>>> 
>>>>  Projects affected (with dev): 2269
>>>>  Versions affected (with dev): 8006
>>>> 
>>>>  Projects affected (without dev): 1880
>>>>  Versions affected (without dev): 7586
>>>> 
>>>> These numbers are if all external urls were immediately removed from PyPI, so this would be the total affected. This does not test if the actual package is installable, just if pip is able to locate an url that it thinks represents a version for that project.
>>> 
>>> Thanks for running the test.
>>> 
>>> About 10% of all packages. The numbers are already impressive,
>>> but if you factor in the popularity of some of those
>>> packages, the situation becomes worse.
>>> 
>>> I'm beginning to wonder whether caching the external link content
>>> on the PyPI CDN wouldn't be a better idea.
>>> 
>>> We'd have to make that legally waterproof and also have an opt-out
>>> mechanism, but it would get us from here to there a lot faster.
>>> 
>>> Together with the added hash tag on the download file URLs (*),
>>> this would solve the availability and the security aspects.
>>> Instead of deprecating external links altogether, we could then
>>> deprecate non-compliant download links and get an overall
>>> very flexible system for Python package distribution.
>>> 
>>> (*) Yes, I know, I still have to deliver the updated proposal -
>>> been working on getting our indexes ready to serve as example :-)
>>> 
>>> -- 
>>> Marc-Andre Lemburg
>>> eGenix.com
>>> 
>>> Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
>>> ________________________________________________________________________
>>> 
>>> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
>>> 
>>> eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>>>  D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>>>         Registered at Amtsgericht Duesseldorf: HRB 46611
>>>             http://www.egenix.com/company/contact/
>>> _______________________________________________
>>> Catalog-SIG mailing list
>>> Catalog-SIG at python.org
>>> http://mail.python.org/mailman/listinfo/catalog-sig
> 
> 
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/566646e1/attachment.pgp>

From jnoller at gmail.com  Fri Mar  8 14:19:07 2013
From: jnoller at gmail.com (Jesse Noller)
Date: Fri, 8 Mar 2013 08:19:07 -0500
Subject: [Catalog-sig] Deprecation of External Urls, Statistics
In-Reply-To: <5A30A698-71A8-445D-9565-07D5769951BD@stufft.io>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com>
	<396F795E-8B6D-4EF3-8B45-08527A04C60E@gmail.com>
	<5A30A698-71A8-445D-9565-07D5769951BD@stufft.io>
Message-ID: <D6017E07-C769-476C-B632-167CAB870D78@gmail.com>



On Mar 8, 2013, at 8:13 AM, Donald Stufft <donald at stufft.io> wrote:

> 
> On Mar 8, 2013, at 8:07 AM, Jesse Noller <jnoller at gmail.com> wrote:
> 
>> As long as external URLs eventually are completely removed I'm okay with caching things
> 
> So I have mixed feelings on caching the urls. I'm not completely against it however it does present a problem of "Well how do we know if the url we are fetching is the accurate url for that package". Downloading and caching them and presenting them the same as if someone uploaded them directly to PyPI loses a point of distinction between "PyPI can verify this is the package that the author intended to release" and "This is something we think that the author releases, maybe, probably?".
> 
> It does solve the backwards compatibility issue of killing external urls immediately so I'm not flat out against it, but there may be legal issues involved too?

Let them opt out.

> 
>> 
>> On Mar 8, 2013, at 6:49 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:
>> 
>>> On 08.03.2013 02:40, Donald Stufft wrote:
>>>> So I updated my script (had to remove eventlet) and I believe it's now accurate. The total time was ~54 hours so this is hardly scientific but it should give a good idea what sort of impact we are talking about.
>>>> 
>>>> This is a list of versions that pip's PackageFinder (what it uses to locate packages to install) could find that were not available on PyPI.
>>>> 
>>>> The results and script is available at: https://gist.github.com/dstufft/5088915
>>>> 
>>>> Some statistics:
>>>> 
>>>>  Projects affected (with dev): 2269
>>>>  Versions affected (with dev): 8006
>>>> 
>>>>  Projects affected (without dev): 1880
>>>>  Versions affected (without dev): 7586
>>>> 
>>>> These numbers are if all external urls were immediately removed from PyPI, so this would be the total affected. This does not test if the actual package is installable, just if pip is able to locate an url that it thinks represents a version for that project.
>>> 
>>> Thanks for running the test.
>>> 
>>> About 10% of all packages. The numbers are already impressive,
>>> but if you factor in the popularity of some of those
>>> packages, the situation becomes worse.
>>> 
>>> I'm beginning to wonder whether caching the external link content
>>> on the PyPI CDN wouldn't be a better idea.
>>> 
>>> We'd have to make that legally waterproof and also have an opt-out
>>> mechanism, but it would get us from here to there a lot faster.
>>> 
>>> Together with the added hash tag on the download file URLs (*),
>>> this would solve the availability and the security aspects.
>>> Instead of deprecating external links altogether, we could then
>>> deprecate non-compliant download links and get an overall
>>> very flexible system for Python package distribution.
>>> 
>>> (*) Yes, I know, I still have to deliver the updated proposal -
>>> been working on getting our indexes ready to serve as example :-)
>>> 
>>> -- 
>>> Marc-Andre Lemburg
>>> eGenix.com
>>> 
>>> Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
>>> ________________________________________________________________________
>>> 
>>> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
>>> 
>>> eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>>>  D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>>>         Registered at Amtsgericht Duesseldorf: HRB 46611
>>>             http://www.egenix.com/company/contact/
>>> _______________________________________________
>>> Catalog-SIG mailing list
>>> Catalog-SIG at python.org
>>> http://mail.python.org/mailman/listinfo/catalog-sig
> 
> 
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 

From mal at egenix.com  Fri Mar  8 14:32:20 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 08 Mar 2013 14:32:20 +0100
Subject: [Catalog-sig] hash tags
In-Reply-To: <F8202A62-0015-4CF9-BD11-80C1C80D239E@stufft.io>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<F8202A62-0015-4CF9-BD11-80C1C80D239E@stufft.io>
Message-ID: <5139E864.8010507@egenix.com>



On 08.03.2013 14:09, Donald Stufft wrote:
> Accidentally sent this to only MAL so resending!
> 
> On Mar 8, 2013, at 7:50 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:
> 
>> On 08.03.2013 13:15, Christian Heimes wrote:
>>> Am 08.03.2013 12:49, schrieb M.-A. Lemburg:
>>>> Together with the added hash tag on the download file URLs (*),
>>>> this would solve the availability and the security aspects.
>>>> Instead of deprecating external links altogether, we could then
>>>> deprecate non-compliant download links and get an overall
>>>> very flexible system for Python package distribution.
>>>>
>>>> (*) Yes, I know, I still have to deliver the updated proposal -
>>>> been working on getting our indexes ready to serve as example :-)
>>>
>>> How does your proposal look like? 
>>
>> Here's the first version with the basic idea:
>>
>> http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal
>>
>> After the feedback I got from Holger and Phillip, I'm currently
>> writing a new version, which drops some of the unneeded
>> requirements and spells out a few more things.
>>
>> Here's a very short version...
>>
>> Installers are modified:
>>
>> * to only follow rel="download" links from the /simple/ index page,
>>  which have a hash tag (e.g. #md5=?)
> 
> Sounds like a pretty serious break in backwards compat. Only 29 releases out of 144493 currently have a #md5= in their download_url. Either PyPI will be expected to download url and compute a hash (DoS vector, will need to be coded properly) which is error prone and is likely to break in non obvious ways for maintainers.
> 
> While I'm obviously not against breaking backwards compatibility, I think if we're going to do that we might as well go whole hog and kill external links completely.

This was just the main new download theme. If the new scheme
doesn't work, they should revert back to the old scheme,
after a BIG warning the user.

Later on they could switch to requiring users to use an
option to reenable the old scheme.

In any case, I'll have to put all this into proper words and
will then post it for another review cycle.

>> * will only use the fetched download page if its contents match
>>  the hash tag
>> * scan that page for rel="download" links, which again have to
>>  have a hash tag to be taken into account
>> * only install files for which the hash tag matches the
>>  downloaded content
>>
>> This should provide a good way to make sure that the downloaded
>> files are indeed under control of the package maintainer.
>>
>> So far the only practical problem I've found with the approach
>> is that the download page may not contain dynamic data, e.g.
>> a date or timestamp, since that causes the hash tag not to
>> verify.
>>
>> The package maintainer will also have to reregister the
>> package whenever changes to the download page are made -
>> but that's actually intended :-)
>>
>>> I like to propose query string-like
>>> key/value pairs. key/value pairs are more flexible and allow us to
>>> add/remove new information in the future.
>>
>> Good idea. I'll add that as extension mechanism.
>>
>>> I also propose that we add the file size in octets (bytes with 8bits in
>>> each byte) to the fragment identifier. File size validation prohibits
>>> e.g. length extension attacks. It is useful to download tools. I know
>>> that HTTP servers usually set a Content-Length header for static files.
>>> But the header is set by the CDN while the information in the fragment
>>> identifier shall come from PyPI's internal database.
>>>
>>> Example:
>>>
>>> defusedxml-0.4.tar.gz#md5=09873c31ce773d48b8a4759571655a2c&sha1=33821e6891e3fc3829f5a238a93490f939533d62&octets=48324
>>
>> Minor nit: s/octets/size
>>
>> We could probably even add GPG sigs to the link.
>>
>> The only problem with the extension mechanism is that the currently
>> available installers only support "#md5=?".
> 
> pip works just fine with any of the algorithms from hashlib. The installers all
> also support #egg=, and there might be some others I can't recall offhand.

Ah, good to know. Thanks.

>>
>> Perhaps there's some way to trick them into still working with
>> the query-style fragment links ?!

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From mal at egenix.com  Fri Mar  8 14:47:14 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 08 Mar 2013 14:47:14 +0100
Subject: [Catalog-sig] hash tags
In-Reply-To: <5139DE99.9020005@egenix.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
Message-ID: <5139EBE2.9020500@egenix.com>

On 08.03.2013 13:50, M.-A. Lemburg wrote:
> On 08.03.2013 13:15, Christian Heimes wrote:
>> I like to propose query string-like
>> key/value pairs. key/value pairs are more flexible and allow us to
>> add/remove new information in the future.
> 
> Good idea. I'll add that as extension mechanism.
> 
>> I also propose that we add the file size in octets (bytes with 8bits in
>> each byte) to the fragment identifier. File size validation prohibits
>> e.g. length extension attacks. It is useful to download tools. I know
>> that HTTP servers usually set a Content-Length header for static files.
>> But the header is set by the CDN while the information in the fragment
>> identifier shall come from PyPI's internal database.
>>
>> Example:
>>
>> defusedxml-0.4.tar.gz#md5=09873c31ce773d48b8a4759571655a2c&sha1=33821e6891e3fc3829f5a238a93490f939533d62&octets=48324
> 
> Minor nit: s/octets/size
> 
> We could probably even add GPG sigs to the link.
> 
> The only problem with the extension mechanism is that the currently
> available installers only support "#md5=...".
> 
> Perhaps there's some way to trick them into still working with
> the query-style fragment links ?!

Too bad... at least distribute/setuptools enforces this:

    def check_md5(self, cs, info, filename, tfp):
        if re.match('md5=[0-9a-f]{32}$', info):
           ...

If it weren't for that '$', we'd have no problem.

At least distribute currently doesn't check the download links
from the /simple/ page at all, so we can use the extension
mechanism there without breaking older versions of the tools.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From solipsis at pitrou.net  Fri Mar  8 15:00:40 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 8 Mar 2013 14:00:40 +0000 (UTC)
Subject: [Catalog-sig] Search engine relevance
Message-ID: <loom.20130308T145827-470@post.gmane.org>


Hello,

It seems the PyPI search engine is quite crude and doesn't try to make the
results relevant at all.
For example, if I'm trying to search "agi" in the hope of finding modules
relevant to the Asterisk Gateway Interface (nicknamed "AGI"), I get the
following results:

https://pypi.python.org/pypi?%3Aaction=search&term=agi&submit=search

As you can see, a large number of results pop up simply because they contain
the word "magic", which apparently is considered to match the "agi" request.
Clearly either the selection or the weighting algorithm isn't very efficient
here.

Regards

Antoine.



From jacob at jacobian.org  Fri Mar  8 15:51:05 2013
From: jacob at jacobian.org (Jacob Kaplan-Moss)
Date: Fri, 8 Mar 2013 08:51:05 -0600
Subject: [Catalog-sig] Search engine relevance
In-Reply-To: <loom.20130308T145827-470@post.gmane.org>
References: <loom.20130308T145827-470@post.gmane.org>
Message-ID: <CAK8PqJFLeP3a1OEoKhkwFh5XGiKFD+mbdfokHEH0ONUM5uxrsw@mail.gmail.com>

Hi Antoine -

Yes, PyPI's search engine is rather simplistic, I think that's a
pretty well-known problem.

For the time being you might try Crate instead (crate.io); I've found
its search engine to be much much better.

Jacob

On Fri, Mar 8, 2013 at 8:00 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>
> Hello,
>
> It seems the PyPI search engine is quite crude and doesn't try to make the
> results relevant at all.
> For example, if I'm trying to search "agi" in the hope of finding modules
> relevant to the Asterisk Gateway Interface (nicknamed "AGI"), I get the
> following results:
>
> https://pypi.python.org/pypi?%3Aaction=search&term=agi&submit=search
>
> As you can see, a large number of results pop up simply because they contain
> the word "magic", which apparently is considered to match the "agi" request.
> Clearly either the selection or the weighting algorithm isn't very efficient
> here.
>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

From ubershmekel at gmail.com  Fri Mar  8 16:03:32 2013
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Fri, 8 Mar 2013 07:03:32 -0800
Subject: [Catalog-sig] Search engine relevance
In-Reply-To: <CAK8PqJFLeP3a1OEoKhkwFh5XGiKFD+mbdfokHEH0ONUM5uxrsw@mail.gmail.com>
References: <loom.20130308T145827-470@post.gmane.org>
	<CAK8PqJFLeP3a1OEoKhkwFh5XGiKFD+mbdfokHEH0ONUM5uxrsw@mail.gmail.com>
Message-ID: <CANSw7KycEVeNXStuZ2yhSnc3261V9W=CA=Mi8jeoGVGFUOWQtg@mail.gmail.com>

https://crate.io/?has_releases=on&q=agi

No results found.



On Fri, Mar 8, 2013 at 6:51 AM, Jacob Kaplan-Moss <jacob at jacobian.org>wrote:

> Hi Antoine -
>
> Yes, PyPI's search engine is rather simplistic, I think that's a
> pretty well-known problem.
>
> For the time being you might try Crate instead (crate.io); I've found
> its search engine to be much much better.
>
> Jacob
>
> On Fri, Mar 8, 2013 at 8:00 AM, Antoine Pitrou <solipsis at pitrou.net>
> wrote:
> >
> > Hello,
> >
> > It seems the PyPI search engine is quite crude and doesn't try to make
> the
> > results relevant at all.
> > For example, if I'm trying to search "agi" in the hope of finding modules
> > relevant to the Asterisk Gateway Interface (nicknamed "AGI"), I get the
> > following results:
> >
> > https://pypi.python.org/pypi?%3Aaction=search&term=agi&submit=search
> >
> > As you can see, a large number of results pop up simply because they
> contain
> > the word "magic", which apparently is considered to match the "agi"
> request.
> > Clearly either the selection or the weighting algorithm isn't very
> efficient
> > here.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > _______________________________________________
> > Catalog-SIG mailing list
> > Catalog-SIG at python.org
> > http://mail.python.org/mailman/listinfo/catalog-sig
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/cc26be07/attachment-0001.html>

From donald at stufft.io  Fri Mar  8 16:22:39 2013
From: donald at stufft.io (Donald Stufft)
Date: Fri, 8 Mar 2013 10:22:39 -0500
Subject: [Catalog-sig] Search engine relevance
In-Reply-To: <CAK8PqJFLeP3a1OEoKhkwFh5XGiKFD+mbdfokHEH0ONUM5uxrsw@mail.gmail.com>
References: <loom.20130308T145827-470@post.gmane.org>
	<CAK8PqJFLeP3a1OEoKhkwFh5XGiKFD+mbdfokHEH0ONUM5uxrsw@mail.gmail.com>
Message-ID: <77B5D8F8-7334-441E-A987-17FAD068D90C@stufft.io>


On Mar 8, 2013, at 9:51 AM, Jacob Kaplan-Moss <jacob at jacobian.org> wrote:

> Hi Antoine -
> 
> Yes, PyPI's search engine is rather simplistic, I think that's a
> pretty well-known problem.
> 
> For the time being you might try Crate instead (crate.io); I've found
> its search engine to be much much better.

Crate's search uses ElasticSearch whereas I believe PyPI is just using SQL against the DB.

That being said Crate's search could be a lot better still :/ But I'm not an expert on how to get the best search results.

> 
> Jacob
> 
> On Fri, Mar 8, 2013 at 8:00 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> 
>> Hello,
>> 
>> It seems the PyPI search engine is quite crude and doesn't try to make the
>> results relevant at all.
>> For example, if I'm trying to search "agi" in the hope of finding modules
>> relevant to the Asterisk Gateway Interface (nicknamed "AGI"), I get the
>> following results:
>> 
>> https://pypi.python.org/pypi?%3Aaction=search&term=agi&submit=search
>> 
>> As you can see, a large number of results pop up simply because they contain
>> the word "magic", which apparently is considered to match the "agi" request.
>> Clearly either the selection or the weighting algorithm isn't very efficient
>> here.
>> 
>> Regards
>> 
>> Antoine.
>> 
>> 
>> _______________________________________________
>> Catalog-SIG mailing list
>> Catalog-SIG at python.org
>> http://mail.python.org/mailman/listinfo/catalog-sig
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/a4bf773f/attachment.pgp>

From solipsis at pitrou.net  Fri Mar  8 16:24:00 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 8 Mar 2013 15:24:00 +0000 (UTC)
Subject: [Catalog-sig] Search engine relevance
References: <loom.20130308T145827-470@post.gmane.org>
	<CAK8PqJFLeP3a1OEoKhkwFh5XGiKFD+mbdfokHEH0ONUM5uxrsw@mail.gmail.com>
	<CANSw7KycEVeNXStuZ2yhSnc3261V9W=CA=Mi8jeoGVGFUOWQtg@mail.gmail.com>
Message-ID: <loom.20130308T162304-685@post.gmane.org>

Yuval Greenfield <ubershmekel <at> gmail.com> writes:
> 
> https://crate.io/?has_releases=on&q=agi
> 
> No results found.

Thanks for the answers.
Yes, crate.io is at least missing pyst2 which does mention AGI in its
description:
https://crate.io/packages/pyst2/

(pyst2 is rather unmaintained, but that shouldn't matter a lot here :-))

Regards

Antoine.



From donald at stufft.io  Fri Mar  8 16:28:06 2013
From: donald at stufft.io (Donald Stufft)
Date: Fri, 8 Mar 2013 10:28:06 -0500
Subject: [Catalog-sig] Search engine relevance
In-Reply-To: <loom.20130308T162304-685@post.gmane.org>
References: <loom.20130308T145827-470@post.gmane.org>
	<CAK8PqJFLeP3a1OEoKhkwFh5XGiKFD+mbdfokHEH0ONUM5uxrsw@mail.gmail.com>
	<CANSw7KycEVeNXStuZ2yhSnc3261V9W=CA=Mi8jeoGVGFUOWQtg@mail.gmail.com>
	<loom.20130308T162304-685@post.gmane.org>
Message-ID: <6EE76A9E-FF3B-48E5-9370-3153A7C51561@stufft.io>


On Mar 8, 2013, at 10:24 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> Yuval Greenfield <ubershmekel <at> gmail.com> writes:
>> 
>> https://crate.io/?has_releases=on&q=agi
>> 
>> No results found.
> 
> Thanks for the answers.
> Yes, crate.io is at least missing pyst2 which does mention AGI in its
> description:
> https://crate.io/packages/pyst2/

So it comes up when you search for "asterisk" https://crate.io/?q=asterisk&has_releases=on however that is less than optimal.

Basically the long_description isn't currently included in indexing for Crate because it trashed the search relevancy and I was unable to (with my limited experience in searches) come up with a method here that didn't trash the overall relevancy.

> 
> (pyst2 is rather unmaintained, but that shouldn't matter a lot here :-))
> 
> Regards
> 
> Antoine.
> 
> 
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/1c82e004/attachment.pgp>

From ubershmekel at gmail.com  Fri Mar  8 16:29:44 2013
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Fri, 8 Mar 2013 07:29:44 -0800
Subject: [Catalog-sig] Search engine relevance
In-Reply-To: <loom.20130308T162304-685@post.gmane.org>
References: <loom.20130308T145827-470@post.gmane.org>
	<CAK8PqJFLeP3a1OEoKhkwFh5XGiKFD+mbdfokHEH0ONUM5uxrsw@mail.gmail.com>
	<CANSw7KycEVeNXStuZ2yhSnc3261V9W=CA=Mi8jeoGVGFUOWQtg@mail.gmail.com>
	<loom.20130308T162304-685@post.gmane.org>
Message-ID: <CANSw7Kzj-28F0e7VyC-avE4Z-KBUYQ50-ngm4e_8634UMZR5uQ@mail.gmail.com>

On Fri, Mar 8, 2013 at 7:24 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> Yes, crate.io is at least missing pyst2 which does mention AGI in its
>  description:
> https://crate.io/packages/pyst2/
>
>
>
I agree. There's only one effective search engine for pypi I know of, e.g.

https://www.google.com/search?q=site%3Apypi.python.org+agi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/62813bbf/attachment.html>

From pje at telecommunity.com  Fri Mar  8 20:16:57 2013
From: pje at telecommunity.com (PJ Eby)
Date: Fri, 8 Mar 2013 14:16:57 -0500
Subject: [Catalog-sig] hash tags
In-Reply-To: <5139DE99.9020005@egenix.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
Message-ID: <CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>

On Fri, Mar 8, 2013 at 7:50 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> After the feedback I got from Holger and Phillip, I'm currently
> writing a new version, which drops some of the unneeded
> requirements and spells out a few more things.
>
> Here's a very short version...
>
> Installers are modified:
>
> * to only follow rel="download" links from the /simple/ index page,
>   which have a hash tag (e.g. #md5=...)
> * will only use the fetched download page if its contents match
>   the hash tag
> * scan that page for rel="download" links, which again have to
>   have a hash tag to be taken into account
> * only install files for which the hash tag matches the
>   downloaded content
>
> This should provide a good way to make sure that the downloaded
> files are indeed under control of the package maintainer.

There is, as I said before, a MUCH simpler way to do this, that works
right now: put direct #md5 download links in your description, and
phase out the rel="" attributes altogether.

The key to making this transition isn't creating elaborate new
standards for the tools, it's *creating new tools for the standards*.

Specifically, *migration tools*.  A migration tool could be made that
scans existing external links and converts found links to #md5 links
or alternately uploads the files themselves to PyPI.  You can do that
without changing pip or distribute or anything else but PyPI, so
there's no need to wait out update cycles to take advantage.

Once a project/version has switched to either #md5 links or PyPI
copies, you can just drop the rel="" attributes and you're done.

Alternately, if using the description for download links is considered
a bad idea, add a new field to PyPI for them.

Point is, this entire thing can be done correctly at the PyPI end and
work with the existing API of the download tools.


> So far the only practical problem I've found with the approach
> is that the download page may not contain dynamic data, e.g.
> a date or timestamp, since that causes the hash tag not to
> verify.

Which is completely unnecessary if one simply exposes the *actual*
download links directly on PyPI.  The download page is redundant, in a
couple different ways.  First, since it can't change, there's no point
in re-fetching it all the time.  Second, since it's only going to be
read by tools anyway, there's no point to it containing anything
besides the link.

So, since the page only contains links, might as well put the links
straight on PyPI, or at most have an option/tool to load the links
from an external source.

Again, the key to making this work is going to be somebody putting
buttons in the PyPI interface (and making setuptools/distutils
commands or similar CLI tools) to migrate their files (or links to the
files) to PyPI hosting.  A new API for such tools is entirely
unnecessary -- at most there might need to be a new field made
available/accessible.  (Personally I don't care if your download links
have to be in the description field if you're hosting off-site, but
that's just me.)

From noah at coderanger.net  Fri Mar  8 20:52:33 2013
From: noah at coderanger.net (Noah Kantrowitz)
Date: Fri, 8 Mar 2013 11:52:33 -0800
Subject: [Catalog-sig] hash tags
In-Reply-To: <5139DE99.9020005@egenix.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
Message-ID: <D75DCD20-1DCC-48EA-BA2F-C45138804163@coderanger.net>


On Mar 8, 2013, at 4:50 AM, M.-A. Lemburg wrote:

> On 08.03.2013 13:15, Christian Heimes wrote:
>> Am 08.03.2013 12:49, schrieb M.-A. Lemburg:
>>> Together with the added hash tag on the download file URLs (*),
>>> this would solve the availability and the security aspects.
>>> Instead of deprecating external links altogether, we could then
>>> deprecate non-compliant download links and get an overall
>>> very flexible system for Python package distribution.
>>> 
>>> (*) Yes, I know, I still have to deliver the updated proposal -
>>> been working on getting our indexes ready to serve as example :-)
>> 
>> How does your proposal look like? 
> 
> Here's the first version with the basic idea:
> 
> http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal
> 
> After the feedback I got from Holger and Phillip, I'm currently
> writing a new version, which drops some of the unneeded
> requirements and spells out a few more things.
> 
> Here's a very short version...
> 
> Installers are modified:
> 
> * to only follow rel="download" links from the /simple/ index page,
>  which have a hash tag (e.g. #md5=...)
> * will only use the fetched download page if its contents match
>  the hash tag
> * scan that page for rel="download" links, which again have to
>  have a hash tag to be taken into account
> * only install files for which the hash tag matches the
>  downloaded content
> 
> This should provide a good way to make sure that the downloaded
> files are indeed under control of the package maintainer.

MD5 is _not_ acceptable for anything security related and we shouldn't be adding anything that increases our dependence on it. MD5's only use in the packaging world is to make people who forget that TCP has its own checksums feel all warm and fuzzy that there hasn't been _accidental_ download corruption.

--Noah

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/64f66f82/attachment.pgp>

From pje at telecommunity.com  Fri Mar  8 20:54:28 2013
From: pje at telecommunity.com (PJ Eby)
Date: Fri, 8 Mar 2013 14:54:28 -0500
Subject: [Catalog-sig] Deprecation of External Urls, Statistics
In-Reply-To: <5A30A698-71A8-445D-9565-07D5769951BD@stufft.io>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com>
	<396F795E-8B6D-4EF3-8B45-08527A04C60E@gmail.com>
	<5A30A698-71A8-445D-9565-07D5769951BD@stufft.io>
Message-ID: <CALeMXf459Q4NijQSWU-Vhi-qH-qG=HtquNtZ24AG6riVz3JKmA@mail.gmail.com>

On Fri, Mar 8, 2013 at 8:13 AM, Donald Stufft <donald at stufft.io> wrote:
> It does solve the backwards compatibility issue of killing external urls immediately so I'm not flat out against it, but there may be legal issues involved too?

I've mentioned this in the other thread as well, but the best way to
actually ensure this stuff gets moved over to PyPI is to make it
*easy*.  Give developers a button to click on PyPI that fetches all
their external links (requiring first that you give matching MD5 or
other checksums) and uploads them to PyPI, and a whole bunch of those
projects are likely to be okay with clicking it a few times.  A
command-line tool to do it (especially as a distutils/setuptools
command) would be a good idea, too.

Of the tiny minority of remaining people who object to PyPI hosting
for some reason other than convenience/familiarity (e.g. MAL's
licensing objection), it will likely be sufficient to provide an
option to add #md5 links to their description, in lieu of actual
rehosting.

FWIW, it's hard to get people to change behavior when one condemns
that behavior as unlikeable or socially undesirable, because it means
one is less likely to consider the other person's motivations, needs,
etc., and on top of that, the other person's resistance and rebellion
are stirred up by being the subject of one's disapproval.

So please, let's all stop talking about ways to work around the
package authors and project maintainers, or how to force them into
doing our bidding, and start talking instead about how to make it
*easy* and *obvious* for them to do what we want.

(And people who think it's already easy and obvious enough, so those
10% of projects must be stupid, will obviously not have anything
positive to contribute.)

So let me kick off that discussion with a list of known-so-far use
cases for external hosting, in descending order of my extremely rough
guesstimate of frequency:

* Always did it that way, never saw a reason to change, or didn't know
you could upload to PyPI
* Lots of files that are currently generated on the system where
they're hosted, or in an automated system that would need significant
rework to support PyPI
* Development snapshots (which may in fact be depended upon by other
in-development projects, so manual URL specification doesn't help
here)
* Had an issue w/PyPI availability in the past
* Objectors to PyPI's licensing requirements

Automation is aimed at the first two: make it easy enough, w/a carrot
and a stick ("external link spidering is going away, you have to put
either the links or the files on PyPI directly if you want them
found"), and a lot of people will move (assuming they're actually
still maintaining their project).

Development snapshots are an interesting case, because one of the
reasons they're valuable is that PyPI's existing multi-release
behavior is a major PITA.  You can't upload a new version of something
without PyPI creating a new release for it...  and automatically
hiding all your previous releases, including your stable release.
There's a lot that would have to be done to PyPI's release management
before it would actually be sane to track such releases there.  So the
obvious fix is to do nothing; such links being external doesn't hurt
availability for people that don't depend on them (unlike
rel=homepage/download links).

The last two issues are education/persuasion problems that won't be
affected by technology changes.

Does anybody know of any other use cases for the thousands of projects
and releases relying on external link discovery spidering?

(Disparaging remarks about why a particular use case is bad, no good,
makes you go blind, etc. need not apply: they serve only to show that
the person providing the opinion lacks sufficient empathy with the
target audience to be *useful* in a discussion of how to persuade that
target audience to behave differently.)

From donald at stufft.io  Fri Mar  8 21:06:17 2013
From: donald at stufft.io (Donald Stufft)
Date: Fri, 8 Mar 2013 15:06:17 -0500
Subject: [Catalog-sig] Deprecation of External Urls, Statistics
In-Reply-To: <CALeMXf459Q4NijQSWU-Vhi-qH-qG=HtquNtZ24AG6riVz3JKmA@mail.gmail.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com>
	<396F795E-8B6D-4EF3-8B45-08527A04C60E@gmail.com>
	<5A30A698-71A8-445D-9565-07D5769951BD@stufft.io>
	<CALeMXf459Q4NijQSWU-Vhi-qH-qG=HtquNtZ24AG6riVz3JKmA@mail.gmail.com>
Message-ID: <83DDC321-9809-4E08-B0DC-3159A15130DA@stufft.io>

On Mar 8, 2013, at 2:54 PM, PJ Eby <pje at telecommunity.com> wrote:

> On Fri, Mar 8, 2013 at 8:13 AM, Donald Stufft <donald at stufft.io> wrote:
>> It does solve the backwards compatibility issue of killing external urls immediately so I'm not flat out against it, but there may be legal issues involved too?
> 
> I've mentioned this in the other thread as well, but the best way to
> actually ensure this stuff gets moved over to PyPI is to make it
> *easy*.  Give developers a button to click on PyPI that fetches all
> their external links (requiring first that you give matching MD5 or
> other checksums) and uploads them to PyPI, and a whole bunch of those
> projects are likely to be okay with clicking it a few times.  A
> command-line tool to do it (especially as a distutils/setuptools
> command) would be a good idea, too.

Tooling is the easy part. I've already volunteered to write a PR to add this functionality to PyPI, maybe with a mail out for maximal conversion.

> 
> Of the tiny minority of remaining people who object to PyPI hosting
> for some reason other than convenience/familiarity (e.g. MAL's
> licensing objection), it will likely be sufficient to provide an
> option to add #md5 links to their description, in lieu of actual
> rehosting.

Keeping the ability to include external links lowers the overall effectiveness of the service in uptime and privacy. MD5 hashes are also unacceptable as a secure hash but that's another argument.

> 
> FWIW, it's hard to get people to change behavior when one condemns
> that behavior as unlikeable or socially undesirable, because it means
> one is less likely to consider the other person's motivations, needs,
> etc., and on top of that, the other person's resistance and rebellion
> are stirred up by being the subject of one's disapproval.
> 
> So please, let's all stop talking about ways to work around the
> package authors and project maintainers, or how to force them into
> doing our bidding, and start talking instead about how to make it
> *easy* and *obvious* for them to do what we want.
> 
> (And people who think it's already easy and obvious enough, so those
> 10% of projects must be stupid, will obviously not have anything
> positive to contribute.)
> 
> So let me kick off that discussion with a list of known-so-far use
> cases for external hosting, in descending order of my extremely rough
> guesstimate of frequency:
> 
> * Always did it that way, never saw a reason to change, or didn't know
> you could upload to PyPI
> * Lots of files that are currently generated on the system where
> they're hosted, or in an automated system that would need significant
> rework to support PyPI
> * Development snapshots (which may in fact be depended upon by other
> in-development projects, so manual URL specification doesn't help
> here)
> * Had an issue w/PyPI availability in the past
> * Objectors to PyPI's licensing requirements
> 
> Automation is aimed at the first two: make it easy enough, w/a carrot
> and a stick ("external link spidering is going away, you have to put
> either the links or the files on PyPI directly if you want them
> found"), and a lot of people will move (assuming they're actually
> still maintaining their project).
> 
> Development snapshots are an interesting case, because one of the
> reasons they're valuable is that PyPI's existing multi-release
> behavior is a major PITA.  You can't upload a new version of something
> without PyPI creating a new release for it...  and automatically
> hiding all your previous releases, including your stable release.
> There's a lot that would have to be done to PyPI's release management
> before it would actually be sane to track such releases there.  So the
> obvious fix is to do nothing; such links being external doesn't hurt
> availability for people that don't depend on them (unlike
> rel=homepage/download links).

This is false, PyPI has a toggle to turn off the automatic hiding by default. However PyPI does need an option to prefer stable for what it uses as the default release when you visit a page in the Web UI.

If you're going to release a snapshot to PyPI you _should_ need to create a new release for it.

> 
> The last two issues are education/persuasion problems that won't be
> affected by technology changes.
> 
> Does anybody know of any other use cases for the thousands of projects
> and releases relying on external link discovery spidering?
> 
> (Disparaging remarks about why a particular use case is bad, no good,
> makes you go blind, etc. need not apply: they serve only to show that
> the person providing the opinion lacks sufficient empathy with the
> target audience to be *useful* in a discussion of how to persuade that
> target audience to behave differently.)


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/d975ed70/attachment.pgp>

From mal at egenix.com  Fri Mar  8 22:11:38 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 08 Mar 2013 22:11:38 +0100
Subject: [Catalog-sig] hash tags
In-Reply-To: <D75DCD20-1DCC-48EA-BA2F-C45138804163@coderanger.net>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<D75DCD20-1DCC-48EA-BA2F-C45138804163@coderanger.net>
Message-ID: <513A540A.1010703@egenix.com>

On 08.03.2013 20:52, Noah Kantrowitz wrote:
> 
> On Mar 8, 2013, at 4:50 AM, M.-A. Lemburg wrote:
> 
>> On 08.03.2013 13:15, Christian Heimes wrote:
>>> Am 08.03.2013 12:49, schrieb M.-A. Lemburg:
>>>> Together with the added hash tag on the download file URLs (*),
>>>> this would solve the availability and the security aspects.
>>>> Instead of deprecating external links altogether, we could then
>>>> deprecate non-compliant download links and get an overall
>>>> very flexible system for Python package distribution.
>>>>
>>>> (*) Yes, I know, I still have to deliver the updated proposal -
>>>> been working on getting our indexes ready to serve as example :-)
>>>
>>> How does your proposal look like? 
>>
>> Here's the first version with the basic idea:
>>
>> http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal
>>
>> After the feedback I got from Holger and Phillip, I'm currently
>> writing a new version, which drops some of the unneeded
>> requirements and spells out a few more things.
>>
>> Here's a very short version...
>>
>> Installers are modified:
>>
>> * to only follow rel="download" links from the /simple/ index page,
>>  which have a hash tag (e.g. #md5=...)
>> * will only use the fetched download page if its contents match
>>  the hash tag
>> * scan that page for rel="download" links, which again have to
>>  have a hash tag to be taken into account
>> * only install files for which the hash tag matches the
>>  downloaded content
>>
>> This should provide a good way to make sure that the downloaded
>> files are indeed under control of the package maintainer.
> 
> MD5 is _not_ acceptable for anything security related and we shouldn't be adding anything that increases our dependence on it. MD5's only use in the packaging world is to make people who forget that TCP has its own checksums feel all warm and fuzzy that there hasn't been _accidental_ download corruption.

I was only using the existing md5 hash tags as example. Tools should
migrate to support all hashlib algorithms (pip already does),
so the hash tag can be e.g. #sha1=... or #sha256=...

For Python 2.4 only md5 and sha1 would work, since it didn't
come with a hashlib module.

With the extension mechanism Christian proposed, we can also
add all sorts of other things as well, e.g. size indications,
GPG key ID, GPG sigs, etc.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From pje at telecommunity.com  Fri Mar  8 22:12:05 2013
From: pje at telecommunity.com (PJ Eby)
Date: Fri, 8 Mar 2013 16:12:05 -0500
Subject: [Catalog-sig] hash tags
In-Reply-To: <D75DCD20-1DCC-48EA-BA2F-C45138804163@coderanger.net>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<D75DCD20-1DCC-48EA-BA2F-C45138804163@coderanger.net>
Message-ID: <CALeMXf7V5+4QW=g-wQrqRt0eTf_yja4yYW3BgpXX4uQZz7c4gA@mail.gmail.com>

On Fri, Mar 8, 2013 at 2:52 PM, Noah Kantrowitz <noah at coderanger.net> wrote:
> MD5 is _not_ acceptable for anything security related and we shouldn't be adding anything that increases our dependence on it. MD5's only use in the packaging world is to make people who forget that TCP has its own checksums feel all warm and fuzzy that there hasn't been _accidental_ download corruption.

So, you're saying that someone has found a second-preimage attack
against MD5 that's more efficient than the current 2**127 threshold
established in 2009?

"Anything security related" is pretty broad.  Out of the many classes
of attacks on hashes, AFAIK the only class that's relevant to PyPI is
second preimage attacks,  i.e. one where the attacker has the original
file and the hash, and must construct a new file that produces the
same hash value.

Did you have some other type of hash attack in mind?  And in either
case, do you have a referent for the attack complexity?

From mal at egenix.com  Fri Mar  8 22:17:41 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 08 Mar 2013 22:17:41 +0100
Subject: [Catalog-sig] hash tags
In-Reply-To: <CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
Message-ID: <513A5575.5000200@egenix.com>

On 08.03.2013 20:16, PJ Eby wrote:
> On Fri, Mar 8, 2013 at 7:50 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> After the feedback I got from Holger and Phillip, I'm currently
>> writing a new version, which drops some of the unneeded
>> requirements and spells out a few more things.
>>
>> Here's a very short version...
>>
>> Installers are modified:
>>
>> * to only follow rel="download" links from the /simple/ index page,
>>   which have a hash tag (e.g. #md5=...)
>> * will only use the fetched download page if its contents match
>>   the hash tag
>> * scan that page for rel="download" links, which again have to
>>   have a hash tag to be taken into account
>> * only install files for which the hash tag matches the
>>   downloaded content
>>
>> This should provide a good way to make sure that the downloaded
>> files are indeed under control of the package maintainer.
> 
> There is, as I said before, a MUCH simpler way to do this, that works
> right now: put direct #md5 download links in your description, and
> phase out the rel="" attributes altogether.

No, that would be a pretty poor design :-)

The rel="" attributes are good design, since they were meant for
exactly this purpose (machine reading and understanding relations
between origin and target).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From donald at stufft.io  Fri Mar  8 22:26:07 2013
From: donald at stufft.io (Donald Stufft)
Date: Fri, 8 Mar 2013 16:26:07 -0500
Subject: [Catalog-sig] hash tags
In-Reply-To: <CALeMXf7V5+4QW=g-wQrqRt0eTf_yja4yYW3BgpXX4uQZz7c4gA@mail.gmail.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<D75DCD20-1DCC-48EA-BA2F-C45138804163@coderanger.net>
	<CALeMXf7V5+4QW=g-wQrqRt0eTf_yja4yYW3BgpXX4uQZz7c4gA@mail.gmail.com>
Message-ID: <B619B06C-A7A8-4AB3-A6F6-E2825F4728F9@stufft.io>

On Mar 8, 2013, at 4:12 PM, PJ Eby <pje at telecommunity.com> wrote:

> On Fri, Mar 8, 2013 at 2:52 PM, Noah Kantrowitz <noah at coderanger.net> wrote:
>> MD5 is _not_ acceptable for anything security related and we shouldn't be adding anything that increases our dependence on it. MD5's only use in the packaging world is to make people who forget that TCP has its own checksums feel all warm and fuzzy that there hasn't been _accidental_ download corruption.
> 
> So, you're saying that someone has found a second-preimage attack
> against MD5 that's more efficient than the current 2**127 threshold
> established in 2009?
> 
> "Anything security related" is pretty broad.  Out of the many classes
> of attacks on hashes, AFAIK the only class that's relevant to PyPI is
> second preimage attacks,  i.e. one where the attacker has the original
> file and the hash, and must construct a new file that produces the
> same hash value.

Relevant to PyPI is pretty broad, and when you're developing a secure system you need to look past what is ok *today* and design for the next 5, 10, or 20 years. So even if there's no attack that can directly allow replacing the target file with a new one, continuing to utilize it is bad. It has a number of weaknesses which do not install confidence in its future security meanwhile there are a number of other hashes which _do_.

Unless you'd rather be trying to replace hashes everywhere once it's already completely broken.

> 
> Did you have some other type of hash attack in mind?  And in either
> case, do you have a referent for the attack complexity?
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/1f1075d7/attachment.pgp>

From r1chardj0n3s at gmail.com  Fri Mar  8 22:26:54 2013
From: r1chardj0n3s at gmail.com (Richard Jones)
Date: Sat, 9 Mar 2013 08:26:54 +1100
Subject: [Catalog-sig] Search engine relevance
In-Reply-To: <CANSw7Kzj-28F0e7VyC-avE4Z-KBUYQ50-ngm4e_8634UMZR5uQ@mail.gmail.com>
References: <loom.20130308T145827-470@post.gmane.org>
	<CAK8PqJFLeP3a1OEoKhkwFh5XGiKFD+mbdfokHEH0ONUM5uxrsw@mail.gmail.com>
	<CANSw7KycEVeNXStuZ2yhSnc3261V9W=CA=Mi8jeoGVGFUOWQtg@mail.gmail.com>
	<loom.20130308T162304-685@post.gmane.org>
	<CANSw7Kzj-28F0e7VyC-avE4Z-KBUYQ50-ngm4e_8634UMZR5uQ@mail.gmail.com>
Message-ID: <CAHrZfZB8cCoB3De5YPWo60mLnAhYkuPNj9yEPj3R+i-xscaxOA@mail.gmail.com>

That *was* the original search engine :-)

Then after user complaints we devised a better solution...

Always happy to take criticism of it and improve it! :-)

Sent from my portable device, please excuse the brevity.
On Mar 9, 2013 2:29 AM, "Yuval Greenfield" <ubershmekel at gmail.com> wrote:

> On Fri, Mar 8, 2013 at 7:24 AM, Antoine Pitrou <solipsis at pitrou.net>wrote:
>
>> Yes, crate.io is at least missing pyst2 which does mention AGI in its
>>  description:
>> https://crate.io/packages/pyst2/
>>
>>
>>
> I agree. There's only one effective search engine for pypi I know of, e.g.
>
> https://www.google.com/search?q=site%3Apypi.python.org+agi
>
>
>
>
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130309/1ae038d9/attachment.html>

From mal at egenix.com  Fri Mar  8 22:28:31 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 08 Mar 2013 22:28:31 +0100
Subject: [Catalog-sig] hash tags
In-Reply-To: <CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
Message-ID: <513A57FF.6000905@egenix.com>

On 08.03.2013 20:16, PJ Eby wrote:
> On Fri, Mar 8, 2013 at 7:50 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> So far the only practical problem I've found with the approach
>> is that the download page may not contain dynamic data, e.g.
>> a date or timestamp, since that causes the hash tag not to
>> verify.
> 
> Which is completely unnecessary if one simply exposes the *actual*
> download links directly on PyPI.  The download page is redundant, in a
> couple different ways.  First, since it can't change, there's no point
> in re-fetching it all the time.  Second, since it's only going to be
> read by tools anyway, there's no point to it containing anything
> besides the link.
> 
> So, since the page only contains links, might as well put the links
> straight on PyPI, or at most have an option/tool to load the links
> from an external source.

I don't follow you. We only have a single download_url field
available to store a download link.

We'd need to modify the meta data format to allow for more than
one such field, which doesn't work if you want to stay backwards
compatible.

BTW: If we go with the CDN caching model for external files, we'd
pull the download page links directly on the /simple/ index
page - as files, not external links.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From donald at stufft.io  Fri Mar  8 22:32:21 2013
From: donald at stufft.io (Donald Stufft)
Date: Fri, 8 Mar 2013 16:32:21 -0500
Subject: [Catalog-sig] hash tags
In-Reply-To: <CALeMXf7V5+4QW=g-wQrqRt0eTf_yja4yYW3BgpXX4uQZz7c4gA@mail.gmail.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<D75DCD20-1DCC-48EA-BA2F-C45138804163@coderanger.net>
	<CALeMXf7V5+4QW=g-wQrqRt0eTf_yja4yYW3BgpXX4uQZz7c4gA@mail.gmail.com>
Message-ID: <8A3002A9-5E2B-4D38-BABD-9253A027E7F6@stufft.io>


On Mar 8, 2013, at 4:12 PM, PJ Eby <pje at telecommunity.com> wrote:

> On Fri, Mar 8, 2013 at 2:52 PM, Noah Kantrowitz <noah at coderanger.net> wrote:
>> MD5 is _not_ acceptable for anything security related and we shouldn't be adding anything that increases our dependence on it. MD5's only use in the packaging world is to make people who forget that TCP has its own checksums feel all warm and fuzzy that there hasn't been _accidental_ download corruption.
> 
> So, you're saying that someone has found a second-preimage attack
> against MD5 that's more efficient than the current 2**127 threshold
> established in 2009?
> 
> "Anything security related" is pretty broad.  Out of the many classes
> of attacks on hashes, AFAIK the only class that's relevant to PyPI is
> second preimage attacks,  i.e. one where the attacker has the original
> file and the hash, and must construct a new file that produces the
> same hash value.
> 
> Did you have some other type of hash attack in mind?  And in either
> case, do you have a referent for the attack complexity?
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

Here's some more information pulled straight from Wikiepdia:

However, it has since been shown that MD5 is not collision resistant;[3] as such, MD5 is not suitable for applications like SSL certificates or digital signatures that rely on this property. In 1996, a flaw was found with the design of MD5, and while it was not a clearly fatal weakness, cryptographers began recommending the use of other algorithms, such as SHA-1?which has since been found to be vulnerable as well. In 2004, more serious flaws were discovered in MD5, making further use of the algorithm for security purposes questionable?specifically, a group of researchers described how to create a pair of files that share the same MD5 checksum.[4][5] Further advances were made in breaking MD5 in 2005, 2006, and 2007.[6] In December 2008, a group of researchers used this technique to fake SSL certificate validity,[7][8] and CMU Software Engineering Institute now says that MD5 "should be considered cryptographically broken and unsuitable for further use",[9] and most U.S. government applications now require the SHA-2 family of hash functions.[10]

Here's the important highlights:

    - specifically, a group of researchers described how to create a pair of files that share the same MD5 checksum
    - MD5 "should be considered cryptographically broken and unsuitable for further use"


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/0a3771b5/attachment.pgp>

From donald at stufft.io  Fri Mar  8 22:33:44 2013
From: donald at stufft.io (Donald Stufft)
Date: Fri, 8 Mar 2013 16:33:44 -0500
Subject: [Catalog-sig] hash tags
In-Reply-To: <513A57FF.6000905@egenix.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
	<513A57FF.6000905@egenix.com>
Message-ID: <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io>

On Mar 8, 2013, at 4:28 PM, "M.-A. Lemburg" <mal at egenix.com> wrote:

> BTW: If we go with the CDN caching model for external files, we'd
> pull the download page links directly on the /simple/ index
> page - as files, not external links.

We cannot download and rehost (even if we call it a cache) external files without getting permission from their owners to do so.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/36120141/attachment.pgp>

From noah at coderanger.net  Fri Mar  8 22:35:50 2013
From: noah at coderanger.net (Noah Kantrowitz)
Date: Fri, 8 Mar 2013 13:35:50 -0800
Subject: [Catalog-sig] hash tags
In-Reply-To: <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
	<513A57FF.6000905@egenix.com>
	<539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io>
Message-ID: <C28056D6-F23D-4399-9B20-798EDBC0FC7A@coderanger.net>


On Mar 8, 2013, at 1:33 PM, Donald Stufft wrote:

> On Mar 8, 2013, at 4:28 PM, "M.-A. Lemburg" <mal at egenix.com> wrote:
> 
>> BTW: If we go with the CDN caching model for external files, we'd
>> pull the download page links directly on the /simple/ index
>> page - as files, not external links.
> 
> We cannot download and rehost (even if we call it a cache) external files without getting permission from their owners to do so.

At which point, they can just upload them the normal way.

--Noah

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/db3e85cf/attachment.pgp>

From dholth at gmail.com  Fri Mar  8 22:43:59 2013
From: dholth at gmail.com (Daniel Holth)
Date: Fri, 8 Mar 2013 16:43:59 -0500
Subject: [Catalog-sig] hash tags
In-Reply-To: <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
	<513A57FF.6000905@egenix.com>
	<539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io>
Message-ID: <CAG8k2+5Q28ZcafOHeMi=3PZzSSDfieDRnK3PottCNpKeC11Xrg@mail.gmail.com>

Check out https://blake2.net/ ; it is both faster and more secure than
md5. md5 does have to go, no matter how secure it is in this
particular application. SHA2 is the only choice that doesn't require a
long explanation. When this came up a little less than a year ago we
talked about maybe including the SHA2 hash in one of the link
attributes <a href= something="hash"> for the benefit of old clients.

From mal at egenix.com  Fri Mar  8 22:45:14 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 08 Mar 2013 22:45:14 +0100
Subject: [Catalog-sig] hash tags
In-Reply-To: <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
	<513A57FF.6000905@egenix.com>
	<539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io>
Message-ID: <513A5BEA.1090603@egenix.com>

On 08.03.2013 22:33, Donald Stufft wrote:
> On Mar 8, 2013, at 4:28 PM, "M.-A. Lemburg" <mal at egenix.com> wrote:
> 
>> BTW: If we go with the CDN caching model for external files, we'd
>> pull the download page links directly on the /simple/ index
>> page - as files, not external links.
> 
> We cannot download and rehost (even if we call it a cache) external files without getting permission from their owners to do so.

Well, in the CDN version of the /simple/ dir, they would look
like files hosted on the CDN. The download pages would still
be feeding the CDN, though.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From donald at stufft.io  Fri Mar  8 22:47:27 2013
From: donald at stufft.io (Donald Stufft)
Date: Fri, 8 Mar 2013 16:47:27 -0500
Subject: [Catalog-sig] hash tags
In-Reply-To: <513A5BEA.1090603@egenix.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
	<513A57FF.6000905@egenix.com>
	<539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io>
	<513A5BEA.1090603@egenix.com>
Message-ID: <9963F1EB-A9DF-4405-B1E4-86ADCB2A1040@stufft.io>

On Mar 8, 2013, at 4:45 PM, "M.-A. Lemburg" <mal at egenix.com> wrote:

> On 08.03.2013 22:33, Donald Stufft wrote:
>> On Mar 8, 2013, at 4:28 PM, "M.-A. Lemburg" <mal at egenix.com> wrote:
>> 
>>> BTW: If we go with the CDN caching model for external files, we'd
>>> pull the download page links directly on the /simple/ index
>>> page - as files, not external links.
>> 
>> We cannot download and rehost (even if we call it a cache) external files without getting permission from their owners to do so.
> 
> Well, in the CDN version of the /simple/ dir, they would look
> like files hosted on the CDN. The download pages would still
> be feeding the CDN, though.

I'm unsure what you're saying here. If it involves downloading files hosted outside of PyPI and putting it on a PSF controlled CDN it's a non starter.

> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/4bc558ee/attachment-0001.pgp>

From christian at python.org  Fri Mar  8 22:50:55 2013
From: christian at python.org (Christian Heimes)
Date: Fri, 08 Mar 2013 22:50:55 +0100
Subject: [Catalog-sig] hash tags
In-Reply-To: <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
	<513A57FF.6000905@egenix.com>
	<539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io>
Message-ID: <513A5D3F.9020202@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Am 08.03.2013 22:33, schrieb Donald Stufft:
> On Mar 8, 2013, at 4:28 PM, "M.-A. Lemburg" <mal at egenix.com>
> wrote:
> 
>> BTW: If we go with the CDN caching model for external files,
>> we'd pull the download page links directly on the /simple/ index 
>> page - as files, not external links.
> 
> We cannot download and rehost (even if we call it a cache) external
> files without getting permission from their owners to do so.

(CC to Van as this is a legal matter)

Would it be sufficient to add a checkbox to the administration section
of PyPI packages that say something like "I'm an owner of this package
and I grant PyPI the permission to rehost my stuff"?

Christian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQIcBAEBCAAGBQJROl07AAoJEMeIxMHUVQ1Fl38P/j5FKyg9C/QLODkOhzJlNeln
MkxUYYMx6iVc8GW1tU6eOw5NIChlgjXmFvL97VAJWLGcw+Crs9ChUyivABH4KPNm
nOxr/hXGTOlFrWahcvMvLthIRofNjTVNqphZNFDYApbdD8zGvilDxG0kvuPPom9K
RER4FIzk7KbkqSTQA7/Wg5Ekd1Cnw3mChkqwGcVfmYn/5ROWwa9h4bBwD0EiCCAn
RsmMWtfWIeP+94KroOKOHIdgnGhIvGyN5bkvixSeNkA1HZsxxdpzpF9ZQ5MhLavN
bxZbySXdaJfG9pyMQ2HtPOWnBfPWU0ywwDX+Q514Tjs68Jxpz5nUs3yPfFzuPdov
rONt9BAHyHQsbNpSNOfs6kULdfcNvrDoWiCKXoceUobQfSy5hpEkC7W8VwIU9Hp2
T0k4H63O3uk2pTTbQQM1fL5yiNcyhUSZEchnCadPRYTkxcifUZN6z3v3yLmGMYsL
HSns8aH1b21MVCn7mFQiQZcPl9gHUS97yAArrDfWtPw4UmMpfGcjJlriXsTRGN22
ZPyzts66ZupXR1eoKWPBTzFXVP337z0kyqUGE2VJDyuAGSM0NaNT38RJCiOd6RKz
CKGdIfwCUDj0c6PdXaVQH+SMefvL7/AnqJrGAB8FDNHx9Hr2reZF4qSuGx66VM8k
6vHvtXX8yuKkByOzhDQj
=I46J
-----END PGP SIGNATURE-----

From donald at stufft.io  Fri Mar  8 22:59:15 2013
From: donald at stufft.io (Donald Stufft)
Date: Fri, 8 Mar 2013 16:59:15 -0500
Subject: [Catalog-sig] hash tags
In-Reply-To: <513A5D3F.9020202@python.org>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
	<513A57FF.6000905@egenix.com>
	<539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io>
	<513A5D3F.9020202@python.org>
Message-ID: <435A6200-BE8B-4DA5-884F-B193EF5984DF@stufft.io>

On Mar 8, 2013, at 4:50 PM, Christian Heimes <christian at python.org> wrote:

> Am 08.03.2013 22:33, schrieb Donald Stufft:
> > On Mar 8, 2013, at 4:28 PM, "M.-A. Lemburg" <mal at egenix.com>
> > wrote:
> > 
> >> BTW: If we go with the CDN caching model for external files,
> >> we'd pull the download page links directly on the /simple/ index 
> >> page - as files, not external links.
> > 
> > We cannot download and rehost (even if we call it a cache) external
> > files without getting permission from their owners to do so.
> 
> (CC to Van as this is a legal matter)
> 
> Would it be sufficient to add a checkbox to the administration section
> of PyPI packages that say something like "I'm an owner of this package
> and I grant PyPI the permission to rehost my stuff"?
> 
> Christian
> 

If we have permission to rehost we might as well just kill the external list and rehost it.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/b313531c/attachment.pgp>

From christian at python.org  Fri Mar  8 23:02:11 2013
From: christian at python.org (Christian Heimes)
Date: Fri, 08 Mar 2013 23:02:11 +0100
Subject: [Catalog-sig] hash tags
In-Reply-To: <CAG8k2+5Q28ZcafOHeMi=3PZzSSDfieDRnK3PottCNpKeC11Xrg@mail.gmail.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
	<513A57FF.6000905@egenix.com>
	<539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io>
	<CAG8k2+5Q28ZcafOHeMi=3PZzSSDfieDRnK3PottCNpKeC11Xrg@mail.gmail.com>
Message-ID: <513A5FE3.2010604@python.org>

Am 08.03.2013 22:43, schrieb Daniel Holth:
> Check out https://blake2.net/ ; it is both faster and more secure than
> md5. md5 does have to go, no matter how secure it is in this
> particular application. SHA2 is the only choice that doesn't require a
> long explanation. When this came up a little less than a year ago we
> talked about maybe including the SHA2 hash in one of the link
> attributes <a href= something="hash"> for the benefit of old clients.

Let's not add yet another crypto hash algorithm. :)

We have SHA-1 and SHA-2, that's ought be be enough. SHA-3 is available
for Python 3.4 and I provide stand-alone sources and binaries for 2.6 to
3.3. Blake2 looks nice but we should stick to NIST-approved algorithms.

The combination of file size, MD5 (for legacy reasons), SHA-1 and
perhaps SHA-256 is more than sufficient. Don't forget that files have to
be valid tar.gz, tar.bz2, zip or Windows binaries, too ...

Christian

From donald at stufft.io  Fri Mar  8 23:03:30 2013
From: donald at stufft.io (Donald Stufft)
Date: Fri, 8 Mar 2013 17:03:30 -0500
Subject: [Catalog-sig] hash tags
In-Reply-To: <513A5FE3.2010604@python.org>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
	<513A57FF.6000905@egenix.com>
	<539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io>
	<CAG8k2+5Q28ZcafOHeMi=3PZzSSDfieDRnK3PottCNpKeC11Xrg@mail.gmail.com>
	<513A5FE3.2010604@python.org>
Message-ID: <D9B854F0-1A21-47AA-842A-19CE071FBCAA@stufft.io>

On Mar 8, 2013, at 5:02 PM, Christian Heimes <christian at python.org> wrote:

> Am 08.03.2013 22:43, schrieb Daniel Holth:
>> Check out https://blake2.net/ ; it is both faster and more secure than
>> md5. md5 does have to go, no matter how secure it is in this
>> particular application. SHA2 is the only choice that doesn't require a
>> long explanation. When this came up a little less than a year ago we
>> talked about maybe including the SHA2 hash in one of the link
>> attributes <a href= something="hash"> for the benefit of old clients.
> 
> Let's not add yet another crypto hash algorithm. :)
> 
> We have SHA-1 and SHA-2, that's ought be be enough. SHA-3 is available
> for Python 3.4 and I provide stand-alone sources and binaries for 2.6 to
> 3.3. Blake2 looks nice but we should stick to NIST-approved algorithms.
> 
> The combination of file size, MD5 (for legacy reasons), SHA-1 and
> perhaps SHA-256 is more than sufficient. Don't forget that files have to
> be valid tar.gz, tar.bz2, zip or Windows binaries, too ?

Sha-1 is broken. Sha-2 or better is the only real acceptable one in the stdlib.

> 
> Christian


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/bccc8e7d/attachment.pgp>

From mal at egenix.com  Fri Mar  8 23:04:24 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 08 Mar 2013 23:04:24 +0100
Subject: [Catalog-sig] hash tags
In-Reply-To: <9963F1EB-A9DF-4405-B1E4-86ADCB2A1040@stufft.io>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
	<513A57FF.6000905@egenix.com>
	<539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io>
	<513A5BEA.1090603@egenix.com>
	<9963F1EB-A9DF-4405-B1E4-86ADCB2A1040@stufft.io>
Message-ID: <513A6068.6090207@egenix.com>

On 08.03.2013 22:47, Donald Stufft wrote:
> On Mar 8, 2013, at 4:45 PM, "M.-A. Lemburg" <mal at egenix.com> wrote:
> 
>> On 08.03.2013 22:33, Donald Stufft wrote:
>>> On Mar 8, 2013, at 4:28 PM, "M.-A. Lemburg" <mal at egenix.com> wrote:
>>>
>>>> BTW: If we go with the CDN caching model for external files, we'd
>>>> pull the download page links directly on the /simple/ index
>>>> page - as files, not external links.
>>>
>>> We cannot download and rehost (even if we call it a cache) external files without getting permission from their owners to do so.
>>
>> Well, in the CDN version of the /simple/ dir, they would look
>> like files hosted on the CDN. The download pages would still
>> be feeding the CDN, though.
> 
> I'm unsure what you're saying here. If it involves downloading files hosted outside of PyPI and putting it on a PSF controlled CDN it's a non starter.

My idea was to have PyPI send a redirect to the external URL
when getting a request for the file, so we could avoid hosting
the files and instead just have the CDN cache them for a certain
time period.

However, I've now read up on the CloudFront docs, which point
out that the CDN won't follow the redirect, but simply forward
it to the user, bypassing the CDN:

http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#ResponseCustomRedirects

I suspect other CDNs to work in the same way, so the redirect
idea doesn't work.

We'd have to use a proxy solution on the PyPI server to make the
caching CDN work, but that will likely cause more legal problems
than the plain caching of content on the way to the user.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From pje at telecommunity.com  Fri Mar  8 23:06:28 2013
From: pje at telecommunity.com (PJ Eby)
Date: Fri, 8 Mar 2013 17:06:28 -0500
Subject: [Catalog-sig] hash tags
In-Reply-To: <513A5575.5000200@egenix.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
	<513A5575.5000200@egenix.com>
Message-ID: <CALeMXf7bkD4ajyOLY2uYqmjykaXcDhoWeTg9A495dCvRwZnDhA@mail.gmail.com>

On Fri, Mar 8, 2013 at 4:17 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 08.03.2013 20:16, PJ Eby wrote:
>> There is, as I said before, a MUCH simpler way to do this, that works
>> right now: put direct #md5 download links in your description, and
>> phase out the rel="" attributes altogether.
>
> No, that would be a pretty poor design :-)
>
> The rel="" attributes are good design, since they were meant for
> exactly this purpose (machine reading and understanding relations
> between origin and target).

That depends on the goal of your design.  If the goal is to phase out
offsite spidering by downloader tools in a reasonably easy and
low-cost way, introducing new API is not a good way to do it.

The simple way to do it is to replace download-time end-user
unsupervised spidering with upload-time or registration-time
author-supervised spidering, which requires only that the tools exist
and people be informed of them (and encouraged to use them).

From pje at telecommunity.com  Fri Mar  8 23:08:52 2013
From: pje at telecommunity.com (PJ Eby)
Date: Fri, 8 Mar 2013 17:08:52 -0500
Subject: [Catalog-sig] hash tags
In-Reply-To: <B619B06C-A7A8-4AB3-A6F6-E2825F4728F9@stufft.io>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<D75DCD20-1DCC-48EA-BA2F-C45138804163@coderanger.net>
	<CALeMXf7V5+4QW=g-wQrqRt0eTf_yja4yYW3BgpXX4uQZz7c4gA@mail.gmail.com>
	<B619B06C-A7A8-4AB3-A6F6-E2825F4728F9@stufft.io>
Message-ID: <CALeMXf7bv+wxLobn+MiNiWQwd5qXHEFptDXfVuWEH9r1wLuG2g@mail.gmail.com>

On Fri, Mar 8, 2013 at 4:26 PM, Donald Stufft <donald at stufft.io> wrote:
> On Mar 8, 2013, at 4:12 PM, PJ Eby <pje at telecommunity.com> wrote:
>
>> On Fri, Mar 8, 2013 at 2:52 PM, Noah Kantrowitz <noah at coderanger.net> wrote:
>>> MD5 is _not_ acceptable for anything security related and we shouldn't be adding anything that increases our dependence on it. MD5's only use in the packaging world is to make people who forget that TCP has its own checksums feel all warm and fuzzy that there hasn't been _accidental_ download corruption.
>>
>> So, you're saying that someone has found a second-preimage attack
>> against MD5 that's more efficient than the current 2**127 threshold
>> established in 2009?
>>
>> "Anything security related" is pretty broad.  Out of the many classes
>> of attacks on hashes, AFAIK the only class that's relevant to PyPI is
>> second preimage attacks,  i.e. one where the attacker has the original
>> file and the hash, and must construct a new file that produces the
>> same hash value.
>
> Relevant to PyPI is pretty broad, and when you're developing a secure system you need to look past what is ok *today* and design for the next 5, 10, or 20 years. So even if there's no attack that can directly allow replacing the target file with a new one, continuing to utilize it is bad. It has a number of weaknesses which do not install confidence in its future security meanwhile there are a number of other hashes which _do_.
>
> Unless you'd rather be trying to replace hashes everywhere once it's already completely broken.

We can replace it completely in a lot less than that many years, if
the new PEP-based tools can be brought to pass.  Using new protocols
(e.g. the embedded signatures in wheel files) will make most of this
moot.

What I'm against is trying to patch over the existing protocol when
what we really want is to replace it altogether.  Adding hashes and
filesizes and whatnot is just gilding the existing lily, or more like
gilding the pond scum, actually.  ;-)

From christian at python.org  Fri Mar  8 23:09:58 2013
From: christian at python.org (Christian Heimes)
Date: Fri, 08 Mar 2013 23:09:58 +0100
Subject: [Catalog-sig] hash tags
In-Reply-To: <D9B854F0-1A21-47AA-842A-19CE071FBCAA@stufft.io>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
	<513A57FF.6000905@egenix.com>
	<539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io>
	<CAG8k2+5Q28ZcafOHeMi=3PZzSSDfieDRnK3PottCNpKeC11Xrg@mail.gmail.com>
	<513A5FE3.2010604@python.org>
	<D9B854F0-1A21-47AA-842A-19CE071FBCAA@stufft.io>
Message-ID: <513A61B6.8020003@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Am 08.03.2013 23:03, schrieb Donald Stufft:
> Sha-1 is broken. Sha-2 or better is the only real acceptable one
> in the stdlib.

Well, then SHA-384 it is.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQIcBAEBCAAGBQJROmGyAAoJEMeIxMHUVQ1FmiMQAIqRskGY53GFclfE1TDUGkBk
KsmatFXfenMYkvJ2w1m5GGqj0AKeeLEBHub/+efgynzd1TVzx0CZUwGJt+XTzB7Y
jUeqbGOxlqPcOujvI880Yh4npYzxJvmLbhiUSx3/6PEOje4TIhlRW8iiLjHKSNt7
Ky0jHA5c3I/I0WaOG+KlgvYGr7McOVoSfRyqKO8IjiLqxeRi757OzLOHCtbyuuEj
N/zWt8dzoXn56D1WNaeV50qvJBejfu+OtCSfvohL2uCmEWFTNulgGy9W4um7U/L2
RHClqchO1aSKUTDzwEKNiDFQK7FAkk1YlehCZva5Er43dTQJFINYm+WDKLEFamWO
7KoNah9ToyIoKENTJJr/Oe/3wsBVh82bcl4pKlP7heOtRQx3bDn1z4ktWWYDSEcr
3MgJOKeu+NyebnOr3DfwQPeNfxPa1qpfX3+UvmMgstvWFEOxJ828SBTZDIIr8LGq
Fb/9IrCVxbXUo5F8qS8klAXbnPGrTGyktYkwi9wMHEoMOrrrNKPqiqpSt0/cpTJV
Kj16JpgT+zvHyJx3hgtu+iynvRSnQ7G4SzI29t1eLhLhNG2RbSNYDafk3yjs8UGA
tUDS+PIxRKEgDMH5stdlAKJJWSDYfpMqf+06TC8FoUKhZQHPwbajsp/anRihidFm
TJ8hCbAGsh3iaR0k8dA9
=W+vN
-----END PGP SIGNATURE-----

From pje at telecommunity.com  Fri Mar  8 23:11:34 2013
From: pje at telecommunity.com (PJ Eby)
Date: Fri, 8 Mar 2013 17:11:34 -0500
Subject: [Catalog-sig] hash tags
In-Reply-To: <513A57FF.6000905@egenix.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
	<513A57FF.6000905@egenix.com>
Message-ID: <CALeMXf4iLA70EhYu+7F4mTTrbSTWfA=onmNiUAz8TpFG4Br+wQ@mail.gmail.com>

On Fri, Mar 8, 2013 at 4:28 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 08.03.2013 20:16, PJ Eby wrote:
>> So, since the page only contains links, might as well put the links
>> straight on PyPI, or at most have an option/tool to load the links
>> from an external source.
>
> I don't follow you. We only have a single download_url field
> available to store a download link.
>
> We'd need to modify the meta data format to allow for more than
> one such field, which doesn't work if you want to stay backwards
> compatible.

Links included in the long description field are placed on the /simple
index of links.  So you can just edit your standard metadata right
this minute if you want to offer more download links.  And you can put
#md5 tags on them if you want the tools to check that.

From donald at stufft.io  Fri Mar  8 23:12:46 2013
From: donald at stufft.io (Donald Stufft)
Date: Fri, 8 Mar 2013 17:12:46 -0500
Subject: [Catalog-sig] hash tags
In-Reply-To: <CALeMXf7bv+wxLobn+MiNiWQwd5qXHEFptDXfVuWEH9r1wLuG2g@mail.gmail.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<D75DCD20-1DCC-48EA-BA2F-C45138804163@coderanger.net>
	<CALeMXf7V5+4QW=g-wQrqRt0eTf_yja4yYW3BgpXX4uQZz7c4gA@mail.gmail.com>
	<B619B06C-A7A8-4AB3-A6F6-E2825F4728F9@stufft.io>
	<CALeMXf7bv+wxLobn+MiNiWQwd5qXHEFptDXfVuWEH9r1wLuG2g@mail.gmail.com>
Message-ID: <DE2F40CC-F141-43C4-8D77-C2225749D81E@stufft.io>

On Mar 8, 2013, at 5:08 PM, PJ Eby <pje at telecommunity.com> wrote:

> On Fri, Mar 8, 2013 at 4:26 PM, Donald Stufft <donald at stufft.io> wrote:
>> On Mar 8, 2013, at 4:12 PM, PJ Eby <pje at telecommunity.com> wrote:
>> 
>>> On Fri, Mar 8, 2013 at 2:52 PM, Noah Kantrowitz <noah at coderanger.net> wrote:
>>>> MD5 is _not_ acceptable for anything security related and we shouldn't be adding anything that increases our dependence on it. MD5's only use in the packaging world is to make people who forget that TCP has its own checksums feel all warm and fuzzy that there hasn't been _accidental_ download corruption.
>>> 
>>> So, you're saying that someone has found a second-preimage attack
>>> against MD5 that's more efficient than the current 2**127 threshold
>>> established in 2009?
>>> 
>>> "Anything security related" is pretty broad.  Out of the many classes
>>> of attacks on hashes, AFAIK the only class that's relevant to PyPI is
>>> second preimage attacks,  i.e. one where the attacker has the original
>>> file and the hash, and must construct a new file that produces the
>>> same hash value.
>> 
>> Relevant to PyPI is pretty broad, and when you're developing a secure system you need to look past what is ok *today* and design for the next 5, 10, or 20 years. So even if there's no attack that can directly allow replacing the target file with a new one, continuing to utilize it is bad. It has a number of weaknesses which do not install confidence in its future security meanwhile there are a number of other hashes which _do_.
>> 
>> Unless you'd rather be trying to replace hashes everywhere once it's already completely broken.
> 
> We can replace it completely in a lot less than that many years, if
> the new PEP-based tools can be brought to pass.  Using new protocols
> (e.g. the embedded signatures in wheel files) will make most of this
> moot.
> 
> What I'm against is trying to patch over the existing protocol when
> what we really want is to replace it altogether.  Adding hashes and
> filesizes and whatnot is just gilding the existing lily, or more like
> gilding the pond scum, actually.  ;-)

Unless we are planning on removing the existing tooling this still matters even with the new system in place.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/ef96c129/attachment.pgp>

From pje at telecommunity.com  Fri Mar  8 23:50:37 2013
From: pje at telecommunity.com (PJ Eby)
Date: Fri, 8 Mar 2013 17:50:37 -0500
Subject: [Catalog-sig] hash tags
In-Reply-To: <8A3002A9-5E2B-4D38-BABD-9253A027E7F6@stufft.io>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<D75DCD20-1DCC-48EA-BA2F-C45138804163@coderanger.net>
	<CALeMXf7V5+4QW=g-wQrqRt0eTf_yja4yYW3BgpXX4uQZz7c4gA@mail.gmail.com>
	<8A3002A9-5E2B-4D38-BABD-9253A027E7F6@stufft.io>
Message-ID: <CALeMXf5-Fh1okVJ6udWXju26B7aNRuL+wuvTb-58qcaxtHf2KQ@mail.gmail.com>

On Fri, Mar 8, 2013 at 4:32 PM, Donald Stufft <donald at stufft.io> wrote:
> Here's some more information pulled straight from Wikiepdia:

Trust me, I've read a LOT of Wikipedia (and even more from other
sites, including at least the conclusions of a number of cryptography
papers) about hashing attacks recently, because I was seeing
inconsistencies in what people are saying about hashes and their
weaknesses and so forth.  99.9% of the discussion about attacks on
hashes have to do with collision attacks, prefix attacks, and length
extension attacks, all of which are extremely relevant for
*cryptographic* purposes.  Specifically, the use of hashes to verify
identity, authority, repudiability, etc...  which emphatically do
*not* apply to the use of an MD5 as a checksum to verify a correct
download.

All of these attacks depend on *something else* being at stake besides
the integrity of the original message.  For example length-extension
attacks bypass the need to know a "secret" used in a naive hash-based
signature scheme (which is why you're supposed to use HMAC for such
things), while collision attacks let you trick a signer into signing
something that you can later replace with something altered.

The current use of #md5 tags isn't subject to either of these kinds of
attack, because:

1. There is no "secret" to be revealed, and
2. The author and signer are the same person

So the only type of attack I've found out about thus far, in my
(admittedly few) hours of study on the subject, that is relevant to
the way we use MD5 on PyPI at present is the so-called "second
pre-image" attack, which is when you're given an existing message and
a hash, and have to create a new message with the same hash...  while
also incorporating something useful in the new message.

The most recent report I saw on second pre-image attacks against full
MD5 estimated a 2**127 strength, meaning that even if you could
process a great many billion tries per second, it would take you
thousands of years to come up with a file that could masquerade as an
existing download.  (And most people's computers and/or internet
connections would choke on the massive file sizes needed for the
still-theoretical Kelsey-Schneier generalized preimage attack, which
in any case would apply equally to just about any other hash we could
currently put out in the field. i.e., it's not specific to a
particular hash algorithm, it just relies on certain properties of the
algorithm.)

So, yeah, MD5 is *cryptographically* broken, sure.  But it's not
broken for *data integrity*.  And in the PyPI use case, the
"cryptographic" part is all in the SSL being used to fetch the MD5
link in the first place.


> Here's the important highlights:
>
>     - specifically, a group of researchers described how to create a pair of files that share the same MD5 checksum

Right, that's what's called a "collision attack".  It means that you
can go out *ahead of time*, and make two files with the same checksum,
one good, one evil.  It does *not* mean you get to take an existing
file, and then make a second file with the same checksum.  (The latter
is a "second preimage" attack, which is *not* broken

Hash collision attacks in PyPI would basically require an author to
upload a special version of their package that looked innocent, and
then they could later switch that version out with one that's harmful.
 And the *way* that this works is that you specially generate *both*
files, in advance.  Which means that the author themselves is
compromised, so the threat is moot.  The author can already upload
compromised code (either through being evil or having their PC
hijacked), and what #md5 it has is 100% irrelevant.

That is, there's nothing stopping an evil author or an author with a
compromised PC from simply uploading a new file with a new MD5,
because PyPI will pass it along in exactly the same way.  Changing
hash algorithms will not affect this threat vector in the slightest.

Given these facts, it makes no sense to fuss over the hash algorithm
in current use, since a concurrent goal here is to switch to file
formats that can be directly signed using, you know, *actual*
cryptography.  ;-)

The new .wheel format makes provisions for modern signature
techniques.  It'd be good if sdists also did.  Then the #md5 tag can
die a natural death, hopefully within the year replaced by a hashtag
that say, fingerprints the author's public key as registered with
PyPI, or something of that sort.  In the meantime, there's no actual
threat here, so bikeshedding what to replace it with *while keeping
the current system* is like rearranging office furniture in a building
that's about to have demolition charges set underneath it.  ;-)

From donald at stufft.io  Sat Mar  9 00:15:13 2013
From: donald at stufft.io (Donald Stufft)
Date: Fri, 8 Mar 2013 18:15:13 -0500
Subject: [Catalog-sig] hash tags
In-Reply-To: <CALeMXf5-Fh1okVJ6udWXju26B7aNRuL+wuvTb-58qcaxtHf2KQ@mail.gmail.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<D75DCD20-1DCC-48EA-BA2F-C45138804163@coderanger.net>
	<CALeMXf7V5+4QW=g-wQrqRt0eTf_yja4yYW3BgpXX4uQZz7c4gA@mail.gmail.com>
	<8A3002A9-5E2B-4D38-BABD-9253A027E7F6@stufft.io>
	<CALeMXf5-Fh1okVJ6udWXju26B7aNRuL+wuvTb-58qcaxtHf2KQ@mail.gmail.com>
Message-ID: <9C6FA1F5-694D-4C24-9E92-39A7C18B80D6@stufft.io>


On Mar 8, 2013, at 5:50 PM, PJ Eby <pje at telecommunity.com> wrote:

> On Fri, Mar 8, 2013 at 4:32 PM, Donald Stufft <donald at stufft.io> wrote:
>> Here's some more information pulled straight from Wikiepdia:
> 
> Trust me, I've read a LOT of Wikipedia (and even more from other
> sites, including at least the conclusions of a number of cryptography
> papers) about hashing attacks recently, because I was seeing
> inconsistencies in what people are saying about hashes and their
> weaknesses and so forth.  99.9% of the discussion about attacks on
> hashes have to do with collision attacks, prefix attacks, and length
> extension attacks, all of which are extremely relevant for
> *cryptographic* purposes.  Specifically, the use of hashes to verify
> identity, authority, repudiability, etc...  which emphatically do
> *not* apply to the use of an MD5 as a checksum to verify a correct
> download.
> 
> All of these attacks depend on *something else* being at stake besides
> the integrity of the original message.  For example length-extension
> attacks bypass the need to know a "secret" used in a naive hash-based
> signature scheme (which is why you're supposed to use HMAC for such
> things), while collision attacks let you trick a signer into signing
> something that you can later replace with something altered.
> 
> The current use of #md5 tags isn't subject to either of these kinds of
> attack, because:
> 
> 1. There is no "secret" to be revealed, and
> 2. The author and signer are the same person
> 
> So the only type of attack I've found out about thus far, in my
> (admittedly few) hours of study on the subject, that is relevant to
> the way we use MD5 on PyPI at present is the so-called "second
> pre-image" attack, which is when you're given an existing message and
> a hash, and have to create a new message with the same hash...  while
> also incorporating something useful in the new message.
> 
> The most recent report I saw on second pre-image attacks against full
> MD5 estimated a 2**127 strength, meaning that even if you could
> process a great many billion tries per second, it would take you
> thousands of years to come up with a file that could masquerade as an
> existing download.  (And most people's computers and/or internet
> connections would choke on the massive file sizes needed for the
> still-theoretical Kelsey-Schneier generalized preimage attack, which
> in any case would apply equally to just about any other hash we could
> currently put out in the field. i.e., it's not specific to a
> particular hash algorithm, it just relies on certain properties of the
> algorithm.)
> 
> So, yeah, MD5 is *cryptographically* broken, sure.  But it's not
> broken for *data integrity*.  And in the PyPI use case, the
> "cryptographic" part is all in the SSL being used to fetch the MD5
> link in the first place.
> 
> 
>> Here's the important highlights:
>> 
>>    - specifically, a group of researchers described how to create a pair of files that share the same MD5 checksum
> 
> Right, that's what's called a "collision attack".  It means that you
> can go out *ahead of time*, and make two files with the same checksum,
> one good, one evil.  It does *not* mean you get to take an existing
> file, and then make a second file with the same checksum.  (The latter
> is a "second preimage" attack, which is *not* broken
> 
> Hash collision attacks in PyPI would basically require an author to
> upload a special version of their package that looked innocent, and
> then they could later switch that version out with one that's harmful.
> And the *way* that this works is that you specially generate *both*
> files, in advance.  Which means that the author themselves is
> compromised, so the threat is moot.  The author can already upload
> compromised code (either through being evil or having their PC
> hijacked), and what #md5 it has is 100% irrelevant.
> 
> That is, there's nothing stopping an evil author or an author with a
> compromised PC from simply uploading a new file with a new MD5,
> because PyPI will pass it along in exactly the same way.  Changing
> hash algorithms will not affect this threat vector in the slightest.
> 
> Given these facts, it makes no sense to fuss over the hash algorithm
> in current use, since a concurrent goal here is to switch to file
> formats that can be directly signed using, you know, *actual*
> cryptography.  ;-)
> 
> The new .wheel format makes provisions for modern signature
> techniques.  It'd be good if sdists also did.  Then the #md5 tag can
> die a natural death, hopefully within the year replaced by a hashtag
> that say, fingerprints the author's public key as registered with
> PyPI, or something of that sort.  In the meantime, there's no actual
> threat here, so bikeshedding what to replace it with *while keeping
> the current system* is like rearranging office furniture in a building
> that's about to have demolition charges set underneath it.  ;-)


http://i.imgur.com/wq6GH17.gif

There's an old saying inside the NSA: "Attacks always get better; they never get worse." [1]

Even if you accept the premise that for this one tiny little segment MD5 is still theoretically ok MD5 isn't going to get any better. The simple API is not going anywhere. Waving your hands and saying the stuff will obselete all of this is great but it won't. This stuff is going to be around for a long time and we need to look towards the future not shove our head in the sand and point towards a toolchain that may or may not happen in the near future.

[1] Stolen from Bruce Schneier

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/4c3b19db/attachment.pgp>

From rasky at develer.com  Sat Mar  9 02:06:48 2013
From: rasky at develer.com (Giovanni Bajo)
Date: Sat, 9 Mar 2013 02:06:48 +0100
Subject: [Catalog-sig] hash tags
In-Reply-To: <9C6FA1F5-694D-4C24-9E92-39A7C18B80D6@stufft.io>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<D75DCD20-1DCC-48EA-BA2F-C45138804163@coderanger.net>
	<CALeMXf7V5+4QW=g-wQrqRt0eTf_yja4yYW3BgpXX4uQZz7c4gA@mail.gmail.com>
	<8A3002A9-5E2B-4D38-BABD-9253A027E7F6@stufft.io>
	<CALeMXf5-Fh1okVJ6udWXju26B7aNRuL+wuvTb-58qcaxtHf2KQ@mail.gmail.com>
	<9C6FA1F5-694D-4C24-9E92-39A7C18B80D6@stufft.io>
Message-ID: <DFA593E5-EB37-477C-A473-38B74A8E4C31@develer.com>

Il giorno 09/mar/2013, alle ore 00:15, Donald Stufft <donald at stufft.io> ha scritto:

> 
> On Mar 8, 2013, at 5:50 PM, PJ Eby <pje at telecommunity.com> wrote:
> 
>> On Fri, Mar 8, 2013 at 4:32 PM, Donald Stufft <donald at stufft.io> wrote:
>>> Here's some more information pulled straight from Wikiepdia:
>> 
>> Trust me, I've read a LOT of Wikipedia (and even more from other
>> sites, including at least the conclusions of a number of cryptography
>> papers) about hashing attacks recently, because I was seeing
>> inconsistencies in what people are saying about hashes and their
>> weaknesses and so forth.  99.9% of the discussion about attacks on
>> hashes have to do with collision attacks, prefix attacks, and length
>> extension attacks, all of which are extremely relevant for
>> *cryptographic* purposes.  Specifically, the use of hashes to verify
>> identity, authority, repudiability, etc...  which emphatically do
>> *not* apply to the use of an MD5 as a checksum to verify a correct
>> download.
>> 
>> All of these attacks depend on *something else* being at stake besides
>> the integrity of the original message.  For example length-extension
>> attacks bypass the need to know a "secret" used in a naive hash-based
>> signature scheme (which is why you're supposed to use HMAC for such
>> things), while collision attacks let you trick a signer into signing
>> something that you can later replace with something altered.
>> 
>> The current use of #md5 tags isn't subject to either of these kinds of
>> attack, because:
>> 
>> 1. There is no "secret" to be revealed, and
>> 2. The author and signer are the same person
>> 
>> So the only type of attack I've found out about thus far, in my
>> (admittedly few) hours of study on the subject, that is relevant to
>> the way we use MD5 on PyPI at present is the so-called "second
>> pre-image" attack, which is when you're given an existing message and
>> a hash, and have to create a new message with the same hash...  while
>> also incorporating something useful in the new message.
>> 
>> The most recent report I saw on second pre-image attacks against full
>> MD5 estimated a 2**127 strength, meaning that even if you could
>> process a great many billion tries per second, it would take you
>> thousands of years to come up with a file that could masquerade as an
>> existing download.  (And most people's computers and/or internet
>> connections would choke on the massive file sizes needed for the
>> still-theoretical Kelsey-Schneier generalized preimage attack, which
>> in any case would apply equally to just about any other hash we could
>> currently put out in the field. i.e., it's not specific to a
>> particular hash algorithm, it just relies on certain properties of the
>> algorithm.)
>> 
>> So, yeah, MD5 is *cryptographically* broken, sure.  But it's not
>> broken for *data integrity*.  And in the PyPI use case, the
>> "cryptographic" part is all in the SSL being used to fetch the MD5
>> link in the first place.
>> 
>> 
>>> Here's the important highlights:
>>> 
>>>   - specifically, a group of researchers described how to create a pair of files that share the same MD5 checksum
>> 
>> Right, that's what's called a "collision attack".  It means that you
>> can go out *ahead of time*, and make two files with the same checksum,
>> one good, one evil.  It does *not* mean you get to take an existing
>> file, and then make a second file with the same checksum.  (The latter
>> is a "second preimage" attack, which is *not* broken
>> 
>> Hash collision attacks in PyPI would basically require an author to
>> upload a special version of their package that looked innocent, and
>> then they could later switch that version out with one that's harmful.
>> And the *way* that this works is that you specially generate *both*
>> files, in advance.  Which means that the author themselves is
>> compromised, so the threat is moot.  The author can already upload
>> compromised code (either through being evil or having their PC
>> hijacked), and what #md5 it has is 100% irrelevant.
>> 
>> That is, there's nothing stopping an evil author or an author with a
>> compromised PC from simply uploading a new file with a new MD5,
>> because PyPI will pass it along in exactly the same way.  Changing
>> hash algorithms will not affect this threat vector in the slightest.
>> 
>> Given these facts, it makes no sense to fuss over the hash algorithm
>> in current use, since a concurrent goal here is to switch to file
>> formats that can be directly signed using, you know, *actual*
>> cryptography.  ;-)
>> 
>> The new .wheel format makes provisions for modern signature
>> techniques.  It'd be good if sdists also did.  Then the #md5 tag can
>> die a natural death, hopefully within the year replaced by a hashtag
>> that say, fingerprints the author's public key as registered with
>> PyPI, or something of that sort.  In the meantime, there's no actual
>> threat here, so bikeshedding what to replace it with *while keeping
>> the current system* is like rearranging office furniture in a building
>> that's about to have demolition charges set underneath it.  ;-)
> 
> 
> http://i.imgur.com/wq6GH17.gif
> 
> There's an old saying inside the NSA: "Attacks always get better; they never get worse." [1]
> 
> Even if you accept the premise that for this one tiny little segment MD5 is still theoretically ok MD5 isn't going to get any better. The simple API is not going anywhere. Waving your hands and saying the stuff will obselete all of this is great but it won't. This stuff is going to be around for a long time and we need to look towards the future not shove our head in the sand and point towards a toolchain that may or may not happen in the near future.


Exactly.

Pj, the point is that even MD5 is not currently broken for 1st/2nd pre-image, there is absolutely no confidence that the security margin for such a broken algorithm is enough to keep it working in that case for even a short time. This is to say that it's not unrealistic that a 1st/2nd pre-image attack is published like tomorrow. You should expect it *any day* at this point.

It's a good practice to avoid crypto algorithms whose foundations are known to be broken. This is one of those cases. If we ever touch code that uses MD5, we should drop it immediately. There is no reason to keep it and wait for someone to release an attack, so that the world can point fingers at us and laugh.
-- 
Giovanni Bajo   ::  rasky at develer.com
Develer S.r.l.  ::  http://www.develer.com

My Blog: http://giovanni.bajo.it





-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4346 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130309/c32deca3/attachment-0001.bin>

From holger at merlinux.eu  Sat Mar  9 07:51:03 2013
From: holger at merlinux.eu (holger krekel)
Date: Sat, 9 Mar 2013 06:51:03 +0000
Subject: [Catalog-sig] hash tags
In-Reply-To: <CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<CALeMXf61HLFe3J=29RL5S=E_AuK5fdmMh932fXCzAftTEC7YmQ@mail.gmail.com>
Message-ID: <20130309065103.GW9677@merlinux.eu>

Hi Philip, all,

On Fri, Mar 08, 2013 at 14:16 -0500, PJ Eby wrote:
> The key to making this transition isn't creating elaborate new
> standards for the tools, it's *creating new tools for the standards*.

If we can find a way to improve PyPI and not require the world to
change first, that's a big plus in my book as well.

> Point is, this entire thing can be done correctly at the PyPI end and
> work with the existing API of the download tools.

I think so as well.  Will suggest a transition model in a 
new top-level thread, trying to follow this idea.

best,
holger

From holger at merlinux.eu  Sat Mar  9 08:22:22 2013
From: holger at merlinux.eu (holger krekel)
Date: Sat, 9 Mar 2013 07:22:22 +0000
Subject: [Catalog-sig] transition to pypi-hosting through server-side changes
Message-ID: <20130309072222.GX9677@merlinux.eu>

Hi all,

i think Philip Eby brought up a very worthwhile idea to consider: 
if we can transition to a no-external-hosting situation by making
pypi-server changes without requiring client-side installers or
releases processes to change, that would be great.  We would
have one place to implement things, and less friction on the probably
millions of places where pip/easy_install and CI/release processes
are used today.

Basically all revolves around the issue of what links are
served on the simple/* pages.

What about adding a "hosting mode" field to a package which effects
all historic and future releases, i.e. the mode is not specific to a 
particular release but to all releases.  This field could have these
values and meanings:

- "pypi-only": homepage/download links are not added to simple/ pages
  unless they are #egg ones.  Release registration with a non-empty and
  non-#egg download url is rejected.  client-side tools will not need to
  crawl or download anything externally unless requring an #egg 
  development tarball. 

- "pypi-cache": homepage/download pages are crawled at the pypi server side
  exactly once at release registration time.  Or once at "transition" time
  when an author chooses to have his externally hosted release files be
  served from pypi.
  
- "pypi-linkext": homepage/download urls are crawled at the pypi server
  side for release files, and the simple/ page serves links to them without
  requiring client-side tools to crawl external sites for determining
  the set of candidate release files.  Legally, this should not pose
  a problem because the files are still hosted externally so we could
  at some point automatically switch projects to this mode.
  
- "pypi-ext": like it is today: homepage/download urls are presented in 
  simple/ pages and client-side tools need to crawl them themselves to 
  find release file links.

Now it is a matter of choosing good defaults and designing friendly
user interactions to allow package maintainers to move to at least pypi-cache 
or best "pypi-only" mode.  My current thoughts on this:

- 90% of the projects could directly get the "pypi-only" mode as a default
  according to Donald's statistics.  They'd still receive a mail 
  with a link to a page where they can change the mode, if needed.
  And of course the friendly information that "pypi-only" provides
  the fastest and most reliable way for users to install their package.

- 10% of the projects having external release files: 
  - if they have their newest releases on pypi already, they could get
    a "linkext" mode so that client-side tools will not need to crawl
    and not need to download from external sites, if they only 
    look for the newest release
  - if they have their newest release on pypi, they could get "ext" mode
    as default

  in either case, maintainers/authors get a mail with a link to the page
  where they can change the mode.  And with information about the time frame
  for phasing out particular modes:

  - pypi-ext: in N months we automatically switch this mode to pypi-linkext
  - in N+M months only "pypi-only" and "pypi-cache" is allowed. 
    
    With the latter you can still host your files externally but you need to 
    accept that pypi caches release files at release registration time and 
    serves them afterwards itself.  
    If you do not agree, your release files will not be automatically 
    discoverable anymore and you need to tell your users how to install 
    things manually through the descrition of your package.
  - (and maybe: in N+M+X months only pypi-hosted is allowed as a mode)

I think this (or a variation/refinements of this scheme) would offer a 
smooth transition where nobody needs to get upset and people would clearly 
see we are doing everything we can to make it easy to transition.

cheers,
holger


From ncoghlan at gmail.com  Sat Mar  9 09:05:44 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 9 Mar 2013 18:05:44 +1000
Subject: [Catalog-sig] transition to pypi-hosting through server-side
	changes
In-Reply-To: <20130309072222.GX9677@merlinux.eu>
References: <20130309072222.GX9677@merlinux.eu>
Message-ID: <CADiSq7eHF_sC=9SucNUkQ6-E6E3nY7m9f8sGqDMo5k=k3rby=g@mail.gmail.com>

On Sat, Mar 9, 2013 at 5:22 PM, holger krekel <holger at merlinux.eu> wrote:
> I think this (or a variation/refinements of this scheme) would offer a
> smooth transition where nobody needs to get upset and people would clearly
> see we are doing everything we can to make it easy to transition.

It sounds good to me, too (says the guy not writing the new code who
already hosts his releases on PyPI...)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From mal at egenix.com  Sat Mar  9 15:56:17 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Sat, 09 Mar 2013 15:56:17 +0100
Subject: [Catalog-sig] hash tags
In-Reply-To: <DFA593E5-EB37-477C-A473-38B74A8E4C31@develer.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<D75DCD20-1DCC-48EA-BA2F-C45138804163@coderanger.net>
	<CALeMXf7V5+4QW=g-wQrqRt0eTf_yja4yYW3BgpXX4uQZz7c4gA@mail.gmail.com>
	<8A3002A9-5E2B-4D38-BABD-9253A027E7F6@stufft.io>
	<CALeMXf5-Fh1okVJ6udWXju26B7aNRuL+wuvTb-58qcaxtHf2KQ@mail.gmail.com>
	<9C6FA1F5-694D-4C24-9E92-39A7C18B80D6@stufft.io>
	<DFA593E5-EB37-477C-A473-38B74A8E4C31@develer.com>
Message-ID: <513B4D91.80005@egenix.com>

[Discussion about MD5]

I think there's not much point in discussing MD5 in this context.
When creating new designs, you should always use the current
best and most widely deployed algorithm, IMO.

For Python, this is the SHA-2 family at the moment, since SHA-3 is
not supported by Python's hashlib. MD5 is only needed to support older
software. SHA-1 is also support by Python versions older than Python 2.5.

It seems that SHA-256 and SHA-512, both from the SHA-2 family,
are the most popular at the moment, so I guess SHA-256 is a good
candidate to move forward and satisfy the 80/20 rule.

Agreed ?

FWIW, I'm pretty sure, SHA-256 will be broken in 10 years from
now as well :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 09 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From donald at stufft.io  Sat Mar  9 15:59:21 2013
From: donald at stufft.io (Donald Stufft)
Date: Sat, 9 Mar 2013 09:59:21 -0500
Subject: [Catalog-sig] hash tags
In-Reply-To: <513B4D91.80005@egenix.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<D75DCD20-1DCC-48EA-BA2F-C45138804163@coderanger.net>
	<CALeMXf7V5+4QW=g-wQrqRt0eTf_yja4yYW3BgpXX4uQZz7c4gA@mail.gmail.com>
	<8A3002A9-5E2B-4D38-BABD-9253A027E7F6@stufft.io>
	<CALeMXf5-Fh1okVJ6udWXju26B7aNRuL+wuvTb-58qcaxtHf2KQ@mail.gmail.com>
	<9C6FA1F5-694D-4C24-9E92-39A7C18B80D6@stufft.io>
	<DFA593E5-EB37-477C-A473-38B74A8E4C31@develer.com>
	<513B4D91.80005@egenix.com>
Message-ID: <70844BDB-9538-4C1A-B853-3D6E60E749C1@stufft.io>

On Mar 9, 2013, at 9:56 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:

> [Discussion about MD5]
> 
> I think there's not much point in discussing MD5 in this context.
> When creating new designs, you should always use the current
> best and most widely deployed algorithm, IMO.
> 
> For Python, this is the SHA-2 family at the moment, since SHA-3 is
> not supported by Python's hashlib. MD5 is only needed to support older
> software. SHA-1 is also support by Python versions older than Python 2.5.
> 
> It seems that SHA-256 and SHA-512, both from the SHA-2 family,
> are the most popular at the moment, so I guess SHA-256 is a good
> candidate to move forward and satisfy the 80/20 rule.

Sha256 and Sha512 are generally considered equivalent in a security context and either would be a perfectly fine candidate.

> 
> Agreed ?
> 
> FWIW, I'm pretty sure, SHA-256 will be broken in 10 years from
> now as well :-)
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 09 2013)
>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130309/4ebc24c1/attachment.pgp>

From christian at python.org  Sat Mar  9 19:09:37 2013
From: christian at python.org (Christian Heimes)
Date: Sat, 09 Mar 2013 19:09:37 +0100
Subject: [Catalog-sig] hash tags
In-Reply-To: <DFA593E5-EB37-477C-A473-38B74A8E4C31@develer.com>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<D75DCD20-1DCC-48EA-BA2F-C45138804163@coderanger.net>
	<CALeMXf7V5+4QW=g-wQrqRt0eTf_yja4yYW3BgpXX4uQZz7c4gA@mail.gmail.com>
	<8A3002A9-5E2B-4D38-BABD-9253A027E7F6@stufft.io>
	<CALeMXf5-Fh1okVJ6udWXju26B7aNRuL+wuvTb-58qcaxtHf2KQ@mail.gmail.com>
	<9C6FA1F5-694D-4C24-9E92-39A7C18B80D6@stufft.io>
	<DFA593E5-EB37-477C-A473-38B74A8E4C31@develer.com>
Message-ID: <513B7AE1.6060002@python.org>

Am 09.03.2013 02:06, schrieb Giovanni Bajo:
> It's a good practice to avoid crypto algorithms whose foundations are known to be broken. This is one of those cases. If we ever touch code that uses MD5, we should drop it immediately. There is no reason to keep it and wait for someone to release an attack, so that the world can point fingers at us and laugh.

Relax, MD5 is still fine to detect broken or partial downloads. Trust
me, this still happens a lot with broken proxy servers and unstable
network connections. I have seen my fair share of broken files during
deployments at works.

If we are going to remove MD5 *now*, then we are going to remove the
last bit of security from old tools. I agree that MD5 doesn't provide
strong cryptographic security. But it's still better than no checksum.

I also agree that we should no longer endorse MD5 and move to a strong
hash algorithm for checksums. People will point their fingers towards us
and laugh about Python when somebody abuses MD5 for an attack on PyPI.

file size + MD5 (for legacy) + SHA-2 look good to me.

Christian

From rasky at develer.com  Sat Mar  9 20:20:17 2013
From: rasky at develer.com (Giovanni Bajo)
Date: Sat, 9 Mar 2013 20:20:17 +0100
Subject: [Catalog-sig] hash tags
In-Reply-To: <513B7AE1.6060002@python.org>
References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io>
	<5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org>
	<5139DE99.9020005@egenix.com>
	<D75DCD20-1DCC-48EA-BA2F-C45138804163@coderanger.net>
	<CALeMXf7V5+4QW=g-wQrqRt0eTf_yja4yYW3BgpXX4uQZz7c4gA@mail.gmail.com>
	<8A3002A9-5E2B-4D38-BABD-9253A027E7F6@stufft.io>
	<CALeMXf5-Fh1okVJ6udWXju26B7aNRuL+wuvTb-58qcaxtHf2KQ@mail.gmail.com>
	<9C6FA1F5-694D-4C24-9E92-39A7C18B80D6@stufft.io>
	<DFA593E5-EB37-477C-A473-38B74A8E4C31@develer.com>
	<513B7AE1.6060002@python.org>
Message-ID: <C0FB07FB-F553-46FE-8405-D503FE4B52F2@develer.com>

Il giorno 09/mar/2013, alle ore 19:09, Christian Heimes <christian at python.org> ha scritto:

> Am 09.03.2013 02:06, schrieb Giovanni Bajo:
>> It's a good practice to avoid crypto algorithms whose foundations are known to be broken. This is one of those cases. If we ever touch code that uses MD5, we should drop it immediately. There is no reason to keep it and wait for someone to release an attack, so that the world can point fingers at us and laugh.
> 
> Relax, MD5 is still fine to detect broken or partial downloads. Trust
> me, this still happens a lot with broken proxy servers and unstable
> network connections. I have seen my fair share of broken files during
> deployments at works.
> 
> If we are going to remove MD5 *now*, then we are going to remove the
> last bit of security from old tools. I agree that MD5 doesn't provide
> strong cryptographic security. But it's still better than no checksum.

When I say "we should drop it", I obviously meant "replace it with a different algorithm".

The post was intended to make sure that we migrate away from it, since we're touching that code. I wasn't certainly advocating against using any checksum algorithm.
-- 
Giovanni Bajo   ::  rasky at develer.com
Develer S.r.l.  ::  http://www.develer.com

My Blog: http://giovanni.bajo.it





-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4346 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130309/9d48bd04/attachment.bin>

From ubershmekel at gmail.com  Sun Mar 10 09:05:57 2013
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Sun, 10 Mar 2013 10:05:57 +0200
Subject: [Catalog-sig] Search engine relevance
In-Reply-To: <CAHrZfZB8cCoB3De5YPWo60mLnAhYkuPNj9yEPj3R+i-xscaxOA@mail.gmail.com>
References: <loom.20130308T145827-470@post.gmane.org>
	<CAK8PqJFLeP3a1OEoKhkwFh5XGiKFD+mbdfokHEH0ONUM5uxrsw@mail.gmail.com>
	<CANSw7KycEVeNXStuZ2yhSnc3261V9W=CA=Mi8jeoGVGFUOWQtg@mail.gmail.com>
	<loom.20130308T162304-685@post.gmane.org>
	<CANSw7Kzj-28F0e7VyC-avE4Z-KBUYQ50-ngm4e_8634UMZR5uQ@mail.gmail.com>
	<CAHrZfZB8cCoB3De5YPWo60mLnAhYkuPNj9yEPj3R+i-xscaxOA@mail.gmail.com>
Message-ID: <CANSw7KyCmd8b0_SLpG=7BaKSxOAyS8CXuX6=-2HFDya9W9ermw@mail.gmail.com>

On Fri, Mar 8, 2013 at 11:26 PM, Richard Jones <r1chardj0n3s at gmail.com>wrote:

> That *was* the original search engine :-)
>
> Then after user complaints we devised a better solution...
>
> Always happy to take criticism of it and improve it! :-)
>
> Sent from my portable device, please excuse the brevity.
>
>
We can go a few directions:

Easy & python.org styled
* google's JS search API to get, parse and display results. $5 per 1K
queries.
* bing's JS search API. 5$ per 2.5K queries.

Easy but external
* textbox links to a google/bing search with site:pypi.python.org

Hard to get good results, but perhaps easy to try:
* Change/improve internal search engine, and invent a good ranking
algorithm.

Though I wouldn't say this is high priority at all. I personally never use
pypi search, just site:pypi.python.org on google.

Yuval
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130310/9c2ecb7d/attachment-0001.html>

From r1chardj0n3s at gmail.com  Sun Mar 10 09:23:43 2013
From: r1chardj0n3s at gmail.com (Richard Jones)
Date: Sun, 10 Mar 2013 19:23:43 +1100
Subject: [Catalog-sig] Search engine relevance
In-Reply-To: <CANSw7KyCmd8b0_SLpG=7BaKSxOAyS8CXuX6=-2HFDya9W9ermw@mail.gmail.com>
References: <loom.20130308T145827-470@post.gmane.org>
	<CAK8PqJFLeP3a1OEoKhkwFh5XGiKFD+mbdfokHEH0ONUM5uxrsw@mail.gmail.com>
	<CANSw7KycEVeNXStuZ2yhSnc3261V9W=CA=Mi8jeoGVGFUOWQtg@mail.gmail.com>
	<loom.20130308T162304-685@post.gmane.org>
	<CANSw7Kzj-28F0e7VyC-avE4Z-KBUYQ50-ngm4e_8634UMZR5uQ@mail.gmail.com>
	<CAHrZfZB8cCoB3De5YPWo60mLnAhYkuPNj9yEPj3R+i-xscaxOA@mail.gmail.com>
	<CANSw7KyCmd8b0_SLpG=7BaKSxOAyS8CXuX6=-2HFDya9W9ermw@mail.gmail.com>
Message-ID: <CAHrZfZA4GcVSU1J_7RJPSBPeB0UEge32eY0VPNS_CUFpHZpG0g@mail.gmail.com>

On 10 March 2013 19:05, Yuval Greenfield <ubershmekel at gmail.com> wrote:
> On Fri, Mar 8, 2013 at 11:26 PM, Richard Jones <r1chardj0n3s at gmail.com>
> wrote:
>>
>> That *was* the original search engine :-)
>>
>> Then after user complaints we devised a better solution...
>>
>> Always happy to take criticism of it and improve it! :-)
>>
>> Sent from my portable device, please excuse the brevity.
>>
>>
>
> We can go a few directions:
>
> Easy & python.org styled
> * google's JS search API to get, parse and display results. $5 per 1K
> queries.
> * bing's JS search API. 5$ per 2.5K queries.

Would be worth investigating if we can reasonably format the results.
Figuring out the billing will be something to discuss with the PSF
admin.


> Easy but external
> * textbox links to a google/bing search with site:pypi.python.org

As I said, this is how it was done, but there were complaints.


> Hard to get good results, but perhaps easy to try:
> * Change/improve internal search engine, and invent a good ranking
> algorithm.

We could probably just use the text search stuff built into postgres,
rather than the current naive LIKE searching. There is a ranking
algorithm in place and it does strongly prefer matching the name
you've entered; it doubly prefers an exact package name match. This
might solve the AGI problem and could probably produce good results
using the current ranking algorithm. Not sure. Google's search
algorithms are far advanced ;-)


> Though I wouldn't say this is high priority at all. I personally never use
> pypi search, just site:pypi.python.org on google.

I also often use google - but I don't even bother with the site: bit.
My go-to search is usually just "python <whatever>".

I note though that unless I add "site:pypi.python.org" to the search
even google struggles to suggest something on PyPI (try "python
agi"...)


    Richard

From robertc at robertcollins.net  Sun Mar 10 09:52:27 2013
From: robertc at robertcollins.net (Robert Collins)
Date: Sun, 10 Mar 2013 21:52:27 +1300
Subject: [Catalog-sig] Search engine relevance
In-Reply-To: <CAHrZfZA4GcVSU1J_7RJPSBPeB0UEge32eY0VPNS_CUFpHZpG0g@mail.gmail.com>
References: <loom.20130308T145827-470@post.gmane.org>
	<CAK8PqJFLeP3a1OEoKhkwFh5XGiKFD+mbdfokHEH0ONUM5uxrsw@mail.gmail.com>
	<CANSw7KycEVeNXStuZ2yhSnc3261V9W=CA=Mi8jeoGVGFUOWQtg@mail.gmail.com>
	<loom.20130308T162304-685@post.gmane.org>
	<CANSw7Kzj-28F0e7VyC-avE4Z-KBUYQ50-ngm4e_8634UMZR5uQ@mail.gmail.com>
	<CAHrZfZB8cCoB3De5YPWo60mLnAhYkuPNj9yEPj3R+i-xscaxOA@mail.gmail.com>
	<CANSw7KyCmd8b0_SLpG=7BaKSxOAyS8CXuX6=-2HFDya9W9ermw@mail.gmail.com>
	<CAHrZfZA4GcVSU1J_7RJPSBPeB0UEge32eY0VPNS_CUFpHZpG0g@mail.gmail.com>
Message-ID: <CAJ3HoZ1g63okip+bwu_wtZN1md1PuOGQakxS_ekxfFOz=VMmhA@mail.gmail.com>

On 10 March 2013 21:23, Richard Jones <r1chardj0n3s at gmail.com> wrote:

> We could probably just use the text search stuff built into postgres,
> rather than the current naive LIKE searching. There is a ranking
> algorithm in place and it does strongly prefer matching the name
> you've entered; it doubly prefers an exact package name match. This
> might solve the AGI problem and could probably produce good results
> using the current ranking algorithm. Not sure. Google's search
> algorithms are far advanced ;-)

tsearch2 is hard to get good results with - we had issues with that
when I was working on Launchpad.

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Cloud Services

From holger at merlinux.eu  Sun Mar 10 16:07:40 2013
From: holger at merlinux.eu (holger krekel)
Date: Sun, 10 Mar 2013 15:07:40 +0000
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi
	site
Message-ID: <20130310150740.GE9677@merlinux.eu>

Hi Donald, Richard, Nick, Philip, Marc-Andre, all,

after some more thinking i wrote a simplified PEP draft for
transitioning hosting of release files to pypi.python.org.  A PEP is
warranted IMO because the according changes will affect all python
package maintainers and the Python packaging ecology in general.  See
the current draft (pre-submit-v1) further below in this mail. 
I also created a bitbucket repository, see "PEP-PYPI-DRAFT.txt"  at 

    https://bitbucket.org/hpk42/pep-pypi/src

Donald, i'd be happy if you join as a co-author and contribute
your statistics script and possibly more implementation stuff (PRs 
to pypi software etc.).  

Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig:
scrutiny and feedback welcome.

Nick: if you could collect feedback on the PEP (draft) around the 
packaging and distribution mini-summit at Pycon US (15th March), that'd 
be very useful.  

Richard: I may ask you to become BDFL-delegate for this PEP especially
since you will need to integrate any resulting changes :)

I'd like to formally submit this PEP soon but not before i got some 
feedback.

I am not subscribed to distutils-sig and i think distutils is not much
affected, but it probably still would help if someone cross-posts there
(please put me in CC).

cheers,
holger


PEP-draft: transition to release file hosting at pypi.python.org
=================================================================

Status
-----------

PRE-SUBMIT-v1

Abstract
------------

This PEP proposes to move hosting of all release files to
pypi.python.org itself.  To ease transition and minimize client-side
friction, **no changes to distutils or installers** are required.
Rather, the transition is implemented through changes to the pypi.python.org 
implementation and by interactions with package maintainers.

Problem
---------------

Today, python package installers (pip and easy_install) need to
query multiple sites to discover release files.  Apart from querying
pypi.python.org's simple index pages, also all homepages and
download pages ever specified with any release of a package need to
be crawled by an installer.  The need for installers to crawl 3rd party
sites slows down installation and makes for a brittle unreliable 
installation process. 

As of March 2013, about 10% of packages have release files which
are not hosted directly from pypi.python.org but rather from places
referenced by download/homepage sites.  

Conversely, roughly 90% of packages are hosted directly on
pypi.python.org [1]_.  Even for them installers still need to crawl the
homepage(s) of a package.  Many package uploaders are particularly not
aware that specifying the "homepage" will slow down the installation
process.


Solution
-----------

Each package is going to get a "hosting mode" field which effects
all historic and future releases of a package and its release files.
The field has these values and meanings:                            

- "pypi-ext" (transitional) encodes exactly the current mode of operations:
  homepage/download urls are presented in simple/ pages and client-side
  tools need to crawl them themselves to find release file links. 

- "pypi-cache": Release files located on remote sites will be downloaded 
  and cached by pypi.python.org by crawling homepage/download metadata sites.
  The resulting simple index contains links to release files hosted by
  pypi.python.org.  The original homepage/download links are added as
  links without a ``rel`` attribute if they have the ``#egg`` format.

- "pypi-only": homepage/download links are served on simple indexes
  but without a ``rel`` attribute.  Installation tools will thus not
  crawl those pages anymore.  Use this option if you commit to always
  uploading your release files to pypi.python.org.


Phases of transition
-------------------------

1. At the outset, we set hosting-mode to "pypi-ext" for all packages.
   This will not change any link served via the simple index and thus
   no bad effects are expected.  Early adopters and testers may now
   change the mode to either pypi-only or pypy-cache to help with
   streamlining issues.  After implementation and UI issues are
   streamlined, the next phase can start.

2. We perform automatic analysis for each package to determine if it is
   a package with externally hosted release files.  Packages which only 
   have release files on pypi.python.org are put in the group "A",
   those which have at least some packages outside are put in the group "B".

   We sent then a mail to all maintainers of packages in A 
   that their hosting-mode is going to be switched automatically to 
   "pypi-only" after N weeks, unless they visit their package
   administration page earlier and set it to either pypi-cache or
   pypi-only earlier.

   We sent then a mail to all maintainers of packages in B
   that their hosting-mode is going to be switched automatically to 
   "pypi-cache" after N weeks, unless they visit their package
   administration page and set it to either pypi-only or
   pypi-cache earlier.

3. all packages will have a hosting mode of either "pypi-cache"
   or "pypi-only", resulting in installers to only query
   packages hosted through pypi.python.org.
  

Transitioning to "pypi-cache" mode
-------------------------------------

When transitioning from the currently implicit "pypi-ext" mode to
"pypi-cache" for a given package, a package maintainer should 
be able to retrieve/verify the historic release files which will 
be cached from pypi.python.org.  The UI should present this list
and have the maintainer accept it for completing the transition
to the "pypi-cache" mode.  Upon future release registration actions,
pypi.python.org will perform crawling for the homepage/download sites
and cache release files *before* returning a success return code for
the release registration.


References
------------

.. [1] ratio of externally hosted versus pypi-hosted http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html

Acknowledgments
----------------------

Donald Stufft for pushing away from external hosting and doing
the 90/10 % statistics script and offering to implement a PR.

Philip Eby for precise information and the basic idea to
implement the transition via server-side changes only.

Marc-Andre Lemburg, Nick Coghlan and catalog-sig for thinking
through issues regarding getting rid of "external hosting".


Copyright
-----------------

This document has been placed in the public domain.



From donald at stufft.io  Sun Mar 10 18:35:00 2013
From: donald at stufft.io (Donald Stufft)
Date: Sun, 10 Mar 2013 13:35:00 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <20130310150740.GE9677@merlinux.eu>
References: <20130310150740.GE9677@merlinux.eu>
Message-ID: <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>


On Mar 10, 2013, at 11:07 AM, holger krekel <holger at merlinux.eu> wrote:

> Hi Donald, Richard, Nick, Philip, Marc-Andre, all,
> 
> after some more thinking i wrote a simplified PEP draft for
> transitioning hosting of release files to pypi.python.org.  A PEP is
> warranted IMO because the according changes will affect all python
> package maintainers and the Python packaging ecology in general.  See
> the current draft (pre-submit-v1) further below in this mail. 
> I also created a bitbucket repository, see "PEP-PYPI-DRAFT.txt"  at 
> 
>    https://bitbucket.org/hpk42/pep-pypi/src
> 
> Donald, i'd be happy if you join as a co-author and contribute
> your statistics script and possibly more implementation stuff (PRs 
> to pypi software etc.).  
> 
> Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig:
> scrutiny and feedback welcome.
> 
> Nick: if you could collect feedback on the PEP (draft) around the 
> packaging and distribution mini-summit at Pycon US (15th March), that'd 
> be very useful.  
> 
> Richard: I may ask you to become BDFL-delegate for this PEP especially
> since you will need to integrate any resulting changes :)
> 
> I'd like to formally submit this PEP soon but not before i got some 
> feedback.
> 
> I am not subscribed to distutils-sig and i think distutils is not much
> affected, but it probably still would help if someone cross-posts there
> (please put me in CC).
> 
> cheers,
> holger
> 
> 
> PEP-draft: transition to release file hosting at pypi.python.org
> =================================================================
> 
> Status
> -----------
> 
> PRE-SUBMIT-v1
> 
> Abstract
> ------------
> 
> This PEP proposes to move hosting of all release files to
> pypi.python.org itself.  To ease transition and minimize client-side
> friction, **no changes to distutils or installers** are required.
> Rather, the transition is implemented through changes to the pypi.python.org 
> implementation and by interactions with package maintainers.
> 
> Problem
> ---------------
> 
> Today, python package installers (pip and easy_install) need to
> query multiple sites to discover release files.  Apart from querying
> pypi.python.org's simple index pages, also all homepages and
> download pages ever specified with any release of a package need to
> be crawled by an installer.  The need for installers to crawl 3rd party
> sites slows down installation and makes for a brittle unreliable 
> installation process. 
> 
> As of March 2013, about 10% of packages have release files which
> are not hosted directly from pypi.python.org but rather from places
> referenced by download/homepage sites.  
> 
> Conversely, roughly 90% of packages are hosted directly on
> pypi.python.org [1]_.  Even for them installers still need to crawl the
> homepage(s) of a package.  Many package uploaders are particularly not
> aware that specifying the "homepage" will slow down the installation
> process.
> 
> 
> Solution
> -----------
> 
> Each package is going to get a "hosting mode" field which effects
> all historic and future releases of a package and its release files.
> The field has these values and meanings:                            
> 
> - "pypi-ext" (transitional) encodes exactly the current mode of operations:
>  homepage/download urls are presented in simple/ pages and client-side
>  tools need to crawl them themselves to find release file links. 
> 
> - "pypi-cache": Release files located on remote sites will be downloaded 
>  and cached by pypi.python.org by crawling homepage/download metadata sites.
>  The resulting simple index contains links to release files hosted by
>  pypi.python.org.  The original homepage/download links are added as
>  links without a ``rel`` attribute if they have the ``#egg`` format.
> 
> - "pypi-only": homepage/download links are served on simple indexes
>  but without a ``rel`` attribute.  Installation tools will thus not
>  crawl those pages anymore.  Use this option if you commit to always
>  uploading your release files to pypi.python.org.
> 
> 
> Phases of transition
> -------------------------
> 
> 1. At the outset, we set hosting-mode to "pypi-ext" for all packages.
>   This will not change any link served via the simple index and thus
>   no bad effects are expected.  Early adopters and testers may now
>   change the mode to either pypi-only or pypy-cache to help with
>   streamlining issues.  After implementation and UI issues are
>   streamlined, the next phase can start.
> 
> 2. We perform automatic analysis for each package to determine if it is
>   a package with externally hosted release files.  Packages which only 
>   have release files on pypi.python.org are put in the group "A",
>   those which have at least some packages outside are put in the group "B".
> 
>   We sent then a mail to all maintainers of packages in A 
>   that their hosting-mode is going to be switched automatically to 
>   "pypi-only" after N weeks, unless they visit their package
>   administration page earlier and set it to either pypi-cache or
>   pypi-only earlier.
> 
>   We sent then a mail to all maintainers of packages in B
>   that their hosting-mode is going to be switched automatically to 
>   "pypi-cache" after N weeks, unless they visit their package
>   administration page and set it to either pypi-only or
>   pypi-cache earlier.
> 
> 3. all packages will have a hosting mode of either "pypi-cache"
>   or "pypi-only", resulting in installers to only query
>   packages hosted through pypi.python.org.
> 
> 
> Transitioning to "pypi-cache" mode
> -------------------------------------
> 
> When transitioning from the currently implicit "pypi-ext" mode to
> "pypi-cache" for a given package, a package maintainer should 
> be able to retrieve/verify the historic release files which will 
> be cached from pypi.python.org.  The UI should present this list
> and have the maintainer accept it for completing the transition
> to the "pypi-cache" mode.  Upon future release registration actions,
> pypi.python.org will perform crawling for the homepage/download sites
> and cache release files *before* returning a success return code for
> the release registration.
> 
> 
> References
> ------------
> 
> .. [1] ratio of externally hosted versus pypi-hosted http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html
> 
> Acknowledgments
> ----------------------
> 
> Donald Stufft for pushing away from external hosting and doing
> the 90/10 % statistics script and offering to implement a PR.
> 
> Philip Eby for precise information and the basic idea to
> implement the transition via server-side changes only.
> 
> Marc-Andre Lemburg, Nick Coghlan and catalog-sig for thinking
> through issues regarding getting rid of "external hosting".
> 
> 
> Copyright
> -----------------
> 
> This document has been placed in the public domain.
> 
> 
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

Some concerns:

1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them.
2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues.

If we're going to do a phased in per project solution like this I think it would work much better to have 2 modes.

1. Legacy - Current behavior, new external links are accepted, existing ones are displayed
2. PyPI Only - New behavior, no new external links are accepted, existing ones are removed

Present the project owners with 2 one way buttons:
   - Switch to PyPI Only and re-host external files [1]
   - Switch to PyPI Only and do NOT re-host external files

These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission.

The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues.

[1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130310/e9e16f70/attachment.pgp>

From jnoller at gmail.com  Sun Mar 10 18:46:32 2013
From: jnoller at gmail.com (Jesse Noller)
Date: Sun, 10 Mar 2013 13:46:32 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
Message-ID: <E22AF5DD-DDF5-47BC-8E9A-ED4BF132CCC7@gmail.com>

+1

On Mar 10, 2013, at 1:35 PM, Donald Stufft <donald at stufft.io> wrote:

> 
> On Mar 10, 2013, at 11:07 AM, holger krekel <holger at merlinux.eu> wrote:
> 
>> Hi Donald, Richard, Nick, Philip, Marc-Andre, all,
>> 
>> after some more thinking i wrote a simplified PEP draft for
>> transitioning hosting of release files to pypi.python.org.  A PEP is
>> warranted IMO because the according changes will affect all python
>> package maintainers and the Python packaging ecology in general.  See
>> the current draft (pre-submit-v1) further below in this mail. 
>> I also created a bitbucket repository, see "PEP-PYPI-DRAFT.txt"  at 
>> 
>>   https://bitbucket.org/hpk42/pep-pypi/src
>> 
>> Donald, i'd be happy if you join as a co-author and contribute
>> your statistics script and possibly more implementation stuff (PRs 
>> to pypi software etc.).  
>> 
>> Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig:
>> scrutiny and feedback welcome.
>> 
>> Nick: if you could collect feedback on the PEP (draft) around the 
>> packaging and distribution mini-summit at Pycon US (15th March), that'd 
>> be very useful.  
>> 
>> Richard: I may ask you to become BDFL-delegate for this PEP especially
>> since you will need to integrate any resulting changes :)
>> 
>> I'd like to formally submit this PEP soon but not before i got some 
>> feedback.
>> 
>> I am not subscribed to distutils-sig and i think distutils is not much
>> affected, but it probably still would help if someone cross-posts there
>> (please put me in CC).
>> 
>> cheers,
>> holger
>> 
>> 
>> PEP-draft: transition to release file hosting at pypi.python.org
>> =================================================================
>> 
>> Status
>> -----------
>> 
>> PRE-SUBMIT-v1
>> 
>> Abstract
>> ------------
>> 
>> This PEP proposes to move hosting of all release files to
>> pypi.python.org itself.  To ease transition and minimize client-side
>> friction, **no changes to distutils or installers** are required.
>> Rather, the transition is implemented through changes to the pypi.python.org 
>> implementation and by interactions with package maintainers.
>> 
>> Problem
>> ---------------
>> 
>> Today, python package installers (pip and easy_install) need to
>> query multiple sites to discover release files.  Apart from querying
>> pypi.python.org's simple index pages, also all homepages and
>> download pages ever specified with any release of a package need to
>> be crawled by an installer.  The need for installers to crawl 3rd party
>> sites slows down installation and makes for a brittle unreliable 
>> installation process. 
>> 
>> As of March 2013, about 10% of packages have release files which
>> are not hosted directly from pypi.python.org but rather from places
>> referenced by download/homepage sites.  
>> 
>> Conversely, roughly 90% of packages are hosted directly on
>> pypi.python.org [1]_.  Even for them installers still need to crawl the
>> homepage(s) of a package.  Many package uploaders are particularly not
>> aware that specifying the "homepage" will slow down the installation
>> process.
>> 
>> 
>> Solution
>> -----------
>> 
>> Each package is going to get a "hosting mode" field which effects
>> all historic and future releases of a package and its release files.
>> The field has these values and meanings:                            
>> 
>> - "pypi-ext" (transitional) encodes exactly the current mode of operations:
>> homepage/download urls are presented in simple/ pages and client-side
>> tools need to crawl them themselves to find release file links. 
>> 
>> - "pypi-cache": Release files located on remote sites will be downloaded 
>> and cached by pypi.python.org by crawling homepage/download metadata sites.
>> The resulting simple index contains links to release files hosted by
>> pypi.python.org.  The original homepage/download links are added as
>> links without a ``rel`` attribute if they have the ``#egg`` format.
>> 
>> - "pypi-only": homepage/download links are served on simple indexes
>> but without a ``rel`` attribute.  Installation tools will thus not
>> crawl those pages anymore.  Use this option if you commit to always
>> uploading your release files to pypi.python.org.
>> 
>> 
>> Phases of transition
>> -------------------------
>> 
>> 1. At the outset, we set hosting-mode to "pypi-ext" for all packages.
>>  This will not change any link served via the simple index and thus
>>  no bad effects are expected.  Early adopters and testers may now
>>  change the mode to either pypi-only or pypy-cache to help with
>>  streamlining issues.  After implementation and UI issues are
>>  streamlined, the next phase can start.
>> 
>> 2. We perform automatic analysis for each package to determine if it is
>>  a package with externally hosted release files.  Packages which only 
>>  have release files on pypi.python.org are put in the group "A",
>>  those which have at least some packages outside are put in the group "B".
>> 
>>  We sent then a mail to all maintainers of packages in A 
>>  that their hosting-mode is going to be switched automatically to 
>>  "pypi-only" after N weeks, unless they visit their package
>>  administration page earlier and set it to either pypi-cache or
>>  pypi-only earlier.
>> 
>>  We sent then a mail to all maintainers of packages in B
>>  that their hosting-mode is going to be switched automatically to 
>>  "pypi-cache" after N weeks, unless they visit their package
>>  administration page and set it to either pypi-only or
>>  pypi-cache earlier.
>> 
>> 3. all packages will have a hosting mode of either "pypi-cache"
>>  or "pypi-only", resulting in installers to only query
>>  packages hosted through pypi.python.org.
>> 
>> 
>> Transitioning to "pypi-cache" mode
>> -------------------------------------
>> 
>> When transitioning from the currently implicit "pypi-ext" mode to
>> "pypi-cache" for a given package, a package maintainer should 
>> be able to retrieve/verify the historic release files which will 
>> be cached from pypi.python.org.  The UI should present this list
>> and have the maintainer accept it for completing the transition
>> to the "pypi-cache" mode.  Upon future release registration actions,
>> pypi.python.org will perform crawling for the homepage/download sites
>> and cache release files *before* returning a success return code for
>> the release registration.
>> 
>> 
>> References
>> ------------
>> 
>> .. [1] ratio of externally hosted versus pypi-hosted http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html
>> 
>> Acknowledgments
>> ----------------------
>> 
>> Donald Stufft for pushing away from external hosting and doing
>> the 90/10 % statistics script and offering to implement a PR.
>> 
>> Philip Eby for precise information and the basic idea to
>> implement the transition via server-side changes only.
>> 
>> Marc-Andre Lemburg, Nick Coghlan and catalog-sig for thinking
>> through issues regarding getting rid of "external hosting".
>> 
>> 
>> Copyright
>> -----------------
>> 
>> This document has been placed in the public domain.
>> 
>> 
>> _______________________________________________
>> Catalog-SIG mailing list
>> Catalog-SIG at python.org
>> http://mail.python.org/mailman/listinfo/catalog-sig
> 
> Some concerns:
> 
> 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them.
> 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues.
> 
> If we're going to do a phased in per project solution like this I think it would work much better to have 2 modes.
> 
> 1. Legacy - Current behavior, new external links are accepted, existing ones are displayed
> 2. PyPI Only - New behavior, no new external links are accepted, existing ones are removed
> 
> Present the project owners with 2 one way buttons:
>   - Switch to PyPI Only and re-host external files [1]
>   - Switch to PyPI Only and do NOT re-host external files
> 
> These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission.
> 
> The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues.
> 
> [1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution.
> 
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

From holger at merlinux.eu  Sun Mar 10 19:18:28 2013
From: holger at merlinux.eu (holger krekel)
Date: Sun, 10 Mar 2013 18:18:28 +0000
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
Message-ID: <20130310181828.GH9677@merlinux.eu>

On Sun, Mar 10, 2013 at 13:35 -0400, Donald Stufft wrote:
> On Mar 10, 2013, at 11:07 AM, holger krekel <holger at merlinux.eu> wrote:
> > [...]
> > Transitioning to "pypi-cache" mode
> > -------------------------------------
> > 
> > When transitioning from the currently implicit "pypi-ext" mode to
> > "pypi-cache" for a given package, a package maintainer should 
> > be able to retrieve/verify the historic release files which will 
> > be cached from pypi.python.org.  The UI should present this list
> > and have the maintainer accept it for completing the transition
> > to the "pypi-cache" mode.  Upon future release registration actions,
> > pypi.python.org will perform crawling for the homepage/download sites
> > and cache release files *before* returning a success return code for
> > the release registration.
> >  [...]
> 
> Some concerns:
> 
> 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them.

Could you detail how you arrive at this conclusion?
(I've seen the claim before but not the underlying reasoning, maybe
i just missed it)

There would be prior notifications to the package maintainers.  If they 
don't want to have their packages cached at pypi.python.org, they can set
the mode to "pypi-only" and leave manual instructions.  I suspect there will
be very few people if anyone, objecting to pypi-cache mode.  If that is
false we might need to prolong pypi-ext mode some more for them and 
eventually switch them to pypi-only when we eventually decide to get
rid of external hosting.

> 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues.

fragility: not sure it's too bad.  Once the mode is activited release
registration ("submit" POST action on "/pypi" http endpoint) will only
succeed if according releases can be found through homepage/download.
Changing the mode to pypi-cache in the presence of historic release
files hosted elsewhere needs a good pypi.python.org UI interaction and
may take several tries if neccessary sites cannot be reached.  Nevertheless,
this step is potentially fragile [X].

Security: the PEP does not try to prevent package tampering. MITM attacks
between pypi.python.org and the download sites may occur as much as they
can happen today between installers and the download sites.  
I think we should consider protection against package tampering 
in a separate discussion/PEP.

> If we're going to do a phased in per project solution like this I think it would work much better to have 2 modes.
> 
> 1. Legacy - Current behavior, new external links are accepted, existing ones are displayed

> 2. PyPI Only - New behavior, no new external links are accepted, existing ones are removed
> 
> Present the project owners with 2 one way buttons:
>    - Switch to PyPI Only and re-host external files [1]

Doesn't this have the same fragility problem as [X] above?

>    - Switch to PyPI Only and do NOT re-host external files

Are there any problems for doing this automatically (with a prior 
notification to maintainers) for all the projects where we don't 
find externally hosted packages?  I'd expect very few false negatives
and they can be quickly switched back.

Back to pypi-cache: it is there to make it super-easy for package
maintainers.  There are all kinds of release habits and scripts pushing out
things to google/bitbucket/github/other sites.  With "pypi-cache" they
don't need to change any of that.  They just need to be fine with
pypi.python.org pulling in the packages for caching.

We might think about phasing out pypi-cache after some larger time
frame so that we eventually only have pypi-only and things are eventually
simple and saner.

best,
holger



> These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission.
> 
> The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues.
> 
> [1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution.
> 
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 



From donald at stufft.io  Sun Mar 10 19:29:34 2013
From: donald at stufft.io (Donald Stufft)
Date: Sun, 10 Mar 2013 14:29:34 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <20130310181828.GH9677@merlinux.eu>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
Message-ID: <D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>


On Mar 10, 2013, at 2:18 PM, holger krekel <holger at merlinux.eu> wrote:

> On Sun, Mar 10, 2013 at 13:35 -0400, Donald Stufft wrote:
>> On Mar 10, 2013, at 11:07 AM, holger krekel <holger at merlinux.eu> wrote:
>>> [...]
>>> Transitioning to "pypi-cache" mode
>>> -------------------------------------
>>> 
>>> When transitioning from the currently implicit "pypi-ext" mode to
>>> "pypi-cache" for a given package, a package maintainer should 
>>> be able to retrieve/verify the historic release files which will 
>>> be cached from pypi.python.org.  The UI should present this list
>>> and have the maintainer accept it for completing the transition
>>> to the "pypi-cache" mode.  Upon future release registration actions,
>>> pypi.python.org will perform crawling for the homepage/download sites
>>> and cache release files *before* returning a success return code for
>>> the release registration.
>>> [...]
>> 
>> Some concerns:
>> 
>> 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them.
> 
> Could you detail how you arrive at this conclusion?
> (I've seen the claim before but not the underlying reasoning, maybe
> i just missed it)
> 
> There would be prior notifications to the package maintainers.  If they 
> don't want to have their packages cached at pypi.python.org, they can set
> the mode to "pypi-only" and leave manual instructions.  I suspect there will
> be very few people if anyone, objecting to pypi-cache mode.  If that is
> false we might need to prolong pypi-ext mode some more for them and 
> eventually switch them to pypi-only when we eventually decide to get
> rid of external hosting.

I asked VanL. His statement on re-hosting packages was:

    "We could do it if we had permission. The tricky part would be getting permission for already-existing packages."

I'm pretty sure that emailing someone and assuming we have permission if they don't opt-out doesn't count as permission.

> 
>> 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues.
> 
> fragility: not sure it's too bad.  Once the mode is activited release
> registration ("submit" POST action on "/pypi" http endpoint) will only
> succeed if according releases can be found through homepage/download.
> Changing the mode to pypi-cache in the presence of historic release
> files hosted elsewhere needs a good pypi.python.org UI interaction and
> may take several tries if neccessary sites cannot be reached.  Nevertheless,
> this step is potentially fragile [X].

I see, so pypi-cache would only be triggered once during release creation. Cache makes it sound like we'd continuously monitor the given external urls instead of it actually being a pull based method of getting files.

> 
> Security: the PEP does not try to prevent package tampering. MITM attacks
> between pypi.python.org and the download sites may occur as much as they
> can happen today between installers and the download sites.  
> I think we should consider protection against package tampering 
> in a separate discussion/PEP.
> 
>> If we're going to do a phased in per project solution like this I think it would work much better to have 2 modes.
>> 
>> 1. Legacy - Current behavior, new external links are accepted, existing ones are displayed
> 
>> 2. PyPI Only - New behavior, no new external links are accepted, existing ones are removed
>> 
>> Present the project owners with 2 one way buttons:
>>   - Switch to PyPI Only and re-host external files [1]
> 
> Doesn't this have the same fragility problem as [X] above?

Yes, and any pull based solution will. The difference is with a one time and done solution we can live with a little bit more fragility. 

> 
>>   - Switch to PyPI Only and do NOT re-host external files
> 
> Are there any problems for doing this automatically (with a prior 
> notification to maintainers) for all the projects where we don't 
> find externally hosted packages?  I'd expect very few false negatives
> and they can be quickly switched back.

Only thing I could think of is a host being temporarily down being counted as a false positive.

> 
> Back to pypi-cache: it is there to make it super-easy for package
> maintainers.  There are all kinds of release habits and scripts pushing out
> things to google/bitbucket/github/other sites.  With "pypi-cache" they
> don't need to change any of that.  They just need to be fine with
> pypi.python.org pulling in the packages for caching.

Yes I understand the goal here. The problem is that there's not really a good way to secure this without requiring changes to their workflow. At best they'll have to push information about every file so that PyPI is able to verify the files it is downloading, and if we are requiring them to push data about those files we might as well require them to push the files themselves. This also has the effect we can provide immediate feedback when files do not validate on PyPI.

> 
> We might think about phasing out pypi-cache after some larger time
> frame so that we eventually only have pypi-only and things are eventually
> simple and saner.
> 
> best,
> holger
> 
> 
> 
>> These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission.
>> 
>> The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues.
>> 
>> [1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution.
>> 
>> -----------------
>> Donald Stufft
>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>> 
> 
> 


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130310/fc32cd89/attachment.pgp>

From asmeurer at gmail.com  Sun Mar 10 19:51:15 2013
From: asmeurer at gmail.com (Aaron Meurer)
Date: Sun, 10 Mar 2013 12:51:15 -0600
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
Message-ID: <-8601578799313976966@unknownmsgid>

On Mar 10, 2013, at 12:29 PM, Donald Stufft <donald at stufft.io> wrote:

>
> On Mar 10, 2013, at 2:18 PM, holger krekel <holger at merlinux.eu> wrote:
>
>> On Sun, Mar 10, 2013 at 13:35 -0400, Donald Stufft wrote:
>>> On Mar 10, 2013, at 11:07 AM, holger krekel <holger at merlinux.eu> wrote:
>>>> [...]
>>>> Transitioning to "pypi-cache" mode
>>>> -------------------------------------
>>>>
>>>> When transitioning from the currently implicit "pypi-ext" mode to
>>>> "pypi-cache" for a given package, a package maintainer should
>>>> be able to retrieve/verify the historic release files which will
>>>> be cached from pypi.python.org.  The UI should present this list
>>>> and have the maintainer accept it for completing the transition
>>>> to the "pypi-cache" mode.  Upon future release registration actions,
>>>> pypi.python.org will perform crawling for the homepage/download sites
>>>> and cache release files *before* returning a success return code for
>>>> the release registration.
>>>> [...]
>>>
>>> Some concerns:
>>>
>>> 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them.
>>
>> Could you detail how you arrive at this conclusion?
>> (I've seen the claim before but not the underlying reasoning, maybe
>> i just missed it)
>>
>> There would be prior notifications to the package maintainers.  If they
>> don't want to have their packages cached at pypi.python.org, they can set
>> the mode to "pypi-only" and leave manual instructions.  I suspect there will
>> be very few people if anyone, objecting to pypi-cache mode.  If that is
>> false we might need to prolong pypi-ext mode some more for them and
>> eventually switch them to pypi-only when we eventually decide to get
>> rid of external hosting.
>
> I asked VanL. His statement on re-hosting packages was:
>
>    "We could do it if we had permission. The tricky part would be getting permission for already-existing packages."
>
> I'm pretty sure that emailing someone and assuming we have permission if they don't opt-out doesn't count as permission.
>
>>
>>> 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues.
>>
>> fragility: not sure it's too bad.  Once the mode is activited release
>> registration ("submit" POST action on "/pypi" http endpoint) will only
>> succeed if according releases can be found through homepage/download.
>> Changing the mode to pypi-cache in the presence of historic release
>> files hosted elsewhere needs a good pypi.python.org UI interaction and
>> may take several tries if neccessary sites cannot be reached.  Nevertheless,
>> this step is potentially fragile [X].
>
> I see, so pypi-cache would only be triggered once during release creation. Cache makes it sound like we'd continuously monitor the given external urls instead of it actually being a pull based method of getting files.

I think the term "mirror" is more accurate than "cache" here.

Aaron Meurer

>
>>
>> Security: the PEP does not try to prevent package tampering. MITM attacks
>> between pypi.python.org and the download sites may occur as much as they
>> can happen today between installers and the download sites.
>> I think we should consider protection against package tampering
>> in a separate discussion/PEP.
>>
>>> If we're going to do a phased in per project solution like this I think it would work much better to have 2 modes.
>>>
>>> 1. Legacy - Current behavior, new external links are accepted, existing ones are displayed
>>
>>> 2. PyPI Only - New behavior, no new external links are accepted, existing ones are removed
>>>
>>> Present the project owners with 2 one way buttons:
>>>  - Switch to PyPI Only and re-host external files [1]
>>
>> Doesn't this have the same fragility problem as [X] above?
>
> Yes, and any pull based solution will. The difference is with a one time and done solution we can live with a little bit more fragility.
>
>>
>>>  - Switch to PyPI Only and do NOT re-host external files
>>
>> Are there any problems for doing this automatically (with a prior
>> notification to maintainers) for all the projects where we don't
>> find externally hosted packages?  I'd expect very few false negatives
>> and they can be quickly switched back.
>
> Only thing I could think of is a host being temporarily down being counted as a false positive.
>
>>
>> Back to pypi-cache: it is there to make it super-easy for package
>> maintainers.  There are all kinds of release habits and scripts pushing out
>> things to google/bitbucket/github/other sites.  With "pypi-cache" they
>> don't need to change any of that.  They just need to be fine with
>> pypi.python.org pulling in the packages for caching.
>
> Yes I understand the goal here. The problem is that there's not really a good way to secure this without requiring changes to their workflow. At best they'll have to push information about every file so that PyPI is able to verify the files it is downloading, and if we are requiring them to push data about those files we might as well require them to push the files themselves. This also has the effect we can provide immediate feedback when files do not validate on PyPI.
>
>>
>> We might think about phasing out pypi-cache after some larger time
>> frame so that we eventually only have pypi-only and things are eventually
>> simple and saner.
>>
>> best,
>> holger
>>
>>
>>
>>> These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission.
>>>
>>> The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues.
>>>
>>> [1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution.
>>>
>>> -----------------
>>> Donald Stufft
>>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>
>
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

From pje at telecommunity.com  Sun Mar 10 20:41:50 2013
From: pje at telecommunity.com (PJ Eby)
Date: Sun, 10 Mar 2013 15:41:50 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <20130310150740.GE9677@merlinux.eu>
References: <20130310150740.GE9677@merlinux.eu>
Message-ID: <CALeMXf5b41ppcRz6EFdGXuZNknw58bZ8M6miDLz-ac5DzHgRRw@mail.gmail.com>

On Sun, Mar 10, 2013 at 11:07 AM, holger krekel <holger at merlinux.eu> wrote:
> Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig:
> scrutiny and feedback welcome.

Hi Holger.  I'm having some difficulty interpreting your proposal
because it is leaving out some things, and in other places
contradicting what I know of how the tools work.  It is also a bit at
odds with itself in some places.

For instance, at the beginning, the PEP states its proposed solution
is to host all release files on PyPI, but then the problem section
describes the problems that arise from crawling external pages:
problems that can be solved without actually hosting the files on
PyPI.

To me, it needs a clearer explanation of why the actual hosting part
also needs to be on PyPI, not just the links.  In the threads to date,
people have argued about uptime, security, etc., and these points are
not covered by the PEP or even really touched on for the most part.

(Actually, thinking about that makes me wonder....  Donald: did your
analysis collect any stats on *where* those externally hosted files
were hosted?  My intuition says that the bulk of the files (by *file
count*) will come from a handful of highly-available domains, i.e.
sourceforge, github, that sort of thing, with actual self-hosting
being relatively rare *for the files themselves*, vs. a much wider
range of domains for the homepage/download URLs (especially because
those change from one release to the next.)  If that's true, then most
complaints about availability are being caused by crawling multiple
not-highly-available HTML pages, *not* by the downloading of the
actual files.  If my intuition about the distribution is wrong, OTOH,
it would provide a stronger argument for moving the files themselves
to PyPI as well.)

Digression aside, this is one of things that needs to be clearer so
that there's a better explanation for package authors as to why
they're being asked to change.  And although the base argument is good
("specifying the "homepage" will slow down the installation process"),
it could be amplified further with an example of some project that has
had multiple homepages over its lifetime, listing all the URLs that
currently must be crawled before an installer can be sure it has found
all available versions, platforms, and formats of the that project.

Okay, on to the Solution section.  Again, your stated problem is to
fix crawling, but the solution is all about file hosting.  Regardless
of which of these three "hosting modes" is selected, it remains an
option for the developer to host files elsewhere, and provide the
links in their description...  unless of course you intended to rule
that out and forgot to mention it.  (Or, I suppose, if you did *not*
intend to rule it out and intentionally omitted mention of that so the
rabid anti-externalists would think you were on their side and not
create further controversy...  in which case I've now spoiled things.
Darn.  ;-) )

Some technical details are also either incorrect or confusing.  For
example, you state that "The original homepage/download links are
added as links without a ``rel`` attribute if they have the ``#egg``
format".  But if they are added without a rel attribute, it doesn't
*matter* whether they have an #egg marker or not.  It is quite
possible for a PyPI package to have a download_url of say,
"http://sourceforge.net/download/someproject-1.2.tgz".

Thus, I would suggest simply stating that changing hosting mode does
not actually remove any links from the /simple index, it just removes
the rel="" attributes from the "Home page" and "Download" links, thus
preventing them from being crawled in search of additional file links.

With that out of the way, that brings me to the larger scope issue
with the modes as presented.  Notice now that with this clarification,
there is no real difference in *state* between the "pypi-cache" and
"pypi-only" modes.  There is only a *functional* difference...  and
that function is underspecified in the PEP.

What I mean is, in both pypi-cache and pypi-only, the *state* of
things is that rel="" attributes are gone, and there are links to
files on PyPI.  The only difference is in *how* the files get there.

And for the pypi-cache mode, this function is *really*
under-specified.  Arguably, this is the meat of the proposal, but it
is entirely missing.  There is nothing here about the frequency of
crawling, the methods used to select or validate files, whether there
is any expiration...  it is all just magically assumed to happen
somehow.

My suggestion would be to do two things:

First, make the state a boolean: crawl external links, with the
current state yes and the future state no, with "no" simply meaning
that the rel="" attribute is removed from the links that currently
have it.

Second, propose to offer tools in the PyPI interface (and command
line) to assist authors in making the transition, rather than
proposing a completely unspecified caching mechanism.  Better to have
some vaguely specified tools than a completely unspecified caching
mechanism, and better still to spell out very precisely what those
tools do.

Okay, on to the "Phases of transtion".  This section gets a lot
simpler if there are only two stages.  Specifically, we let everyone
know the change is going to happen, and how long they have, give 'em
links to migration tools.  Done.  ;-)

(Okay, so analysis still makes sense: the people who don't have any
externally hosted files can get a different message, i.e., "Hey, we
notice that installing your package is slow because you have these
links that don't go anywhere.  Click here if you'd like PyPI to stop
sending people on wild goose chases".  The people who have external
hosted files will need a more involved message.)

Whew.  Okay, that ends my critique of the PEP as it sits.  Now for an
outside-the-box suggestion.

If you'd like to be able to transition people away from spidered links
in the fewest possible steps, with the least user action, no legal
issues, and in a completely automated way, note that this can be done
with a one-time spidering of the existing links to find the download
links, then adding those links directly to the /simple index, and
switching off the rel="" attributes.  This can be done without
explicit user consent, though they can be given the chance to do it
manually, sooner.

To implement this you'd need two project-level (*not* release-level)
fields: one to indicate whether the project is using rel="" or not,
and one to contain the list of external download links, which would be
user-editable.

This overall approach I'm proposing can be extended to also support
mirroring, since it provides an explicit place to list what it is
you're mirroring.  (At any rate, it's more explicitly specified than
any such place in the current PEP.)

That field can also be fairly easily populated for any given project
in just a few lines of code:

    from pkg_resources import Requirement
    pr = Requirement.parse('Projectname')
    from setuptools.package_index import PackageIndex
    pi = PackageIndex(search_path=[], python=None, platform=None)
    pi.find_packages(pr)
    all_urls = dist.location for dist in pi[pr.key]
    external_urls = [ url for url in all_urls if not '//pypi.python.org' in url]

(Although if you want more information, like what kind of link each
one is, the dist objects actually know a bit more than just the URL.)

Anyway, I hope you found at least some of all this helpful.  ;-)

From holger at merlinux.eu  Sun Mar 10 20:54:05 2013
From: holger at merlinux.eu (holger krekel)
Date: Sun, 10 Mar 2013 19:54:05 +0000
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
Message-ID: <20130310195405.GI9677@merlinux.eu>

On Sun, Mar 10, 2013 at 14:29 -0400, Donald Stufft wrote:
> 
> On Mar 10, 2013, at 2:18 PM, holger krekel <holger at merlinux.eu> wrote:
> 
> > On Sun, Mar 10, 2013 at 13:35 -0400, Donald Stufft wrote:
> >> On Mar 10, 2013, at 11:07 AM, holger krekel <holger at merlinux.eu> wrote:
> >>> [...]
> >>> Transitioning to "pypi-cache" mode
> >>> -------------------------------------
> >>> 
> >>> When transitioning from the currently implicit "pypi-ext" mode to
> >>> "pypi-cache" for a given package, a package maintainer should 
> >>> be able to retrieve/verify the historic release files which will 
> >>> be cached from pypi.python.org.  The UI should present this list
> >>> and have the maintainer accept it for completing the transition
> >>> to the "pypi-cache" mode.  Upon future release registration actions,
> >>> pypi.python.org will perform crawling for the homepage/download sites
> >>> and cache release files *before* returning a success return code for
> >>> the release registration.
> >>> [...]
> >> 
> >> Some concerns:
> >> 
> >> 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them.
> > 
> > Could you detail how you arrive at this conclusion?
> > (I've seen the claim before but not the underlying reasoning, maybe
> > i just missed it)
> > 
> > There would be prior notifications to the package maintainers.  If they 
> > don't want to have their packages cached at pypi.python.org, they can set
> > the mode to "pypi-only" and leave manual instructions.  I suspect there will
> > be very few people if anyone, objecting to pypi-cache mode.  If that is
> > false we might need to prolong pypi-ext mode some more for them and 
> > eventually switch them to pypi-only when we eventually decide to get
> > rid of external hosting.
> 
> I asked VanL. His statement on re-hosting packages was:
> 
>     "We could do it if we had permission. The tricky part would be getting permission for already-existing packages."
> 
> I'm pretty sure that emailing someone and assuming we have permission if they don't opt-out doesn't count as permission.

Hum, i I saw Jesse Noller saying a few days ago "let them opt out".
But i guess VanL can trump that :)  If that is true we could change the
notification to maintainers of B packages that hosting mode is going to
change to pypi-only, which would loose their release files unless they
opt-in to pypi-cache.  As long as that is a no-brainer for them, we are
not asking for much and can count on most people's good will to not make
other people's installation life harder.

Besides, admins could still set the "pypi-ext" mode if a maintainer can
explain why it's a problem for them to agree to "pypi-cache" or
"pypi-only".  I'd really like to not have too many packages lingering
around in "pypi-ext" mode if it can be avoided.

> > 
> >> 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues.
> > 
> > fragility: not sure it's too bad.  Once the mode is activited release
> > registration ("submit" POST action on "/pypi" http endpoint) will only
> > succeed if according releases can be found through homepage/download.
> > Changing the mode to pypi-cache in the presence of historic release
> > files hosted elsewhere needs a good pypi.python.org UI interaction and
> > may take several tries if neccessary sites cannot be reached.  Nevertheless,
> > this step is potentially fragile [X].
> 
> I see, so pypi-cache would only be triggered once during release creation. Cache makes it sound like we'd continuously monitor the given external urls instead of it actually being a pull based method of getting files.

Right, we need to avoid cache invalidation problems by only allowing
updates at user-chosen point in times (there might also be an explicit 
"update cache" button in case a maintainer pushes a egg/wheel later).  
It's still technically a cache i think but the term "rehost" would 
work as well i guess.

> [...]
> > Back to pypi-cache: it is there to make it super-easy for package
> > maintainers.  There are all kinds of release habits and scripts
> > pushing out things to google/bitbucket/github/other sites.  With
> > "pypi-cache" they don't need to change any of that.  They just need
> > to be fine with pypi.python.org pulling in the packages for caching.
> 
> Yes I understand the goal here. The problem is that there's not really
> a good way to secure this without requiring changes to their workflow. 
> At best they'll have to push information about every file so that PyPI
> is able to verify the files it is downloading, and if we are requiring
> them to push data about those files we might as well require them to
> push the files themselves. 

Is this about protection against package tampering?  If so, I think a
proper solution involves maintainers signing their release files but
this is outside the intended scope of the PEP.

Otherwise, the "re-hosting" process for pypi-cache mode is at least as
secure as currently where all hosts issuing pip/easy_install commands
visit external sites and can thus be MITM-attacked.  For pypi-only
server packages it's safer because no crawling takes place.

In any case, asking people to change their release process is not 
a no-brainer.  The PEP tries to avoid this source of friction.
That being said, i think we both agree to recommend maintainers to
(eventually) go for pypi-only and change their release processes
accordingly.  This PEP is not the end of the story of evolving package
hosting and i'd like to be careful about asking maintainers to change 
how they do things.

> This also has the effect we can provide
> immediate feedback when files do not validate on PyPI.

At release registration or switch-to-pypi-rehost time we could also do
package validation but i am inclined to see this as out of scope
for this PEP which tries to focus on the minimal steps to move 
from pypi-ext to everything-hosted-through-pypi.python.org.

cheers,
holger

> 
> > 
> > We might think about phasing out pypi-cache after some larger time
> > frame so that we eventually only have pypi-only and things are eventually
> > simple and saner.
> > 
> > best,
> > holger
> > 
> > 
> > 
> >> These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission.
> >> 
> >> The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues.
> >> 
> >> [1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution.
> >> 
> >> -----------------
> >> Donald Stufft
> >> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> >> 
> > 
> > 
> 
> 
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 



From pje at telecommunity.com  Sun Mar 10 20:59:43 2013
From: pje at telecommunity.com (PJ Eby)
Date: Sun, 10 Mar 2013 15:59:43 -0400
Subject: [Catalog-sig] Search engine relevance
In-Reply-To: <CAHrZfZA4GcVSU1J_7RJPSBPeB0UEge32eY0VPNS_CUFpHZpG0g@mail.gmail.com>
References: <loom.20130308T145827-470@post.gmane.org>
	<CAK8PqJFLeP3a1OEoKhkwFh5XGiKFD+mbdfokHEH0ONUM5uxrsw@mail.gmail.com>
	<CANSw7KycEVeNXStuZ2yhSnc3261V9W=CA=Mi8jeoGVGFUOWQtg@mail.gmail.com>
	<loom.20130308T162304-685@post.gmane.org>
	<CANSw7Kzj-28F0e7VyC-avE4Z-KBUYQ50-ngm4e_8634UMZR5uQ@mail.gmail.com>
	<CAHrZfZB8cCoB3De5YPWo60mLnAhYkuPNj9yEPj3R+i-xscaxOA@mail.gmail.com>
	<CANSw7KyCmd8b0_SLpG=7BaKSxOAyS8CXuX6=-2HFDya9W9ermw@mail.gmail.com>
	<CAHrZfZA4GcVSU1J_7RJPSBPeB0UEge32eY0VPNS_CUFpHZpG0g@mail.gmail.com>
Message-ID: <CALeMXf5wZKJG5wGnutVaL0HT3_i4OXtjQNtPfzi4p9W=2z8=yA@mail.gmail.com>

On Sun, Mar 10, 2013 at 4:23 AM, Richard Jones <r1chardj0n3s at gmail.com> wrote:
> This might solve the AGI problem and could probably produce good results
> using the current ranking algorithm. Not sure. Google's search
> algorithms are far advanced ;-)

Heh.  This just gave me a bit of a chuckle, taken out of context.

"AGI", you see, is also an acronym for "artificial general
intelligence", so for a moment there I thought you were suggesting
that using Postgres rankings properly could bring about the
Singularity.  ;-)

From jnoller at gmail.com  Sun Mar 10 21:03:42 2013
From: jnoller at gmail.com (Jesse Noller)
Date: Sun, 10 Mar 2013 16:03:42 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <20130310195405.GI9677@merlinux.eu>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
Message-ID: <1639F813-A646-4ECF-BDF1-A5C581A64CE0@gmail.com>

I said that before we talked to a lawyer

On Mar 10, 2013, at 3:54 PM, holger krekel <holger at merlinux.eu> wrote:

> On Sun, Mar 10, 2013 at 14:29 -0400, Donald Stufft wrote:
>> 
>> On Mar 10, 2013, at 2:18 PM, holger krekel <holger at merlinux.eu> wrote:
>> 
>>> On Sun, Mar 10, 2013 at 13:35 -0400, Donald Stufft wrote:
>>>> On Mar 10, 2013, at 11:07 AM, holger krekel <holger at merlinux.eu> wrote:
>>>>> [...]
>>>>> Transitioning to "pypi-cache" mode
>>>>> -------------------------------------
>>>>> 
>>>>> When transitioning from the currently implicit "pypi-ext" mode to
>>>>> "pypi-cache" for a given package, a package maintainer should 
>>>>> be able to retrieve/verify the historic release files which will 
>>>>> be cached from pypi.python.org.  The UI should present this list
>>>>> and have the maintainer accept it for completing the transition
>>>>> to the "pypi-cache" mode.  Upon future release registration actions,
>>>>> pypi.python.org will perform crawling for the homepage/download sites
>>>>> and cache release files *before* returning a success return code for
>>>>> the release registration.
>>>>> [...]
>>>> 
>>>> Some concerns:
>>>> 
>>>> 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them.
>>> 
>>> Could you detail how you arrive at this conclusion?
>>> (I've seen the claim before but not the underlying reasoning, maybe
>>> i just missed it)
>>> 
>>> There would be prior notifications to the package maintainers.  If they 
>>> don't want to have their packages cached at pypi.python.org, they can set
>>> the mode to "pypi-only" and leave manual instructions.  I suspect there will
>>> be very few people if anyone, objecting to pypi-cache mode.  If that is
>>> false we might need to prolong pypi-ext mode some more for them and 
>>> eventually switch them to pypi-only when we eventually decide to get
>>> rid of external hosting.
>> 
>> I asked VanL. His statement on re-hosting packages was:
>> 
>>    "We could do it if we had permission. The tricky part would be getting permission for already-existing packages."
>> 
>> I'm pretty sure that emailing someone and assuming we have permission if they don't opt-out doesn't count as permission.
> 
> Hum, i I saw Jesse Noller saying a few days ago "let them opt out".
> But i guess VanL can trump that :)  If that is true we could change the
> notification to maintainers of B packages that hosting mode is going to
> change to pypi-only, which would loose their release files unless they
> opt-in to pypi-cache.  As long as that is a no-brainer for them, we are
> not asking for much and can count on most people's good will to not make
> other people's installation life harder.
> 
> Besides, admins could still set the "pypi-ext" mode if a maintainer can
> explain why it's a problem for them to agree to "pypi-cache" or
> "pypi-only".  I'd really like to not have too many packages lingering
> around in "pypi-ext" mode if it can be avoided.
> 
>>> 
>>>> 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues.
>>> 
>>> fragility: not sure it's too bad.  Once the mode is activited release
>>> registration ("submit" POST action on "/pypi" http endpoint) will only
>>> succeed if according releases can be found through homepage/download.
>>> Changing the mode to pypi-cache in the presence of historic release
>>> files hosted elsewhere needs a good pypi.python.org UI interaction and
>>> may take several tries if neccessary sites cannot be reached.  Nevertheless,
>>> this step is potentially fragile [X].
>> 
>> I see, so pypi-cache would only be triggered once during release creation. Cache makes it sound like we'd continuously monitor the given external urls instead of it actually being a pull based method of getting files.
> 
> Right, we need to avoid cache invalidation problems by only allowing
> updates at user-chosen point in times (there might also be an explicit 
> "update cache" button in case a maintainer pushes a egg/wheel later).  
> It's still technically a cache i think but the term "rehost" would 
> work as well i guess.
> 
>> [...]
>>> Back to pypi-cache: it is there to make it super-easy for package
>>> maintainers.  There are all kinds of release habits and scripts
>>> pushing out things to google/bitbucket/github/other sites.  With
>>> "pypi-cache" they don't need to change any of that.  They just need
>>> to be fine with pypi.python.org pulling in the packages for caching.
>> 
>> Yes I understand the goal here. The problem is that there's not really
>> a good way to secure this without requiring changes to their workflow. 
>> At best they'll have to push information about every file so that PyPI
>> is able to verify the files it is downloading, and if we are requiring
>> them to push data about those files we might as well require them to
>> push the files themselves.
> 
> Is this about protection against package tampering?  If so, I think a
> proper solution involves maintainers signing their release files but
> this is outside the intended scope of the PEP.
> 
> Otherwise, the "re-hosting" process for pypi-cache mode is at least as
> secure as currently where all hosts issuing pip/easy_install commands
> visit external sites and can thus be MITM-attacked.  For pypi-only
> server packages it's safer because no crawling takes place.
> 
> In any case, asking people to change their release process is not 
> a no-brainer.  The PEP tries to avoid this source of friction.
> That being said, i think we both agree to recommend maintainers to
> (eventually) go for pypi-only and change their release processes
> accordingly.  This PEP is not the end of the story of evolving package
> hosting and i'd like to be careful about asking maintainers to change 
> how they do things.
> 
>> This also has the effect we can provide
>> immediate feedback when files do not validate on PyPI.
> 
> At release registration or switch-to-pypi-rehost time we could also do
> package validation but i am inclined to see this as out of scope
> for this PEP which tries to focus on the minimal steps to move 
> from pypi-ext to everything-hosted-through-pypi.python.org.
> 
> cheers,
> holger
> 
>> 
>>> 
>>> We might think about phasing out pypi-cache after some larger time
>>> frame so that we eventually only have pypi-only and things are eventually
>>> simple and saner.
>>> 
>>> best,
>>> holger
>>> 
>>> 
>>> 
>>>> These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission.
>>>> 
>>>> The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues.
>>>> 
>>>> [1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution.
>>>> 
>>>> -----------------
>>>> Donald Stufft
>>>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>> 
>> 
>> -----------------
>> Donald Stufft
>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 
> 
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

From donald at stufft.io  Sun Mar 10 21:59:14 2013
From: donald at stufft.io (Donald Stufft)
Date: Sun, 10 Mar 2013 16:59:14 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CALeMXf5b41ppcRz6EFdGXuZNknw58bZ8M6miDLz-ac5DzHgRRw@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<CALeMXf5b41ppcRz6EFdGXuZNknw58bZ8M6miDLz-ac5DzHgRRw@mail.gmail.com>
Message-ID: <73570046-C5E3-429B-B390-29C6578721A3@stufft.io>


On Mar 10, 2013, at 3:41 PM, PJ Eby <pje at telecommunity.com> wrote:

> On Sun, Mar 10, 2013 at 11:07 AM, holger krekel <holger at merlinux.eu> wrote:
>> Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig:
>> scrutiny and feedback welcome.
> 
> Hi Holger.  I'm having some difficulty interpreting your proposal
> because it is leaving out some things, and in other places
> contradicting what I know of how the tools work.  It is also a bit at
> odds with itself in some places.
> 
> For instance, at the beginning, the PEP states its proposed solution
> is to host all release files on PyPI, but then the problem section
> describes the problems that arise from crawling external pages:
> problems that can be solved without actually hosting the files on
> PyPI.
> 
> To me, it needs a clearer explanation of why the actual hosting part
> also needs to be on PyPI, not just the links.  In the threads to date,
> people have argued about uptime, security, etc., and these points are
> not covered by the PEP or even really touched on for the most part.
> 
> (Actually, thinking about that makes me wonder....  Donald: did your
> analysis collect any stats on *where* those externally hosted files
> were hosted?  My intuition says that the bulk of the files (by *file
> count*) will come from a handful of highly-available domains, i.e.
> sourceforge, github, that sort of thing, with actual self-hosting
> being relatively rare *for the files themselves*, vs. a much wider
> range of domains for the homepage/download URLs (especially because
> those change from one release to the next.)  If that's true, then most
> complaints about availability are being caused by crawling multiple
> not-highly-available HTML pages, *not* by the downloading of the
> actual files.  If my intuition about the distribution is wrong, OTOH,
> it would provide a stronger argument for moving the files themselves
> to PyPI as well.)

No but it wouldn't be difficult to take the list of packages I generated and run another script to see where the files that aren't available on PyPI are actually located at. I'd like to emphasize again though that it doesn't really matter how good their uptime is, the best case scenario is it doens't hurt uptime, and worst case and typical case) is that it decreases it. A high uptime host will just decrease it _less_ than a low uptime host.

> 
> Digression aside, this is one of things that needs to be clearer so
> that there's a better explanation for package authors as to why
> they're being asked to change.  And although the base argument is good
> ("specifying the "homepage" will slow down the installation process"),
> it could be amplified further with an example of some project that has
> had multiple homepages over its lifetime, listing all the URLs that
> currently must be crawled before an installer can be sure it has found
> all available versions, platforms, and formats of the that project.
> 
> Okay, on to the Solution section.  Again, your stated problem is to
> fix crawling, but the solution is all about file hosting.  Regardless
> of which of these three "hosting modes" is selected, it remains an
> option for the developer to host files elsewhere, and provide the
> links in their description...  unless of course you intended to rule
> that out and forgot to mention it.  (Or, I suppose, if you did *not*
> intend to rule it out and intentionally omitted mention of that so the
> rabid anti-externalists would think you were on their side and not
> create further controversy...  in which case I've now spoiled things.
> Darn.  ;-) )
> 
> Some technical details are also either incorrect or confusing.  For
> example, you state that "The original homepage/download links are
> added as links without a ``rel`` attribute if they have the ``#egg``
> format".  But if they are added without a rel attribute, it doesn't
> *matter* whether they have an #egg marker or not.  It is quite
> possible for a PyPI package to have a download_url of say,
> "http://sourceforge.net/download/someproject-1.2.tgz".
> 
> Thus, I would suggest simply stating that changing hosting mode does
> not actually remove any links from the /simple index, it just removes
> the rel="" attributes from the "Home page" and "Download" links, thus
> preventing them from being crawled in search of additional file links.

In my opinion the final, PyPI only mode needs to remove all external links from the /simple/ index.

> 
> With that out of the way, that brings me to the larger scope issue
> with the modes as presented.  Notice now that with this clarification,
> there is no real difference in *state* between the "pypi-cache" and
> "pypi-only" modes.  There is only a *functional* difference...  and
> that function is underspecified in the PEP.
> 
> What I mean is, in both pypi-cache and pypi-only, the *state* of
> things is that rel="" attributes are gone, and there are links to
> files on PyPI.  The only difference is in *how* the files get there.
> 
> And for the pypi-cache mode, this function is *really*
> under-specified.  Arguably, this is the meat of the proposal, but it
> is entirely missing.  There is nothing here about the frequency of
> crawling, the methods used to select or validate files, whether there
> is any expiration...  it is all just magically assumed to happen
> somehow.
> 
> My suggestion would be to do two things:
> 
> First, make the state a boolean: crawl external links, with the
> current state yes and the future state no, with "no" simply meaning
> that the rel="" attribute is removed from the links that currently
> have it.
> 
> Second, propose to offer tools in the PyPI interface (and command
> line) to assist authors in making the transition, rather than
> proposing a completely unspecified caching mechanism.  Better to have
> some vaguely specified tools than a completely unspecified caching
> mechanism, and better still to spell out very precisely what those
> tools do.
> 
> Okay, on to the "Phases of transtion".  This section gets a lot
> simpler if there are only two stages.  Specifically, we let everyone
> know the change is going to happen, and how long they have, give 'em
> links to migration tools.  Done.  ;-)

This is my opinion as well. Though I think we differ in what the final stage should look like.

> 
> (Okay, so analysis still makes sense: the people who don't have any
> externally hosted files can get a different message, i.e., "Hey, we
> notice that installing your package is slow because you have these
> links that don't go anywhere.  Click here if you'd like PyPI to stop
> sending people on wild goose chases".  The people who have external
> hosted files will need a more involved message.)
> 
> Whew.  Okay, that ends my critique of the PEP as it sits.  Now for an
> outside-the-box suggestion.
> 
> If you'd like to be able to transition people away from spidered links
> in the fewest possible steps, with the least user action, no legal
> issues, and in a completely automated way, note that this can be done
> with a one-time spidering of the existing links to find the download
> links, then adding those links directly to the /simple index, and
> switching off the rel="" attributes.  This can be done without
> explicit user consent, though they can be given the chance to do it
> manually, sooner.
> 
> To implement this you'd need two project-level (*not* release-level)
> fields: one to indicate whether the project is using rel="" or not,
> and one to contain the list of external download links, which would be
> user-editable.
> 
> This overall approach I'm proposing can be extended to also support
> mirroring, since it provides an explicit place to list what it is
> you're mirroring.  (At any rate, it's more explicitly specified than
> any such place in the current PEP.)
> 
> That field can also be fairly easily populated for any given project
> in just a few lines of code:
> 
>    from pkg_resources import Requirement
>    pr = Requirement.parse('Projectname')
>    from setuptools.package_index import PackageIndex
>    pi = PackageIndex(search_path=[], python=None, platform=None)
>    pi.find_packages(pr)
>    all_urls = dist.location for dist in pi[pr.key]
>    external_urls = [ url for url in all_urls if not '//pypi.python.org' in url]
> 
> (Although if you want more information, like what kind of link each
> one is, the dist objects actually know a bit more than just the URL.)
> 
> Anyway, I hope you found at least some of all this helpful.  ;-)
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


I'm still against any off PyPI hosting of files. I call it "External links" a lot but in reality it's the requirement to contact any host other than PyPI to install a package.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130310/dbc5eb29/attachment.pgp>

From donald at stufft.io  Sun Mar 10 22:16:24 2013
From: donald at stufft.io (Donald Stufft)
Date: Sun, 10 Mar 2013 17:16:24 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <20130310195405.GI9677@merlinux.eu>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
Message-ID: <AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>


On Mar 10, 2013, at 3:54 PM, holger krekel <holger at merlinux.eu> wrote:

> On Sun, Mar 10, 2013 at 14:29 -0400, Donald Stufft wrote:
>> 
>> On Mar 10, 2013, at 2:18 PM, holger krekel <holger at merlinux.eu> wrote:
>> 
>>> On Sun, Mar 10, 2013 at 13:35 -0400, Donald Stufft wrote:
>>>> On Mar 10, 2013, at 11:07 AM, holger krekel <holger at merlinux.eu> wrote:
>>>>> [...]
>>>>> Transitioning to "pypi-cache" mode
>>>>> -------------------------------------
>>>>> 
>>>>> When transitioning from the currently implicit "pypi-ext" mode to
>>>>> "pypi-cache" for a given package, a package maintainer should 
>>>>> be able to retrieve/verify the historic release files which will 
>>>>> be cached from pypi.python.org.  The UI should present this list
>>>>> and have the maintainer accept it for completing the transition
>>>>> to the "pypi-cache" mode.  Upon future release registration actions,
>>>>> pypi.python.org will perform crawling for the homepage/download sites
>>>>> and cache release files *before* returning a success return code for
>>>>> the release registration.
>>>>> [...]
>>>> 
>>>> Some concerns:
>>>> 
>>>> 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them.
>>> 
>>> Could you detail how you arrive at this conclusion?
>>> (I've seen the claim before but not the underlying reasoning, maybe
>>> i just missed it)
>>> 
>>> There would be prior notifications to the package maintainers.  If they 
>>> don't want to have their packages cached at pypi.python.org, they can set
>>> the mode to "pypi-only" and leave manual instructions.  I suspect there will
>>> be very few people if anyone, objecting to pypi-cache mode.  If that is
>>> false we might need to prolong pypi-ext mode some more for them and 
>>> eventually switch them to pypi-only when we eventually decide to get
>>> rid of external hosting.
>> 
>> I asked VanL. His statement on re-hosting packages was:
>> 
>>    "We could do it if we had permission. The tricky part would be getting permission for already-existing packages."
>> 
>> I'm pretty sure that emailing someone and assuming we have permission if they don't opt-out doesn't count as permission.
> 
> Hum, i I saw Jesse Noller saying a few days ago "let them opt out".
> But i guess VanL can trump that :)  If that is true we could change the
> notification to maintainers of B packages that hosting mode is going to
> change to pypi-only, which would loose their release files unless they
> opt-in to pypi-cache.  As long as that is a no-brainer for them, we are
> not asking for much and can count on most people's good will to not make
> other people's installation life harder.
> 
> Besides, admins could still set the "pypi-ext" mode if a maintainer can
> explain why it's a problem for them to agree to "pypi-cache" or
> "pypi-only".  I'd really like to not have too many packages lingering
> around in "pypi-ext" mode if it can be avoided.

0 packages allowing external links is the only useful end goal.

> 
>>> 
>>>> 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues.
>>> 
>>> fragility: not sure it's too bad.  Once the mode is activited release
>>> registration ("submit" POST action on "/pypi" http endpoint) will only
>>> succeed if according releases can be found through homepage/download.
>>> Changing the mode to pypi-cache in the presence of historic release
>>> files hosted elsewhere needs a good pypi.python.org UI interaction and
>>> may take several tries if neccessary sites cannot be reached.  Nevertheless,
>>> this step is potentially fragile [X].
>> 
>> I see, so pypi-cache would only be triggered once during release creation. Cache makes it sound like we'd continuously monitor the given external urls instead of it actually being a pull based method of getting files.
> 
> Right, we need to avoid cache invalidation problems by only allowing
> updates at user-chosen point in times (there might also be an explicit 
> "update cache" button in case a maintainer pushes a egg/wheel later).  
> It's still technically a cache i think but the term "rehost" would 
> work as well i guess.
> 
>> [...]
>>> Back to pypi-cache: it is there to make it super-easy for package
>>> maintainers.  There are all kinds of release habits and scripts
>>> pushing out things to google/bitbucket/github/other sites.  With
>>> "pypi-cache" they don't need to change any of that.  They just need
>>> to be fine with pypi.python.org pulling in the packages for caching.
>> 
>> Yes I understand the goal here. The problem is that there's not really
>> a good way to secure this without requiring changes to their workflow. 
>> At best they'll have to push information about every file so that PyPI
>> is able to verify the files it is downloading, and if we are requiring
>> them to push data about those files we might as well require them to
>> push the files themselves. 
> 
> Is this about protection against package tampering?  If so, I think a
> proper solution involves maintainers signing their release files but
> this is outside the intended scope of the PEP.

This part of it is yes, it's also about accidentally mirroring an unreleased file. We're going to need require certain information pushed to the PyPI requiring files pushed to PyPI in addition to that is not a big deal. Further more if people really really want this pull based behavior they can easily set it up outside of PyPI.

> 
> Otherwise, the "re-hosting" process for pypi-cache mode is at least as
> secure as currently where all hosts issuing pip/easy_install commands
> visit external sites and can thus be MITM-attacked.  For pypi-only
> server packages it's safer because no crawling takes place.

It's as least as secure as a completely insecure process. That's not setting a very high bar.

> 
> In any case, asking people to change their release process is not 
> a no-brainer.  The PEP tries to avoid this source of friction.
> That being said, i think we both agree to recommend maintainers to
> (eventually) go for pypi-only and change their release processes
> accordingly.  This PEP is not the end of the story of evolving package
> hosting and i'd like to be careful about asking maintainers to change 
> how they do things.

If someones release process forces PyPI to have security, uptime, and privacy issues then I'm very sorry but their release process is going to need to change. It's not fun, it's a shitty situation, but trying to bend over backwards to enable their current release processes is like trying to bend over backwards to enable people to still walk into their banks vault and grab a stack of currency.

This isn't a case of "I don't like the way your process works, and I want you to change it". This is a case of "Your process actively causes the greater Python community to be vulnerable to a host of issues".

> 
>> This also has the effect we can provide
>> immediate feedback when files do not validate on PyPI.
> 
> At release registration or switch-to-pypi-rehost time we could also do
> package validation but i am inclined to see this as out of scope
> for this PEP which tries to focus on the minimal steps to move 
> from pypi-ext to everything-hosted-through-pypi.python.org.
> 
> cheers,
> holger
> 
>> 
>>> 
>>> We might think about phasing out pypi-cache after some larger time
>>> frame so that we eventually only have pypi-only and things are eventually
>>> simple and saner.
>>> 
>>> best,
>>> holger
>>> 
>>> 
>>> 
>>>> These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission.
>>>> 
>>>> The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues.
>>>> 
>>>> [1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution.
>>>> 
>>>> -----------------
>>>> Donald Stufft
>>>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>>>> 
>>> 
>>> 
>> 
>> 
>> -----------------
>> Donald Stufft
>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>> 
> 
> 


There isn't a good middle ground here, any externally hosted or spidered file leads us back to at least 2 of the 3 major issues I outlined. The end goal *needs* to be that all external links are removed from PyPI's simple page, and only files hosted on PyPI are accepted there.

The only real useful discussion is how do we get from where we are now, to the zero external links/files situation. At some point this is going to *require* breaking things for anyone who hasn't put their files on PyPI. Adding more steps only draws out the pain, like a bandaid it's best if it's ripped off quickly.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130310/57e14be7/attachment-0001.pgp>

From pje at telecommunity.com  Sun Mar 10 23:41:22 2013
From: pje at telecommunity.com (PJ Eby)
Date: Sun, 10 Mar 2013 18:41:22 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
Message-ID: <CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>

On Sun, Mar 10, 2013 at 5:16 PM, Donald Stufft <donald at stufft.io> wrote:
> If someones release process forces PyPI to have security, uptime, and privacy issues then I'm very sorry but their release process is going to need to change. It's not fun, it's a shitty situation, but trying to bend over backwards to enable their current release processes is like trying to bend over backwards to enable people to still walk into their banks vault and grab a stack of currency.

When people in group 1 express disapproval of people in group 2, this
creates a rallying effect among members of group 1, and a *negative*
counter-reaction in members of group 2.

This is effective if, and *only* if, the people in group 2 have less
power in the situation than the people in group 1.   For example, if
co-operation from the people in group 2 are not needed in order to
carry out the wishes of group 1.

However, in the situation under discussion, such co-operation is
required, which means an alternative motivational strategy is
indicated.

That strategy involves giving persons in group 2 a better reason to
care than "because we in group 1 think you group 2 people are
thieves."

And by better, I mean, a reason that *benefits group 2*, and more
specifically, each individual in group 2 who chooses to co-operate.

And ideally, you work also to lower the cost of that co-operation.

That's what *this* thread was originally about (lowering the cost of
co-operation), before these "burn the witch" sentiments started up
again.  So, why not just step aside and let the adults go back to
working on the actual problem?

Just kidding, of course.   ;-)  That's an example of me using the same
type of communication style, in the opposite direction: spewing
disapproval at something I don't like, instead of giving you a reason
that benefits *you*, to do what I want.  See how it feels, going the
other direction?  Did it motivate you to be helpful?  I'm guessing
not.  ;-)

Anyway, my point is this: people don't like it one bit when you tell
them what to do.

If you tell them, "you must do X", you get resistance.

But if you offer them a choice, "Are you going to do X or Y?", there's
much less resistance.

And if one choice is less convenient than the other, most will pick
the easier choice.

So, would you rather fight with developers to make them do it your
way, or have most of them do exactly what you want and most of the
rest get pretty close, but not have to fight with them about it?

Right now, the impression you and certain other people are giving me
is that it is more important that whatever action we take be seen as
censuring the practice of off-PyPI hosting, than that we actually fix
the problems!

And it's difficult to take such a position seriously, because the
post-hoc rationalization of harms is, well, unconvincing at best to a
neutral party.  When PyPI was first built, it didn't *have* hosting,
so there was nothing morally wrong about off-site hosting then.

And when hosting was first added, automated downloading didn't exist
yet, either.  So it still wasn't wrong.

And when I added automated downloading, I made the choice to encourage
people to collaborate by making it as easy as possible.  So offsite
hosting still wasn't wrong, in fact it was a documented alternative.

And that's been the case for, oh, 8 years now?

So what you're actually doing isn't crusading against evil-doers, it's
more like saying that every restaurant that isn't McDonalds should be
immediately remodeled, because you have just noticed the shocking
trend that hardly any of those restaurants will serve you food as
quickly!

And that of course, the restaurant owners should undertake the
remodeling and procedure changes, retraining, retooling, etc. at
*their* expense, on *your* timeline.

Just so that *you*, who *chose to visit those restaurants in the first
place*, can get your food a bit more quickly.

Sure, I know that's not how *you* see it.

But surely you can see that's how the *restaurant owners* are going to see it.

And if you want them to co-operate, it's probably going to be in your
interest to focus your attention on their side of the equation, rather
than on yours.  You already agree with your point of view.  They
don't.

I realize that can be difficult to do when you have strong feelings
about a subject.  For example, as I write this I keep backing up and
deleting all sorts of unhelpful things I find myself wanting to say.
;-)

And I'm doing that because I'm consciously reminding myself that
*getting to a solution* is more important to me than *making you feel
bad* for being "wrong on the internet".

What's more important to you?  The *actual* state of PyPI, or the
state of who is to be considered right or wrong?

If it's the former, you would probably find it useful to your goals,
to please refrain from calling me and that other 10% of PyPI thieves.
Or really any other names whatsoever, explicitly OR implicitly.

Thanks.

From donald at stufft.io  Mon Mar 11 01:25:16 2013
From: donald at stufft.io (Donald Stufft)
Date: Sun, 10 Mar 2013 20:25:16 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
Message-ID: <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>

On Mar 10, 2013, at 6:41 PM, PJ Eby <pje at telecommunity.com> wrote:

> On Sun, Mar 10, 2013 at 5:16 PM, Donald Stufft <donald at stufft.io> wrote:
>> If someones release process forces PyPI to have security, uptime, and privacy issues then I'm very sorry but their release process is going to need to change. It's not fun, it's a shitty situation, but trying to bend over backwards to enable their current release processes is like trying to bend over backwards to enable people to still walk into their banks vault and grab a stack of currency.
> 
> When people in group 1 express disapproval of people in group 2, this
> creates a rallying effect among members of group 1, and a *negative*
> counter-reaction in members of group 2.
> 
> This is effective if, and *only* if, the people in group 2 have less
> power in the situation than the people in group 1.   For example, if
> co-operation from the people in group 2 are not needed in order to
> carry out the wishes of group 1.
> 
> However, in the situation under discussion, such co-operation is
> required, which means an alternative motivational strategy is
> indicated.
> 
> That strategy involves giving persons in group 2 a better reason to
> care than "because we in group 1 think you group 2 people are
> thieves."
> 
> And by better, I mean, a reason that *benefits group 2*, and more
> specifically, each individual in group 2 who chooses to co-operate.
> 
> And ideally, you work also to lower the cost of that co-operation.
> 
> That's what *this* thread was originally about (lowering the cost of
> co-operation), before these "burn the witch" sentiments started up
> again.  So, why not just step aside and let the adults go back to
> working on the actual problem?
> 
> Just kidding, of course.   ;-)  That's an example of me using the same
> type of communication style, in the opposite direction: spewing
> disapproval at something I don't like, instead of giving you a reason
> that benefits *you*, to do what I want.  See how it feels, going the
> other direction?  Did it motivate you to be helpful?  I'm guessing
> not.  ;-)
> 
> Anyway, my point is this: people don't like it one bit when you tell
> them what to do.
> 
> If you tell them, "you must do X", you get resistance.
> 
> But if you offer them a choice, "Are you going to do X or Y?", there's
> much less resistance.
> 
> And if one choice is less convenient than the other, most will pick
> the easier choice.
> 
> So, would you rather fight with developers to make them do it your
> way, or have most of them do exactly what you want and most of the
> rest get pretty close, but not have to fight with them about it?
> 
> Right now, the impression you and certain other people are giving me
> is that it is more important that whatever action we take be seen as
> censuring the practice of off-PyPI hosting, than that we actually fix
> the problems!
> 
> And it's difficult to take such a position seriously, because the
> post-hoc rationalization of harms is, well, unconvincing at best to a
> neutral party.  When PyPI was first built, it didn't *have* hosting,
> so there was nothing morally wrong about off-site hosting then.
> 
> And when hosting was first added, automated downloading didn't exist
> yet, either.  So it still wasn't wrong.
> 
> And when I added automated downloading, I made the choice to encourage
> people to collaborate by making it as easy as possible.  So offsite
> hosting still wasn't wrong, in fact it was a documented alternative.
> 
> And that's been the case for, oh, 8 years now?
> 
> So what you're actually doing isn't crusading against evil-doers, it's
> more like saying that every restaurant that isn't McDonalds should be
> immediately remodeled, because you have just noticed the shocking
> trend that hardly any of those restaurants will serve you food as
> quickly!
> 
> And that of course, the restaurant owners should undertake the
> remodeling and procedure changes, retraining, retooling, etc. at
> *their* expense, on *your* timeline.
> 
> Just so that *you*, who *chose to visit those restaurants in the first
> place*, can get your food a bit more quickly.
> 
> Sure, I know that's not how *you* see it.
> 
> But surely you can see that's how the *restaurant owners* are going to see it.
> 
> And if you want them to co-operate, it's probably going to be in your
> interest to focus your attention on their side of the equation, rather
> than on yours.  You already agree with your point of view.  They
> don't.
> 
> I realize that can be difficult to do when you have strong feelings
> about a subject.  For example, as I write this I keep backing up and
> deleting all sorts of unhelpful things I find myself wanting to say.
> ;-)
> 
> And I'm doing that because I'm consciously reminding myself that
> *getting to a solution* is more important to me than *making you feel
> bad* for being "wrong on the internet".
> 
> What's more important to you?  The *actual* state of PyPI, or the
> state of who is to be considered right or wrong?
> 
> If it's the former, you would probably find it useful to your goals,
> to please refrain from calling me and that other 10% of PyPI thieves.
> Or really any other names whatsoever, explicitly OR implicitly.
> 
> Thanks.

I don't think anyone is bad here, nor am I arguing against any particular person or group of people. I'm arguing against a practice and a system. You're going out of your way to find excuses to throw all sorts of stop energy here. All I said was that their process needed to change, I even expressed sympathy with the fact it did need to change. 

I've never called *anyone* on this list, or on PyPI a thief. My analogy served only to put into light that the system that I'm trying to change is insecure, just like allowing anyone to walk into a bank vault and pick up money would be insecure. I fully believe that the people using such a system are completely trustworthy people. But just because *they* are trustworthy doesn't mean that a system which allows *anyone* to attack other Python developers is *ok*.

When discussing security of a system it's necessary to divorce yourself from the implementations of things. When you get wrapped up in the implementation you turn things into a Us vs Them game (as evidenced by several of your messages) instead of discussing the merits of the various systems and which ones serve the greatest needs of the community the best.

I believe I've said it before, but if not here it is again: I will donate *my free time* to help ANYONE who is using a release process which this change would break to engineer a new release process that has as little impact on their actual process as possible and not have all these issues for the greater Python community. And let's just be clear, I'm offering to put aside a massive list of things I need to be doing to help the very folks you're saying i'm disparaging. 

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130310/9b64073f/attachment.pgp>

From pje at telecommunity.com  Mon Mar 11 07:09:16 2013
From: pje at telecommunity.com (PJ Eby)
Date: Mon, 11 Mar 2013 02:09:16 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
Message-ID: <CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>

On Sun, Mar 10, 2013 at 8:25 PM, Donald Stufft <donald at stufft.io> wrote:
> I don't think anyone is bad here, nor am I arguing against any particular person or group of people. I'm arguing against a practice and a system. You're going out of your way to find excuses to throw all sorts of stop energy here.

Calling a legitimate disagreement with your point of view "stop
energy" seems inappropriate to me, since my issue is with you
derailing the topic of how to get people to *voluntarily* migrate to a
better situation than the present one, and to develop tools for that
process.  The only thing I wish you to stop is the repeated assertion
without proof that 1) external links must go *and* 2) this must be an
enforced directive rather than a (highly-encouraged) option.

I have even gone so far as to suggest, earlier in this thread, what
evidence I would find at least suggestive of your POV.  But your
response to that and prior challenges to those assertions, has been
simply to move your goalpost.  E.g. from "current uptime is bad" to
"any uptime lower than PyPI's is totally unacceptable".

I, on the other hand, have moved in the direction of *your* proposals
repeatedly, making adjustments as I find actually-convincing evidence
and/or reasoning, or find ways to deal with the issues.  I have
compromised quite a bit.  (And have already spent a fair amount of
time writing setuptools code to lay a foundation for these changes.)

You, as far as I can tell, have not moved your position in the slightest.

Which of these is "stop energy"?

It is not the case that external links must be removed from PyPI in
order to ensure security, or uptime.  And it is *especially* not the
case that you are the BDFL of uptime.  You're definitely not the BDFL
of uptime for any given project hosted on PyPI, that you *voluntarily
choose* to make a part of your build process.  If your primary
argument is that project X must host its files on PyPI because of your
build process, then I think you misunderstand open source, and also
the part where you *chose* to make it part of your build process.  It
certainly doesn't give you the right to force projects Y, Z, and Q --
that you don't even use! -- to also host their projects on PyPI,
because project X -- the one you do use -- has a slow or unreliable
file host!

It seems disingenuous to then shfit the argument back to security when
challenged on uptime, and back to uptime when challenged on security.
We've looped back and forth over those for some time: when I point out
that wheels have signatures which will make off-site hosting
relatively unimportant to the security picture, you jump back to
talking about uptime.  When I point out that uptime is a consensual
factor that in no way justifies legislating what other people can do
with their projects, you go back to talking about security.

Make up your mind.  What problem are you actually trying to solve?

(I expect your response on wheels to be that wheels aren't there yet,
etc., but that isn't actually a response to the objection unless
you're going to change your position to, "okay, external links to file
formats that can be signed can stay," or something of that sort.
Otherwise, you're not actually compromising, just using the fact that
wheels aren't in common use yet as an argument to keep the position
you started with.)


> My analogy served only to put into light that the system that I'm trying to change is insecure, just like allowing anyone to walk into a bank vault and pick up money would be insecure. I fully believe that the people using such a system are completely trustworthy people. But just because *they* are trustworthy doesn't mean that a system which allows *anyone* to attack other Python developers is *ok*.

And my analogy served only to put into light the part where you're
insisting that one group of people change for the benefit of a group
which is already benefiting from their pre-existing generosity.

That being said, I do see that I could have misinterpreted the intent
of your analogy -- it sounded like you were saying that the developers
who host off-PyPI were thieves walking into your bank and taking your
money (i.e., analogizing theft with inconveniencing you by making your
builds fail or run slowly).

Though to be honest, I still don't comprehend how else to make any
kind of sense to that analogy in its original context.  Who is the
bank?  Whose money is being taken?  The whole thing is utterly
confusing to me if I try to take it any other way than the way I did,
because it doesn't seem to have any other simple 1:1 mapping to the
situation, as far as I can see.   Your explanation seems terribly
abstract and tortured to me, as far as analogies go.


> When discussing security of a system it's necessary to divorce yourself from the implementations of things. When you get wrapped up in the implementation you turn things into a Us vs Them game (as evidenced by several of your messages) instead of discussing the merits of the various systems and which ones serve the greatest needs of the community the best.

I think you've got things backwards here.  It's you who's been arguing
that the solution to the problem of "improved uptime and security" is
best implemented by "ban all non-PyPI hosting".  It is I who has been
arguing that this is a premature judgment and rush to implementation,
without considering all of the design angles.  And I am the one asking
you to stop insisting on this one implementation and state your
*actual* problem with external links.

(By which I mean, a problem stated such that, if you're given a
solution that *doesn't* involve banning them from PyPI, you aren't
going to rejigger the problem statement so that it once again requires
banning.  That's moving the goalposts, and that's what keeps happening
in this discussion, at least as far as I can see.  I, on the other
hand, have given you my actual problem with your proposal, and I have
not moved *my* goalposts.  Instead, I've moved towards your position,
more than once.  But I've moved as far towards it as I can go at this
time, without you providing any additional evidence or explanation or
*some* kind of engagement with the points that I've raised above that
you've previously ignored, in this thread and others.)

From regebro at gmail.com  Mon Mar 11 07:23:35 2013
From: regebro at gmail.com (Lennart Regebro)
Date: Mon, 11 Mar 2013 07:23:35 +0100
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
Message-ID: <CAL0kPAUELqLo+__GJD0NF0j_dM8zzeDa-wPxjedyKCO-sqLcqQ@mail.gmail.com>

On Mon, Mar 11, 2013 at 7:09 AM, PJ Eby <pje at telecommunity.com> wrote:
> I think you've got things backwards here.  It's you who's been arguing
> that the solution to the problem of "improved uptime and security" is
> best implemented by "ban all non-PyPI hosting".

The uptime problem is *only* solvable by minimizing the number of
hosts involved. The minimum number of hosts is one. That means we
should get all releases onto PyPI. This has been obvious for years,
and I'm overjoyed to see that work is finally being done to make that
happen. Discussion should be about how to best do that, not if we
should do that or not.

We can also discuss wordings. Nobody is for example trying to strictly
speaking ban hosting on other hosts than PyPI. But if you do host on
another server, your package will not be a part of the Python
ecosystem, and it will not be installable by easy_install or pip or
buildout, etc. You can call that a "ban" if you want, but maybe that
causes negative connotations that are best avoided. But what ever you
call it the end goal and result is the same. Packages not hosted on
PyPI will not be easily installable. This is, and must be, the end
goal.

Now let's discuss how to get there instead.

//Lennart

From ronaldoussoren at mac.com  Mon Mar 11 09:06:21 2013
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Mon, 11 Mar 2013 09:06:21 +0100
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting
	at	pypi site
In-Reply-To: <CAL0kPAUELqLo+__GJD0NF0j_dM8zzeDa-wPxjedyKCO-sqLcqQ@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<CAL0kPAUELqLo+__GJD0NF0j_dM8zzeDa-wPxjedyKCO-sqLcqQ@mail.gmail.com>
Message-ID: <212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com>


On 11 Mar, 2013, at 7:23, Lennart Regebro <regebro at gmail.com> wrote:

> On Mon, Mar 11, 2013 at 7:09 AM, PJ Eby <pje at telecommunity.com> wrote:
>> I think you've got things backwards here.  It's you who's been arguing
>> that the solution to the problem of "improved uptime and security" is
>> best implemented by "ban all non-PyPI hosting".
> 
> The uptime problem is *only* solvable by minimizing the number of
> hosts involved. The minimum number of hosts is one.

I mostly agree when you change hosts to websites ;-). 

> That means we
> should get all releases onto PyPI.

But this isn't necessarily true, there is another solution: mirror your requirements locally.  That way you don't have problems when the remote PyPI server is unreachable for some reason, and you can be sure that the exact version you tested with is available and used.

> This has been obvious for years,
> and I'm overjoyed to see that work is finally being done to make that
> happen. Discussion should be about how to best do that, not if we
> should do that or not.
> 
> We can also discuss wordings. Nobody is for example trying to strictly
> speaking ban hosting on other hosts than PyPI. But if you do host on
> another server, your package will not be a part of the Python
> ecosystem, and it will not be installable by easy_install or pip or
> buildout, etc. You can call that a "ban" if you want, but maybe that
> causes negative connotations that are best avoided. But what ever you
> call it the end goal and result is the same. Packages not hosted on
> PyPI will not be easily installable. This is, and must be, the end
> goal.

The end goal is to make it easy and safe to install packages.

> 
> Now let's discuss how to get there instead.

Is it even clear why numerous archives aren't hosted on PyPI?  IMHO it would be better to 
remove barriers than force projects to host files on PyPI.

Ronald

> 
> //Lennart
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


From ronaldoussoren at mac.com  Mon Mar 11 09:14:11 2013
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Mon, 11 Mar 2013 09:14:11 +0100
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting
	at	pypi site
In-Reply-To: <AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
Message-ID: <BAA01A69-0061-433D-8DCE-68D21CE138C8@mac.com>


On 10 Mar, 2013, at 22:16, Donald Stufft <donald at stufft.io> wrote:
> 
> There isn't a good middle ground here, any externally hosted or spidered file leads us back to at least 2 of the 3 major issues I outlined. The end goal *needs* to be that all external links are removed from PyPI's simple page, and only files hosted on PyPI are accepted there.

Why is that? It there something in the proposed package signing solution that won't work when files aren't on PyPI? If so, will it still be possible to run in-house package repositories (partial PyPI mirrors and/or repositories with non-public software)? 

Ronald

From regebro at gmail.com  Mon Mar 11 09:18:28 2013
From: regebro at gmail.com (Lennart Regebro)
Date: Mon, 11 Mar 2013 09:18:28 +0100
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<CAL0kPAUELqLo+__GJD0NF0j_dM8zzeDa-wPxjedyKCO-sqLcqQ@mail.gmail.com>
	<212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com>
Message-ID: <CAL0kPAUt7NKex9i9rm-0K-haYxemCJdjPZmz1pH7z6ZpM2UyGg@mail.gmail.com>

On Mon, Mar 11, 2013 at 9:06 AM, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
> But this isn't necessarily true, there is another solution: mirror your requirements locally.

I do that. This is not a solution, because your requirements yesterday
is not your requirements tomorrow.

> Is it even clear why numerous archives aren't hosted on PyPI?

No, the only one that has mentioned why is Marc-Andr?, I think, whose
eGenix packages are distributed as binary packages for loads of
different platforms. It's unclear to me if all these binary packages
should be uploaded to PyPI, and it is also unclear to me why they
can't be, it seems to be mostly a case of it being too much work.

He also mentioned the big Python distributions eGenix does as being
too large for PyPI, but I don't really see the point of uploading
Python distributions to PyPI, they can't be installed with Python
installers anyway.

>  IMHO it would be better to remove barriers than force projects to host files on PyPI.

Nobody has really been able to point out any real barriers, so we
don't know what they are or if they exist.

//Lennart

From ronaldoussoren at mac.com  Mon Mar 11 09:33:51 2013
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Mon, 11 Mar 2013 09:33:51 +0100
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CAL0kPAUt7NKex9i9rm-0K-haYxemCJdjPZmz1pH7z6ZpM2UyGg@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<CAL0kPAUELqLo+__GJD0NF0j_dM8zzeDa-wPxjedyKCO-sqLcqQ@mail.gmail.com>
	<212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com>
	<CAL0kPAUt7NKex9i9rm-0K-haYxemCJdjPZmz1pH7z6ZpM2UyGg@mail.gmail.com>
Message-ID: <826D31AF-BE1C-4FC3-8FF9-EAC3B7D6EA54@mac.com>


On 11 Mar, 2013, at 9:18, Lennart Regebro <regebro at gmail.com> wrote:

> On Mon, Mar 11, 2013 at 9:06 AM, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
>> But this isn't necessarily true, there is another solution: mirror your requirements locally.
> 
> I do that. This is not a solution, because your requirements yesterday
> is not your requirements tomorrow.

So? When your requirements change you change the local mirror.

> 
>> Is it even clear why numerous archives aren't hosted on PyPI?
> 
> No, the only one that has mentioned why is Marc-Andr?, I think, whose
> eGenix packages are distributed as binary packages for loads of
> different platforms. It's unclear to me if all these binary packages
> should be uploaded to PyPI, and it is also unclear to me why they
> can't be, it seems to be mostly a case of it being too much work.
> 
> He also mentioned the big Python distributions eGenix does as being
> too large for PyPI, but I don't really see the point of uploading
> Python distributions to PyPI, they can't be installed with Python
> installers anyway.

Some reasons I've seen mentioned in the past:

* In some big companies it might be easier to publish archives on the company webserver than on PyPI due to truckloads of red tape on their part (not something we can fix)

* It is easier to publish all related archives in the same place for projects where the python package is just one component (for example client libraries for a network server)

* Authors might not know it is possible to upload archives to PyPI

> 
>> IMHO it would be better to remove barriers than force projects to host files on PyPI.
> 
> Nobody has really been able to point out any real barriers, so we
> don't know what they are or if they exist.

It may be as simple as lack of knowledge (e.g. "I didn't know I could host files on PyPI"),
or unnecessary friction in the release proces. 

I guess the only way we will know why some authors don't upload archives to
PyPI is to ask (some of) them.   

Ronald

From mal at egenix.com  Mon Mar 11 10:23:03 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Mon, 11 Mar 2013 10:23:03 +0100
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <CAL0kPAUt7NKex9i9rm-0K-haYxemCJdjPZmz1pH7z6ZpM2UyGg@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<CAL0kPAUELqLo+__GJD0NF0j_dM8zzeDa-wPxjedyKCO-sqLcqQ@mail.gmail.com>
	<212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com>
	<CAL0kPAUt7NKex9i9rm-0K-haYxemCJdjPZmz1pH7z6ZpM2UyGg@mail.gmail.com>
Message-ID: <513DA277.2010809@egenix.com>

On 11.03.2013 09:18, Lennart Regebro wrote:
> On Mon, Mar 11, 2013 at 9:06 AM, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
>> But this isn't necessarily true, there is another solution: mirror your requirements locally.
> 
> I do that. This is not a solution, because your requirements yesterday
> is not your requirements tomorrow.
> 
>> Is it even clear why numerous archives aren't hosted on PyPI?
> 
> No, the only one that has mentioned why is Marc-Andr?, I think, whose
> eGenix packages are distributed as binary packages for loads of
> different platforms. It's unclear to me if all these binary packages
> should be uploaded to PyPI, and it is also unclear to me why they
> can't be, it seems to be mostly a case of it being too much work.

I've listed all the reasons in one of the previous emails:

http://mail.python.org/pipermail/catalog-sig/2013-March/005502.html

Others will likely have additional reasons, like e.g.

* the PyPI uploads not being compatible to their release process

* not knowing that it's possible to host files on PyPI - after
  all it's an *index*, not a repository :-)

* still believing that PyPI is an unreliable hosting provider
  due the many downtimes and problems it had in the past - which
  is no longer true today

* not wanting to host and maintain files in several different
  places

* not wanting to host release files at all, i.e. have people
  check out the version from a repository instead of doing
  the download, unzip, install dance

* not wanting to separate associated library or product
  code from the Python wrapper code (think e.g. the
  Python interface for subversion)

* not being allowed to upload files to external servers
  by company policy, or having to deal with a company
  policy that makes this difficult/unattractive

* having issues with the added latency of PyPI downloads compared
  to a simple file based index hosted on a company web server

* having a strong need to know the number of downloads per
  package and associated statistics such as downloads per
  country, per year/month/day/hour

* not wanting to give up access to the download log files

* having a requirement to restrict downloads on a per country
  basis, e.g. for export controlled software or software which
  may not be imported/used in certain countries

* having PyPI not provide the technical means to host the
  release files, e.g. due to the releases using a format
  which is not supported by PyPI (e.g. all the ActiveState
  packages - http://code.activestate.com/pypm/)

* user experience/support issues:
  if the package has external dependencies,
  or needs special setup, it may provide a better user experience
  to host the Python wrapper on the same page as the dependencies
  and instructions on how to install them; rather than having
  them on PyPI which lets people believe that a simple
  "pip install something" will get them a working setup

Those are just a few things that come to mind. I'm sure there
are more issues that keep authors from uploading their
packages to PyPI.

Overall, I think we should encourage people to make their
code available through PyPI and make it attractive to them,
but keep the possibility to continue using external hosting
platforms, should they run into issues that PyPI cannot
solve for them.

> He also mentioned the big Python distributions eGenix does as being
> too large for PyPI, but I don't really see the point of uploading
> Python distributions to PyPI, they can't be installed with Python
> installers anyway.

Not sure what you mean here.

PyPI is also used to index Python projects which are not Python
packages to be installed by pip/easy_install/etc.

Some of those may also want to

>>  IMHO it would be better to remove barriers than force projects to host files on PyPI.
> 
> Nobody has really been able to point out any real barriers, so we
> don't know what they are or if they exist.

Again, please see the email where I listed the ones affecting
at least eGenix.

Most of those can be addressed in one way or another, e.g.
by having PyPI cache the files, provide access to the download
counts by country, provide a way to host separate indexes for
UCS2/UCS4 egg files, etc.

The only issues that need more investigation are the PyPI license
terms and the general issue of not being able to host export
regulated files on PyPI.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 11 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From regebro at gmail.com  Mon Mar 11 10:31:45 2013
From: regebro at gmail.com (Lennart Regebro)
Date: Mon, 11 Mar 2013 10:31:45 +0100
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <826D31AF-BE1C-4FC3-8FF9-EAC3B7D6EA54@mac.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<CAL0kPAUELqLo+__GJD0NF0j_dM8zzeDa-wPxjedyKCO-sqLcqQ@mail.gmail.com>
	<212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com>
	<CAL0kPAUt7NKex9i9rm-0K-haYxemCJdjPZmz1pH7z6ZpM2UyGg@mail.gmail.com>
	<826D31AF-BE1C-4FC3-8FF9-EAC3B7D6EA54@mac.com>
Message-ID: <CAL0kPAU-JbVa0AsSfSrOA3yKgiXxb85=9dwg_MOuu1ULGv-qng@mail.gmail.com>

On Mon, Mar 11, 2013 at 9:33 AM, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
>
> On 11 Mar, 2013, at 9:18, Lennart Regebro <regebro at gmail.com> wrote:
>
>> On Mon, Mar 11, 2013 at 9:06 AM, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
>>> But this isn't necessarily true, there is another solution: mirror your requirements locally.
>>
>> I do that. This is not a solution, because your requirements yesterday
>> is not your requirements tomorrow.
>
> So? When your requirements change you change the local mirror.

How? You can't mirror something that you can't reach.
The only local solution to this is to mirror every file that is
reachable via PyPI, in advance. That is obviously *not* a feasible
solution.

> I guess the only way we will know why some authors don't upload archives to
> PyPI is to ask (some of) them.

Right. I don't think it's feasible to discuss speculative reasons, and
in any case I strongly believe that whatever reason people have, we
still should not let the Python tools install packages from
third-party hosts by default. If you have your own index (like Plone
currently does, largely because of the problems caused by having
packages on several different servers) that should of course be
allowed.

I have a list of emails already, if somebody wants to ask people. :-)
It's 2651 emails though, and I think most of those people have
registeres packages that doesn't actually have *any* distributions.
:-) I didn't check for that.

//Lennart

From ronaldoussoren at mac.com  Mon Mar 11 10:56:23 2013
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Mon, 11 Mar 2013 10:56:23 +0100
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CAL0kPAU-JbVa0AsSfSrOA3yKgiXxb85=9dwg_MOuu1ULGv-qng@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<CAL0kPAUELqLo+__GJD0NF0j_dM8zzeDa-wPxjedyKCO-sqLcqQ@mail.gmail.com>
	<212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com>
	<CAL0kPAUt7NKex9i9rm-0K-haYxemCJdjPZmz1pH7z6ZpM2UyGg@mail.gmail.com>
	<826D31AF-BE1C-4FC3-8FF9-EAC3B7D6EA54@mac.com>
	<CAL0kPAU-JbVa0AsSfSrOA3yKgiXxb85=9dwg_MOuu1ULGv-qng@mail.gmail.com>
Message-ID: <DAEC3884-78A8-4CDD-BDFD-37DB458E7F6B@mac.com>


On 11 Mar, 2013, at 10:31, Lennart Regebro <regebro at gmail.com> wrote:

> On Mon, Mar 11, 2013 at 9:33 AM, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
>> 
>> On 11 Mar, 2013, at 9:18, Lennart Regebro <regebro at gmail.com> wrote:
>> 
>>> On Mon, Mar 11, 2013 at 9:06 AM, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
>>>> But this isn't necessarily true, there is another solution: mirror your requirements locally.
>>> 
>>> I do that. This is not a solution, because your requirements yesterday
>>> is not your requirements tomorrow.
>> 
>> So? When your requirements change you change the local mirror.
> 
> How? You can't mirror something that you can't reach.

Now I'm confused. You want to change a dependency without testing it before hand?

I'm probably getting old, but for production software I tend to download and archive
all versions used instead of assuming that all software can at all times easily be
downloaded. 

When I want to update a dependency (new version, new external package)
I first download and test, then add it to the local archive.

Part of the reason for this is that the production site doesn't have a fast always on
internet connection, another part is that the local archive ensures I can reproduce
the exact installation on another server without cloning the first one.

> The only local solution to this is to mirror every file that is
> reachable via PyPI, in advance. That is obviously *not* a feasible
> solution.
> 
>> I guess the only way we will know why some authors don't upload archives to
>> PyPI is to ask (some of) them.
> 
> Right. I don't think it's feasible to discuss speculative reasons, and
> in any case I strongly believe that whatever reason people have, we
> still should not let the Python tools install packages from
> third-party hosts by default.

I don't have problems with installing from 3th-party hosts, as someone noted
earlier some of those 3th-party hosts have very high uptimes themself (github,
bitbucket, ...).   

The current way to get to those 3th-party hosts is hacky and could be changed,
for example by adding a PyPI API for registering download links and other metadata
for specific files (that is, a way to add items to the file list on PyPI that aren't hosted on PyPI).  

I don't know how feasible this would be when packages are signed
using TUF, but it could work with Giovanni's proposal using PGP signatures. 

A problem with adding such an API is that there is no reason to assume that
it would actually be used, using that API would be about as much work as
using the upload API in the first place.

> If you have your own index (like Plone
> currently does, largely because of the problems caused by having
> packages on several different servers) that should of course be
> allowed.
> 
> I have a list of emails already, if somebody wants to ask people. :-)

That won't be me, I don't have enough time available to act upon the results.

Ronald

From holger at merlinux.eu  Mon Mar 11 11:02:25 2013
From: holger at merlinux.eu (holger krekel)
Date: Mon, 11 Mar 2013 10:02:25 +0000
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <CALeMXf5b41ppcRz6EFdGXuZNknw58bZ8M6miDLz-ac5DzHgRRw@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<CALeMXf5b41ppcRz6EFdGXuZNknw58bZ8M6miDLz-ac5DzHgRRw@mail.gmail.com>
Message-ID: <20130311100225.GL9677@merlinux.eu>

Hi Philip,

thanks for your helpful review, almost all makes sense to me ...
some more inlined comments below.  Up front, i am open to you 
co-authoring the PEP if you like and share the goal to find a minimum
viable approach to speed up and simplify the interactions for installers.

On Sun, Mar 10, 2013 at 15:41 -0400, PJ Eby wrote:
> On Sun, Mar 10, 2013 at 11:07 AM, holger krekel <holger at merlinux.eu> wrote:
> > Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig:
> > scrutiny and feedback welcome.
> 
> Hi Holger.  I'm having some difficulty interpreting your proposal
> because it is leaving out some things, and in other places
> contradicting what I know of how the tools work.  It is also a bit at
> odds with itself in some places.

Certainly, it was a quick draft to get the process going and useful
feedback which worked so far :)

> For instance, at the beginning, the PEP states its proposed solution
> is to host all release files on PyPI, but then the problem section
> describes the problems that arise from crawling external pages:
> problems that can be solved without actually hosting the files on
> PyPI.
>
> To me, it needs a clearer explanation of why the actual hosting part
> also needs to be on PyPI, not just the links.  In the threads to date,
> people have argued about uptime, security, etc., and these points are
> not covered by the PEP or even really touched on for the most part.

Makes sense to clarify this more.

> (Actually, thinking about that makes me wonder....  Donald: did your
> analysis collect any stats on *where* those externally hosted files
> were hosted?  My intuition says that the bulk of the files (by *file
> count*) will come from a handful of highly-available domains, i.e.
> sourceforge, github, that sort of thing, with actual self-hosting
> being relatively rare *for the files themselves*, vs. a much wider
> range of domains for the homepage/download URLs (especially because
> those change from one release to the next.)  If that's true, then most
> complaints about availability are being caused by crawling multiple
> not-highly-available HTML pages, *not* by the downloading of the
> actual files.  If my intuition about the distribution is wrong, OTOH,
> it would provide a stronger argument for moving the files themselves
> to PyPI as well.)
> 
> Digression aside, this is one of things that needs to be clearer so
> that there's a better explanation for package authors as to why
> they're being asked to change.  And although the base argument is good
> ("specifying the "homepage" will slow down the installation process"),
> it could be amplified further with an example of some project that has
> had multiple homepages over its lifetime, listing all the URLs that
> currently must be crawled before an installer can be sure it has found
> all available versions, platforms, and formats of the that project.

Right, an example makes sense.

> Okay, on to the Solution section.  Again, your stated problem is to
> fix crawling, but the solution is all about file hosting.  Regardless
> of which of these three "hosting modes" is selected, it remains an
> option for the developer to host files elsewhere, and provide the
> links in their description...  unless of course you intended to rule
> that out and forgot to mention it.  (Or, I suppose, if you did *not*
> intend to rule it out and intentionally omitted mention of that so the
> rabid anti-externalists would think you were on their side and not
> create further controversy...  in which case I've now spoiled things.
> Darn.  ;-) )

To be honest, while drafting i forgot about the fact that the
long_description can contain package links as well.

> Some technical details are also either incorrect or confusing.  For
> example, you state that "The original homepage/download links are
> added as links without a ``rel`` attribute if they have the ``#egg``
> format".  But if they are added without a rel attribute, it doesn't
> *matter* whether they have an #egg marker or not.  It is quite
> possible for a PyPI package to have a download_url of say,
> "http://sourceforge.net/download/someproject-1.2.tgz".

Right.  I just wanted to clarify that the distutils metadata 
"download_url" can contain an #egg format link and that this link
should still be served (without a rel).

> Thus, I would suggest simply stating that changing hosting mode does
> not actually remove any links from the /simple index, it just removes
> the rel="" attributes from the "Home page" and "Download" links, thus
> preventing them from being crawled in search of additional file links.

That's certainly a better description of what effectively happens 
and avoids the special mentioning of #egg.

> With that out of the way, that brings me to the larger scope issue
> with the modes as presented.  Notice now that with this clarification,
> there is no real difference in *state* between the "pypi-cache" and
> "pypi-only" modes.  There is only a *functional* difference...  and
> that function is underspecified in the PEP.

Agreed.

> What I mean is, in both pypi-cache and pypi-only, the *state* of
> things is that rel="" attributes are gone, and there are links to
> files on PyPI.  The only difference is in *how* the files get there.

Yes.

> And for the pypi-cache mode, this function is *really*
> under-specified.  Arguably, this is the meat of the proposal, but it
> is entirely missing.  There is nothing here about the frequency of
> crawling, the methods used to select or validate files, whether there
> is any expiration...  it is all just magically assumed to happen
> somehow.

I'd like to avoid cache-invalidation issues by only performing cache
updates upon three user actions:

- when a release is registered for a package which is in 
  "pypi-cache" hosting mode.

- when a maintainer chooses to set "pypi-cache" 

- when a maintainer explicitely triggers a "cache" update 

All actions allow pypi.python.org to provide feedback / error codes
so there is nothing hidden going on in regular intervals or so.


> My suggestion would be to do two things:
> 
> First, make the state a boolean: crawl external links, with the
> current state yes and the future state no, with "no" simply meaning
> that the rel="" attribute is removed from the links that currently
> have it.
> 
> Second, propose to offer tools in the PyPI interface (and command
> line) to assist authors in making the transition, rather than
> proposing a completely unspecified caching mechanism.  Better to have
> some vaguely specified tools than a completely unspecified caching
> mechanism, and better still to spell out very precisely what those
> tools do.

This structure makes sense to me except that i see the need to start off with
"pypi-ext", i.e. a third state which encodes the current behaviour.
Thing is that the pypi.python.org doesn't have an extensive test 
suite and we will thus need to rely on a few early adopters 
using the tools/state-changes before starting phase 2 (mass mailings etc.).
Also in case of problems we can always switch back packages to the safe
"pypi-ext" mode.  IOW, the motiviation for this third state is considering
the actual implementation process.

> Okay, on to the "Phases of transtion".  This section gets a lot
> simpler if there are only two stages.  Specifically, we let everyone
> know the change is going to happen, and how long they have, give 'em
> links to migration tools.  Done.  ;-)
> 
> (Okay, so analysis still makes sense: the people who don't have any
> externally hosted files can get a different message, i.e., "Hey, we
> notice that installing your package is slow because you have these
> links that don't go anywhere.  Click here if you'd like PyPI to stop
> sending people on wild goose chases".  The people who have external
> hosted files will need a more involved message.)
> 
> Whew.  Okay, that ends my critique of the PEP as it sits.  Now for an
> outside-the-box suggestion.
> 
> If you'd like to be able to transition people away from spidered links
> in the fewest possible steps, with the least user action, no legal
> issues, and in a completely automated way, note that this can be done
> with a one-time spidering of the existing links to find the download
> links, then adding those links directly to the /simple index, and
> switching off the rel="" attributes.  This can be done without
> explicit user consent, though they can be given the chance to do it
> manually, sooner.

Right, my mail preceding the "pre-pep" one contained a "linkext" state
which spidered the links and offered them directly.  It's certainly possible
and indeed would likely not have legal issues.  It might have 
cache-invalidation issues and probably makes the pypi-side implementation 
more complex.  Also it goes a bit against the current intention of the
PEP to have pypi.python.org control all hosting of release files.
 
> To implement this you'd need two project-level (*not* release-level)
> fields: one to indicate whether the project is using rel="" or not,
> and one to contain the list of external download links, which would be
> user-editable.
> 
> This overall approach I'm proposing can be extended to also support
> mirroring, since it provides an explicit place to list what it is
> you're mirroring.  (At any rate, it's more explicitly specified than
> any such place in the current PEP.)
> 
> That field can also be fairly easily populated for any given project
> in just a few lines of code:
> 
>     from pkg_resources import Requirement
>     pr = Requirement.parse('Projectname')
>     from setuptools.package_index import PackageIndex
>     pi = PackageIndex(search_path=[], python=None, platform=None)
>     pi.find_packages(pr)
>     all_urls = dist.location for dist in pi[pr.key]
>     external_urls = [ url for url in all_urls if not '//pypi.python.org' in url]
> 
> (Although if you want more information, like what kind of link each
> one is, the dist objects actually know a bit more than just the URL.)
> 
> Anyway, I hope you found at least some of all this helpful.  ;-)

Certainly!  Will try to do an update incorporating your suggestions
in the next days.

best,
holger


From regebro at gmail.com  Mon Mar 11 11:44:31 2013
From: regebro at gmail.com (Lennart Regebro)
Date: Mon, 11 Mar 2013 11:44:31 +0100
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <DAEC3884-78A8-4CDD-BDFD-37DB458E7F6B@mac.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<CAL0kPAUELqLo+__GJD0NF0j_dM8zzeDa-wPxjedyKCO-sqLcqQ@mail.gmail.com>
	<212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com>
	<CAL0kPAUt7NKex9i9rm-0K-haYxemCJdjPZmz1pH7z6ZpM2UyGg@mail.gmail.com>
	<826D31AF-BE1C-4FC3-8FF9-EAC3B7D6EA54@mac.com>
	<CAL0kPAU-JbVa0AsSfSrOA3yKgiXxb85=9dwg_MOuu1ULGv-qng@mail.gmail.com>
	<DAEC3884-78A8-4CDD-BDFD-37DB458E7F6B@mac.com>
Message-ID: <CAL0kPAWuaupd1ykf+okzLfRebSonLw=u6mKOHeaAkqGhJsMoBw@mail.gmail.com>

On Mon, Mar 11, 2013 at 10:56 AM, Ronald Oussoren
<ronaldoussoren at mac.com> wrote:
> Now I'm confused. You want to change a dependency without testing it before hand?

How do you test a dependency without changing it? How do you test a
dependency that is unreachable?

It seems to me you are arbitrarily limiting this discussion to
problems installing software on production servers. This is not
reasonably real-life limitation. If a server is unreachable, it is
unreachable even if you aren't installing production software on a
server. It's equally unreachable if I need to download something for
testing on my local machine.

That's now all the energy I'm willing to spend on discussing this
topic. Third-party hosting needs to go. I believe there is a broad
consensus on this. Let's instead discuss *how* to implement it.

//Lennart

From donald at stufft.io  Mon Mar 11 12:14:14 2013
From: donald at stufft.io (Donald Stufft)
Date: Mon, 11 Mar 2013 07:14:14 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
Message-ID: <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>


On Mar 11, 2013, at 2:09 AM, PJ Eby <pje at telecommunity.com> wrote:

> On Sun, Mar 10, 2013 at 8:25 PM, Donald Stufft <donald at stufft.io> wrote:
>> I don't think anyone is bad here, nor am I arguing against any particular person or group of people. I'm arguing against a practice and a system. You're going out of your way to find excuses to throw all sorts of stop energy here.
> 
> Calling a legitimate disagreement with your point of view "stop
> energy" seems inappropriate to me, since my issue is with you
> derailing the topic of how to get people to *voluntarily* migrate to a
> better situation than the present one, and to develop tools for that
> process.  The only thing I wish you to stop is the repeated assertion
> without proof that 1) external links must go *and* 2) this must be an
> enforced directive rather than a (highly-encouraged) option.

1) Proof of what? That it's insecure? That it harms uptime? That it violates people's privacy? I don't understand what you want here, do you want me to go and find insecure hosts and start boosting malware onto peoples machines?
2) Even a single project remaining causes the entire thing to cascade, Weakest Link Theory.

> 
> I have even gone so far as to suggest, earlier in this thread, what
> evidence I would find at least suggestive of your POV.  But your
> response to that and prior challenges to those assertions, has been
> simply to move your goalpost.  E.g. from "current uptime is bad" to
> "any uptime lower than PyPI's is totally unacceptable".

I outlined all 3 of the major reasons in my very first email. I've never changed them.

> 
> I, on the other hand, have moved in the direction of *your* proposals
> repeatedly, making adjustments as I find actually-convincing evidence
> and/or reasoning, or find ways to deal with the issues.  I have
> compromised quite a bit.  (And have already spent a fair amount of
> time writing setuptools code to lay a foundation for these changes.)
> 
> You, as far as I can tell, have not moved your position in the slightest.
> 
> Which of these is "stop energy"?

I've not been willing to compromise because none of the solutions presented solves all the actual issues. They just rearrange deck chairs on the titanic.

> 
> It is not the case that external links must be removed from PyPI in
> order to ensure security, or uptime.  And it is *especially* not the
> case that you are the BDFL of uptime.  You're definitely not the BDFL
> of uptime for any given project hosted on PyPI, that you *voluntarily
> choose* to make a part of your build process.  If your primary
> argument is that project X must host its files on PyPI because of your
> build process, then I think you misunderstand open source, and also
> the part where you *chose* to make it part of your build process.  It
> certainly doesn't give you the right to force projects Y, Z, and Q --
> that you don't even use! -- to also host their projects on PyPI,
> because project X -- the one you do use -- has a slow or unreliable
> file host!
> 
> It seems disingenuous to then shfit the argument back to security when
> challenged on uptime, and back to uptime when challenged on security.
> We've looped back and forth over those for some time: when I point out
> that wheels have signatures which will make off-site hosting
> relatively unimportant to the security picture, you jump back to
> talking about uptime.  When I point out that uptime is a consensual
> factor that in no way justifies legislating what other people can do
> with their projects, you go back to talking about security.
> 
> Make up your mind.  What problem are you actually trying to solve?

All of them, as outlined in my original email.

> 
> (I expect your response on wheels to be that wheels aren't there yet,
> etc., but that isn't actually a response to the objection unless
> you're going to change your position to, "okay, external links to file
> formats that can be signed can stay," or something of that sort.
> Otherwise, you're not actually compromising, just using the fact that
> wheels aren't in common use yet as an argument to keep the position
> you started with.)

Signed releases solve 1/3 of the original issues and bring with them their own. How do you transmit the signatures? How do you decide which signatures are valid for any given file? There's a pretty complicated system written called TUF which handles some of these issues (but again it only solves 1/3 of them) and until we get that transmission of the signatures in a sane way is unlikely.

> 
> 
>> My analogy served only to put into light that the system that I'm trying to change is insecure, just like allowing anyone to walk into a bank vault and pick up money would be insecure. I fully believe that the people using such a system are completely trustworthy people. But just because *they* are trustworthy doesn't mean that a system which allows *anyone* to attack other Python developers is *ok*.
> 
> And my analogy served only to put into light the part where you're
> insisting that one group of people change for the benefit of a group
> which is already benefiting from their pre-existing generosity.
> 
> That being said, I do see that I could have misinterpreted the intent
> of your analogy -- it sounded like you were saying that the developers
> who host off-PyPI were thieves walking into your bank and taking your
> money (i.e., analogizing theft with inconveniencing you by making your
> builds fail or run slowly).
> 
> Though to be honest, I still don't comprehend how else to make any
> kind of sense to that analogy in its original context.  Who is the
> bank?  Whose money is being taken?  The whole thing is utterly
> confusing to me if I try to take it any other way than the way I did,
> because it doesn't seem to have any other simple 1:1 mapping to the
> situation, as far as I can see.   Your explanation seems terribly
> abstract and tortured to me, as far as analogies go.

Bank == PyPI, People insisting that the bank vault remain open so they can walk in and grab their own money because it's easier == folks arguing for the existing solution because they don't want to change their release process. Combined this leaves the bank (and in the actual situation, PyPI) open to a number of issues.

> 
> 
>> When discussing security of a system it's necessary to divorce yourself from the implementations of things. When you get wrapped up in the implementation you turn things into a Us vs Them game (as evidenced by several of your messages) instead of discussing the merits of the various systems and which ones serve the greatest needs of the community the best.
> 
> I think you've got things backwards here.  It's you who's been arguing
> that the solution to the problem of "improved uptime and security" is
> best implemented by "ban all non-PyPI hosting".  It is I who has been
> arguing that this is a premature judgment and rush to implementation,
> without considering all of the design angles.  And I am the one asking
> you to stop insisting on this one implementation and state your
> *actual* problem with external links.

Read my first email. Security, uptime, privacy. Note security isn't just about changing out files either, there's a whole host of possible problems most of them documented here: https://www.updateframework.com/wiki/Docs/Security . It's true that his won't solve all of those issues immediately but it moves us to a position where we can start trying.

> 
> (By which I mean, a problem stated such that, if you're given a
> solution that *doesn't* involve banning them from PyPI, you aren't
> going to rejigger the problem statement so that it once again requires
> banning.  That's moving the goalposts, and that's what keeps happening
> in this discussion, at least as far as I can see.  I, on the other
> hand, have given you my actual problem with your proposal, and I have
> not moved *my* goalposts.  Instead, I've moved towards your position,
> more than once.  But I've moved as far towards it as I can go at this
> time, without you providing any additional evidence or explanation or
> *some* kind of engagement with the points that I've raised above that
> you've previously ignored, in this thread and others.)


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130311/2ca3dd75/attachment.pgp>

From donald at stufft.io  Mon Mar 11 12:18:30 2013
From: donald at stufft.io (Donald Stufft)
Date: Mon, 11 Mar 2013 07:18:30 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <826D31AF-BE1C-4FC3-8FF9-EAC3B7D6EA54@mac.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<CAL0kPAUELqLo+__GJD0NF0j_dM8zzeDa-wPxjedyKCO-sqLcqQ@mail.gmail.com>
	<212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com>
	<CAL0kPAUt7NKex9i9rm-0K-haYxemCJdjPZmz1pH7z6ZpM2UyGg@mail.gmail.com>
	<826D31AF-BE1C-4FC3-8FF9-EAC3B7D6EA54@mac.com>
Message-ID: <259AAEEB-CCAB-438A-9A64-AFF2450AA7DF@stufft.io>


On Mar 11, 2013, at 4:33 AM, Ronald Oussoren <ronaldoussoren at mac.com> wrote:

> 
> On 11 Mar, 2013, at 9:18, Lennart Regebro <regebro at gmail.com> wrote:
> 
>> On Mon, Mar 11, 2013 at 9:06 AM, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
>>> But this isn't necessarily true, there is another solution: mirror your requirements locally.
>> 
>> I do that. This is not a solution, because your requirements yesterday
>> is not your requirements tomorrow.
> 
> So? When your requirements change you change the local mirror.
> 
>> 
>>> Is it even clear why numerous archives aren't hosted on PyPI?
>> 
>> No, the only one that has mentioned why is Marc-Andr?, I think, whose
>> eGenix packages are distributed as binary packages for loads of
>> different platforms. It's unclear to me if all these binary packages
>> should be uploaded to PyPI, and it is also unclear to me why they
>> can't be, it seems to be mostly a case of it being too much work.
>> 
>> He also mentioned the big Python distributions eGenix does as being
>> too large for PyPI, but I don't really see the point of uploading
>> Python distributions to PyPI, they can't be installed with Python
>> installers anyway.
> 
> Some reasons I've seen mentioned in the past:
> 
> * In some big companies it might be easier to publish archives on the company webserver than on PyPI due to truckloads of red tape on their part (not something we can fix)

Publish your own PyPI. It's easy to do. You can even list your project on PyPI with instructions on how to add your company wide PyPI to someones deployment process. People just won't be able to automatically install from PyPI your software.

> 
> * It is easier to publish all related archives in the same place for projects where the python package is just one component (for example client libraries for a network server)

If it's too hard for the hypothetical you to push just the Python parts to PyPI then same answer as above.

> 
> * Authors might not know it is possible to upload archives to PyPI
> 
>> 
>>> IMHO it would be better to remove barriers than force projects to host files on PyPI.
>> 
>> Nobody has really been able to point out any real barriers, so we
>> don't know what they are or if they exist.
> 
> It may be as simple as lack of knowledge (e.g. "I didn't know I could host files on PyPI"),
> or unnecessary friction in the release proces. 
> 
> I guess the only way we will know why some authors don't upload archives to
> PyPI is to ask (some of) them.   
> 
> Ronald
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130311/c7107b14/attachment.pgp>

From donald at stufft.io  Mon Mar 11 12:32:25 2013
From: donald at stufft.io (Donald Stufft)
Date: Mon, 11 Mar 2013 07:32:25 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <513DA277.2010809@egenix.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<CAL0kPAUELqLo+__GJD0NF0j_dM8zzeDa-wPxjedyKCO-sqLcqQ@mail.gmail.com>
	<212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com>
	<CAL0kPAUt7NKex9i9rm-0K-haYxemCJdjPZmz1pH7z6ZpM2UyGg@mail.gmail.com>
	<513DA277.2010809@egenix.com>
Message-ID: <E50D827F-E341-4DA2-8977-B9A52A3EB43F@stufft.io>


On Mar 11, 2013, at 5:23 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:

> On 11.03.2013 09:18, Lennart Regebro wrote:
>> On Mon, Mar 11, 2013 at 9:06 AM, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
>>> But this isn't necessarily true, there is another solution: mirror your requirements locally.
>> 
>> I do that. This is not a solution, because your requirements yesterday
>> is not your requirements tomorrow.
>> 
>>> Is it even clear why numerous archives aren't hosted on PyPI?
>> 
>> No, the only one that has mentioned why is Marc-Andr?, I think, whose
>> eGenix packages are distributed as binary packages for loads of
>> different platforms. It's unclear to me if all these binary packages
>> should be uploaded to PyPI, and it is also unclear to me why they
>> can't be, it seems to be mostly a case of it being too much work.
> 
> I've listed all the reasons in one of the previous emails:
> 
> http://mail.python.org/pipermail/catalog-sig/2013-March/005502.html
> 
> Others will likely have additional reasons, like e.g.
> 
> * the PyPI uploads not being compatible to their release process

I've offered to donate my free time to anyone whose release process actually dictates they can't upload to PyPI to move to one that is as close as possible to their current solution while also not requiring external hosting.

> 
> * not knowing that it's possible to host files on PyPI - after
>  all it's an *index*, not a repository :-)

I know your joking but if this is an actual limiting factor my next proposal will be to change the name >:].

> 
> * still believing that PyPI is an unreliable hosting provider
>  due the many downtimes and problems it had in the past - which
>  is no longer true today
> 
> * not wanting to host and maintain files in several different
>  places

Publish their own repository, put instructions on their PyPI page how to add it to a potential users deploy.

> 
> * not wanting to host release files at all, i.e. have people
>  check out the version from a repository instead of doing
>  the download, unzip, install dance

So don't? Include instructions. No one's proposal prevents people from listing projects on PyPI that aren't hosted there. It just means that if you don't want to host your things on PyPI you'll need to provide instructions for getting your files.

> 
> * not wanting to separate associated library or product
>  code from the Python wrapper code (think e.g. the
>  Python interface for subversion)

Same answer as before, either separate it or provide instructions on your PyPI index page.

> 
> * not being allowed to upload files to external servers
>  by company policy, or having to deal with a company
>  policy that makes this difficult/unattractive

Again with include instructions in the PyPI description.

> 
> * having issues with the added latency of PyPI downloads compared
>  to a simple file based index hosted on a company web server

This seems backwards. If they are upset with the latency why aren't they just installing directly from the index on the company web server? Why are they hitting PyPI at all?

> 
> * having a strong need to know the number of downloads per
>  package and associated statistics such as downloads per
>  country, per year/month/day/hour

Daily stats are published per filename. Doesn't include breakdowns per country though. I will fight for any statistic people actually want that doesn't expose sensitive information. (No IP addresses etc. Countries are fine etc.).

> 
> * not wanting to give up access to the download log files

That runs counter to privacy concerns. If this is an actual blocker then I suggest they run their own index again.

> 
> * having a requirement to restrict downloads on a per country
>  basis, e.g. for export controlled software or software which
>  may not be imported/used in certain countries

Don't host the files on PyPI, publish instructions for installing your software on PyPI.

> 
> * having PyPI not provide the technical means to host the
>  release files, e.g. due to the releases using a format
>  which is not supported by PyPI (e.g. all the ActiveState
>  packages - http://code.activestate.com/pypm/)

Open a discussion here about including your format, open a ticket tracker about including your format, submit a PR about including your format, host your own repository if it makes sense for your format (See active state again).

> 
> * user experience/support issues:
>  if the package has external dependencies,
>  or needs special setup, it may provide a better user experience
>  to host the Python wrapper on the same page as the dependencies
>  and instructions on how to install them; rather than having
>  them on PyPI which lets people believe that a simple
>  "pip install something" will get them a working setup

So don't host the files on PyPI, include your instructions on PyPI.

> 
> Those are just a few things that come to mind. I'm sure there
> are more issues that keep authors from uploading their
> packages to PyPI.
> 
> Overall, I think we should encourage people to make their
> code available through PyPI and make it attractive to them,
> but keep the possibility to continue using external hosting
> platforms, should they run into issues that PyPI cannot
> solve for them.

This is a nice thought, but it doesn't work in practice because of the "Weakest Link Theory". Basically you're only as strong as your weakest link. The weakest link is any external package.

> 
>> He also mentioned the big Python distributions eGenix does as being
>> too large for PyPI, but I don't really see the point of uploading
>> Python distributions to PyPI, they can't be installed with Python
>> installers anyway.
> 
> Not sure what you mean here.
> 
> PyPI is also used to index Python projects which are not Python
> packages to be installed by pip/easy_install/etc.

That's fine. No one's saying you can't list a package on PyPI that doesn't include files. Just the external links won't be available on /simple/.

> 
> Some of those may also want to
> 
>>> IMHO it would be better to remove barriers than force projects to host files on PyPI.
>> 
>> Nobody has really been able to point out any real barriers, so we
>> don't know what they are or if they exist.
> 
> Again, please see the email where I listed the ones affecting
> at least eGenix.
> 
> Most of those can be addressed in one way or another, e.g.
> by having PyPI cache the files, provide access to the download
> counts by country, provide a way to host separate indexes for
> UCS2/UCS4 egg files, etc.
> 
> The only issues that need more investigation are the PyPI license
> terms and the general issue of not being able to host export
> regulated files on PyPI.
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 11 2013)
>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130311/cec5d687/attachment-0001.pgp>

From jnoller at gmail.com  Mon Mar 11 12:33:18 2013
From: jnoller at gmail.com (Jesse Noller)
Date: Mon, 11 Mar 2013 07:33:18 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
Message-ID: <E292EC07-AE58-4061-9244-2C9FC1AF71E9@gmail.com>

Couldn't have said it better Donald. +1

On Mar 11, 2013, at 7:14 AM, Donald Stufft <donald at stufft.io> wrote:

> 
> On Mar 11, 2013, at 2:09 AM, PJ Eby <pje at telecommunity.com> wrote:
> 
>> On Sun, Mar 10, 2013 at 8:25 PM, Donald Stufft <donald at stufft.io> wrote:
>>> I don't think anyone is bad here, nor am I arguing against any particular person or group of people. I'm arguing against a practice and a system. You're going out of your way to find excuses to throw all sorts of stop energy here.
>> 
>> Calling a legitimate disagreement with your point of view "stop
>> energy" seems inappropriate to me, since my issue is with you
>> derailing the topic of how to get people to *voluntarily* migrate to a
>> better situation than the present one, and to develop tools for that
>> process.  The only thing I wish you to stop is the repeated assertion
>> without proof that 1) external links must go *and* 2) this must be an
>> enforced directive rather than a (highly-encouraged) option.
> 
> 1) Proof of what? That it's insecure? That it harms uptime? That it violates people's privacy? I don't understand what you want here, do you want me to go and find insecure hosts and start boosting malware onto peoples machines?
> 2) Even a single project remaining causes the entire thing to cascade, Weakest Link Theory.
> 
>> 
>> I have even gone so far as to suggest, earlier in this thread, what
>> evidence I would find at least suggestive of your POV.  But your
>> response to that and prior challenges to those assertions, has been
>> simply to move your goalpost.  E.g. from "current uptime is bad" to
>> "any uptime lower than PyPI's is totally unacceptable".
> 
> I outlined all 3 of the major reasons in my very first email. I've never changed them.
> 
>> 
>> I, on the other hand, have moved in the direction of *your* proposals
>> repeatedly, making adjustments as I find actually-convincing evidence
>> and/or reasoning, or find ways to deal with the issues.  I have
>> compromised quite a bit.  (And have already spent a fair amount of
>> time writing setuptools code to lay a foundation for these changes.)
>> 
>> You, as far as I can tell, have not moved your position in the slightest.
>> 
>> Which of these is "stop energy"?
> 
> I've not been willing to compromise because none of the solutions presented solves all the actual issues. They just rearrange deck chairs on the titanic.
> 
>> 
>> It is not the case that external links must be removed from PyPI in
>> order to ensure security, or uptime.  And it is *especially* not the
>> case that you are the BDFL of uptime.  You're definitely not the BDFL
>> of uptime for any given project hosted on PyPI, that you *voluntarily
>> choose* to make a part of your build process.  If your primary
>> argument is that project X must host its files on PyPI because of your
>> build process, then I think you misunderstand open source, and also
>> the part where you *chose* to make it part of your build process.  It
>> certainly doesn't give you the right to force projects Y, Z, and Q --
>> that you don't even use! -- to also host their projects on PyPI,
>> because project X -- the one you do use -- has a slow or unreliable
>> file host!
>> 
>> It seems disingenuous to then shfit the argument back to security when
>> challenged on uptime, and back to uptime when challenged on security.
>> We've looped back and forth over those for some time: when I point out
>> that wheels have signatures which will make off-site hosting
>> relatively unimportant to the security picture, you jump back to
>> talking about uptime.  When I point out that uptime is a consensual
>> factor that in no way justifies legislating what other people can do
>> with their projects, you go back to talking about security.
>> 
>> Make up your mind.  What problem are you actually trying to solve?
> 
> All of them, as outlined in my original email.
> 
>> 
>> (I expect your response on wheels to be that wheels aren't there yet,
>> etc., but that isn't actually a response to the objection unless
>> you're going to change your position to, "okay, external links to file
>> formats that can be signed can stay," or something of that sort.
>> Otherwise, you're not actually compromising, just using the fact that
>> wheels aren't in common use yet as an argument to keep the position
>> you started with.)
> 
> Signed releases solve 1/3 of the original issues and bring with them their own. How do you transmit the signatures? How do you decide which signatures are valid for any given file? There's a pretty complicated system written called TUF which handles some of these issues (but again it only solves 1/3 of them) and until we get that transmission of the signatures in a sane way is unlikely.
> 
>> 
>> 
>>> My analogy served only to put into light that the system that I'm trying to change is insecure, just like allowing anyone to walk into a bank vault and pick up money would be insecure. I fully believe that the people using such a system are completely trustworthy people. But just because *they* are trustworthy doesn't mean that a system which allows *anyone* to attack other Python developers is *ok*.
>> 
>> And my analogy served only to put into light the part where you're
>> insisting that one group of people change for the benefit of a group
>> which is already benefiting from their pre-existing generosity.
>> 
>> That being said, I do see that I could have misinterpreted the intent
>> of your analogy -- it sounded like you were saying that the developers
>> who host off-PyPI were thieves walking into your bank and taking your
>> money (i.e., analogizing theft with inconveniencing you by making your
>> builds fail or run slowly).
>> 
>> Though to be honest, I still don't comprehend how else to make any
>> kind of sense to that analogy in its original context.  Who is the
>> bank?  Whose money is being taken?  The whole thing is utterly
>> confusing to me if I try to take it any other way than the way I did,
>> because it doesn't seem to have any other simple 1:1 mapping to the
>> situation, as far as I can see.   Your explanation seems terribly
>> abstract and tortured to me, as far as analogies go.
> 
> Bank == PyPI, People insisting that the bank vault remain open so they can walk in and grab their own money because it's easier == folks arguing for the existing solution because they don't want to change their release process. Combined this leaves the bank (and in the actual situation, PyPI) open to a number of issues.
> 
>> 
>> 
>>> When discussing security of a system it's necessary to divorce yourself from the implementations of things. When you get wrapped up in the implementation you turn things into a Us vs Them game (as evidenced by several of your messages) instead of discussing the merits of the various systems and which ones serve the greatest needs of the community the best.
>> 
>> I think you've got things backwards here.  It's you who's been arguing
>> that the solution to the problem of "improved uptime and security" is
>> best implemented by "ban all non-PyPI hosting".  It is I who has been
>> arguing that this is a premature judgment and rush to implementation,
>> without considering all of the design angles.  And I am the one asking
>> you to stop insisting on this one implementation and state your
>> *actual* problem with external links.
> 
> Read my first email. Security, uptime, privacy. Note security isn't just about changing out files either, there's a whole host of possible problems most of them documented here: https://www.updateframework.com/wiki/Docs/Security . It's true that his won't solve all of those issues immediately but it moves us to a position where we can start trying.
> 
>> 
>> (By which I mean, a problem stated such that, if you're given a
>> solution that *doesn't* involve banning them from PyPI, you aren't
>> going to rejigger the problem statement so that it once again requires
>> banning.  That's moving the goalposts, and that's what keeps happening
>> in this discussion, at least as far as I can see.  I, on the other
>> hand, have given you my actual problem with your proposal, and I have
>> not moved *my* goalposts.  Instead, I've moved towards your position,
>> more than once.  But I've moved as far towards it as I can go at this
>> time, without you providing any additional evidence or explanation or
>> *some* kind of engagement with the points that I've raised above that
>> you've previously ignored, in this thread and others.)
> 
> 
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

From ncoghlan at gmail.com  Mon Mar 11 12:55:38 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 11 Mar 2013 21:55:38 +1000
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <E50D827F-E341-4DA2-8977-B9A52A3EB43F@stufft.io>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<CAL0kPAUELqLo+__GJD0NF0j_dM8zzeDa-wPxjedyKCO-sqLcqQ@mail.gmail.com>
	<212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com>
	<CAL0kPAUt7NKex9i9rm-0K-haYxemCJdjPZmz1pH7z6ZpM2UyGg@mail.gmail.com>
	<513DA277.2010809@egenix.com>
	<E50D827F-E341-4DA2-8977-B9A52A3EB43F@stufft.io>
Message-ID: <CADiSq7cWZKtJLpKCGwK9Qxzoff+gGMrk89v7GNONJ3QL9nvX1A@mail.gmail.com>

On Mon, Mar 11, 2013 at 9:32 PM, Donald Stufft <donald at stufft.io> wrote:
> I know your joking but if this is an actual limiting factor my next proposal will be to change the name >:].

PyPR would not only be more accurate, it would actually get rid of the
confusion with PyPy. We'd get a new pronunciation argument
(Pie-pee-arr vs Pie-per) to corresponding with the existing one,
though (Pie-pee-eye vs Pie-pie)

Hell, the next generation of PyPI is going to have a different enough
architecture for metadata distribution that a name change may be
entirely appropriate :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From holger at merlinux.eu  Mon Mar 11 12:57:57 2013
From: holger at merlinux.eu (holger krekel)
Date: Mon, 11 Mar 2013 11:57:57 +0000
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <20130311100225.GL9677@merlinux.eu>
References: <20130310150740.GE9677@merlinux.eu>
	<CALeMXf5b41ppcRz6EFdGXuZNknw58bZ8M6miDLz-ac5DzHgRRw@mail.gmail.com>
	<20130311100225.GL9677@merlinux.eu>
Message-ID: <20130311115757.GP9677@merlinux.eu>

Hi again,

A correction on one point of my last mail to you,

On Mon, Mar 11, 2013 at 10:02 +0000, holger krekel wrote:
> > My suggestion would be to do two things:
> > 
> > First, make the state a boolean: crawl external links, with the
> > current state yes and the future state no, with "no" simply meaning
> > that the rel="" attribute is removed from the links that currently
> > have it.
> > 
> > Second, propose to offer tools in the PyPI interface (and command
> > line) to assist authors in making the transition, rather than
> > proposing a completely unspecified caching mechanism.  Better to have
> > some vaguely specified tools than a completely unspecified caching
> > mechanism, and better still to spell out very precisely what those
> > tools do.
> 
> This structure makes sense to me except that i see the need to start off with
> "pypi-ext", i.e. a third state which encodes the current behaviour.

Wait, your suggestion of a boolean "crawl external" set to yes
would encode the current behaviour, so my "except" is invalid.

> Thing is that the pypi.python.org doesn't have an extensive test 
> suite and we will thus need to rely on a few early adopters 
> using the tools/state-changes before starting phase 2 (mass mailings etc.).
> Also in case of problems we can always switch back packages to the safe
> "pypi-ext" mode.  IOW, the motiviation for this third state is considering
> the actual implementation process.

This can also be done with your two-state suggestion (switching back 
to crawl=yes).  So no disagreement on that either.

best,
holger

From regebro at gmail.com  Mon Mar 11 13:33:07 2013
From: regebro at gmail.com (Lennart Regebro)
Date: Mon, 11 Mar 2013 13:33:07 +0100
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CADiSq7cWZKtJLpKCGwK9Qxzoff+gGMrk89v7GNONJ3QL9nvX1A@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<CAL0kPAUELqLo+__GJD0NF0j_dM8zzeDa-wPxjedyKCO-sqLcqQ@mail.gmail.com>
	<212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com>
	<CAL0kPAUt7NKex9i9rm-0K-haYxemCJdjPZmz1pH7z6ZpM2UyGg@mail.gmail.com>
	<513DA277.2010809@egenix.com>
	<E50D827F-E341-4DA2-8977-B9A52A3EB43F@stufft.io>
	<CADiSq7cWZKtJLpKCGwK9Qxzoff+gGMrk89v7GNONJ3QL9nvX1A@mail.gmail.com>
Message-ID: <CAL0kPAWSQodDdZOexqViCJfvuK9ptDgQkVN9pPm=gB5gTXfiRQ@mail.gmail.com>

On Mon, Mar 11, 2013 at 12:55 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Mon, Mar 11, 2013 at 9:32 PM, Donald Stufft <donald at stufft.io> wrote:
>> I know your joking but if this is an actual limiting factor my next proposal will be to change the name >:].
>
> PyPR would not only be more accurate, it would actually get rid of the
> confusion with PyPy. We'd get a new pronunciation argument
> (Pie-pee-arr vs Pie-per) to corresponding with the existing one,
> though (Pie-pee-eye vs Pie-pie)

Hey! Are you a piper that is trying to lure us poor rats away from the
cheeseshop? :-)

//Lennart

From rasky at develer.com  Mon Mar 11 14:34:48 2013
From: rasky at develer.com (Giovanni Bajo)
Date: Mon, 11 Mar 2013 14:34:48 +0100
Subject: [Catalog-sig] PyPI/pip security: waiting for input
Message-ID: <5BB62E84-97E1-4C35-97D5-8F52095A348B@develer.com>

Hi Justin,

just a quick reminder that we are still waiting for you guys to move over and start actually doing something. Are you going to draft a document on how exactly we can use TUF within the context of pip + PyPI, with all the different concerns and thread models handled in my document?

Thanks!
-- 
Giovanni Bajo   ::  rasky at develer.com
Develer S.r.l.  ::  http://www.develer.com

My Blog: http://giovanni.bajo.it





-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4346 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130311/0f283f08/attachment-0001.bin>

From dholth at gmail.com  Mon Mar 11 15:06:34 2013
From: dholth at gmail.com (Daniel Holth)
Date: Mon, 11 Mar 2013 10:06:34 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CAL0kPAWSQodDdZOexqViCJfvuK9ptDgQkVN9pPm=gB5gTXfiRQ@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<CAL0kPAUELqLo+__GJD0NF0j_dM8zzeDa-wPxjedyKCO-sqLcqQ@mail.gmail.com>
	<212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com>
	<CAL0kPAUt7NKex9i9rm-0K-haYxemCJdjPZmz1pH7z6ZpM2UyGg@mail.gmail.com>
	<513DA277.2010809@egenix.com>
	<E50D827F-E341-4DA2-8977-B9A52A3EB43F@stufft.io>
	<CADiSq7cWZKtJLpKCGwK9Qxzoff+gGMrk89v7GNONJ3QL9nvX1A@mail.gmail.com>
	<CAL0kPAWSQodDdZOexqViCJfvuK9ptDgQkVN9pPm=gB5gTXfiRQ@mail.gmail.com>
Message-ID: <CAG8k2+6HP5E5RxeGKwHbFoVUG9Bxq402O9=4dyPaJ=k_ZfyAOA@mail.gmail.com>

It will probably wind up working more like every other package manager
I'm familiar with, where you have a "sources.d" that lists the
repositories you would like to search. Use Plone, add their repository
to the list.

We also seem to be making good progress on "contact the central
repository much less often" by keeping local copies of the packages
you actually need. The most frustrating thing about pypi being down
was that you already had a virtualenv with all the packages you
actually needed, but maybe you couldn't re-install them elsewhere
without contacting pypi again.

Wheel signatures are handy because they travel with the archive but
the eventual security system will probably look very different, at
most taking advantage of the feature when available but doing
something else for sdists. The trust chain is the tricky part.

From jcappos at poly.edu  Mon Mar 11 15:17:36 2013
From: jcappos at poly.edu (Justin Cappos)
Date: Mon, 11 Mar 2013 10:17:36 -0400
Subject: [Catalog-sig] PyPI/pip security: waiting for input
In-Reply-To: <5BB62E84-97E1-4C35-97D5-8F52095A348B@develer.com>
References: <5BB62E84-97E1-4C35-97D5-8F52095A348B@develer.com>
Message-ID: <CAMVss_r1fAm7ej9A9ccfr1f_oLhhWzcv=wZ4gP9o=AtTsshU0A@mail.gmail.com>

Yes, we're finishing this up now.   We have a working demo with TUF signing
PyPI metadata and pip (integrated with TUF) correctly checking signatures,
etc.

Trishank: when do you plan to share this?   Does Kon still have some
integration tests to write to show we meet the use cases from Giovanni's
document?

Thanks,
Justin


On Mon, Mar 11, 2013 at 9:34 AM, Giovanni Bajo <rasky at develer.com> wrote:

> Hi Justin,
>
> just a quick reminder that we are still waiting for you guys to move over
> and start actually doing something. Are you going to draft a document on
> how exactly we can use TUF within the context of pip + PyPI, with all the
> different concerns and thread models handled in my document?
>
> Thanks!
> --
> Giovanni Bajo   ::  rasky at develer.com
> Develer S.r.l.  ::  http://www.develer.com
>
> My Blog: http://giovanni.bajo.it
>
>
>
>
>
>
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130311/e6e40449/attachment.html>

From rasky at develer.com  Mon Mar 11 15:31:00 2013
From: rasky at develer.com (Giovanni Bajo)
Date: Mon, 11 Mar 2013 15:31:00 +0100
Subject: [Catalog-sig] PyPI/pip security: waiting for input
In-Reply-To: <CAMVss_r1fAm7ej9A9ccfr1f_oLhhWzcv=wZ4gP9o=AtTsshU0A@mail.gmail.com>
References: <5BB62E84-97E1-4C35-97D5-8F52095A348B@develer.com>
	<CAMVss_r1fAm7ej9A9ccfr1f_oLhhWzcv=wZ4gP9o=AtTsshU0A@mail.gmail.com>
Message-ID: <80DFF1BA-A6E1-48D6-AB51-FEAE07E20B6A@develer.com>

Il giorno 11/mar/2013, alle ore 15:17, Justin Cappos <jcappos at poly.edu> ha scritto:

> Yes, we're finishing this up now.   We have a working demo with TUF signing PyPI metadata and pip (integrated with TUF) correctly checking signatures, etc.   
> 
> Trishank: when do you plan to share this?   Does Kon still have some integration tests to write to show we meet the use cases from Giovanni's document?


While the code is great, I'm mainly concerned with documenting the workflow and making sure it matches the proposed requirements: how to create a key, how to revoke it, how to use an offline list of authorized keys for installation of packages, etc.

As I mentioned before, my proposal would only take me a few days to prototype (repeating this in case someone thinks that my proposal requires millions of man hours for any reason); I held it off waiting for a discussion with you.

Relink to my proposal:
https://docs.google.com/a/develer.com/document/d/1DgQdDCZY5LiTY5mvfxVVE4MTWiaqIGccK3QCUI8np4k/edit
-- 
Giovanni Bajo   ::  rasky at develer.com
Develer S.r.l.  ::  http://www.develer.com

My Blog: http://giovanni.bajo.it





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130311/49376a1c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4346 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130311/49376a1c/attachment.bin>

From jcappos at poly.edu  Mon Mar 11 15:33:42 2013
From: jcappos at poly.edu (Justin Cappos)
Date: Mon, 11 Mar 2013 10:33:42 -0400
Subject: [Catalog-sig] PyPI/pip security: waiting for input
In-Reply-To: <80DFF1BA-A6E1-48D6-AB51-FEAE07E20B6A@develer.com>
References: <5BB62E84-97E1-4C35-97D5-8F52095A348B@develer.com>
	<CAMVss_r1fAm7ej9A9ccfr1f_oLhhWzcv=wZ4gP9o=AtTsshU0A@mail.gmail.com>
	<80DFF1BA-A6E1-48D6-AB51-FEAE07E20B6A@develer.com>
Message-ID: <CAMVss_qHAAuf3bafCNJP8UQGSYUcaVxqi6n+tpmYR4+1NpNY6w@mail.gmail.com>

Yep, we have the doc mostly together and are finishing it up / polishing
it.

We'll have something to you soon.   We have a lightning talk set up at
PyCon and will post all then at the latest.   We do want to announce /
share before then though.

Justin


On Mon, Mar 11, 2013 at 10:31 AM, Giovanni Bajo <rasky at develer.com> wrote:

> Il giorno 11/mar/2013, alle ore 15:17, Justin Cappos <jcappos at poly.edu>
> ha scritto:
>
> Yes, we're finishing this up now.   We have a working demo with TUF
> signing PyPI metadata and pip (integrated with TUF) correctly checking
> signatures, etc.
>
> Trishank: when do you plan to share this?   Does Kon still have some
> integration tests to write to show we meet the use cases from Giovanni's
> document?
>
>
> While the code is great, I'm mainly concerned with documenting the
> workflow and making sure it matches the proposed requirements: how to
> create a key, how to revoke it, how to use an offline list of authorized
> keys for installation of packages, etc.
>
> As I mentioned before, my proposal would only take me a few days to
> prototype (repeating this in case someone thinks that my proposal requires
> millions of man hours for any reason); I held it off waiting for a
> discussion with you.
>
> Relink to my proposal:
>
> https://docs.google.com/a/develer.com/document/d/1DgQdDCZY5LiTY5mvfxVVE4MTWiaqIGccK3QCUI8np4k/edit
> --
> Giovanni Bajo   ::  rasky at develer.com
> Develer S.r.l.  ::  http://www.develer.com
>
> My Blog: http://giovanni.bajo.it
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130311/39779c15/attachment-0001.html>

From dholth at gmail.com  Mon Mar 11 15:52:46 2013
From: dholth at gmail.com (Daniel Holth)
Date: Mon, 11 Mar 2013 10:52:46 -0400
Subject: [Catalog-sig] PyPI/pip security: waiting for input
In-Reply-To: <80DFF1BA-A6E1-48D6-AB51-FEAE07E20B6A@develer.com>
References: <5BB62E84-97E1-4C35-97D5-8F52095A348B@develer.com>
	<CAMVss_r1fAm7ej9A9ccfr1f_oLhhWzcv=wZ4gP9o=AtTsshU0A@mail.gmail.com>
	<80DFF1BA-A6E1-48D6-AB51-FEAE07E20B6A@develer.com>
Message-ID: <CAG8k2+4FNdCs4mrOPoBnfLeZCdbW-WOk5zs1AKgPQjcV2kr0FA@mail.gmail.com>

Super impressed after reading all the TUF papers and comparing it to
my own feeble proposal, they had addressed a whole bevy of problems
that I hadn't even thought of - infinite-length download attacks,
server-asserted timestamps, quorum signatures, sophisticated trust
delegation, consistency of all the metadata all the time ...

From donald at stufft.io  Mon Mar 11 15:53:38 2013
From: donald at stufft.io (Donald Stufft)
Date: Mon, 11 Mar 2013 10:53:38 -0400
Subject: [Catalog-sig] PyPI/pip security: waiting for input
In-Reply-To: <CAG8k2+4FNdCs4mrOPoBnfLeZCdbW-WOk5zs1AKgPQjcV2kr0FA@mail.gmail.com>
References: <5BB62E84-97E1-4C35-97D5-8F52095A348B@develer.com>
	<CAMVss_r1fAm7ej9A9ccfr1f_oLhhWzcv=wZ4gP9o=AtTsshU0A@mail.gmail.com>
	<80DFF1BA-A6E1-48D6-AB51-FEAE07E20B6A@develer.com>
	<CAG8k2+4FNdCs4mrOPoBnfLeZCdbW-WOk5zs1AKgPQjcV2kr0FA@mail.gmail.com>
Message-ID: <933A6E76-7631-44CB-BD28-C4123C24E2CD@stufft.io>


On Mar 11, 2013, at 10:52 AM, Daniel Holth <dholth at gmail.com> wrote:

> Super impressed after reading all the TUF papers and comparing it to
> my own feeble proposal, they had addressed a whole bevy of problems
> that I hadn't even thought of - infinite-length download attacks,
> server-asserted timestamps, quorum signatures, sophisticated trust
> delegation, consistency of all the metadata all the time ...
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig



Agreed, and they've been very helpful with questions when asked.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130311/46ea59ed/attachment.pgp>

From pje at telecommunity.com  Mon Mar 11 17:12:12 2013
From: pje at telecommunity.com (PJ Eby)
Date: Mon, 11 Mar 2013 12:12:12 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
Message-ID: <CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>

On Mon, Mar 11, 2013 at 7:14 AM, Donald Stufft <donald at stufft.io> wrote:
> 1) Proof of what? That it's insecure? That it harms uptime? That it violates people's privacy?

That any of those things apply to anybody who *isn't using those packages*.

Without this, you are only providing a reason to encourage people to
change, not to force them to do so.


> 2) Even a single project remaining causes the entire thing to cascade

Cascade *how*?  Please explain.

From tseaver at palladion.com  Mon Mar 11 17:21:02 2013
From: tseaver at palladion.com (Tres Seaver)
Date: Mon, 11 Mar 2013 12:21:02 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CAL0kPAUELqLo+__GJD0NF0j_dM8zzeDa-wPxjedyKCO-sqLcqQ@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<CAL0kPAUELqLo+__GJD0NF0j_dM8zzeDa-wPxjedyKCO-sqLcqQ@mail.gmail.com>
Message-ID: <khl0gr$h16$1@ger.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/11/2013 02:23 AM, Lennart Regebro wrote:

> The uptime problem is *only* solvable by minimizing the number of 
> hosts involved. The minimum number of hosts is one. That means we 
> should get all releases onto PyPI.

Uptime for *production* use is a red herring here.  Anybody who needs
uptime should be maintaining their own deployment-specific index, or
paying somebody to do that for them.  Period.  Anybody who needs that
kind of uptime *also* needs insulation from other factors which PyPI
project authors can inject into the equation, regardless of PyPIs uptime
(or any external host).

- - Uploading undocumented backward-incompatible changes in third-dot
  releases.

- - Uploading a new feature release which injects new security
  vulnerabilities (think of the Ruby-YAML stuff).

- - Deleting distributions or releases.

- - Re-uploading a *different* tarball over the top of an existing one
  (wihtout bumping the version).

Not to mention the possibility of uploaded trojans / malware when a
developer loses control of his laptop / keys, etc. to a hostile actor.

PyPI's uptime is primarily important for *development* use cases, not for
deployment / operations, and in those cases convenience, safety, and
community building are as important as uptime (consumers of FLOSS don't
have any SLA with the producers).

At a sprint, for instance, it is obnoxious to have a dependency with
external files on a slow or hanging hsot:  it breaks the repeatability of
builds, as well as damaging the velocity of the sprint.  But the
sprinters do *not* have recourse (other than complaining loudly) for such
cases, where they have chosen to rely on PyPI or the external sites for
quick and convenient discovery of those dependencies, instead of going to
the trouble to create a curated index for their own use.



Tres.
- -- 
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iEYEARECAAYFAlE+BG4ACgkQ+gerLs4ltQ5bpgCgzT12UDoqjsaXTBWS5CYuglkI
n0wAnjl0+b/9RZpaUetSBDPovg9fGY+I
=G56Q
-----END PGP SIGNATURE-----


From regebro at gmail.com  Mon Mar 11 17:45:15 2013
From: regebro at gmail.com (Lennart Regebro)
Date: Mon, 11 Mar 2013 17:45:15 +0100
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
Message-ID: <CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>

On Mon, Mar 11, 2013 at 5:12 PM, PJ Eby <pje at telecommunity.com> wrote:
> On Mon, Mar 11, 2013 at 7:14 AM, Donald Stufft <donald at stufft.io> wrote:
>> 1) Proof of what? That it's insecure? That it harms uptime? That it violates people's privacy?
>
> That any of those things apply to anybody who *isn't using those packages*.

If nobody is using the packages, it does indeed harm no-one.

//Lennart

From tk47 at students.poly.edu  Mon Mar 11 18:09:41 2013
From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy)
Date: Mon, 11 Mar 2013 13:09:41 -0400
Subject: [Catalog-sig] PyPI/pip security: waiting for input
In-Reply-To: <CAMVss_r1fAm7ej9A9ccfr1f_oLhhWzcv=wZ4gP9o=AtTsshU0A@mail.gmail.com>
References: <5BB62E84-97E1-4C35-97D5-8F52095A348B@develer.com>
	<CAMVss_r1fAm7ej9A9ccfr1f_oLhhWzcv=wZ4gP9o=AtTsshU0A@mail.gmail.com>
Message-ID: <513E0FD5.9020000@students.poly.edu>

Hello everyone,

On 3/11/13 10:17 AM, Justin Cappos wrote:
> Yes, we're finishing this up now.   We have a working demo with TUF
> signing PyPI metadata and pip (integrated with TUF) correctly checking
> signatures, etc.

Yes, and we are excited to be sharing this very soon!

> Trishank: when do you plan to share this?   Does Kon still have some
> integration tests to write to show we meet the use cases from Giovanni's
> document?

I have the demo up and running, and I just need to get the documentation 
together. Complicating this is that I have a midterm tomorrow, but I 
should have the basic documentation together by today. Let me get back 
to you then!

Thanks,
Trishank


From pje at telecommunity.com  Mon Mar 11 18:42:29 2013
From: pje at telecommunity.com (PJ Eby)
Date: Mon, 11 Mar 2013 13:42:29 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
Message-ID: <CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>

On Mon, Mar 11, 2013 at 12:45 PM, Lennart Regebro <regebro at gmail.com> wrote:
> On Mon, Mar 11, 2013 at 5:12 PM, PJ Eby <pje at telecommunity.com> wrote:
>> On Mon, Mar 11, 2013 at 7:14 AM, Donald Stufft <donald at stufft.io> wrote:
>>> 1) Proof of what? That it's insecure? That it harms uptime? That it violates people's privacy?
>>
>> That any of those things apply to anybody who *isn't using those packages*.
>
> If nobody is using the packages, it does indeed harm no-one.

Then there is no reason to ban them.

From regebro at gmail.com  Mon Mar 11 18:45:37 2013
From: regebro at gmail.com (Lennart Regebro)
Date: Mon, 11 Mar 2013 18:45:37 +0100
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
Message-ID: <CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>

On Mon, Mar 11, 2013 at 6:42 PM, PJ Eby <pje at telecommunity.com> wrote:
> On Mon, Mar 11, 2013 at 12:45 PM, Lennart Regebro <regebro at gmail.com> wrote:
>> On Mon, Mar 11, 2013 at 5:12 PM, PJ Eby <pje at telecommunity.com> wrote:
>>> On Mon, Mar 11, 2013 at 7:14 AM, Donald Stufft <donald at stufft.io> wrote:
>>>> 1) Proof of what? That it's insecure? That it harms uptime? That it violates people's privacy?
>>>
>>> That any of those things apply to anybody who *isn't using those packages*.
>>
>> If nobody is using the packages, it does indeed harm no-one.
>
> Then there is no reason to ban them.

So, we should not remove the links for external packages until
somebody traverses those links? But as soon as somebody asks for those
links, we should remove them? In fact before we give them the link?

That to me, is indistinguishable from removing the links.

//Lennart

From pje at telecommunity.com  Mon Mar 11 20:57:30 2013
From: pje at telecommunity.com (PJ Eby)
Date: Mon, 11 Mar 2013 15:57:30 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
Message-ID: <CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>

On Mon, Mar 11, 2013 at 1:45 PM, Lennart Regebro <regebro at gmail.com> wrote:
> So, we should not remove the links for external packages until
> somebody traverses those links? But as soon as somebody asks for those
> links, we should remove them? In fact before we give them the link?

I'm saying that if someone objects to the presence of  links they
don't actually use, they are speaking nonsense.  Might as well ask to
ban all packages from PyPI that they don't personally like -- it's the
same request.  Nobody is forcing you to depend on packages that don't
host on PyPI, so there is no point to the censorship.

If you don't use the links, you can't argue that their presence is
causing you harm.

From carl at oddbird.net  Mon Mar 11 21:07:50 2013
From: carl at oddbird.net (Carl Meyer)
Date: Mon, 11 Mar 2013 14:07:50 -0600
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
Message-ID: <513E3996.10203@oddbird.net>

On 03/11/2013 01:57 PM, PJ Eby wrote:
> I'm saying that if someone objects to the presence of  links they
> don't actually use, they are speaking nonsense.  Might as well ask to
> ban all packages from PyPI that they don't personally like -- it's the
> same request.  Nobody is forcing you to depend on packages that don't
> host on PyPI, so there is no point to the censorship.
> 
> If you don't use the links, you can't argue that their presence is
> causing you harm.

You can, of course, argue that the mere presence of those links
(combined with the current behavior of easy_install/pip) is an
"attractive nuisance" that indirectly causes harm to unsuspecting new
users of Python who never even consider the possibility that tools like
easy_install and pip might spider off PyPI to arbitrary websites (a
reasonable assumption based on experience with automatic installation
toolchains and software repositories in other communities). I've talked
to many such users, so there is no question that they exist, and I think
probably in significant numbers.

Carl

From pje at telecommunity.com  Mon Mar 11 22:15:08 2013
From: pje at telecommunity.com (PJ Eby)
Date: Mon, 11 Mar 2013 17:15:08 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <513E3996.10203@oddbird.net>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<513E3996.10203@oddbird.net>
Message-ID: <CALeMXf5H2F24CZhEtWrbsdja2WY0ZuJKUuSwEtsWmWU2SZ9FUQ@mail.gmail.com>

On Mon, Mar 11, 2013 at 4:07 PM, Carl Meyer <carl at oddbird.net> wrote:
> On 03/11/2013 01:57 PM, PJ Eby wrote:
>> I'm saying that if someone objects to the presence of  links they
>> don't actually use, they are speaking nonsense.  Might as well ask to
>> ban all packages from PyPI that they don't personally like -- it's the
>> same request.  Nobody is forcing you to depend on packages that don't
>> host on PyPI, so there is no point to the censorship.
>>
>> If you don't use the links, you can't argue that their presence is
>> causing you harm.
>
> You can, of course, argue that the mere presence of those links
> (combined with the current behavior of easy_install/pip) is an
> "attractive nuisance" that indirectly causes harm to unsuspecting new
> users of Python who never even consider the possibility that tools like
> easy_install and pip might spider off PyPI to arbitrary websites

Which is why I think removing rel="" spidering is a good idea.  In
fact, I'm the one who suggested that.  I also suggested moving to
turning it off by default in future versions of easy_install, adding
warnings, etc.

But that's not the same thing as agreeing that it should be *banned*
for people to publish machine-readable download information on PyPI
for a file that's hosted off-PyPI.  ISTM that Python's "consenting
adults" standard sets a higher bar for banning a feature than it does
for marking it, "here there be dragons" and offering a better
alternative.  Heck, even in Python the language, the mere removal of a
feature in a new version of Python, doesn't stop people from
continuing to use the old one.  Here we're talking about
infrastructure that everybody uses; it's not like there's a PyPI X.1
that people can keep using if X.2 comes out.

From donald at stufft.io  Mon Mar 11 22:26:39 2013
From: donald at stufft.io (Donald Stufft)
Date: Mon, 11 Mar 2013 17:26:39 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <513E3996.10203@oddbird.net>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<513E3996.10203@oddbird.net>
Message-ID: <51132ED7-CBD5-441A-838D-D83B50A3C983@stufft.io>

On Mar 11, 2013, at 4:07 PM, Carl Meyer <carl at oddbird.net> wrote:

> On 03/11/2013 01:57 PM, PJ Eby wrote:
>> I'm saying that if someone objects to the presence of  links they
>> don't actually use, they are speaking nonsense.  Might as well ask to
>> ban all packages from PyPI that they don't personally like -- it's the
>> same request.  Nobody is forcing you to depend on packages that don't
>> host on PyPI, so there is no point to the censorship.
>> 
>> If you don't use the links, you can't argue that their presence is
>> causing you harm.
> 
> You can, of course, argue that the mere presence of those links
> (combined with the current behavior of easy_install/pip) is an
> "attractive nuisance" that indirectly causes harm to unsuspecting new
> users of Python who never even consider the possibility that tools like
> easy_install and pip might spider off PyPI to arbitrary websites (a
> reasonable assumption based on experience with automatic installation
> toolchains and software repositories in other communities). I've talked
> to many such users, so there is no question that they exist, and I think
> probably in significant numbers.
> 
> Carl
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

Since it was asked I had ran a script to see which projects/versions that my earlier script had identified as not being hosted on PyPI to determine _where_ people are hosting these files. These statistics include dev releases.

There are 10538 total external file links that pip locates that do not exist on PyPI.

Of these here is the top 20:

    (u'downloads.tryton.org', 1201),
    (u'github.com', 811),
    (u'bitbucket.org', 428),
    (u'launchpad.net', 279),
    (u'www.doughellmann.com', 255),
    (u'walco.n--tree.net', 161),
    (u'prdownloads.sourceforge.net', 156),
    (u'infrae.com', 150),
    (u'downloads.sourceforge.net', 139),
    (u'keepnote.org', 138),
    (u'downloads.reviewboard.org', 124),
    (u'tilestache.org', 121),
    (u'mercurial.selenic.com', 120),
    (u'www.defuze.org', 85),
    (u'www.vicbioinformatics.com', 74),
    (u'downloads.review-board.org', 70),
    (u'samba.org', 70),
    (u'python-graph.googlecode.com', 67),
    (u'cyberelk.net', 65),
    (u'tuohela.net', 61),

I suspect that a lot of the github, bitbucket etc links are dev links (of which there are roughly 420 total).

Here is the complete listing: https://gist.github.com/dstufft/5137885

I ran a minor bit of heuristics to see how many were not hosted in one of the big name hosting sites,

>>> sum([x[1] for x in b if not "github.com" in x[0] and "bitbucket.org" not in x[0] and "google" not in x[0] and "sourceforge" not in x[0]])
7097

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130311/ced719d0/attachment-0001.pgp>

From tk47 at students.poly.edu  Mon Mar 11 23:18:37 2013
From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy)
Date: Mon, 11 Mar 2013 18:18:37 -0400
Subject: [Catalog-sig] PyPI/pip security: waiting for input
In-Reply-To: <513E0FD5.9020000@students.poly.edu>
References: <5BB62E84-97E1-4C35-97D5-8F52095A348B@develer.com>
	<CAMVss_r1fAm7ej9A9ccfr1f_oLhhWzcv=wZ4gP9o=AtTsshU0A@mail.gmail.com>
	<513E0FD5.9020000@students.poly.edu>
Message-ID: <513E583D.4080306@students.poly.edu>

On 03/11/2013 01:09 PM, Trishank Karthik Kuppusamy wrote:
>
> I have the demo up and running, and I just need to get the documentation
> together. Complicating this is that I have a midterm tomorrow, but I
> should have the basic documentation together by today. Let me get back
> to you then!

We are working on the documentation and integration tests, and barring 
unexpected circumstances, we hope to show you a well-documented demo of 
PyPI + TUF + pip tomorrow.

Actually, many pieces of the documentation and tests are already online, 
but we want to glue them all together and complete the missing pieces 
before showing them to you.

We thank you for your patience and continued interest.

-Trishank


From pje at telecommunity.com  Tue Mar 12 00:04:27 2013
From: pje at telecommunity.com (PJ Eby)
Date: Mon, 11 Mar 2013 19:04:27 -0400
Subject: [Catalog-sig] A 90% Solution
Message-ID: <CALeMXf4admmZ5sVy2Hhd-dCcrVx9HqHf4_kbWtxkWtMorYitLQ@mail.gmail.com>

Just a thought, but...

If 90% of PyPI projects do not have any external files to download,
then, wouldn't it make sense to:

1. Add a project-level option to enable or disable the adding of the
rel="" attribute to /simple links (but not affecting the links in any
other way)
2. Default it to disabled for new projects, and
3. Set it to disabled *now* for the 90% of projects that *don't have
external files*?

If the arguments about banning external links are as valid and
important as some people claim, wouldn't it make sense to do this part
*now*, without first requiring a commitment to force the switch to a
disabled state in the future?

Immediately, 90% of the problem goes away - no random spidering of
stuff that doesn't contain a link now, but which could be taken over
by a malicious party in the future, and 90% fewer sites having to be
up in order for you to build something from PyPI.

Seems like a serious win to me -- and one that might not even need a PEP.

Next steps after this would be providing tools to help people move
their files and links, promoting that people switch it off if they no
longer support the offsite links, educating about security concerns,
etc.

I really don't understand why the 90% solution isn't *already* the
consensus position, since it doesn't preclude follow-on efforts
towards reducing the 10% towards 0%.

And if the problem is so important, why must we keep 90% of the
problems in place, just so we can keep arguing about censoring the
10%?  That doesn't make sense to me.

To me, if somebody's injured, the first thing you do is clean and
close the wound, not argue about whether it's a complete solution and
what might happen days or weeks later.

Just a thought.

From donald at stufft.io  Tue Mar 12 00:39:50 2013
From: donald at stufft.io (Donald Stufft)
Date: Mon, 11 Mar 2013 19:39:50 -0400
Subject: [Catalog-sig] A 90% Solution
In-Reply-To: <CALeMXf4admmZ5sVy2Hhd-dCcrVx9HqHf4_kbWtxkWtMorYitLQ@mail.gmail.com>
References: <CALeMXf4admmZ5sVy2Hhd-dCcrVx9HqHf4_kbWtxkWtMorYitLQ@mail.gmail.com>
Message-ID: <CF975217-751D-408A-A4FA-69F8E3BAEF9F@stufft.io>


On Mar 11, 2013, at 7:04 PM, PJ Eby <pje at telecommunity.com> wrote:

> Just a thought, but...
> 
> If 90% of PyPI projects do not have any external files to download,
> then, wouldn't it make sense to:

To be accurate it's 90% don't have any files/release available *only* externally. Most have external  files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't.

> 
> 1. Add a project-level option to enable or disable the adding of the
> rel="" attribute to /simple links (but not affecting the links in any
> other way)
> 2. Default it to disabled for new projects, and
> 3. Set it to disabled *now* for the 90% of projects that *don't have
> external files*?

+1 except 1. should be to remove the links entirely from the /simple/
index, not to just remove the rel attribute.

> 
> If the arguments about banning external links are as valid and
> important as some people claim, wouldn't it make sense to do this part
> *now*, without first requiring a commitment to force the switch to a
> disabled state in the future?
> 
> Immediately, 90% of the problem goes away - no random spidering of
> stuff that doesn't contain a link now, but which could be taken over
> by a malicious party in the future, and 90% fewer sites having to be
> up in order for you to build something from PyPI.
> 
> Seems like a serious win to me -- and one that might not even need a PEP.

Absolutely, and similar to something I asked Richard at the start of this, I'm waiting on an OK from someone with authority that they'd merge such a change and I'll have a PR out for it asap after that.

> 
> Next steps after this would be providing tools to help people move
> their files and links, promoting that people switch it off if they no
> longer support the offsite links, educating about security concerns,
> etc.
> 
> I really don't understand why the 90% solution isn't *already* the
> consensus position, since it doesn't preclude follow-on efforts
> towards reducing the 10% towards 0%.
> 
> And if the problem is so important, why must we keep 90% of the
> problems in place, just so we can keep arguing about censoring the
> 10%?  That doesn't make sense to me.
> 
> To me, if somebody's injured, the first thing you do is clean and
> close the wound, not argue about whether it's a complete solution and
> what might happen days or weeks later.

Like I said above, I'm just waiting on an ok that this has a chance of landing before bothering to implement it.

> 
> Just a thought.
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130311/030cbce0/attachment.pgp>

From ncoghlan at gmail.com  Tue Mar 12 00:50:52 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 12 Mar 2013 09:50:52 +1000
Subject: [Catalog-sig] A 90% Solution
In-Reply-To: <CF975217-751D-408A-A4FA-69F8E3BAEF9F@stufft.io>
References: <CALeMXf4admmZ5sVy2Hhd-dCcrVx9HqHf4_kbWtxkWtMorYitLQ@mail.gmail.com>
	<CF975217-751D-408A-A4FA-69F8E3BAEF9F@stufft.io>
Message-ID: <CADiSq7fpnDR13Jg5GfP--WYoes2fagt-TMuqiNKTBEW=cHZb=w@mail.gmail.com>

Richard's in transit at the moment and I'm about to be, but this sounds
worth doing to me.

I say send the pull request :)

Cheers,
Nick.
On 12 Mar 2013 09:42, "Donald Stufft" <donald at stufft.io> wrote:

>
> On Mar 11, 2013, at 7:04 PM, PJ Eby <pje at telecommunity.com> wrote:
>
> > Just a thought, but...
> >
> > If 90% of PyPI projects do not have any external files to download,
> > then, wouldn't it make sense to:
>
> To be accurate it's 90% don't have any files/release available *only*
> externally. Most have external  files to download because it's very rare
> that a project doesn't include an home_page or a download_url, especially
> since distutils complains if you don't.
>
> >
> > 1. Add a project-level option to enable or disable the adding of the
> > rel="" attribute to /simple links (but not affecting the links in any
> > other way)
> > 2. Default it to disabled for new projects, and
> > 3. Set it to disabled *now* for the 90% of projects that *don't have
> > external files*?
>
> +1 except 1. should be to remove the links entirely from the /simple/
> index, not to just remove the rel attribute.
>
> >
> > If the arguments about banning external links are as valid and
> > important as some people claim, wouldn't it make sense to do this part
> > *now*, without first requiring a commitment to force the switch to a
> > disabled state in the future?
> >
> > Immediately, 90% of the problem goes away - no random spidering of
> > stuff that doesn't contain a link now, but which could be taken over
> > by a malicious party in the future, and 90% fewer sites having to be
> > up in order for you to build something from PyPI.
> >
> > Seems like a serious win to me -- and one that might not even need a PEP.
>
> Absolutely, and similar to something I asked Richard at the start of this,
> I'm waiting on an OK from someone with authority that they'd merge such a
> change and I'll have a PR out for it asap after that.
>
> >
> > Next steps after this would be providing tools to help people move
> > their files and links, promoting that people switch it off if they no
> > longer support the offsite links, educating about security concerns,
> > etc.
> >
> > I really don't understand why the 90% solution isn't *already* the
> > consensus position, since it doesn't preclude follow-on efforts
> > towards reducing the 10% towards 0%.
> >
> > And if the problem is so important, why must we keep 90% of the
> > problems in place, just so we can keep arguing about censoring the
> > 10%?  That doesn't make sense to me.
> >
> > To me, if somebody's injured, the first thing you do is clean and
> > close the wound, not argue about whether it's a complete solution and
> > what might happen days or weeks later.
>
> Like I said above, I'm just waiting on an ok that this has a chance of
> landing before bothering to implement it.
>
> >
> > Just a thought.
> > _______________________________________________
> > Catalog-SIG mailing list
> > Catalog-SIG at python.org
> > http://mail.python.org/mailman/listinfo/catalog-sig
>
>
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372
> DCFA
>
>
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130312/3e661b36/attachment-0001.html>

From pje at telecommunity.com  Tue Mar 12 01:12:09 2013
From: pje at telecommunity.com (PJ Eby)
Date: Mon, 11 Mar 2013 20:12:09 -0400
Subject: [Catalog-sig] A 90% Solution
In-Reply-To: <CF975217-751D-408A-A4FA-69F8E3BAEF9F@stufft.io>
References: <CALeMXf4admmZ5sVy2Hhd-dCcrVx9HqHf4_kbWtxkWtMorYitLQ@mail.gmail.com>
	<CF975217-751D-408A-A4FA-69F8E3BAEF9F@stufft.io>
Message-ID: <CALeMXf55AfwEcdNpaOudvRtFgVHaZNrMA5r8iFjqaP+zjt-6yQ@mail.gmail.com>

On Mon, Mar 11, 2013 at 7:39 PM, Donald Stufft <donald at stufft.io> wrote:
>
> On Mar 11, 2013, at 7:04 PM, PJ Eby <pje at telecommunity.com> wrote:
>
>> Just a thought, but...
>>
>> If 90% of PyPI projects do not have any external files to download,
>> then, wouldn't it make sense to:
>
> To be accurate it's 90% don't have any files/release available *only* externally. Most have external  files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't.

So what is the % of projects for whom the option can be disabled
automatically, *without* disabling automated downloadability of a
project's externally hosted files?

Your statement is confusing to me, because the having of a home page
or download URL doesn't have anything to do with whether that page has
any files to download from it.

I am saying that if a project has no *downloadable* files (not web
pages) whose links can only be found by spidering, then we can turn
off the rel attribute.

How many projects do not have any download links listed on their
rel=""-linked pages?


>> 1. Add a project-level option to enable or disable the adding of the
>> rel="" attribute to /simple links (but not affecting the links in any
>> other way)
>> 2. Default it to disabled for new projects, and
>> 3. Set it to disabled *now* for the 90% of projects that *don't have
>> external files*?
>
> +1 except 1. should be to remove the links entirely from the /simple/
> index, not to just remove the rel attribute.

-1, since sometimes download links are in fact *download links*.  So
this design choice would unncessarily limit the number of projects for
whom the option could be applied automatically and immediately.

That is, a project with a download link of "foobar.com/foobar-1.2.tgz"
would no longer be usable if you removed the download link from the
/simple index, but would remain usable if the rel attribute were
removed.

From donald at stufft.io  Tue Mar 12 01:23:12 2013
From: donald at stufft.io (Donald Stufft)
Date: Mon, 11 Mar 2013 20:23:12 -0400
Subject: [Catalog-sig] A 90% Solution
In-Reply-To: <CALeMXf55AfwEcdNpaOudvRtFgVHaZNrMA5r8iFjqaP+zjt-6yQ@mail.gmail.com>
References: <CALeMXf4admmZ5sVy2Hhd-dCcrVx9HqHf4_kbWtxkWtMorYitLQ@mail.gmail.com>
	<CF975217-751D-408A-A4FA-69F8E3BAEF9F@stufft.io>
	<CALeMXf55AfwEcdNpaOudvRtFgVHaZNrMA5r8iFjqaP+zjt-6yQ@mail.gmail.com>
Message-ID: <7981DD31-7868-4E24-94E2-1F80ACCB46C8@stufft.io>


On Mar 11, 2013, at 8:12 PM, PJ Eby <pje at telecommunity.com> wrote:

> On Mon, Mar 11, 2013 at 7:39 PM, Donald Stufft <donald at stufft.io> wrote:
>> 
>> On Mar 11, 2013, at 7:04 PM, PJ Eby <pje at telecommunity.com> wrote:
>> 
>>> Just a thought, but...
>>> 
>>> If 90% of PyPI projects do not have any external files to download,
>>> then, wouldn't it make sense to:
>> 
>> To be accurate it's 90% don't have any files/release available *only* externally. Most have external  files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't.
> 
> So what is the % of projects for whom the option can be disabled
> automatically, *without* disabling automated downloadability of a
> project's externally hosted files?
> 
> Your statement is confusing to me, because the having of a home page
> or download URL doesn't have anything to do with whether that page has
> any files to download from it.

I didn't differentiate between spidering or direct links to external files. I simply iterated over all files that the pip PackageFinder was able to find, figured out the version for each url, and stored if that version came a link to a pypi.python.org resource or a different domain. I then diffed the two lists to get a list of versions that are _only_ installable externally. That 90% is 90% who can have *all* links what so ever besides ones hosted on PyPI itself removed and not have any versions be no longer installable.

> 
> I am saying that if a project has no *downloadable* files (not web
> pages) whose links can only be found by spidering, then we can turn
> off the rel attribute.
> 
> How many projects do not have any download links listed on their
> rel=""-linked pages?
> 
> 
>>> 1. Add a project-level option to enable or disable the adding of the
>>> rel="" attribute to /simple links (but not affecting the links in any
>>> other way)
>>> 2. Default it to disabled for new projects, and
>>> 3. Set it to disabled *now* for the 90% of projects that *don't have
>>> external files*?
>> 
>> +1 except 1. should be to remove the links entirely from the /simple/
>> index, not to just remove the rel attribute.
> 
> -1, since sometimes download links are in fact *download links*.  So
> this design choice would unncessarily limit the number of projects for
> whom the option could be applied automatically and immediately.
> 
> That is, a project with a download link of "foobar.com/foobar-1.2.tgz"
> would no longer be usable if you removed the download link from the
> /simple index, but would remain usable if the rel attribute were
> removed.


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130311/1fd74cb2/attachment.pgp>

From mal at egenix.com  Tue Mar 12 01:28:30 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 12 Mar 2013 01:28:30 +0100
Subject: [Catalog-sig] A 90% Solution
In-Reply-To: <CF975217-751D-408A-A4FA-69F8E3BAEF9F@stufft.io>
References: <CALeMXf4admmZ5sVy2Hhd-dCcrVx9HqHf4_kbWtxkWtMorYitLQ@mail.gmail.com>
	<CF975217-751D-408A-A4FA-69F8E3BAEF9F@stufft.io>
Message-ID: <513E76AE.10601@egenix.com>

On 12.03.2013 00:39, Donald Stufft wrote:
> 
> On Mar 11, 2013, at 7:04 PM, PJ Eby <pje at telecommunity.com> wrote:
> 
>> Just a thought, but...
>>
>> If 90% of PyPI projects do not have any external files to download,
>> then, wouldn't it make sense to:
> 
> To be accurate it's 90% don't have any files/release available *only* externally. Most have external  files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't.

How are you going to verify that disabling the links
on those projects won't make certain release versions of
those packages unavailable for pip/easy_install ?

How are you planing to inform the package authors of that
change, so that they can take corrective action ?

Which options would be available for authors ?

PyPI is a much too important Python resource to play around with.
We need a good understanding of the effects a change may have
and provide ways to deal with them, before putting a change,
which potentially breaks hundreds of packages, into
production.

So yeah, just a thought ;-)

>> 1. Add a project-level option to enable or disable the adding of the
>> rel="" attribute to /simple links (but not affecting the links in any
>> other way)
>> 2. Default it to disabled for new projects, and
>> 3. Set it to disabled *now* for the 90% of projects that *don't have
>> external files*?
> 
> +1 except 1. should be to remove the links entirely from the /simple/
> index, not to just remove the rel attribute.

Removing those links removes the possibility of tools
to still download or display information based on those
links, e.g. to build a semantic web of Python resources.

Please remember that the /simple/ index is part
of the PyPI API, so it needs to be handled with the same
care as the rest of the PyPI APIs.

If you want to experiment with new ways of building the
index, I'd suggest to first experiment with a new index, say
/simple-v2/, before touching the main /simple/ index.

Regarding the links, it's probably better to not
remove the rel="" attributes but instead change them
from rel="download" to e.g. rel="external-download";
or to keep the old index semantics around as /simple-v1/.
This keeps the valuable semantic relation available for
tools that want to use it.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 12 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From mal at egenix.com  Tue Mar 12 01:32:47 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 12 Mar 2013 01:32:47 +0100
Subject: [Catalog-sig] A 90% Solution
In-Reply-To: <7981DD31-7868-4E24-94E2-1F80ACCB46C8@stufft.io>
References: <CALeMXf4admmZ5sVy2Hhd-dCcrVx9HqHf4_kbWtxkWtMorYitLQ@mail.gmail.com>
	<CF975217-751D-408A-A4FA-69F8E3BAEF9F@stufft.io>
	<CALeMXf55AfwEcdNpaOudvRtFgVHaZNrMA5r8iFjqaP+zjt-6yQ@mail.gmail.com>
	<7981DD31-7868-4E24-94E2-1F80ACCB46C8@stufft.io>
Message-ID: <513E77AF.6050408@egenix.com>

On 12.03.2013 01:23, Donald Stufft wrote:
> 
> On Mar 11, 2013, at 8:12 PM, PJ Eby <pje at telecommunity.com> wrote:
> 
>> On Mon, Mar 11, 2013 at 7:39 PM, Donald Stufft <donald at stufft.io> wrote:
>>>
>>> On Mar 11, 2013, at 7:04 PM, PJ Eby <pje at telecommunity.com> wrote:
>>>
>>>> Just a thought, but...
>>>>
>>>> If 90% of PyPI projects do not have any external files to download,
>>>> then, wouldn't it make sense to:
>>>
>>> To be accurate it's 90% don't have any files/release available *only* externally. Most have external  files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't.
>>
>> So what is the % of projects for whom the option can be disabled
>> automatically, *without* disabling automated downloadability of a
>> project's externally hosted files?
>>
>> Your statement is confusing to me, because the having of a home page
>> or download URL doesn't have anything to do with whether that page has
>> any files to download from it.
> 
> I didn't differentiate between spidering or direct links to external files. I simply iterated over all files that the pip PackageFinder was able to find, figured out the version for each url, and stored if that version came a link to a pypi.python.org resource or a different domain. I then diffed the two lists to get a list of versions that are _only_ installable externally. That 90% is 90% who can have *all* links what so ever besides ones hosted on PyPI itself removed and not have any versions be no longer installable.

Which kinds of distribution files can pip's PackageFinder find ?
Does it find MSIs, EXEs, egg files ?

AFAIK, it only supports .tar.gz and .zip files, but no binary
files (except for the new .whl binary format).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 12 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From pje at telecommunity.com  Tue Mar 12 03:46:15 2013
From: pje at telecommunity.com (PJ Eby)
Date: Mon, 11 Mar 2013 22:46:15 -0400
Subject: [Catalog-sig] A 90% Solution
In-Reply-To: <513E76AE.10601@egenix.com>
References: <CALeMXf4admmZ5sVy2Hhd-dCcrVx9HqHf4_kbWtxkWtMorYitLQ@mail.gmail.com>
	<CF975217-751D-408A-A4FA-69F8E3BAEF9F@stufft.io>
	<513E76AE.10601@egenix.com>
Message-ID: <CALeMXf4hjXe2eSWPJhVwmenJ=yCxg3pVrEWQjSYqFXOmo3LgVw@mail.gmail.com>

On Mon, Mar 11, 2013 at 8:28 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 12.03.2013 00:39, Donald Stufft wrote:
>>
>> On Mar 11, 2013, at 7:04 PM, PJ Eby <pje at telecommunity.com> wrote:
>>
>>> Just a thought, but...
>>>
>>> If 90% of PyPI projects do not have any external files to download,
>>> then, wouldn't it make sense to:
>>
>> To be accurate it's 90% don't have any files/release available *only* externally. Most have external  files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't.
>
> How are you going to verify that disabling the links
> on those projects won't make certain release versions of
> those packages unavailable for pip/easy_install ?

I'm not sure if you're asking Donald or me here.  My proposal was to
only automatically disable the rel attributes for links to pages that
do *not* contain any easy_install or pip-able download links.  So, by
definition, this would not make any releases unavailable.

As for what Donald is proposing, I honestly have no idea what he's
talking about, or whether the 90% statistic actually applies for what
I'm proposing.

So it's possible that it might be a lot less than 90% that my proposal
would be able to affect *instantly*, without contacting any authors.


> How are you planing to inform the package authors of that
> change, so that they can take corrective action ?
>
> Which options would be available for authors ?

Do see my proposal again, which was simply that there be a switch to
enable or disable the rel attributes, that it default off for new
packages, and be switched to off for exactly that set of packages
which would not result in the loss of access to any download files.

There is, at this point, the question of how to handle projects that
have some of their releases hosted externally, or with some of the
files external and some not.  I would prefer that any automated
changeover apply only to packages where the set of discoverable links
is exactly equal to the links found on the project's /simple page.


> Regarding the links, it's probably better to not
> remove the rel="" attributes but instead change them
> from rel="download" to e.g. rel="external-download";
> or to keep the old index semantics around as /simple-v1/.
> This keeps the valuable semantic relation available for
> tools that want to use it.

For what?  If you must keep them, rel="disabled-homepage" etc. would
get the message across.  But I really don't see the point, and I
*invented* the bloody things.

Frankly, I'm more than prepared to toss the rel attributes altogether,
after adequate notice is given for people to move their files or links
to the files.  I just don't want any changes in the *rest* of the
/simple generation algorithm.

From regebro at gmail.com  Tue Mar 12 06:20:20 2013
From: regebro at gmail.com (Lennart Regebro)
Date: Tue, 12 Mar 2013 06:20:20 +0100
Subject: [Catalog-sig] A 90% Solution
In-Reply-To: <CALeMXf4admmZ5sVy2Hhd-dCcrVx9HqHf4_kbWtxkWtMorYitLQ@mail.gmail.com>
References: <CALeMXf4admmZ5sVy2Hhd-dCcrVx9HqHf4_kbWtxkWtMorYitLQ@mail.gmail.com>
Message-ID: <CAL0kPAVHLKb2217rWjLQqyFioa_oQRUXTiFzPggbz0XTDBDx7g@mail.gmail.com>

On Tue, Mar 12, 2013 at 12:04 AM, PJ Eby <pje at telecommunity.com> wrote:
> Just a thought, but...
>
> If 90% of PyPI projects do not have any external files to download,
> then, wouldn't it make sense to:
>
> 1. Add a project-level option to enable or disable the adding of the
> rel="" attribute to /simple links (but not affecting the links in any
> other way)
> 2. Default it to disabled for new projects, and
> 3. Set it to disabled *now* for the 90% of projects that *don't have
> external files*?

That doesn't solve the problem, but it would make easy_install faster, so +1

> Immediately, 90% of the problem goes away

That's not 90% of the problem. The problem with externally hosted
files is not primarily that easy_install gets slower.

> stuff that doesn't contain a link now, but which could be taken over
> by a malicious party in the future, and 90% fewer sites having to be
> up in order for you to build something from PyPI.

Well, if the sites that do not contain the packages are down, that
only results in the build be *really* slow, it doesn't fail. It's when
the sites which *are* hosting packages are down that the build fails.

//Lennart

From regebro at gmail.com  Tue Mar 12 06:25:08 2013
From: regebro at gmail.com (Lennart Regebro)
Date: Tue, 12 Mar 2013 06:25:08 +0100
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
Message-ID: <CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>

On Mon, Mar 11, 2013 at 8:57 PM, PJ Eby <pje at telecommunity.com> wrote:
> On Mon, Mar 11, 2013 at 1:45 PM, Lennart Regebro <regebro at gmail.com> wrote:
>> So, we should not remove the links for external packages until
>> somebody traverses those links? But as soon as somebody asks for those
>> links, we should remove them? In fact before we give them the link?
>
> I'm saying that if someone objects to the presence of  links they
> don't actually use, they are speaking nonsense.  Might as well ask to
> ban all packages from PyPI that they don't personally like -- it's the
> same request.  Nobody is forcing you to depend on packages that don't
> host on PyPI, so there is no point to the censorship.
>
> If you don't use the links, you can't argue that their presence is
> causing you harm.

Externally hosted files are a real world actual problem. We can only
solve it by not having externally hosted files. This discussion has
since a long time gone past reason into pure stop energy. I'm not
wasting more energy on it.

//Lennart

From mal at egenix.com  Tue Mar 12 08:57:22 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 12 Mar 2013 08:57:22 +0100
Subject: [Catalog-sig] A 90% Solution
In-Reply-To: <CALeMXf4hjXe2eSWPJhVwmenJ=yCxg3pVrEWQjSYqFXOmo3LgVw@mail.gmail.com>
References: <CALeMXf4admmZ5sVy2Hhd-dCcrVx9HqHf4_kbWtxkWtMorYitLQ@mail.gmail.com>
	<CF975217-751D-408A-A4FA-69F8E3BAEF9F@stufft.io>
	<513E76AE.10601@egenix.com>
	<CALeMXf4hjXe2eSWPJhVwmenJ=yCxg3pVrEWQjSYqFXOmo3LgVw@mail.gmail.com>
Message-ID: <513EDFE2.2000907@egenix.com>

On 12.03.2013 03:46, PJ Eby wrote:
> On Mon, Mar 11, 2013 at 8:28 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 12.03.2013 00:39, Donald Stufft wrote:
>>>
>>> On Mar 11, 2013, at 7:04 PM, PJ Eby <pje at telecommunity.com> wrote:
>>>
>>>> Just a thought, but...
>>>>
>>>> If 90% of PyPI projects do not have any external files to download,
>>>> then, wouldn't it make sense to:
>>>
>>> To be accurate it's 90% don't have any files/release available *only* externally. Most have external  files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't.
>>
>> How are you going to verify that disabling the links
>> on those projects won't make certain release versions of
>> those packages unavailable for pip/easy_install ?
> 
> I'm not sure if you're asking Donald or me here. 

I was asking Donald, since he came up with the list. Given that
he was using the pip PackageFinder, it is not clear whether this
actually covers all easy_install'able packages as well (most likely
not, since pip doesn't support e.g. egg files).

> My proposal was to
> only automatically disable the rel attributes for links to pages that
> do *not* contain any easy_install or pip-able download links.  So, by
> definition, this would not make any releases unavailable.

Ok.

> As for what Donald is proposing, I honestly have no idea what he's
> talking about, or whether the 90% statistic actually applies for what
> I'm proposing.
> 
> So it's possible that it might be a lot less than 90% that my proposal
> would be able to affect *instantly*, without contacting any authors.

We'd still need to inform authors that we changed a setting
in their package, since they may want to use the feature
to host packages or releases off-PyPI again in the future.

>> How are you planing to inform the package authors of that
>> change, so that they can take corrective action ?
>>
>> Which options would be available for authors ?
> 
> Do see my proposal again, which was simply that there be a switch to
> enable or disable the rel attributes, that it default off for new
> packages, and be switched to off for exactly that set of packages
> which would not result in the loss of access to any download files.

Yes, I saw that, but was putting up the questions in the context
of Donald's idea to remove the links altogether.

> There is, at this point, the question of how to handle projects that
> have some of their releases hosted externally, or with some of the
> files external and some not.  I would prefer that any automated
> changeover apply only to packages where the set of discoverable links
> is exactly equal to the links found on the project's /simple page.

That would be safer, yes.

>> Regarding the links, it's probably better to not
>> remove the rel="" attributes but instead change them
>> from rel="download" to e.g. rel="external-download";
>> or to keep the old index semantics around as /simple-v1/.
>> This keeps the valuable semantic relation available for
>> tools that want to use it.
> 
> For what?  If you must keep them, rel="disabled-homepage" etc. would
> get the message across.  But I really don't see the point, and I
> *invented* the bloody things.

True, but they are now part of the PyPI API and thus cannot be
changed or removed easily.

The rel="" attributes provide extra information to tools
using the /simple/ index as (static) API and losing such
information would break the API.

You're only thinking about installers using the /simple/
API, but there may very well also be e.g. researchers interested
in scanning the index for homepages to find out where Python
software lives, how the community is connected, which
preferences for hosting and developing Python software
there are, etc.

That's a different context and in that context, the rel=""
attributes play a different role.

Removing them would make such research impossible to implement
using the /simple/ index and researchers would have to either go
with the XML-RPC API (which is slow compared to /simple/, puts a
lot of load on the PyPI server and cannot be placed on a CDN)
or revert to the old-style scanning of the PyPI package pages.

> Frankly, I'm more than prepared to toss the rel attributes altogether,
> after adequate notice is given for people to move their files or links
> to the files.  I just don't want any changes in the *rest* of the
> /simple generation algorithm.

See above.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 12 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From holger at merlinux.eu  Tue Mar 12 09:21:18 2013
From: holger at merlinux.eu (holger krekel)
Date: Tue, 12 Mar 2013 08:21:18 +0000
Subject: [Catalog-sig] A 90% Solution
In-Reply-To: <CALeMXf4admmZ5sVy2Hhd-dCcrVx9HqHf4_kbWtxkWtMorYitLQ@mail.gmail.com>
References: <CALeMXf4admmZ5sVy2Hhd-dCcrVx9HqHf4_kbWtxkWtMorYitLQ@mail.gmail.com>
Message-ID: <20130312082118.GY9677@merlinux.eu>

On Mon, Mar 11, 2013 at 19:04 -0400, PJ Eby wrote:
> Just a thought, but...
> 
> If 90% of PyPI projects do not have any external files to download,
> then, wouldn't it make sense to:

sidenote: we need to verify and clarify the 90/10 ratio.  It would be 
the basis for action/changing pypi-state so we need to have this accurate
and double-checked.

> 1. Add a project-level option to enable or disable the adding of the
> rel="" attribute to /simple links (but not affecting the links in any
> other way)
> 2. Default it to disabled for new projects, and
> 3. Set it to disabled *now* for the 90% of projects that *don't have
> external files*?
>
> If the arguments about banning external links are as valid and
> important as some people claim, wouldn't it make sense to do this part
> *now*, without first requiring a commitment to force the switch to a
> disabled state in the future?

Pre-announcing the step to maintainers is good communication style. 
There is always the issue of bugs in your determination of "external hosting"
or tools that rely on "rel" attributes without us knowing etc.  

> Immediately, 90% of the problem goes away - no random spidering of
> stuff that doesn't contain a link now, but which could be taken over
> by a malicious party in the future, and 90% fewer sites having to be
> up in order for you to build something from PyPI.
> 
> Seems like a serious win to me -- and one that might not even need a PEP.

Yes and no: a PEP-like document is a good place to point people to.

> Next steps after this would be providing tools to help people move
> their files and links, promoting that people switch it off if they no
> longer support the offsite links, educating about security concerns,
> etc.
>
> I really don't understand why the 90% solution isn't *already* the
> consensus position, since it doesn't preclude follow-on efforts
> towards reducing the 10% towards 0%.
>
> And if the problem is so important, why must we keep 90% of the
> problems in place, just so we can keep arguing about censoring the
> 10%?  That doesn't make sense to me.

The idea for only changing the pypi-server side only evolved last week -
so we are not that slow in moving on here :)

cheers,
holger


> 
> To me, if somebody's injured, the first thing you do is clean and
> close the wound, not argue about whether it's a complete solution and
> what might happen days or weeks later.
> 
> Just a thought.
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
> 

From jnoller at gmail.com  Tue Mar 12 10:14:07 2013
From: jnoller at gmail.com (Jesse Noller)
Date: Tue, 12 Mar 2013 05:14:07 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
Message-ID: <A29A9890-2807-4131-9836-26E7A87EEB0A@gmail.com>



On Mar 12, 2013, at 1:25 AM, Lennart Regebro <regebro at gmail.com> wrote:

> On Mon, Mar 11, 2013 at 8:57 PM, PJ Eby <pje at telecommunity.com> wrote:
>> On Mon, Mar 11, 2013 at 1:45 PM, Lennart Regebro <regebro at gmail.com> wrote:
>>> So, we should not remove the links for external packages until
>>> somebody traverses those links? But as soon as somebody asks for those
>>> links, we should remove them? In fact before we give them the link?
>> 
>> I'm saying that if someone objects to the presence of  links they
>> don't actually use, they are speaking nonsense.  Might as well ask to
>> ban all packages from PyPI that they don't personally like -- it's the
>> same request.  Nobody is forcing you to depend on packages that don't
>> host on PyPI, so there is no point to the censorship.
>> 
>> If you don't use the links, you can't argue that their presence is
>> causing you harm.
> 
> Externally hosted files are a real world actual problem. We can only
> solve it by not having externally hosted files. This discussion has
> since a long time gone past reason into pure stop energy. I'm not
> wasting more energy on it.
> 
> //Lennart
> 

Likewise. I'd like to see a pull requests cleaning things up in a reasonable way we can discuss with Richard at pycon 

From jnoller at gmail.com  Tue Mar 12 10:20:11 2013
From: jnoller at gmail.com (Jesse Noller)
Date: Tue, 12 Mar 2013 05:20:11 -0400
Subject: [Catalog-sig] A 90% Solution
In-Reply-To: <513EDFE2.2000907@egenix.com>
References: <CALeMXf4admmZ5sVy2Hhd-dCcrVx9HqHf4_kbWtxkWtMorYitLQ@mail.gmail.com>
	<CF975217-751D-408A-A4FA-69F8E3BAEF9F@stufft.io>
	<513E76AE.10601@egenix.com>
	<CALeMXf4hjXe2eSWPJhVwmenJ=yCxg3pVrEWQjSYqFXOmo3LgVw@mail.gmail.com>
	<513EDFE2.2000907@egenix.com>
Message-ID: <2A3B41AD-BDF2-481A-8830-F6E26E1D17BC@gmail.com>



On Mar 12, 2013, at 3:57 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:

> On 12.03.2013 03:46, PJ Eby wrote:
>> On Mon, Mar 11, 2013 at 8:28 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>> On 12.03.2013 00:39, Donald Stufft wrote:
>>>> 
>>>> On Mar 11, 2013, at 7:04 PM, PJ Eby <pje at telecommunity.com> wrote:
>>>> 
>>>>> Just a thought, but...
>>>>> 
>>>>> If 90% of PyPI projects do not have any external files to download,
>>>>> then, wouldn't it make sense to:
>>>> 
>>>> To be accurate it's 90% don't have any files/release available *only* externally. Most have external  files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't.
>>> 
>>> How are you going to verify that disabling the links
>>> on those projects won't make certain release versions of
>>> those packages unavailable for pip/easy_install ?
>> 
>> I'm not sure if you're asking Donald or me here. 
> 
> I was asking Donald, since he came up with the list. Given that
> he was using the pip PackageFinder, it is not clear whether this
> actually covers all easy_install'able packages as well (most likely
> not, since pip doesn't support e.g. egg files).
> 
>> My proposal was to
>> only automatically disable the rel attributes for links to pages that
>> do *not* contain any easy_install or pip-able download links.  So, by
>> definition, this would not make any releases unavailable.
> 
> Ok.
> 
>> As for what Donald is proposing, I honestly have no idea what he's
>> talking about, or whether the 90% statistic actually applies for what
>> I'm proposing.
>> 
>> So it's possible that it might be a lot less than 90% that my proposal
>> would be able to affect *instantly*, without contacting any authors.
> 
> We'd still need to inform authors that we changed a setting
> in their package, since they may want to use the feature
> to host packages or releases off-PyPI again in the future.
> 
>>> How are you planing to inform the package authors of that
>>> change, so that they can take corrective action ?
>>> 
>>> Which options would be available for authors ?
>> 
>> Do see my proposal again, which was simply that there be a switch to
>> enable or disable the rel attributes, that it default off for new
>> packages, and be switched to off for exactly that set of packages
>> which would not result in the loss of access to any download files.
> 
> Yes, I saw that, but was putting up the questions in the context
> of Donald's idea to remove the links altogether.
> 
>> There is, at this point, the question of how to handle projects that
>> have some of their releases hosted externally, or with some of the
>> files external and some not.  I would prefer that any automated
>> changeover apply only to packages where the set of discoverable links
>> is exactly equal to the links found on the project's /simple page.
> 
> That would be safer, yes.
> 
>>> Regarding the links, it's probably better to not
>>> remove the rel="" attributes but instead change them
>>> from rel="download" to e.g. rel="external-download";
>>> or to keep the old index semantics around as /simple-v1/.
>>> This keeps the valuable semantic relation available for
>>> tools that want to use it.
>> 
>> For what?  If you must keep them, rel="disabled-homepage" etc. would
>> get the message across.  But I really don't see the point, and I
>> *invented* the bloody things.
> 
> True, but they are now part of the PyPI API and thus cannot be
> changed or removed easily.
> 
> The rel="" attributes provide extra information to tools
> using the /simple/ index as (static) API and losing such
> information would break the API.
> 
> You're only thinking about installers using the /simple/
> API, but there may very well also be e.g. researchers interested
> in scanning the index for homepages to find out where Python
> software lives, how the community is connected, which
> preferences for hosting and developing Python software
> there are, etc.
> 
> That's a different context and in that context, the rel=""
> attributes play a different role.
> 
> Removing them would make such research impossible to implement
> using the /simple/ index and researchers would have to either go
> with the XML-RPC API (which is slow compared to /simple/, puts a
> lot of load on the PyPI server and cannot be placed on a CDN)
> or revert to the old-style scanning of the PyPI package pages.
> 

So because of hypothetical researchers we can't make the system better.


>> Frankly, I'm more than prepared to toss the rel attributes altogether,
>> after adequate notice is given for people to move their files or links
>> to the files.  I just don't want any changes in the *rest* of the
>> /simple generation algorithm.
> 
> See above.
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 12 2013)
>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

From mal at egenix.com  Tue Mar 12 10:50:23 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 12 Mar 2013 10:50:23 +0100
Subject: [Catalog-sig] A 90% Solution
In-Reply-To: <2A3B41AD-BDF2-481A-8830-F6E26E1D17BC@gmail.com>
References: <CALeMXf4admmZ5sVy2Hhd-dCcrVx9HqHf4_kbWtxkWtMorYitLQ@mail.gmail.com>
	<CF975217-751D-408A-A4FA-69F8E3BAEF9F@stufft.io>
	<513E76AE.10601@egenix.com>
	<CALeMXf4hjXe2eSWPJhVwmenJ=yCxg3pVrEWQjSYqFXOmo3LgVw@mail.gmail.com>
	<513EDFE2.2000907@egenix.com>
	<2A3B41AD-BDF2-481A-8830-F6E26E1D17BC@gmail.com>
Message-ID: <513EFA5F.5000302@egenix.com>

On 12.03.2013 10:20, Jesse Noller wrote:
> 
> 
> On Mar 12, 2013, at 3:57 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:
> 
>> On 12.03.2013 03:46, PJ Eby wrote:
>>> On Mon, Mar 11, 2013 at 8:28 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>> On 12.03.2013 00:39, Donald Stufft wrote:
>>>>>
>>>>> On Mar 11, 2013, at 7:04 PM, PJ Eby <pje at telecommunity.com> wrote:
>>>>>
>>>>>> Just a thought, but...
>>>>>>
>>>>>> If 90% of PyPI projects do not have any external files to download,
>>>>>> then, wouldn't it make sense to:
>>>>>
>>>>> To be accurate it's 90% don't have any files/release available *only* externally. Most have external  files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't.
>>>>
>>>> How are you going to verify that disabling the links
>>>> on those projects won't make certain release versions of
>>>> those packages unavailable for pip/easy_install ?
>>>
>>> I'm not sure if you're asking Donald or me here. 
>>
>> I was asking Donald, since he came up with the list. Given that
>> he was using the pip PackageFinder, it is not clear whether this
>> actually covers all easy_install'able packages as well (most likely
>> not, since pip doesn't support e.g. egg files).
>>
>>> My proposal was to
>>> only automatically disable the rel attributes for links to pages that
>>> do *not* contain any easy_install or pip-able download links.  So, by
>>> definition, this would not make any releases unavailable.
>>
>> Ok.
>>
>>> As for what Donald is proposing, I honestly have no idea what he's
>>> talking about, or whether the 90% statistic actually applies for what
>>> I'm proposing.
>>>
>>> So it's possible that it might be a lot less than 90% that my proposal
>>> would be able to affect *instantly*, without contacting any authors.
>>
>> We'd still need to inform authors that we changed a setting
>> in their package, since they may want to use the feature
>> to host packages or releases off-PyPI again in the future.
>>
>>>> How are you planing to inform the package authors of that
>>>> change, so that they can take corrective action ?
>>>>
>>>> Which options would be available for authors ?
>>>
>>> Do see my proposal again, which was simply that there be a switch to
>>> enable or disable the rel attributes, that it default off for new
>>> packages, and be switched to off for exactly that set of packages
>>> which would not result in the loss of access to any download files.
>>
>> Yes, I saw that, but was putting up the questions in the context
>> of Donald's idea to remove the links altogether.
>>
>>> There is, at this point, the question of how to handle projects that
>>> have some of their releases hosted externally, or with some of the
>>> files external and some not.  I would prefer that any automated
>>> changeover apply only to packages where the set of discoverable links
>>> is exactly equal to the links found on the project's /simple page.
>>
>> That would be safer, yes.
>>
>>>> Regarding the links, it's probably better to not
>>>> remove the rel="" attributes but instead change them
>>>> from rel="download" to e.g. rel="external-download";
>>>> or to keep the old index semantics around as /simple-v1/.
>>>> This keeps the valuable semantic relation available for
>>>> tools that want to use it.
>>>
>>> For what?  If you must keep them, rel="disabled-homepage" etc. would
>>> get the message across.  But I really don't see the point, and I
>>> *invented* the bloody things.
>>
>> True, but they are now part of the PyPI API and thus cannot be
>> changed or removed easily.
>>
>> The rel="" attributes provide extra information to tools
>> using the /simple/ index as (static) API and losing such
>> information would break the API.
>>
>> You're only thinking about installers using the /simple/
>> API, but there may very well also be e.g. researchers interested
>> in scanning the index for homepages to find out where Python
>> software lives, how the community is connected, which
>> preferences for hosting and developing Python software
>> there are, etc.
>>
>> That's a different context and in that context, the rel=""
>> attributes play a different role.
>>
>> Removing them would make such research impossible to implement
>> using the /simple/ index and researchers would have to either go
>> with the XML-RPC API (which is slow compared to /simple/, puts a
>> lot of load on the PyPI server and cannot be placed on a CDN)
>> or revert to the old-style scanning of the PyPI package pages.
>>
> 
> So because of hypothetical researchers we can't make the system better.

Of course we can, but just like with Python itself, we have to
pay attention to backwards compatibility.

Not hard to do: we'd just need to keep the old index in place
using a different URL, e.g. /simple-v1/.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 12 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From holger at merlinux.eu  Tue Mar 12 12:38:17 2013
From: holger at merlinux.eu (holger krekel)
Date: Tue, 12 Mar 2013 11:38:17 +0000
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on
	PYPI
Message-ID: <20130312113817.GA9677@merlinux.eu>

Hi all,

below is the new PEP pre-submit version (V2) which incorporates the
latest suggestions and aims at a rapidly deployable solution.  Thanks in
particular to Philip, Donald and Marc-Andre.  I also added a few notes
on how installers should behave with respect to non-PYPI crawling.  

I think a PEP like doc is warranted and that we should not silently
change things without proper communication to maintainers and pre-planning
the implementation/change process.  Arguably, the changes are more
invasive than "oh, let's just do a http->https redirect" which didn't
work too well either.

Now, if there is some agreement, i can submit this PEP officially tomorrow,
and given agreement/refinments from the Pycon folks and the likes of
Richard, we may be able to get going very shortly after Pycon.

cheers,
holger


PEP-draft: transitioning to release-file hosting on PYPI
====================================================================

Status
-----------

PRE-SUBMIT-v2

Abstract
------------

This PEP proposes a backward-compatible transition process to speed up,
simplify and robustify installing from the pypi.python.org (PYPI)
package index.  The initial transition will put most packages on PYPI
automatically in a configuration mode which will prevent client-side
crawling from installers.  To ease automatic transition and minimize
client-side friction, **no changes to distutils or installation tools** are
required.  Instead, the transition is implemented by modifying PYPI to
serve links from ``simple/`` pages in a configurable way, preventing or
allowing crawling of non-PYPI sites for detecting release files.
Maintainers of all PYPI packages will be notified ahead of those
changes.

Maintainers of packages which currently are hosted on non-PYPI sites
shall receive instructions and tools to ease "re-hosting" of their
historic and future package release files.  The implementation of such
tools is NOT required for implementing the initial automatic transition.

Installation tools like pip and easy_install shall warn about crawling
non-PYPI sites and later default to disallow it and only allow it with
an explicit option.


History and motivations for external hosting
------------------------------------------------

When PYPI went online, it offered release registration but had no
facility to host release files itself.  When hosting was added, no
automated downloading tool existed yet.  When Philip Eby implemented
automated downloading (through setuptools), he made the choice 
to allow people to use download hosts of their choice.  This was
implemented by the PYPI ``simple/`` index containing links of type
``rel=homepage`` or ``rel=download`` which are crawled by installation
tools to discover package links.  As of March 2013, a substantial part 
of packages (estimated to about 10%) make use of this mechanism to host
files on github, bitbucket, sourceforge or own hosting sites like 
``mercurial.selenic.com``, to just name a few.

There are many reasons [2]_ why people choose to use external hosting,
to cite just a few:

- release processes and scripts have been developed already and 
  upload to external sites 

- it takes too long to upload large files from some places in the world

- export restrictions e.g. for crypto-related software

- company policies which prescribe offering open source packages through
  own sites

- problems with integrating uploading to PYPI into one's release process
  (because of release policies)

- perceived bad reliability of PYPI

- missing knowlege you can upload files 

Irrespective of the present-day validity of these reasons, there clearly
is a history why people choose to host files externally and it even was 
for some time the only way you could do things.  


Problem
---------------

**Today, python package installers (pip and easy_install) often need to
query non-PYPI sites even if there are no externally hosted files**.
Apart from querying pypi.python.org's simple index pages, also all
homepages and download pages ever specified with any release of a
package are crawled by an installer.  The need for installers to
crawl 3rd party sites slows down installation and makes for a brittle
unreliable installation process.   Those sites and packages also don't 
take part in the :pep:`381` mirroring infrastructure, further decreasing
reliability and speed of automated installation processes around the world. 

Roughly 90% of packages are hosted directly on pypi.python.org [1]_.
Even for them installers still need to crawl the homepage(s) of a
package.  Many package uploaders are particularly not aware that
specifying the "homepage" in their release process will slow down 
the installation process for all its users.

Relying on third party sites also opens up more attack vectors
for injecting malicious packages into sites using automated installs.  
A simple attack might just involve getting hold of an old now-unused
homepage domain and placing mailicious packages there.  Moreover,
performing a Man-in-The-Middle (MITM) attack between an installation
site and any of the download sites can inject mailicious packages on the
installation site.  As many homepages and download locations are using
HTTP and not proper HTTPS, such attacks are not very hard to launch.
Such MITM attacks can happen even for packages which never intended to
host files externally as their homepages are contacted by installers
anyway.

There is currently no way for package maintainers to avoid 3rd party
crawling, other than removing all homepage/download url metadata
for all historic releases.  While a script [3]_ has been written to 
perform this action, it is not a good general solution because it removes
semantic information like the "homepage" specification from PYPI packages.


Solution
-----------

The proposed solution consists of the following implementation and
communication steps:

- determine which packages have releases files only on PYPI (group A)
  and which have externally hosted release files (group B).

- Prepare PYPI implementation to allow a per-project "hosting mode",
  effectively enabling or disabling external crawling.  When enabled 
  nothing changes from the current situation of producing ``rel=download`` 
  and ``rel=homepage`` attributed links on ``simple/`` pages, 
  causing installers to crawl those sites.  
  When disabled, the attributions of links will change 
  to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to
  avoid crawling 3rd party sites.  Retaining the meta-information allows
  tools to still make use of the semantic information.

- send mail to maintainers of A that their project is going to be 
  automatically configured to "disable crawling" in one week
  and encourage them to set this mode earlier to help all of 
  their users.

- send mail to maintainers of B that their package hosting mode 
  is "crawling enabled", and list the sites which currently are crawled,
  and suggest that they re-host their packages directly on PYPI and 
  then switch the hosting-mode "disable crawling".  Provide instructions 
  and at best tools to help with this "re-uploading" process.

In addition, maintainers of installation tools are asked to release
two updates.  The first one shall provide clear warnings if external
crawling needs to happen, for which projects and URLS exactly 
this happens, and that in the future crawling will be disabled by default.  
The next update shall change the default to disallow crawling and allow 
crawling only with an explicit option like ``--crawl-externals`` and 
another option allowing to limit which hosts are allowed to be crawled
at all.


Hosting-Mode state transitions
----------------------------------

1. At the outset, we set hosting-mode to "notset" for all packages.
   This will not change any link served via the simple index and thus
   no bad effects are expected.  Early adopters and testers may now
   change the mode to either "crawl" or "nocrawl" to help with
   streamlining issues in the PYPI implementation.

2. When maintainers of B packages are mailed their mode is directly
   set to "crawl".

3. When maintainers of A are mailed we leave the mode at "notset" to allow
   people to change it to "nocrawl" themselves or to set it to "crawl" 
   if they think they are wrongly in the "A" group.  After a week 
   all "notset" modes are set to "nocrawl".

A week after the mailings all packages will be in "crawl" or "nocrawl"
hosting mode.  It is then a matter of good tools and reaching out to
maintainers of B packages to increase the A/B ratio.

Open questions
----------------------

- Should the support tools for "rehosting" packages be implemented  on the
  server side or on the client side?  Implementing it on the client
  side probably is quicker to get right and less fatal in terms of failures.

- double-check if ``rel=newhomepage`` and ``rel=newdownload`` cause the 
  desired behaviour of pip and easy_install (both the distribute and 
  setuptools based one) to not crawl those pages.

- are the "support tools" for re-hosting outside the scope of this PEP?

- Think some more about pip/easy_install "allow-hosts" mode etc.

References
------------

.. [1] Donald Stufft, ratio of externally hosted versus pypi-hosted, http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html

.. [2] Marc-Andre Lemburg, reasons for external hosting, http://mail.python.org/pipermail/catalog-sig/2013-March/005626.html

.. [3] Holger Krekel, Script to remove homepage/download metadata for
       all releases http://mail.python.org/pipermail/catalog-sig/2013-February/005423.html

Acknowledgments
----------------------

Philip Eby for precise information and the basic ideas to
implement the transition via server-side changes only.

Donald Stufft for pushing away from external hosting and doing
the 90/10 % statistics script and offering to implement a PR.

Marc-Andre Lemburg, Nick Coghlan and catalog-sig for thinking
through issues regarding getting rid of "external hosting".


Copyright
-----------------

This document has been placed in the public domain.



From ncoghlan at gmail.com  Tue Mar 12 16:19:32 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 13 Mar 2013 01:19:32 +1000
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting
 on PYPI
In-Reply-To: <20130312113817.GA9677@merlinux.eu>
References: <20130312113817.GA9677@merlinux.eu>
Message-ID: <CADiSq7fZPNuuUD-J7pu=e91Yi_OHzc5UFTz68jOibhE+PbMObg@mail.gmail.com>

That looks pretty good to me. My only comment is that qualifiers like "new"
don't age well in an API. The explicit "nocrawlhomepage" and
"nocrawldownload" might be a better choice.

Cheers,
Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130313/e16b0b08/attachment.html>

From pje at telecommunity.com  Tue Mar 12 16:28:54 2013
From: pje at telecommunity.com (PJ Eby)
Date: Tue, 12 Mar 2013 11:28:54 -0400
Subject: [Catalog-sig] A 90% Solution
In-Reply-To: <513EFA5F.5000302@egenix.com>
References: <CALeMXf4admmZ5sVy2Hhd-dCcrVx9HqHf4_kbWtxkWtMorYitLQ@mail.gmail.com>
	<CF975217-751D-408A-A4FA-69F8E3BAEF9F@stufft.io>
	<513E76AE.10601@egenix.com>
	<CALeMXf4hjXe2eSWPJhVwmenJ=yCxg3pVrEWQjSYqFXOmo3LgVw@mail.gmail.com>
	<513EDFE2.2000907@egenix.com>
	<2A3B41AD-BDF2-481A-8830-F6E26E1D17BC@gmail.com>
	<513EFA5F.5000302@egenix.com>
Message-ID: <CALeMXf4bgoEKWJCKgoUrjdL590tabMf9i4b_oey+VvHeYWsaQg@mail.gmail.com>

On Tue, Mar 12, 2013 at 5:50 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> Not hard to do: we'd just need to keep the old index in place
> using a different URL, e.g. /simple-v1/.

That's not necessary: the XML-RPC API lets you query those URLs
directly.  They're part of the metadata standard, after all...  which
means you can *also* access them by downloading the DOAP records,
browsing the PyPI pages directly, etc.

There are plenty of ways to get that data, no point adding another one.

From pje at telecommunity.com  Tue Mar 12 16:38:22 2013
From: pje at telecommunity.com (PJ Eby)
Date: Tue, 12 Mar 2013 11:38:22 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
Message-ID: <CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>

On Tue, Mar 12, 2013 at 1:25 AM, Lennart Regebro <regebro at gmail.com> wrote:
> Externally hosted files are a real world actual problem.

You're leaving out some important words from that sentence.  Words
like, "for some people" and "who choose to depend on projects using
them".

PyPI isn't your private personal playground.  Other people have rights, too.

> This discussion has since a long time gone past reason into pure stop energy.

I agree - hardly anyone is giving any reasoning that justifies why one
group of people should have their projects censored to benefit a few
blowhards on Catalog-SIG.

Carl's the only person who's even *tried* giving a justification.
Everyone else just shuts up or changes the subject when I ask that
question.

I'll ask it again: why should *thousands* of projects be censored or
made to change their release processes, because *you* can't be
bothered to cache the distributions of the projects you depend on?

Not, why would it be a good idea for them to change anyway.

Why should they be *forced* to do it?

Bonus points: answer why, *every time* somebody proposes a way of
improving things that doesn't *ban* external hosting, you guys go all
stop energy on that and derail the discussion with why it has to be
total.

AFAICT, you're the ones stopping things moving forward here,
filibustering against every possible compromise.

From jacob at jacobian.org  Tue Mar 12 16:42:28 2013
From: jacob at jacobian.org (Jacob Kaplan-Moss)
Date: Tue, 12 Mar 2013 10:42:28 -0500
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
Message-ID: <CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>

On Tue, Mar 12, 2013 at 10:38 AM, PJ Eby <pje at telecommunity.com> wrote:
> I'll ask it again: why should *thousands* of projects be censored or
> made to change their release processes, because *you* can't be
> bothered to cache the distributions of the projects you depend on?

Because externally-hosted files are a security risk, one that most
users don't realize exists.

We can either fix this problem now, or we can wait until someone is
compromised using PyPI as a vector.

Jacob

From jacob at jacobian.org  Tue Mar 12 16:44:17 2013
From: jacob at jacobian.org (Jacob Kaplan-Moss)
Date: Tue, 12 Mar 2013 10:44:17 -0500
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
Message-ID: <CAK8PqJHoMXBG8CDssR55CRvrMCxkn2V1QZci0+QGi4S42Y==pA@mail.gmail.com>

On Tue, Mar 12, 2013 at 10:38 AM, PJ Eby <pje at telecommunity.com> wrote:
> AFAICT, you're the ones stopping things moving forward here,
> filibustering against every possible compromise.

Sorry, one more thing: I'm interested in what your comprise would be.
Can you write up a counter-proposal to Holger's?

Jacob

From pje at telecommunity.com  Tue Mar 12 16:53:10 2013
From: pje at telecommunity.com (PJ Eby)
Date: Tue, 12 Mar 2013 11:53:10 -0400
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting
 on PYPI
In-Reply-To: <20130312113817.GA9677@merlinux.eu>
References: <20130312113817.GA9677@merlinux.eu>
Message-ID: <CALeMXf6M+5pSmFq+Krqe1sc-p_z-tADhzaEmcdd5RwVw5VXc-w@mail.gmail.com>

On Tue, Mar 12, 2013 at 7:38 AM, holger krekel <holger at merlinux.eu> wrote:
> In addition, maintainers of installation tools are asked to release
> two updates.  The first one shall provide clear warnings if external
> crawling needs to happen,

A clarification here: "needs to happen" is not well-specified.  An
installer tasked with finding the latest or best-matching version of a
package must currently *always* crawl.  So the warning would be
always.

The strategy I originally chose for making this change in easy_install
is to warn once at the beginning that --allow-hosts has not been set,
and thus packages might be downloaded from anywhere on the internet.

I've since become uncertain that this change is actually workable in
the short term, since until most of the packages are actually moved
onto PyPI, a lot of installs will fail if somebody changes their
configuration to be more secure.  So I'm thinking the warning needs to
be deferred until at least the more popular packages have moved to
PyPI.


> Now, if there is some agreement, i can submit this PEP officially tomorrow,
> and given agreement/refinments from the Pycon folks and the likes of
> Richard, we may be able to get going very shortly after Pycon.

I'd like to suggest that the PEP should be explicit that no other
changes to the /simple generation algorithm are being made, just the
removal or alteration of rel="" attributes.  i.e., it will still be
possible -- at least in the near term -- for projects to include
explicit download links to files made available elsewhere.  Changing
that situation is more controversial and will require wider community
participation than has occurred to date.

It might also be good to suggest that authors of PyPI clones plan
their own phase-out of rel="" attributes.

From m.van.rees at zestsoftware.nl  Tue Mar 12 17:04:52 2013
From: m.van.rees at zestsoftware.nl (Maurits van Rees)
Date: Tue, 12 Mar 2013 17:04:52 +0100
Subject: [Catalog-sig] Inconsistency on f.pypi.python.org with
	Products.PluggableAuthService
In-Reply-To: <kh53a9$i3$1@ger.gmane.org>
References: <kh53a9$i3$1@ger.gmane.org>
Message-ID: <khnjn1$gji$1@ger.gmane.org>

Op 05-03-13 16:34, Christian Theune schreef:
> Hi,
>
>
> it seems my fight to keep f.pypi.python.org is at least keeping the
> pypi-mirrors.org page happy.
>
>
> Unfortunately one ouf our users detected another inconsistency that the
> mirror script doesn't find or clean up by itself. I also don't know how
> to get this back in line.
>
>
> If you compare those pages:
>
>
> http://f.pypi.python.org/packages/source/P/Products.PluggableAuthService/
>
> http://f.pypi.python.org/simple/Products.PluggableAuthService
>
> http://pypi.python.org/simple/Products.PluggableAuthService
> <http://f.pypi.python.org/simple/Products.PluggableAuthService>
>
>
> There's definitely something wrong.
>
>
> Suggestions?

I meant to look at this earlier, as I noticed it too.  Apparently it has 
not solved itself.  The latest release is 1.10.0, which was uploaded on 
19 February, which is the day that PyPI switched to https.  My guess is 
that some mirrors did an update at a point in time when PyPI had 
problems because of this switch and that those mirrors somehow got 
affected by this.

Let's look at the state of the various mirros=rs.

http://a.pypi.python.org is perfect.

http://b.pypi.python.org says "Package Products.PluggableAuthService 
does not exist", which should not be true, as this package has existed 
for years.  Also, http://b.pypi.python.org/packages/source/P/ does list 
Products.PluggableAuthService, but that page has an empty html body.

http://c.pypi.python.org/simple/Products.PluggableAuthService says: "The 
requested URL /simple/Products.PluggableAuthService/ was not found on 
this server." 
http://c.pypi.python.org/packages/source/P/Products.PluggableAuthService/ does 
exist and lists all except the last release.

d.pypi.python.org is unavailable.

e.pypi.python.org is perfect.

f.pypi.python.org: same as c.

g.pypi.python.org works, but has not been updated in over a month so it 
misses the latest release.  I guess this one would work if it got 
updated again.

pypi.crate.io is fine.


So b, c and f have a problem.  http://www.pypi-mirrors.org lists these 
respectively as old, aging and fresh.

If anyone knows what could be done to solve this, that would be good.

-- 
Maurits van Rees: http://maurits.vanrees.org/
Zest Software: http://zestsoftware.nl


From mal at egenix.com  Tue Mar 12 17:06:26 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 12 Mar 2013 17:06:26 +0100
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting
 on PYPI
In-Reply-To: <20130312113817.GA9677@merlinux.eu>
References: <20130312113817.GA9677@merlinux.eu>
Message-ID: <513F5282.3010206@egenix.com>

On 12.03.2013 12:38, holger krekel wrote:
> Hi all,
> 
> below is the new PEP pre-submit version (V2) which incorporates the
> latest suggestions and aims at a rapidly deployable solution.  Thanks in
> particular to Philip, Donald and Marc-Andre.  I also added a few notes
> on how installers should behave with respect to non-PYPI crawling.  
> 
> I think a PEP like doc is warranted and that we should not silently
> change things without proper communication to maintainers and pre-planning
> the implementation/change process.  Arguably, the changes are more
> invasive than "oh, let's just do a http->https redirect" which didn't
> work too well either.
> 
> Now, if there is some agreement, i can submit this PEP officially tomorrow,
> and given agreement/refinments from the Pycon folks and the likes of
> Richard, we may be able to get going very shortly after Pycon.
> 
> cheers,
> holger
> 
> 
> PEP-draft: transitioning to release-file hosting on PYPI
> ====================================================================
> 
> Status
> -----------
> 
> PRE-SUBMIT-v2
> 
> Abstract
> ------------
> 
> This PEP proposes a backward-compatible transition process to speed up,
> simplify and robustify installing from the pypi.python.org (PYPI)
> package index.  The initial transition will put most packages on PYPI
> automatically in a configuration mode which will prevent client-side
> crawling from installers.  To ease automatic transition and minimize
> client-side friction, **no changes to distutils or installation tools** are
> required.  Instead, the transition is implemented by modifying PYPI to
> serve links from ``simple/`` pages in a configurable way, preventing or
> allowing crawling of non-PYPI sites for detecting release files.
> Maintainers of all PYPI packages will be notified ahead of those
> changes.
> 
> Maintainers of packages which currently are hosted on non-PYPI sites
> shall receive instructions and tools to ease "re-hosting" of their
> historic and future package release files.  The implementation of such
> tools is NOT required for implementing the initial automatic transition.
> 
> Installation tools like pip and easy_install shall warn about crawling
> non-PYPI sites and later default to disallow it and only allow it with
> an explicit option.
> 
> 
> History and motivations for external hosting
> ------------------------------------------------
> 
> When PYPI went online, it offered release registration but had no
> facility to host release files itself.  When hosting was added, no
> automated downloading tool existed yet.  When Philip Eby implemented
> automated downloading (through setuptools), he made the choice 
> to allow people to use download hosts of their choice.  This was
> implemented by the PYPI ``simple/`` index containing links of type
> ``rel=homepage`` or ``rel=download`` which are crawled by installation
> tools to discover package links.  As of March 2013, a substantial part 
> of packages (estimated to about 10%) make use of this mechanism to host
> files on github, bitbucket, sourceforge or own hosting sites like 
> ``mercurial.selenic.com``, to just name a few.
> 
> There are many reasons [2]_ why people choose to use external hosting,
> to cite just a few:
> 
> - release processes and scripts have been developed already and 
>   upload to external sites 
> 
> - it takes too long to upload large files from some places in the world
> 
> - export restrictions e.g. for crypto-related software
> 
> - company policies which prescribe offering open source packages through
>   own sites
> 
> - problems with integrating uploading to PYPI into one's release process
>   (because of release policies)
> 
> - perceived bad reliability of PYPI
> 
> - missing knowlege you can upload files 
> 
> Irrespective of the present-day validity of these reasons, there clearly
> is a history why people choose to host files externally and it even was 
> for some time the only way you could do things.  
> 
> 
> Problem
> ---------------
> 
> **Today, python package installers (pip and easy_install) often need to
> query non-PYPI sites even if there are no externally hosted files**.
> Apart from querying pypi.python.org's simple index pages, also all
> homepages and download pages ever specified with any release of a
> package are crawled by an installer.  The need for installers to
> crawl 3rd party sites slows down installation and makes for a brittle
> unreliable installation process.   Those sites and packages also don't 
> take part in the :pep:`381` mirroring infrastructure, further decreasing
> reliability and speed of automated installation processes around the world. 
> 
> Roughly 90% of packages are hosted directly on pypi.python.org [1]_.
> Even for them installers still need to crawl the homepage(s) of a
> package.  Many package uploaders are particularly not aware that
> specifying the "homepage" in their release process will slow down 
> the installation process for all its users.
> 
> Relying on third party sites also opens up more attack vectors
> for injecting malicious packages into sites using automated installs.  
> A simple attack might just involve getting hold of an old now-unused
> homepage domain and placing mailicious packages there.  Moreover,
> performing a Man-in-The-Middle (MITM) attack between an installation
> site and any of the download sites can inject mailicious packages on the
> installation site.  As many homepages and download locations are using
> HTTP and not proper HTTPS, such attacks are not very hard to launch.
> Such MITM attacks can happen even for packages which never intended to
> host files externally as their homepages are contacted by installers
> anyway.
> 
> There is currently no way for package maintainers to avoid 3rd party
> crawling, other than removing all homepage/download url metadata
> for all historic releases.  While a script [3]_ has been written to 
> perform this action, it is not a good general solution because it removes
> semantic information like the "homepage" specification from PYPI packages.
> 
> 
> Solution
> -----------
> 
> The proposed solution consists of the following implementation and
> communication steps:
> 
> - determine which packages have releases files only on PYPI (group A)
>   and which have externally hosted release files (group B).
> 
> - Prepare PYPI implementation to allow a per-project "hosting mode",
>   effectively enabling or disabling external crawling.  When enabled 
>   nothing changes from the current situation of producing ``rel=download`` 
>   and ``rel=homepage`` attributed links on ``simple/`` pages, 
>   causing installers to crawl those sites.  
>   When disabled, the attributions of links will change 
>   to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to
>   avoid crawling 3rd party sites.  Retaining the meta-information allows
>   tools to still make use of the semantic information.

Please start using versioned APIs for these things. The
old style index should still be available under some
URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/

> - send mail to maintainers of A that their project is going to be 
>   automatically configured to "disable crawling" in one week
>   and encourage them to set this mode earlier to help all of 
>   their users.

One week ? That's a somewhat unrealistic timeframe.

I'm also missing some real-life tests to see what the effect
are on actual users, e.g. setup the new index using a
URL /simple-v2/ and let users play with it for a month
before making /simple/ == /simple-v2/.

> - send mail to maintainers of B that their package hosting mode 
>   is "crawling enabled", and list the sites which currently are crawled,
>   and suggest that they re-host their packages directly on PYPI and 
>   then switch the hosting-mode "disable crawling".  Provide instructions 
>   and at best tools to help with this "re-uploading" process.

That email should clearly state the PyPI terms to not
cause surprises among the maintainers.

I'd wait with this step until we've sorted out the PyPI terms
issues on the python-legal list, to not cause a an uproar
from people who get to read the terms for the first time ;-)

> In addition, maintainers of installation tools are asked to release
> two updates.  The first one shall provide clear warnings if external
> crawling needs to happen, for which projects and URLS exactly 
> this happens, and that in the future crawling will be disabled by default.  
> The next update shall change the default to disallow crawling and allow 
> crawling only with an explicit option like ``--crawl-externals`` and 
> another option allowing to limit which hosts are allowed to be crawled
> at all.

AFAIK, both already exist in easy_install. Not sure about pip.
They are not enable per default, though.

> Hosting-Mode state transitions
> ----------------------------------
> 
> 1. At the outset, we set hosting-mode to "notset" for all packages.
>    This will not change any link served via the simple index and thus
>    no bad effects are expected.  Early adopters and testers may now
>    change the mode to either "crawl" or "nocrawl" to help with
>    streamlining issues in the PYPI implementation.
> 
> 2. When maintainers of B packages are mailed their mode is directly
>    set to "crawl".
> 
> 3. When maintainers of A are mailed we leave the mode at "notset" to allow
>    people to change it to "nocrawl" themselves or to set it to "crawl" 
>    if they think they are wrongly in the "A" group.  After a week 
>    all "notset" modes are set to "nocrawl".
> 
> A week after the mailings all packages will be in "crawl" or "nocrawl"
> hosting mode.  It is then a matter of good tools and reaching out to
> maintainers of B packages to increase the A/B ratio.
> 
> Open questions
> ----------------------
> 
> - Should the support tools for "rehosting" packages be implemented  on the
>   server side or on the client side?  Implementing it on the client
>   side probably is quicker to get right and less fatal in terms of failures.

Not sure what you mean here.

Your are also completely leaving out the idea to only cache
distribution files on the PyPI CDN, without having to actually
upload them.

> - double-check if ``rel=newhomepage`` and ``rel=newdownload`` cause the 
>   desired behaviour of pip and easy_install (both the distribute and 
>   setuptools based one) to not crawl those pages.

Indeed :-)

Note that it will still be possible to add links to the
distribution files in the long description of the package.

Those links also show up on the /simple/ index page and
will then get used, regardless of whether they have a rel
attribute set or not.

> - are the "support tools" for re-hosting outside the scope of this PEP?

As with any PEP proposing an API change or a new API, it
has to provide a reference implementation.

The current distutils upload command is geared towards
uploading files at release time. While it is possible
to trick it into uploading existing distribution files,
it is not at all obvious how this is done.

> - Think some more about pip/easy_install "allow-hosts" mode etc.

Note that tools such as zc.buildout provide easy ways
of adding extra indexes and external URLs to scan for
distribution files.

I'm not sure how the above would fit such use cases,
i.e. if setuptools were to stop crawling external
links per default, this could mean that user hosted
PyPI-style indexes stop working with newer releases.

Here's an example list of indexes used in Plone 4.2:

# Add additional egg download sources here. dist.plone.org contains archives
# of Plone packages.
find-links =
    http://dist.plone.org
    http://download.zope.org/ppix/
    http://download.zope.org/distribution/
    http://effbot.org/downloads
    http://dist.plone.org/release/4.2

None of these seem to use the rel attribute feature, so those
will likely continue to work fine.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 12 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From mal at egenix.com  Tue Mar 12 17:19:34 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 12 Mar 2013 17:19:34 +0100
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
Message-ID: <513F5596.5090302@egenix.com>

On 12.03.2013 16:42, Jacob Kaplan-Moss wrote:
> On Tue, Mar 12, 2013 at 10:38 AM, PJ Eby <pje at telecommunity.com> wrote:
>> I'll ask it again: why should *thousands* of projects be censored or
>> made to change their release processes, because *you* can't be
>> bothered to cache the distributions of the projects you depend on?
> 
> Because externally-hosted files are a security risk, one that most
> users don't realize exists.
> 
> We can either fix this problem now, or we can wait until someone is
> compromised using PyPI as a vector.

We can fix this problem, yes, but we need to do this right and
try not to break things.

I don't see the need to rush this, just to address some perceived
high risk. Files hosted on PyPI are just as risky to use as files
on any other server.

The only way to minimize the risk is by downloading all the packages
you need, do reviews of all of them and each time a new release
is published. If you then point your installers only to the repository
where you keep your reviewed files, then you can feel safer.

In reality, this doesn't happen, though, so a lot of the stuff
we're talking about here is security theater, no matter how
much crypto/signing/hashing/hosting/CDN we throw at it :-)

So let's do this carefully and find a good solution before
jumping to conclusions.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 12 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From holger at merlinux.eu  Tue Mar 12 17:20:28 2013
From: holger at merlinux.eu (holger krekel)
Date: Tue, 12 Mar 2013 16:20:28 +0000
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting
 on PYPI
In-Reply-To: <CADiSq7fZPNuuUD-J7pu=e91Yi_OHzc5UFTz68jOibhE+PbMObg@mail.gmail.com>
References: <20130312113817.GA9677@merlinux.eu>
	<CADiSq7fZPNuuUD-J7pu=e91Yi_OHzc5UFTz68jOibhE+PbMObg@mail.gmail.com>
Message-ID: <20130312162028.GE9677@merlinux.eu>

On Wed, Mar 13, 2013 at 01:19 +1000, Nick Coghlan wrote:
> That looks pretty good to me. My only comment is that qualifiers like "new"
> don't age well in an API. The explicit "nocrawlhomepage" and
> "nocrawldownload" might be a better choice.

Right, we might also consider dropping rel-attributing given that
you can indeed access release metadata via the xmlrpc or json API.

best,
holger

> Cheers,
> Nick.

From jacob at jacobian.org  Tue Mar 12 17:29:45 2013
From: jacob at jacobian.org (Jacob Kaplan-Moss)
Date: Tue, 12 Mar 2013 11:29:45 -0500
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <513F5596.5090302@egenix.com>
References: <20130310150740.GE9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
Message-ID: <CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>

On Tue, Mar 12, 2013 at 11:19 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> So let's do this carefully and find a good solution before
> jumping to conclusions.

Completely agreed; rushing is a bad idea.

But so is not starting. What I'm seeing ? as a total outsider, a user
of these tools, not someone who creates them ? is that a bunch of
people (Holger, Donald, Richard, the pip maintainers, etc.) have the
beginnings of a solution ready to go *right now*, and I want to
capture that energy and enthusiasm before it evaporates.

This isn't an academic situation; I've seen companies decline to adopt
Python over this exact security issue. I can't share details in
writing but ask me at PyCon and I can tell you some stories.
Externally-hosted packages are a security risk, full stop.

There's likely a even better solution involving strong cryptography
and such, but there's also an incremental improvement on the table
right now. Nobody's suggesting that we do this hastily or all at once,
but there *is* a proposal to get the process started right now. Why
shouldn't we get going while there's momentum?

Jacob

From holger at merlinux.eu  Tue Mar 12 17:33:39 2013
From: holger at merlinux.eu (holger krekel)
Date: Tue, 12 Mar 2013 16:33:39 +0000
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting
 on PYPI
In-Reply-To: <CALeMXf6M+5pSmFq+Krqe1sc-p_z-tADhzaEmcdd5RwVw5VXc-w@mail.gmail.com>
References: <20130312113817.GA9677@merlinux.eu>
	<CALeMXf6M+5pSmFq+Krqe1sc-p_z-tADhzaEmcdd5RwVw5VXc-w@mail.gmail.com>
Message-ID: <20130312163339.GF9677@merlinux.eu>

On Tue, Mar 12, 2013 at 11:53 -0400, PJ Eby wrote:
> On Tue, Mar 12, 2013 at 7:38 AM, holger krekel <holger at merlinux.eu> wrote:
> > In addition, maintainers of installation tools are asked to release
> > two updates.  The first one shall provide clear warnings if external
> > crawling needs to happen,
> 
> A clarification here: "needs to happen" is not well-specified.  An
> installer tasked with finding the latest or best-matching version of a
> package must currently *always* crawl.  So the warning would be
> always.

Not after the initial automatic PYPI transition. For the 90% of the 
packages you wouldn't see the warning then.

> The strategy I originally chose for making this change in easy_install
> is to warn once at the beginning that --allow-hosts has not been set,
> and thus packages might be downloaded from anywhere on the internet.

>From a UI perspective i'd like to see a summary of actually consulted but
non-specified websites (including if it was http or https) at the 
very end of an installers output.  With "non-specified" i mean sites
that weren't specified as an indexserver or allow-host.

> I've since become uncertain that this change is actually workable in
> the short term, since until most of the packages are actually moved
> onto PyPI, a lot of installs will fail if somebody changes their
> configuration to be more secure.  So I'm thinking the warning needs to
> be deferred until at least the more popular packages have moved to
> PyPI.

I think it's fine to wait until after the initial "hosting-mode" transition.

> > Now, if there is some agreement, i can submit this PEP officially tomorrow,
> > and given agreement/refinments from the Pycon folks and the likes of
> > Richard, we may be able to get going very shortly after Pycon.
> 
> I'd like to suggest that the PEP should be explicit that no other
> changes to the /simple generation algorithm are being made, just the
> removal or alteration of rel="" attributes.  i.e., it will still be
> possible -- at least in the near term -- for projects to include
> explicit download links to files made available elsewhere.  Changing
> that situation is more controversial and will require wider community
> participation than has occurred to date.

I kind of agree.  To transition forward , we should leave out the
question of further modifying the "simple/" pages at the moment.
Mentioning that this means you can put "http://PKGNAME-VER.tar.gz" in
your PKGNAME long_description or download_url metadata makes sense.
For that, the installers will give warnings, however, and eventually 
change defaults according to the PEP draft.

> It might also be good to suggest that authors of PyPI clones plan
> their own phase-out of rel="" attributes.

Most alternative servers i've seen don't use the "rel" attribution
but it's good to mention it.

best,
holger


From mal at egenix.com  Tue Mar 12 17:41:31 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 12 Mar 2013 17:41:31 +0100
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
Message-ID: <513F5ABB.9030006@egenix.com>

On 12.03.2013 17:29, Jacob Kaplan-Moss wrote:
> On Tue, Mar 12, 2013 at 11:19 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> So let's do this carefully and find a good solution before
>> jumping to conclusions.
> 
> Completely agreed; rushing is a bad idea.
> 
> But so is not starting. What I'm seeing ? as a total outsider, a user
> of these tools, not someone who creates them ? is that a bunch of
> people (Holger, Donald, Richard, the pip maintainers, etc.) have the
> beginnings of a solution ready to go *right now*, and I want to
> capture that energy and enthusiasm before it evaporates.
> 
> This isn't an academic situation; I've seen companies decline to adopt
> Python over this exact security issue. I can't share details in
> writing but ask me at PyCon and I can tell you some stories.
> Externally-hosted packages are a security risk, full stop.
> 
> There's likely a even better solution involving strong cryptography
> and such, but there's also an incremental improvement on the table
> right now. Nobody's suggesting that we do this hastily or all at once,
> but there *is* a proposal to get the process started right now. Why
> shouldn't we get going while there's momentum?

Sure; I'm just saying that we need to test drive the proposal
before actually adopting it.

I'm also trying to get some of the more radical unneeded changes
reconsidered. We don't need to break things just because we can -
let's leave that to our kids ;-)

Holger has already addressed much of this in his V2 proposal
and apart from the time frame and some details, it looks good.

Meanwhile, I've been playing around with the earlier proposal
I put forward:

http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal

to secure external links and found several issues while
implementing it. It's easy to draw up a design, but you
only get down to the problems when actually trying to
implement it.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 12 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From carl at oddbird.net  Tue Mar 12 17:48:08 2013
From: carl at oddbird.net (Carl Meyer)
Date: Tue, 12 Mar 2013 10:48:08 -0600
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting
 on PYPI
In-Reply-To: <20130312113817.GA9677@merlinux.eu>
References: <20130312113817.GA9677@merlinux.eu>
Message-ID: <513F5C48.3070602@oddbird.net>

Hi Holger,

I am confused about the discrepancy between the title of this pre-PEP
("transition to release file hosting on PyPI") and the contents of the
PEP, which describe a transition to not crawling _HTML pages_ on
external sites looking for distribution download links. These are not
the same thing at all.

Current installer tools will only crawl external HTML pages if they are
rel="download" or rel="homepage", but they will use any link they find
in the simple index (regardless of rel attr) if the target of the link
appears to be a distribution file (as determined by filename
pattern-matching or #egg fragment).

At the end of the process you describe, if all packages migrate to
"nocrawl", the rel-link HTML spidering will no longer happen. This is a
good first step: it will speed up installation somewhat, and reduce the
frustration of some package owners when installers find files linked
from their project homepage that they never intended for automated
installation. But installers will still find and download release
packages that are not hosted on PyPI, if those package files are linked
directly in the simple index. This is still surprising behavior to many
new Python users, and still carries the security and reliability
concerns that this PEP claims to address.

I'm honestly not sure whether the title or the content more accurately
reflects the intent of this PEP; depending which it is, I suggest one of
the following:

1) Add to the PEP a description of a further step in the migration
process, which actually does transition away from automated installation
of non-PyPI-hosted release files (as the default behavior of
installation tools); or

2) Change the title of the PEP to something like "Transitioning away
from non-PyPI HTML crawling" and add a paragraph to the PEP clarifying
that this PEP does not address the issue of actual release files hosted
off-PyPI.

Carl

From holger at merlinux.eu  Tue Mar 12 18:05:08 2013
From: holger at merlinux.eu (holger krekel)
Date: Tue, 12 Mar 2013 17:05:08 +0000
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting
 on PYPI
In-Reply-To: <513F5282.3010206@egenix.com>
References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com>
Message-ID: <20130312170508.GG9677@merlinux.eu>

Hi Marc-Andre, all,

On Tue, Mar 12, 2013 at 17:06 +0100, M.-A. Lemburg wrote:
> On 12.03.2013 12:38, holger krekel wrote:
> > Hi all,
> > 
> > below is the new PEP pre-submit version (V2) which incorporates the
> > latest suggestions and aims at a rapidly deployable solution.  Thanks in
> > particular to Philip, Donald and Marc-Andre.  I also added a few notes
> > on how installers should behave with respect to non-PYPI crawling.  
> > 
> > I think a PEP like doc is warranted and that we should not silently
> > change things without proper communication to maintainers and pre-planning
> > the implementation/change process.  Arguably, the changes are more
> > invasive than "oh, let's just do a http->https redirect" which didn't
> > work too well either.
> > 
> > Now, if there is some agreement, i can submit this PEP officially tomorrow,
> > and given agreement/refinments from the Pycon folks and the likes of
> > Richard, we may be able to get going very shortly after Pycon.
> > 
> > cheers,
> > holger
> > 
> > 
> > PEP-draft: transitioning to release-file hosting on PYPI
> > ====================================================================
> > 
> > Status
> > -----------
> > 
> > PRE-SUBMIT-v2
> > 
> > Abstract
> > ------------
> > 
> > This PEP proposes a backward-compatible transition process to speed up,
> > simplify and robustify installing from the pypi.python.org (PYPI)
> > package index.  The initial transition will put most packages on PYPI
> > automatically in a configuration mode which will prevent client-side
> > crawling from installers.  To ease automatic transition and minimize
> > client-side friction, **no changes to distutils or installation tools** are
> > required.  Instead, the transition is implemented by modifying PYPI to
> > serve links from ``simple/`` pages in a configurable way, preventing or
> > allowing crawling of non-PYPI sites for detecting release files.
> > Maintainers of all PYPI packages will be notified ahead of those
> > changes.
> > 
> > Maintainers of packages which currently are hosted on non-PYPI sites
> > shall receive instructions and tools to ease "re-hosting" of their
> > historic and future package release files.  The implementation of such
> > tools is NOT required for implementing the initial automatic transition.
> > 
> > Installation tools like pip and easy_install shall warn about crawling
> > non-PYPI sites and later default to disallow it and only allow it with
> > an explicit option.
> > 
> > 
> > History and motivations for external hosting
> > ------------------------------------------------
> > 
> > When PYPI went online, it offered release registration but had no
> > facility to host release files itself.  When hosting was added, no
> > automated downloading tool existed yet.  When Philip Eby implemented
> > automated downloading (through setuptools), he made the choice 
> > to allow people to use download hosts of their choice.  This was
> > implemented by the PYPI ``simple/`` index containing links of type
> > ``rel=homepage`` or ``rel=download`` which are crawled by installation
> > tools to discover package links.  As of March 2013, a substantial part 
> > of packages (estimated to about 10%) make use of this mechanism to host
> > files on github, bitbucket, sourceforge or own hosting sites like 
> > ``mercurial.selenic.com``, to just name a few.
> > 
> > There are many reasons [2]_ why people choose to use external hosting,
> > to cite just a few:
> > 
> > - release processes and scripts have been developed already and 
> >   upload to external sites 
> > 
> > - it takes too long to upload large files from some places in the world
> > 
> > - export restrictions e.g. for crypto-related software
> > 
> > - company policies which prescribe offering open source packages through
> >   own sites
> > 
> > - problems with integrating uploading to PYPI into one's release process
> >   (because of release policies)
> > 
> > - perceived bad reliability of PYPI
> > 
> > - missing knowlege you can upload files 
> > 
> > Irrespective of the present-day validity of these reasons, there clearly
> > is a history why people choose to host files externally and it even was 
> > for some time the only way you could do things.  
> > 
> > 
> > Problem
> > ---------------
> > 
> > **Today, python package installers (pip and easy_install) often need to
> > query non-PYPI sites even if there are no externally hosted files**.
> > Apart from querying pypi.python.org's simple index pages, also all
> > homepages and download pages ever specified with any release of a
> > package are crawled by an installer.  The need for installers to
> > crawl 3rd party sites slows down installation and makes for a brittle
> > unreliable installation process.   Those sites and packages also don't 
> > take part in the :pep:`381` mirroring infrastructure, further decreasing
> > reliability and speed of automated installation processes around the world. 
> > 
> > Roughly 90% of packages are hosted directly on pypi.python.org [1]_.
> > Even for them installers still need to crawl the homepage(s) of a
> > package.  Many package uploaders are particularly not aware that
> > specifying the "homepage" in their release process will slow down 
> > the installation process for all its users.
> > 
> > Relying on third party sites also opens up more attack vectors
> > for injecting malicious packages into sites using automated installs.  
> > A simple attack might just involve getting hold of an old now-unused
> > homepage domain and placing mailicious packages there.  Moreover,
> > performing a Man-in-The-Middle (MITM) attack between an installation
> > site and any of the download sites can inject mailicious packages on the
> > installation site.  As many homepages and download locations are using
> > HTTP and not proper HTTPS, such attacks are not very hard to launch.
> > Such MITM attacks can happen even for packages which never intended to
> > host files externally as their homepages are contacted by installers
> > anyway.
> > 
> > There is currently no way for package maintainers to avoid 3rd party
> > crawling, other than removing all homepage/download url metadata
> > for all historic releases.  While a script [3]_ has been written to 
> > perform this action, it is not a good general solution because it removes
> > semantic information like the "homepage" specification from PYPI packages.
> > 
> > 
> > Solution
> > -----------
> > 
> > The proposed solution consists of the following implementation and
> > communication steps:
> > 
> > - determine which packages have releases files only on PYPI (group A)
> >   and which have externally hosted release files (group B).
> > 
> > - Prepare PYPI implementation to allow a per-project "hosting mode",
> >   effectively enabling or disabling external crawling.  When enabled 
> >   nothing changes from the current situation of producing ``rel=download`` 
> >   and ``rel=homepage`` attributed links on ``simple/`` pages, 
> >   causing installers to crawl those sites.  
> >   When disabled, the attributions of links will change 
> >   to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to
> >   avoid crawling 3rd party sites.  Retaining the meta-information allows
> >   tools to still make use of the semantic information.
> 
> Please start using versioned APIs for these things. The
> old style index should still be available under some
> URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/

Not sure it is neccessary in this case.  I would think it makes
the implementation harder and it would probably break PEP381 (mirroring
infrastructure) as well.

> > - send mail to maintainers of A that their project is going to be 
> >   automatically configured to "disable crawling" in one week
> >   and encourage them to set this mode earlier to help all of 
> >   their users.
> 
> One week ? That's a somewhat unrealistic timeframe.

Assuming we get our initial analysis correct, it's not a super-critical
change.  Also very easy to switch it back on a per-project basis.

I suggest we refine and repeat Donald's script from multiple places in
the world and merge the results to get a consolidated set of
"needs-no-crawling" packages.  If in doubt, we put a project into the
"needs-crawl" category.  Therefore, we can assume our set of
"needs-no-crawling" packages to be safe enough to perform the switching.
The one week is just there as an additional safety net, to give the
authors a chance for acting if they thing we did wrong.  I don't think
we end up with many problems and they will be localized to very very few
packages.  Extending the time frame will not help to significantly
reduce this number.  The main problem will be mails not reaching
a human, i suspect.
 
> I'm also missing some real-life tests to see what the effect
> are on actual users, e.g. setup the new index using a
> URL /simple-v2/ and let users play with it for a month
> before making /simple/ == /simple-v2/.

Preparation time is specified in the PEP by bringing the PYPI changes
online and asking _some_ people to set their hosting-mode.  As of know, 
the changes to PYPI are fairly trivial.

> > - send mail to maintainers of B that their package hosting mode 
> >   is "crawling enabled", and list the sites which currently are crawled,
> >   and suggest that they re-host their packages directly on PYPI and 
> >   then switch the hosting-mode "disable crawling".  Provide instructions 
> >   and at best tools to help with this "re-uploading" process.
> 
> That email should clearly state the PyPI terms to not
> cause surprises among the maintainers.

Can't the PYPI TOS be referenced from that mail?
And an address where they can get back in case of questions?

> I'd wait with this step until we've sorted out the PyPI terms
> issues on the python-legal list, to not cause a an uproar
> from people who get to read the terms for the first time ;-)

We could postpone the B packages maintainers mailing if there 
is a legal need.  We can still migrate "A" packages already.

> > In addition, maintainers of installation tools are asked to release
> > two updates.  The first one shall provide clear warnings if external
> > crawling needs to happen, for which projects and URLS exactly 
> > this happens, and that in the future crawling will be disabled by default.  
> > The next update shall change the default to disallow crawling and allow 
> > crawling only with an explicit option like ``--crawl-externals`` and 
> > another option allowing to limit which hosts are allowed to be crawled
> > at all.
> 
> AFAIK, both already exist in easy_install. Not sure about pip.
> They are not enable per default, though.

Right, i didn't investigage in detail the current cmdline options.  
To keep things simple i'd  like to just specify the meta-level of (a)
giving warnings and b) changing the default.

> > Hosting-Mode state transitions
> > ----------------------------------
> > 
> > 1. At the outset, we set hosting-mode to "notset" for all packages.
> >    This will not change any link served via the simple index and thus
> >    no bad effects are expected.  Early adopters and testers may now
> >    change the mode to either "crawl" or "nocrawl" to help with
> >    streamlining issues in the PYPI implementation.
> > 
> > 2. When maintainers of B packages are mailed their mode is directly
> >    set to "crawl".
> > 
> > 3. When maintainers of A are mailed we leave the mode at "notset" to allow
> >    people to change it to "nocrawl" themselves or to set it to "crawl" 
> >    if they think they are wrongly in the "A" group.  After a week 
> >    all "notset" modes are set to "nocrawl".
> > 
> > A week after the mailings all packages will be in "crawl" or "nocrawl"
> > hosting mode.  It is then a matter of good tools and reaching out to
> > maintainers of B packages to increase the A/B ratio.
> > 
> > Open questions
> > ----------------------
> > 
> > - Should the support tools for "rehosting" packages be implemented  on the
> >   server side or on the client side?  Implementing it on the client
> >   side probably is quicker to get right and less fatal in terms of failures.
> 
> Not sure what you mean here.

"Rehosting" tools help to transfer release files to PYPI which
are currently served on non-PYPI sites through the "crawling" algo.  
This could be done via a server-side interface or via client-side tools.  
I prefer the latter because i'd like to keep changes on the PYPI 
server minimal.  I am sure Richard agrees :)

> Your are also completely leaving out the idea to only cache
> distribution files on the PyPI CDN, without having to actually
> upload them.

Not sure what you mean.  FWIW, how PYPI hosts packages itself is completely
left out of this PEP on purpose.  PYPI might evolve to offer packages on a CDN
or improve the existing PEP381 infrastructure or introduce simple
"rsync-ability" (like CPAN).  IOW, this "no crawling" PEP is orthogonal 
to this question.

> > - double-check if ``rel=newhomepage`` and ``rel=newdownload`` cause the 
> >   desired behaviour of pip and easy_install (both the distribute and 
> >   setuptools based one) to not crawl those pages.
> 
> Indeed :-)

We might just avoid rel-attributions and point to the XMLRPC/JSON API - 
i am sure this works with easy_install and pip :)  

> Note that it will still be possible to add links to the
> distribution files in the long description of the package.

> Those links also show up on the /simple/ index page and
> will then get used, regardless of whether they have a rel
> attribute set or not.

Yes, this should be noted.

> > - are the "support tools" for re-hosting outside the scope of this PEP?
> 
> As with any PEP proposing an API change or a new API, it
> has to provide a reference implementation.

The re-hosting tools are NOT required for the "transition" part of
the PEP.  The PYPI implementation changes are required, of course.
Donald offered to help with a PYPI PR and the PEP tries to minimize
the neccessary changes.

> The current distutils upload command is geared towards
> uploading files at release time. While it is possible
> to trick it into uploading existing distribution files,
> it is not at all obvious how this is done.

Right, but i've written the code for that in another project.  Unless
someone (probably Donald) else beats me to it, i can try to help with
writing such a re-hosting tool.

> > - Think some more about pip/easy_install "allow-hosts" mode etc.
> 
> Note that tools such as zc.buildout provide easy ways
> of adding extra indexes and external URLs to scan for
> distribution files.
> 
> I'm not sure how the above would fit such use cases,
> i.e. if setuptools were to stop crawling external
> links per default, this could mean that user hosted
> PyPI-style indexes stop working with newer releases.
> 
> Here's an example list of indexes used in Plone 4.2:
> 
> # Add additional egg download sources here. dist.plone.org contains archives
> # of Plone packages.
> find-links =
>     http://dist.plone.org
>     http://download.zope.org/ppix/
>     http://download.zope.org/distribution/
>     http://effbot.org/downloads
>     http://dist.plone.org/release/4.2
> 
> None of these seem to use the rel attribute feature, so those
> will likely continue to work fine.

I am not surprised.  I don't know of alternative PYPI implementations
that actually implement "rel" attribution.  Most of them have the purpose
of controling which packages are installed in company environments and
thus have no need to implement this crawling mechanism but rather always
host files in their database.

cheers,
holger


> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 12 2013)
> >>> Python Projects, Consulting and Support ...   http://www.egenix.com/
> >>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
> >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>            Registered at Amtsgericht Duesseldorf: HRB 46611
>                http://www.egenix.com/company/contact/
> 

From holger at merlinux.eu  Tue Mar 12 18:17:40 2013
From: holger at merlinux.eu (holger krekel)
Date: Tue, 12 Mar 2013 17:17:40 +0000
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting
 on PYPI
In-Reply-To: <513F5C48.3070602@oddbird.net>
References: <20130312113817.GA9677@merlinux.eu> <513F5C48.3070602@oddbird.net>
Message-ID: <20130312171740.GH9677@merlinux.eu>

Hi Carl,

On Tue, Mar 12, 2013 at 10:48 -0600, Carl Meyer wrote:
> Hi Holger,
> 
> I am confused about the discrepancy between the title of this pre-PEP
> ("transition to release file hosting on PyPI") and the contents of the
> PEP, which describe a transition to not crawling _HTML pages_ on
> external sites looking for distribution download links. These are not
> the same thing at all.

I agree the title is not quite right at the moment.

> Current installer tools will only crawl external HTML pages if they are
> rel="download" or rel="homepage", but they will use any link they find
> in the simple index (regardless of rel attr) if the target of the link
> appears to be a distribution file (as determined by filename
> pattern-matching or #egg fragment).

Right.

> At the end of the process you describe, if all packages migrate to
> "nocrawl", the rel-link HTML spidering will no longer happen. This is a
> good first step: it will speed up installation somewhat, and reduce the
> frustration of some package owners when installers find files linked
> from their project homepage that they never intended for automated
> installation. But installers will still find and download release
> packages that are not hosted on PyPI, if those package files are linked
> directly in the simple index. This is still surprising behavior to many
> new Python users, and still carries the security and reliability
> concerns that this PEP claims to address.

Yes, and here the installers should move to give clear warnings
and change defaults.

> I'm honestly not sure whether the title or the content more accurately
> reflects the intent of this PEP; depending which it is, I suggest one of
> the following:
> 
> 1) Add to the PEP a description of a further step in the migration
> process, which actually does transition away from automated installation
> of non-PyPI-hosted release files (as the default behavior of
> installation tools); or

This makes sense to me.  Do you feel like opening a pull request on

    https://bitbucket.org/hpk42/pep-pypi

to help refine this aspect?  I am also on IRC for co-ordination (also
about the title) as i intend to create the PEP submission for
python-ideas and maybe already the pep-editors (?!).  In any case, it
wouldn't mean the PEP's discussion is finalized, of course, and i'd
continue to post here new versions and ask for feedback.

cheers,
holger

> 2) Change the title of the PEP to something like "Transitioning away
> from non-PyPI HTML crawling" and add a paragraph to the PEP clarifying
> that this PEP does not address the issue of actual release files hosted
> off-PyPI.


> Carl
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
> 

From pje at telecommunity.com  Tue Mar 12 18:18:05 2013
From: pje at telecommunity.com (PJ Eby)
Date: Tue, 12 Mar 2013 13:18:05 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
Message-ID: <CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>

On Tue, Mar 12, 2013 at 12:29 PM, Jacob Kaplan-Moss <jacob at jacobian.org> wrote:
> On Tue, Mar 12, 2013 at 11:19 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> So let's do this carefully and find a good solution before
>> jumping to conclusions.
>
> Completely agreed; rushing is a bad idea.
>
> But so is not starting. What I'm seeing ? as a total outsider, a user
> of these tools, not someone who creates them ? is that a bunch of
> people (Holger, Donald, Richard, the pip maintainers, etc.) have the
> beginnings of a solution ready to go *right now*, and I want to
> capture that energy and enthusiasm before it evaporates.
>
> This isn't an academic situation; I've seen companies decline to adopt
> Python over this exact security issue.

Nobody told them about how to configure a restricted, site-wide
default --allow-hosts setting?   (
http://peak.telecommunity.com/DevCenter/EasyInstall#restricting-downloads-with-allow-hosts
and http://docs.python.org/2/install/index.html#location-and-names-of-config-files
)

(FWIW, --allow-hosts was added in setuptools 0.6a6 -- *years* before
the distribute fork or the existence of pip, and pip offers the same
option.)

I've already agreed to change setuptools to default this option to
only allow downloads from the same host as its index URL, in a future
release.  (i.e. to default --allow-hosts to the host of the
--index-url option), and I support the removing of rel="" spidering
from PyPI (which will significantly mitigate the immediate speed and
security issues).  Heck, I've been the one who'se repeatedly proposed
various ways of cutting back or removing rel="" attributes from the
/simple index.

The result of these two changes will actually have the same net effect
that people are being asking for here: you'll only be able to download
stuff hosted on PyPI, unless you explicitly override the --allow-hosts
to get a wider range of packages.

Already today, when a URL is blocked by --allow-hosts, it's announced
as part of easy_install's output, so you can see exactly how much
wider you need to extend your trust for the download to succeed.

The *only* thing I object to is removing the ability for people to
*choose* their own levels of trust.

And I have not yet seen an argument that justifies removing people's
ability to *choose* to be more inclusive in their downloads.

And I've put multiple compromise proposals out there to begin
mitigating the problem *now* (i.e. for non-updated versions of
setuptools), and every time, the objection is, "no, we need to ban it
all now, no discussion, no re-evaluation, no personal choice, everyone
must do as we say, no argument".

And I don't understand that, at all.

From holger at merlinux.eu  Tue Mar 12 18:22:26 2013
From: holger at merlinux.eu (holger krekel)
Date: Tue, 12 Mar 2013 17:22:26 +0000
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
References: <CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
Message-ID: <20130312172226.GI9677@merlinux.eu>

On Tue, Mar 12, 2013 at 13:18 -0400, PJ Eby wrote:
> On Tue, Mar 12, 2013 at 12:29 PM, Jacob Kaplan-Moss <jacob at jacobian.org> wrote:
> > On Tue, Mar 12, 2013 at 11:19 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> >> So let's do this carefully and find a good solution before
> >> jumping to conclusions.
> >
> > Completely agreed; rushing is a bad idea.
> >
> > But so is not starting. What I'm seeing ? as a total outsider, a user
> > of these tools, not someone who creates them ? is that a bunch of
> > people (Holger, Donald, Richard, the pip maintainers, etc.) have the
> > beginnings of a solution ready to go *right now*, and I want to
> > capture that energy and enthusiasm before it evaporates.
> >
> > This isn't an academic situation; I've seen companies decline to adopt
> > Python over this exact security issue.
> 
> Nobody told them about how to configure a restricted, site-wide
> default --allow-hosts setting?   (
> http://peak.telecommunity.com/DevCenter/EasyInstall#restricting-downloads-with-allow-hosts
> and http://docs.python.org/2/install/index.html#location-and-names-of-config-files
> )
> 
> (FWIW, --allow-hosts was added in setuptools 0.6a6 -- *years* before
> the distribute fork or the existence of pip, and pip offers the same
> option.)
> 
> I've already agreed to change setuptools to default this option to
> only allow downloads from the same host as its index URL, in a future
> release.  (i.e. to default --allow-hosts to the host of the
> --index-url option), and I support the removing of rel="" spidering
> from PyPI (which will significantly mitigate the immediate speed and
> security issues).  Heck, I've been the one who'se repeatedly proposed
> various ways of cutting back or removing rel="" attributes from the
> /simple index.
> 
> The result of these two changes will actually have the same net effect
> that people are being asking for here: you'll only be able to download
> stuff hosted on PyPI, unless you explicitly override the --allow-hosts
> to get a wider range of packages.
> 
> Already today, when a URL is blocked by --allow-hosts, it's announced
> as part of easy_install's output, so you can see exactly how much
> wider you need to extend your trust for the download to succeed.
> 
> The *only* thing I object to is removing the ability for people to
> *choose* their own levels of trust.
> 
> And I have not yet seen an argument that justifies removing people's
> ability to *choose* to be more inclusive in their downloads.
> 
> And I've put multiple compromise proposals out there to begin
> mitigating the problem *now* (i.e. for non-updated versions of
> setuptools), and every time, the objection is, "no, we need to ban it
> all now, no discussion, no re-evaluation, no personal choice, everyone
> must do as we say, no argument".

FWIW, the PEP draft in V2 doesn't take this approach and i don't
plan to introduce it in subsequent versions. IOW, i agree that
we should keep things backward-compatible in the sense that users
can choose to use non-default settings to get the current behaviour 
(which might make their installation process less reliable/secure, 
but that's their choice).

cheers,
holger
 
> And I don't understand that, at all.


> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
> 

From jnoller at gmail.com  Tue Mar 12 18:33:55 2013
From: jnoller at gmail.com (Jesse Noller)
Date: Tue, 12 Mar 2013 13:33:55 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
Message-ID: <2564CB0F5D96477E86F655096A06941E@gmail.com>


> 
> And I've put multiple compromise proposals out there to begin
> mitigating the problem *now* (i.e. for non-updated versions of
> setuptools), and every time, the objection is, "no, we need to ban it
> all now, no discussion, no re-evaluation, no personal choice, everyone
> must do as we say, no argument".
> 
> And I don't understand that, at all.

There's not much to understand: external hosting of packages is *actively harmful*, period. End users of easy_install and pip *don't even realize* 99% of the time that these tools are following links off of PyPi and installing packages from random, probably insecure/non https locations all over the internet. Once they realize it they recoil in terror if they have any understanding of the implications.

Let me put this in different terms: out of the packages using external hosting: can you prove to me that 100% of them aren't compromised machines serving malware, performing MITM attacks, etc? The fact that the end user tools support this is a bug, but one from history. The fact that PyPI continues to support external links on simple/ is inexcusable given that we know that they are an attack vector. 

A simple proof of concept on a popular package hosted off site deployed during PyCon would be terrible, it was bad enough that last year people were trying to MITM due to lack of SSL. 

jesse

From pje at telecommunity.com  Tue Mar 12 18:54:25 2013
From: pje at telecommunity.com (PJ Eby)
Date: Tue, 12 Mar 2013 13:54:25 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <2564CB0F5D96477E86F655096A06941E@gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<2564CB0F5D96477E86F655096A06941E@gmail.com>
Message-ID: <CALeMXf5v+YU06oLHgbd57oUKHmrJrq8inhHGMrRe+A-9Nq9z=Q@mail.gmail.com>

On Tue, Mar 12, 2013 at 1:33 PM, Jesse Noller <jnoller at gmail.com> wrote:
> There's not much to understand: external hosting of packages is *actively harmful*, period. End users of easy_install and pip *don't even realize* 99% of the time that these tools are following links off of PyPi and installing packages from random, probably insecure/non https locations all over the internet. Once they realize it they recoil in terror if they have any understanding of the implications.

This is a rationale for secure defaults for various options, like the
ones I outlined in the portions of my post that you *didn't* quote.

It's not a rationale for removing the options themselves.

From mal at egenix.com  Tue Mar 12 19:00:21 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 12 Mar 2013 19:00:21 +0100
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <2564CB0F5D96477E86F655096A06941E@gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<2564CB0F5D96477E86F655096A06941E@gmail.com>
Message-ID: <513F6D35.2030707@egenix.com>

On 12.03.2013 18:33, Jesse Noller wrote:
> 
>>
>> And I've put multiple compromise proposals out there to begin
>> mitigating the problem *now* (i.e. for non-updated versions of
>> setuptools), and every time, the objection is, "no, we need to ban it
>> all now, no discussion, no re-evaluation, no personal choice, everyone
>> must do as we say, no argument".
>>
>> And I don't understand that, at all.
> 
> There's not much to understand: external hosting of packages is *actively harmful*, period. End users of easy_install and pip *don't even realize* 99% of the time that these tools are following links off of PyPi and installing packages from random, probably insecure/non https locations all over the internet. Once they realize it they recoil in terror if they have any understanding of the implications.
> 
> Let me put this in different terms: out of the packages using external hosting: can you prove to me that 100% of them aren't compromised machines serving malware, performing MITM attacks, etc? The fact that the end user tools support this is a bug, but one from history. The fact that PyPI continues to support external links on simple/ is inexcusable given that we know that they are an attack vector. 
> 
> A simple proof of concept on a popular package hosted off site deployed during PyCon would be terrible, it was bad enough that last year people were trying to MITM due to lack of SSL. 

Let's please not exaggerate all this. It's not like PyPI is
the only server out there implementing HTTPS, ye know ;-)

A single package uploaded on PyPI with os.system('rm -rf')
in its setup.py could easily ruin all this and no HTTPS in this
world would stop it from showing its ugly face.

The whole Python package eco-system works based on trust and
injecting fear into this system is not helpful, IMO.

People need to understand the possible issues, we need to make
things safer from both the client and the server side and
improve the tool chain. There's really nothing new here.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 12 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From mal at egenix.com  Tue Mar 12 19:07:28 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 12 Mar 2013 19:07:28 +0100
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting
 on PYPI
In-Reply-To: <20130312170508.GG9677@merlinux.eu>
References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com>
	<20130312170508.GG9677@merlinux.eu>
Message-ID: <513F6EE0.6080503@egenix.com>

Just a quick note (more later, if time permits)...

On 12.03.2013 18:05, holger krekel wrote:
> Hi Marc-Andre, all,
> 
>>> - Prepare PYPI implementation to allow a per-project "hosting mode",
>>>   effectively enabling or disabling external crawling.  When enabled 
>>>   nothing changes from the current situation of producing ``rel=download`` 
>>>   and ``rel=homepage`` attributed links on ``simple/`` pages, 
>>>   causing installers to crawl those sites.  
>>>   When disabled, the attributions of links will change 
>>>   to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to
>>>   avoid crawling 3rd party sites.  Retaining the meta-information allows
>>>   tools to still make use of the semantic information.
>>
>> Please start using versioned APIs for these things. The
>> old style index should still be available under some
>> URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/
> 
> Not sure it is neccessary in this case.  I would think it makes
> the implementation harder and it would probably break PEP381 (mirroring
> infrastructure) as well.

Here's what I meant:

We publish the current implementation of the /simple/ index API
under a new URL /simple-v1/, so that people that want to use
the old API can continue to do so.

Then we setup a new /simple-v2/ index API with your proposed
change, perhaps even dropping the rel attribute altogether.

During testing, we'd then have:

/simple/    - same as /simple-v1/
/simple-v1/ - old API with rel attributes always set
/simple-v2/ - new API with your changes (rel attributes only
              set in some cases)

After a month or so of testing, we then switch this to:

/simple/    - same as /simple-v2/
/simple-v1/ - old API with rel attributes always set
/simple-v2/ - new API with your changes (rel attributes only
              set in some cases)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 12 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From mal at egenix.com  Tue Mar 12 19:15:17 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 12 Mar 2013 19:15:17 +0100
Subject: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource
	sorting algorithm
Message-ID: <513F70B5.5030501@egenix.com>

I've run into a weird issue with easy_install, that I'm trying to solve:

If I place two files named

egenix_mxodbc_connect_client-2.0.2-py2.6.egg
egenix-mxodbc-connect-client-2.0.2.win32-py2.6.prebuilt.zip

into the same directory and let easy_install running on Linux
scan this, it considers the second file for Windows as best
match.

Is the algorithm used for determining the best match documented
somewhere ?

I've had a look at the implementation, but this left me rather
clueless.

I thought that setuptools would prefer the .egg file over
the prebuilt .zip file - binary files being easier to install
than "source" files.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 12 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From donald at stufft.io  Tue Mar 12 19:17:55 2013
From: donald at stufft.io (Donald Stufft)
Date: Tue, 12 Mar 2013 14:17:55 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <513F5ABB.9030006@egenix.com>
References: <20130310150740.GE9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<513F5ABB.9030 006@egenix.com>
Message-ID: <98847D8D-02D3-4C00-AB63-9B100C283D29@stufft.io>

On Mar 12, 2013, at 12:41 PM, "M.-A. Lemburg" <mal at egenix.com> wrote:

> On 12.03.2013 17:29, Jacob Kaplan-Moss wrote:
>> On Tue, Mar 12, 2013 at 11:19 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>>> So let's do this carefully and find a good solution before
>>> jumping to conclusions.
>> 
>> Completely agreed; rushing is a bad idea.
>> 
>> But so is not starting. What I'm seeing ? as a total outsider, a user
>> of these tools, not someone who creates them ? is that a bunch of
>> people (Holger, Donald, Richard, the pip maintainers, etc.) have the
>> beginnings of a solution ready to go *right now*, and I want to
>> capture that energy and enthusiasm before it evaporates.
>> 
>> This isn't an academic situation; I've seen companies decline to adopt
>> Python over this exact security issue. I can't share details in
>> writing but ask me at PyCon and I can tell you some stories.
>> Externally-hosted packages are a security risk, full stop.
>> 
>> There's likely a even better solution involving strong cryptography
>> and such, but there's also an incremental improvement on the table
>> right now. Nobody's suggesting that we do this hastily or all at once,
>> but there *is* a proposal to get the process started right now. Why
>> shouldn't we get going while there's momentum?
> 
> Sure; I'm just saying that we need to test drive the proposal
> before actually adopting it.

fwiw https://restricted.crate.io/ is the simple index minus any external url and has existed for over a year. I use it full time. and have others doing the same.

> 
> I'm also trying to get some of the more radical unneeded changes
> reconsidered. We don't need to break things just because we can -
> let's leave that to our kids ;-)
> 
> Holger has already addressed much of this in his V2 proposal
> and apart from the time frame and some details, it looks good.
> 
> Meanwhile, I've been playing around with the earlier proposal
> I put forward:
> 
> http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal
> 
> to secure external links and found several issues while
> implementing it. It's easy to draw up a design, but you
> only get down to the problems when actually trying to
> implement it.
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 12 2013)
>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130312/cc30ca9d/attachment.pgp>

From carl at oddbird.net  Tue Mar 12 19:18:53 2013
From: carl at oddbird.net (Carl Meyer)
Date: Tue, 12 Mar 2013 12:18:53 -0600
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
Message-ID: <513F718D.4040307@oddbird.net>

It seems to me that there's a remarkable level of consensus developing
here (though it may not look like it), and a small set of remaining open
questions.

The consensus (as I see it):

- Migrate away from scraping external HTML pages, with package owners in
control of the migration but a deadline for a forced switch, as outlined
in Holger's PEP (with all appropriate caution and testing).

- In some way, migrate to a situation where the popular installer tools
install only release files from PyPI by default, but are capable of
installing from other locations if the user provides an option.

The open question is basically how to implement the latter portion. I
see two options proposed:

A) Leave external links in the PyPI simple index, but migrate the major
tools to not use external links by default (i.e. Philip's plan to make
allow-hosts=pypi the default in a future setuptools), with an option to
turn them back on.

or

B) Do a second PyPI migration, again with a per-package toggle and
package owners in control, to a "no external links in simple index" setting.

Consider for a moment how similar the end state here is with either A or
B. In either case, by default users install only from PyPI, but by
providing a special option they can install from some external source.
(In B, that special option would be something like --find-links with a
URL). In either case, we can continue to allow packages to register
themselves on PyPI, be found in searches, etc, without uploading release
files to PyPI if they prefer not to; they'll just have to provide
special installation instructions to their users in that case.

Here are some differences:

1) With B, we can provide a gentler migration for package owners, where
they are in control of when the switch happens. With A, regardless of
how it's done at some point some package owners are likely to start
getting "hey, i can't install your stuff anymore" reports from users,
and they can't control when that starts happening.

2) With B, all end users benefit from the new defaults, not only end
users who update to the latest and greatest tools.

3) With B (and probably some forms of A as well), end users clearly
state which external sources they would like to trust and install from,
rather than having a global "trust everything!" flag, which is less
secure and less sensible.

It seems to me that option B (a controlled, per-package, PyPI migration
to no-external-links in simple index) is a better migration path than A
(leaving it up to external tools), and the end result either way is very
similar.

Carl

From robertc at robertcollins.net  Tue Mar 12 19:43:06 2013
From: robertc at robertcollins.net (Robert Collins)
Date: Wed, 13 Mar 2013 07:43:06 +1300
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <513F718D.4040307@oddbird.net>
References: <20130310150740.GE9677@merlinux.eu>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
Message-ID: <CAJ3HoZ1oDd9+keKeNGpntTput7FwaYAwAd9-Gv2A==vvo=z+zQ@mail.gmail.com>

On 13 March 2013 07:18, Carl Meyer <carl at oddbird.net> wrote:
> It seems to me that there's a remarkable level of consensus developing
> here (though it may not look like it), and a small set of remaining open
> questions.
>
> The consensus (as I see it):

I think that is a fair summary.

One thing I'd like to mention, that I don't recall seeing so far is
that PyPI is *really slow*. I don't mean 'the pypi web host is on a
bad link' - far from it.

pip, and I presume setuptools, spider to check dependencies and do the
external HTML scraping and so forth.

This takes an age when each new web host to talk to is a new DNS
lookup (say 0.3 seconds) + HTTP request (0.6 seconds) with possible
HTTPS setup in there too (up to 1.2 seconds). A project with dozens of
dependencies in it's transitive dependency graph may take minutes
*just spidering*.

Now, if you read those figures and go 'zomg thats slow' - well yes,
light speed isn't that fast - and even then  while much of
round-the-globe traffic is at light speed, a considerable chunk of
time isn't.

Moving all releases to one HTTPS host (and ensuring persistent
connections are used for repeated index queries) [and then drop to
HTTP for release files so they can be squid cached] is the simplest
short term solution to this, and I'm *really* excited to see it being
tackled.

Longer term I'd love to see PyPI offer an API to return transitive
data, to avoid the spidering altogether.

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Cloud Services

From jacob at jacobian.org  Tue Mar 12 19:52:25 2013
From: jacob at jacobian.org (Jacob Kaplan-Moss)
Date: Tue, 12 Mar 2013 13:52:25 -0500
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CALeMXf5v+YU06oLHgbd57oUKHmrJrq8inhHGMrRe+A-9Nq9z=Q@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<2564CB0F5D96477E86F655096A06941E@gmail.com>
	<CALeMXf5v+YU06oLHgbd57oUKHmrJrq8inhHGMrRe+A-9Nq9z=Q@mail.gmail.com>
Message-ID: <CAK8PqJEZeE6D0oOUp6j6B711AprZS40HenyCAF95dwdKypTLLA@mail.gmail.com>

On Tue, Mar 12, 2013 at 12:54 PM, PJ Eby <pje at telecommunity.com> wrote:
> This is a rationale for secure defaults for various options, like the
> ones I outlined in the portions of my post that you *didn't* quote.
>
> It's not a rationale for removing the options themselves.

Exactly; thanks for saying this better than I did.

As we've seen from the recent Rails security vulnerabilities, secure
has to be the default. Users having to explicitly choose the "secure"
option is an anti-pattern, with teeth.

As long as the default, out-of-the-box behavior is secure it's fine;
users who want to run their tools with the "--hack-me-if-you-can" flag
will find a way to do so. This isn't about taking away people's
options, but about putting secure-by-default tools into the hands
people who need them the most.

Jacob

From jacob at jacobian.org  Tue Mar 12 19:56:00 2013
From: jacob at jacobian.org (Jacob Kaplan-Moss)
Date: Tue, 12 Mar 2013 13:56:00 -0500
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <513F6D35.2030707@egenix.com>
References: <20130310150740.GE9677@merlinux.eu>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<2564CB0F5D96477E86F655096A06941E@gmail.com>
	<513F6D35.2030707@egenix.com>
Message-ID: <CAK8PqJEaUfuDQOt=MzpKXLfhjd8ro0vWYP1xJsLsdOceYJOhsQ@mail.gmail.com>

On Tue, Mar 12, 2013 at 1:00 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> The whole Python package eco-system works based on trust and
> injecting fear into this system is not helpful, IMO.

I'm sorry if my words came across that way; I'm not trying to scare
anyone. I'm trying to emphasize that this isn't an academic
discussion; the insecurity of PyPI is something that actively prevents
the adoption of Python. I think I'm probably right in saying that
everyone here wants to push Python forward; I'm trying to articulate
how security fits into that. Again, sorry for not being clearer;
you're totally right that fear-mongering isn't helpful.

Jacob

From jnoller at gmail.com  Tue Mar 12 19:58:01 2013
From: jnoller at gmail.com (Jesse Noller)
Date: Tue, 12 Mar 2013 14:58:01 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <CAK8PqJEaUfuDQOt=MzpKXLfhjd8ro0vWYP1xJsLsdOceYJOhsQ@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<2564CB0F5D96477E86F655096A06941E@gmail.com>
	<513F6D35.2030707@egenix.com>
	<CAK8PqJEaUfuDQOt=MzpKXLfhjd8ro0vWYP1xJsLsdOceYJOhsQ@mail.gmail.com>
Message-ID: <1A6A72B4C3944FD1A1D02F21AAABF82F@gmail.com>



On Tuesday, March 12, 2013 at 2:56 PM, Jacob Kaplan-Moss wrote:

> On Tue, Mar 12, 2013 at 1:00 PM, M.-A. Lemburg <mal at egenix.com (mailto:mal at egenix.com)> wrote:
> > The whole Python package eco-system works based on trust and
> > injecting fear into this system is not helpful, IMO.
> 
> 
> 
> I'm sorry if my words came across that way; I'm not trying to scare
> anyone. I'm trying to emphasize that this isn't an academic
> discussion; the insecurity of PyPI is something that actively prevents
> the adoption of Python. I think I'm probably right in saying that
> everyone here wants to push Python forward; I'm trying to articulate
> how security fits into that. Again, sorry for not being clearer;
> you're totally right that fear-mongering isn't helpful.
> 
> Jacob 
Nah, that was me injecting fear. I call dibs on that one. 


From jacob at jacobian.org  Tue Mar 12 19:59:24 2013
From: jacob at jacobian.org (Jacob Kaplan-Moss)
Date: Tue, 12 Mar 2013 13:59:24 -0500
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <1A6A72B4C3944FD1A1D02F21AAABF82F@gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<2564CB0F5D96477E86F655096A06941E@gmail.com>
	<513F6D35.2030707@egenix.com>
	<CAK8PqJEaUfuDQOt=MzpKXLfhjd8ro0vWYP1xJsLsdOceYJOhsQ@mail.gmail.com>
	<1A6A72B4C3944FD1A1D02F21AAABF82F@gmail.com>
Message-ID: <CAK8PqJF=uq72Gcmp-OSQ3OJp+-GStWZ3utwOseJHZd2tnGO6Kg@mail.gmail.com>

On Tue, Mar 12, 2013 at 1:58 PM, Jesse Noller <jnoller at gmail.com> wrote:
> Nah, that was me injecting fear. I call dibs on that one.

Aw, man!

Can I have Uncertainty and Doubt then?

Jacob

From jnoller at gmail.com  Tue Mar 12 20:01:12 2013
From: jnoller at gmail.com (Jesse Noller)
Date: Tue, 12 Mar 2013 15:01:12 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <CAK8PqJF=uq72Gcmp-OSQ3OJp+-GStWZ3utwOseJHZd2tnGO6Kg@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<2564CB0F5D96477E86F655096A06941E@gmail.com>
	<513F6D35.2030707@egenix.com>
	<CAK8PqJEaUfuDQOt=MzpKXLfhjd8ro0vWYP1xJsLsdOceYJOhsQ@mail.gmail.com>
	<1A6A72B4C3944FD1A1D02F21AAABF82F@gmail.com>
	<CAK8PqJF=uq72Gcmp-OSQ3OJp+-GStWZ3utwOseJHZd2tnGO6Kg@mail.gmail.com>
Message-ID: <DC2993D1F1A449CE918D4E48E337A150@gmail.com>



On Tuesday, March 12, 2013 at 2:59 PM, Jacob Kaplan-Moss wrote:

> On Tue, Mar 12, 2013 at 1:58 PM, Jesse Noller <jnoller at gmail.com (mailto:jnoller at gmail.com)> wrote:
> > Nah, that was me injecting fear. I call dibs on that one.
> 
> 
> 
> Aw, man!
> 
> Can I have Uncertainty and Doubt then?
> 
> Jacob 
Yes. Just as long as you call me Fear Injector. 


From mordred at inaugust.com  Tue Mar 12 19:51:15 2013
From: mordred at inaugust.com (Monty Taylor)
Date: Tue, 12 Mar 2013 11:51:15 -0700
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <513F6D35.2030707@egenix.com>
References: <20130310150740.GE9677@merlinux.eu>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<2564CB0F5D96477E86F655096A06941E@gmail.com>
	<513F6D35.2030707@egenix.com>
Message-ID: <513F7923.4050106@inaugust.com>



On 03/12/2013 11:00 AM, M.-A. Lemburg wrote:
> On 12.03.2013 18:33, Jesse Noller wrote:
>>
>>>
>>> And I've put multiple compromise proposals out there to begin
>>> mitigating the problem *now* (i.e. for non-updated versions of
>>> setuptools), and every time, the objection is, "no, we need to ban it
>>> all now, no discussion, no re-evaluation, no personal choice, everyone
>>> must do as we say, no argument".
>>>
>>> And I don't understand that, at all.
>>
>> There's not much to understand: external hosting of packages is *actively harmful*, period. End users of easy_install and pip *don't even realize* 99% of the time that these tools are following links off of PyPi and installing packages from random, probably insecure/non https locations all over the internet. Once they realize it they recoil in terror if they have any understanding of the implications.
>>
>> Let me put this in different terms: out of the packages using external hosting: can you prove to me that 100% of them aren't compromised machines serving malware, performing MITM attacks, etc? The fact that the end user tools support this is a bug, but one from history. The fact that PyPI continues to support external links on simple/ is inexcusable given that we know that they are an attack vector. 
>>
>> A simple proof of concept on a popular package hosted off site deployed during PyCon would be terrible, it was bad enough that last year people were trying to MITM due to lack of SSL. 
> 
> Let's please not exaggerate all this. It's not like PyPI is
> the only server out there implementing HTTPS, ye know ;-)
> 
> A single package uploaded on PyPI with os.system('rm -rf')
> in its setup.py could easily ruin all this and no HTTPS in this
> world would stop it from showing its ugly face.
> 
> The whole Python package eco-system works based on trust and
> injecting fear into this system is not helpful, IMO.
> 
> People need to understand the possible issues, we need to make
> things safer from both the client and the server side and
> improve the tool chain. There's really nothing new here.

externally hosted packages isn't just about security. It's about
reliability of the service. PyPI as it is right now with externally
hosted packages is 100% unusable in automated systems for reasons having
nothing to do with security. For better or for worse, PyPI _IS_ the
place where python packages are expected to exist and be uploaded.
However, attempting to hang on to a feature which undermines the ability
of the service to be used is absolutely mind-blowing to me.

Why, you ask, is it broken?

a) it's massively unreliable, because reliability is now dependent on
the availability of ALL of the external link hosting sites combined.
It's not even just the packages - version information lookups, which
should take 0.1 second and be the most reliable thing ever, have to
spider a billion web pages.

b) It's massively slow. All that spidering of lycos and altavista and
some random trac site? Slow. Guess what - that spidering is happening on
my LAPTOP - so while sitting here on this plane, if I want to install a
package that's on PyPI, it has to go web-spider other things.

c) It's agressive about being both of the above. Even if packages are
hosted on PyPI, my local client will STILL spider external sites that
are listed.

The funny part is, if you remove the externally hosted packages, pypi is
a wonderfully elegant system that is super easy to scale. A PyPI can be
completely static, which is how we run the partial-mirror that OpenStack
is forced to run due to the instability of homepages stored on Apple
IIe's of various random people who decided that "python setup.py sdist
upload" is too hard to run. It's great. We love it. I works for just
about everything.

Except for those darned external links.

Why are we persisting in trying to make this super complex? Can we
revisit PEP20 here? Specifically:

Explicit is better than implicit.
Simple is better than complex.
Flat is better than nested.
...
There should be one-- and preferably only one --obvious way to do it

If I run :

pip install foo

I am EXPLICITLY asking for a package from PyPI, not from launchpad.
There is a URL option, which would allow to to request a package from
somewhere that is not pypi should I want to do that.

Having to spider out to external sites is more complex that not doing that.

External sites are effectively needless nesting.

Most importantly - PyPI is there - it's where we upload packages? What
benefit do we gain from subverting that?

Nothing.

Remove the external links. Please.

Monty

From holger at merlinux.eu  Tue Mar 12 20:11:41 2013
From: holger at merlinux.eu (holger krekel)
Date: Tue, 12 Mar 2013 19:11:41 +0000
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <513F718D.4040307@oddbird.net>
References: <CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
Message-ID: <20130312191141.GJ9677@merlinux.eu>

On Tue, Mar 12, 2013 at 12:18 -0600, Carl Meyer wrote:
> It seems to me that there's a remarkable level of consensus developing
> here (though it may not look like it), and a small set of remaining open
> questions.
> 
> The consensus (as I see it):
> 
> - Migrate away from scraping external HTML pages, with package owners in
> control of the migration but a deadline for a forced switch, as outlined
> in Holger's PEP (with all appropriate caution and testing).
> 
> - In some way, migrate to a situation where the popular installer tools
> install only release files from PyPI by default, but are capable of
> installing from other locations if the user provides an option.
> 
> The open question is basically how to implement the latter portion. I
> see two options proposed:
> 
> A) Leave external links in the PyPI simple index, but migrate the major
> tools to not use external links by default (i.e. Philip's plan to make
> allow-hosts=pypi the default in a future setuptools), with an option to
> turn them back on.
> 
> or
> 
> B) Do a second PyPI migration, again with a per-package toggle and
> package owners in control, to a "no external links in simple index" setting.
> 
> Consider for a moment how similar the end state here is with either A or
> B. In either case, by default users install only from PyPI, but by
> providing a special option they can install from some external source.
> (In B, that special option would be something like --find-links with a
> URL). In either case, we can continue to allow packages to register
> themselves on PyPI, be found in searches, etc, without uploading release
> files to PyPI if they prefer not to; they'll just have to provide
> special installation instructions to their users in that case.
> 
> Here are some differences:
> 
> 1) With B, we can provide a gentler migration for package owners, where
> they are in control of when the switch happens. With A, regardless of
> how it's done at some point some package owners are likely to start
> getting "hey, i can't install your stuff anymore" reports from users,
> and they can't control when that starts happening.
> 
> 2) With B, all end users benefit from the new defaults, not only end
> users who update to the latest and greatest tools.
> 
> 3) With B (and probably some forms of A as well), end users clearly
> state which external sources they would like to trust and install from,
> rather than having a global "trust everything!" flag, which is less
> secure and less sensible.
> 
> It seems to me that option B (a controlled, per-package, PyPI migration
> to no-external-links in simple index) is a better migration path than A
> (leaving it up to external tools), and the end result either way is very
> similar.

Thanks for outlining this so well.  I agree with the B approach and
suggest to introduce three per-package hosting-states then:

- pypi-only: only pypi-hosted files and all #egg links are served via simple/
  (#egg links are necccessary and a special case for installing
  development snapshots - we should not leave them out i think)

- pypi-nocrawl: all links as of know but without rel-attribution (i.e.
  all description links are served and also the homepage/download ones but
  without rel-attribution)
   
- pypi-crawl: all links as of know

The automatic transition of the hosting-mode for most packages (with
pre-announcements) specified in the PEP will need to differentiate
between switching to pypi-only and pypi-nocrawl.  

And as discussed elsewhere, the implementation of the underlying
analysis script and the PYPI changes certainly needs to be ready 
before the PEP can be finally accepted.

Am open to an according PR to the PEP-draft :)

holger


> 
> Carl
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
> 

From holger at merlinux.eu  Tue Mar 12 20:17:21 2013
From: holger at merlinux.eu (holger krekel)
Date: Tue, 12 Mar 2013 19:17:21 +0000
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting
 on PYPI
In-Reply-To: <513F6EE0.6080503@egenix.com>
References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com>
	<20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com>
Message-ID: <20130312191721.GK9677@merlinux.eu>

On Tue, Mar 12, 2013 at 19:07 +0100, M.-A. Lemburg wrote:
> Just a quick note (more later, if time permits)...
> 
> On 12.03.2013 18:05, holger krekel wrote:
> > Hi Marc-Andre, all,
> > 
> >>> - Prepare PYPI implementation to allow a per-project "hosting mode",
> >>>   effectively enabling or disabling external crawling.  When enabled 
> >>>   nothing changes from the current situation of producing ``rel=download`` 
> >>>   and ``rel=homepage`` attributed links on ``simple/`` pages, 
> >>>   causing installers to crawl those sites.  
> >>>   When disabled, the attributions of links will change 
> >>>   to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to
> >>>   avoid crawling 3rd party sites.  Retaining the meta-information allows
> >>>   tools to still make use of the semantic information.
> >>
> >> Please start using versioned APIs for these things. The
> >> old style index should still be available under some
> >> URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/
> > 
> > Not sure it is neccessary in this case.  I would think it makes
> > the implementation harder and it would probably break PEP381 (mirroring
> > infrastructure) as well.
> 
> Here's what I meant:
> 
> We publish the current implementation of the /simple/ index API
> under a new URL /simple-v1/, so that people that want to use
> the old API can continue to do so.
> 
> Then we setup a new /simple-v2/ index API with your proposed
> change, perhaps even dropping the rel attribute altogether.
> 
> During testing, we'd then have:
> 
> /simple/    - same as /simple-v1/
> /simple-v1/ - old API with rel attributes always set
> /simple-v2/ - new API with your changes (rel attributes only
>               set in some cases)
> 
> After a month or so of testing, we then switch this to:
> 
> /simple/    - same as /simple-v2/
> /simple-v1/ - old API with rel attributes always set
> /simple-v2/ - new API with your changes (rel attributes only
>               set in some cases)

I understand but am not sure how easy this is to manage at the moment.
I'd like to put this up in open questions and have (eventually) Richard 
comment on this before evolving it further.

best,
holger

> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 12 2013)
> >>> Python Projects, Consulting and Support ...   http://www.egenix.com/
> >>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
> >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>            Registered at Amtsgericht Duesseldorf: HRB 46611
>                http://www.egenix.com/company/contact/
> 

From pje at telecommunity.com  Tue Mar 12 20:21:43 2013
From: pje at telecommunity.com (PJ Eby)
Date: Tue, 12 Mar 2013 15:21:43 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <513F718D.4040307@oddbird.net>
References: <20130310150740.GE9677@merlinux.eu>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
Message-ID: <CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>

On Tue, Mar 12, 2013 at 2:18 PM, Carl Meyer <carl at oddbird.net> wrote:
> It seems to me that there's a remarkable level of consensus developing
> here (though it may not look like it), and a small set of remaining open
> questions.
>
> The consensus (as I see it):
>
> - Migrate away from scraping external HTML pages, with package owners in
> control of the migration but a deadline for a forced switch, as outlined
> in Holger's PEP (with all appropriate caution and testing).
>
> - In some way, migrate to a situation where the popular installer tools
> install only release files from PyPI by default, but are capable of
> installing from other locations if the user provides an option.

Perhaps I'm confused, but ISTM that every time I've said this, Donald
and Lennart argue that it should not be possible to provide such an
option -- or to be more specific, that PyPI should not publish the
information that makes that option possible.

If that's *not* the position they're taking, it'd be good to know,
because we could totally stop arguing about it in that case.


> A) Leave external links in the PyPI simple index, but migrate the major
> tools to not use external links by default (i.e. Philip's plan to make
> allow-hosts=pypi the default in a future setuptools), with an option to
> turn them back on.

I don't know who has proposed this option, but it's not me.  You seem
to be confusing external links and HTML-scraped links (rel=""
attributed links in /simple).

I was the first person to propose disabling HTML-scraped links from
PyPI *ASAP*.  I still want them gone.  That won't require tool
changes, it just requires a rollout plan.  Holger has one, let's work
on that.

The second thing I proposed is that new tools be developed to *assist*
package authors in moving their files onto PyPI, so that future tool
changes wouldn't result in widespread instances of people needing to
set their tools to insecure settings just to get anything done.  We
need to get people's files moving onto PyPI *first*, in order to make
changing the tool defaults practical.

The *only* thing I object to is the part where some people want to ban
external links from /simple, always and forever, regardless of the
package authors' choice in the matter.


> B) Do a second PyPI migration, again with a per-package toggle and
> package owners in control, to a "no external links in simple index" setting.
>
> Consider for a moment how similar the end state here is with either A or
> B. In either case, by default users install only from PyPI, but by
> providing a special option they can install from some external source.
> (In B, that special option would be something like --find-links with a
> URL). In either case, we can continue to allow packages to register
> themselves on PyPI, be found in searches, etc, without uploading release
> files to PyPI if they prefer not to; they'll just have to provide
> special installation instructions to their users in that case.

Not true: approach B means that you won't know what values to pass to
the option.

It's also confused about an important point.  All the links that
appear in /simple are *already* completely under the package author's
control.  No new switches are required to remove external links - you
can simply remove them from your releases' descriptions.  This process
could be made more transparent or easy, sure -- but it's a mistake to
say that this is granting the package owners control that they don't
already have.

What they lack control over is the rel="" attributes, short of
removing those links entirely.  That's why I've proposed having a
switch for that , as reflected in Holger's pre-PEP.


> 1) With B, we can provide a gentler migration for package owners, where
> they are in control of when the switch happens.
>
> 2) With B, all end users benefit from the new defaults, not only end
> users who update to the latest and greatest tools.
>
> 3) With B (and probably some forms of A as well), end users clearly
> state which external sources they would like to trust and install from,
> rather than having a global "trust everything!" flag, which is less
> secure and less sensible.

These 3 statements all mischaracterize things substantially, because
none of those benefits are exclusive to A, and nobody has proposed a
"trust everything" flag.  Removing rel="" attributes also benefits
everyone right away, *without* new tools.

From pje at telecommunity.com  Tue Mar 12 20:24:52 2013
From: pje at telecommunity.com (PJ Eby)
Date: Tue, 12 Mar 2013 15:24:52 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CAJ3HoZ1oDd9+keKeNGpntTput7FwaYAwAd9-Gv2A==vvo=z+zQ@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
	<CAJ3HoZ1oDd9+keKeNGpntTput7FwaYAwAd9-Gv2A==vvo=z+zQ@mail.gmail.com>
Message-ID: <CALeMXf6YydrpgzTVTn+RF=oTKoBW6JzXN_m=UDOv0ksAZ5LBTA@mail.gmail.com>

On Tue, Mar 12, 2013 at 2:43 PM, Robert Collins
<robertc at robertcollins.net> wrote:
> This takes an age when each new web host to talk to is a new DNS
> lookup (say 0.3 seconds) + HTTP request (0.6 seconds) with possible
> HTTPS setup in there too (up to 1.2 seconds). A project with dozens of
> dependencies in it's transitive dependency graph may take minutes
> *just spidering*.

Which is why we should act on Holger's pre-PEP to drop the rel=""
attributes from projects that don't actually use them -- builds
involving those projects will immediately drop to one HTTP request to
PyPI, plus one to whatever host has the actually needed file.

And that's without any tooling changes whatsoever: builds all over the
planet will just get faster and more secure, right away.

From mal at egenix.com  Tue Mar 12 20:28:21 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 12 Mar 2013 20:28:21 +0100
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting
 on PYPI
In-Reply-To: <20130312191721.GK9677@merlinux.eu>
References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com>
	<20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com>
	<20130312191721.GK9677@merlinux.eu>
Message-ID: <513F81D5.1040802@egenix.com>

On 12.03.2013 20:17, holger krekel wrote:
> On Tue, Mar 12, 2013 at 19:07 +0100, M.-A. Lemburg wrote:
>> Just a quick note (more later, if time permits)...
>>
>> On 12.03.2013 18:05, holger krekel wrote:
>>> Hi Marc-Andre, all,
>>>
>>>>> - Prepare PYPI implementation to allow a per-project "hosting mode",
>>>>>   effectively enabling or disabling external crawling.  When enabled 
>>>>>   nothing changes from the current situation of producing ``rel=download`` 
>>>>>   and ``rel=homepage`` attributed links on ``simple/`` pages, 
>>>>>   causing installers to crawl those sites.  
>>>>>   When disabled, the attributions of links will change 
>>>>>   to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to
>>>>>   avoid crawling 3rd party sites.  Retaining the meta-information allows
>>>>>   tools to still make use of the semantic information.
>>>>
>>>> Please start using versioned APIs for these things. The
>>>> old style index should still be available under some
>>>> URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/
>>>
>>> Not sure it is neccessary in this case.  I would think it makes
>>> the implementation harder and it would probably break PEP381 (mirroring
>>> infrastructure) as well.
>>
>> Here's what I meant:
>>
>> We publish the current implementation of the /simple/ index API
>> under a new URL /simple-v1/, so that people that want to use
>> the old API can continue to do so.
>>
>> Then we setup a new /simple-v2/ index API with your proposed
>> change, perhaps even dropping the rel attribute altogether.
>>
>> During testing, we'd then have:
>>
>> /simple/    - same as /simple-v1/
>> /simple-v1/ - old API with rel attributes always set
>> /simple-v2/ - new API with your changes (rel attributes only
>>               set in some cases)
>>
>> After a month or so of testing, we then switch this to:
>>
>> /simple/    - same as /simple-v2/
>> /simple-v1/ - old API with rel attributes always set
>> /simple-v2/ - new API with your changes (rel attributes only
>>               set in some cases)
> 
> I understand but am not sure how easy this is to manage at the moment.
> I'd like to put this up in open questions and have (eventually) Richard 
> comment on this before evolving it further.

Should be pretty easy to do...

Just add a version parameter to .run_simple()
at
https://bitbucket.org/loewis/pypi/src/dc6c23cce746bb25e0b013a1a1e020bc27bb332b/webui.py?at=default#cl-706
and then hook it up to the two URLs
at
https://bitbucket.org/loewis/pypi/src/dc6c23cce746bb25e0b013a1a1e020bc27bb332b/webui.py?at=default#cl-486
and
https://bitbucket.org/loewis/pypi/src/dc6c23cce746bb25e0b013a1a1e020bc27bb332b/pypi.wsgi?at=default#cl-71

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 12 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From jacob at jacobian.org  Tue Mar 12 20:36:20 2013
From: jacob at jacobian.org (Jacob Kaplan-Moss)
Date: Tue, 12 Mar 2013 14:36:20 -0500
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
	<CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
Message-ID: <CAK8PqJEkDW9uOfmFDuK2e81uo9mmQLcshk-wzw9=_bUf+p76Zg@mail.gmail.com>

On Tue, Mar 12, 2013 at 2:21 PM, PJ Eby <pje at telecommunity.com> wrote:
> The *only* thing I object to is the part where some people want to ban
> external links from /simple, always and forever, regardless of the
> package authors' choice in the matter.

Here's the thing though, there are already a bunch of other ways users
can install packages from external repositories. I can think of at
least two:

* I can pip/easy_install a given URL (e.g. easy_install
https://www.djangoproject.com/download/1.5/tarball/)
* I can use a custom index server (pip install -i http://localserver/ django)

The important part is that in each of those cases I can see clearly
where I'm getting things from.

OTOH, if I do "pip install Django" I ? the person making the install ?
have no control over where that package comes from. It really violates
people's expectations that this reaches out to somewhere that's
not-pypi. More importantly it prevents me from making a security
choice -- I literally don't know until the download starts where the
file might be coming from.

>From where I stand the absolutely non-negotiable part is that
`pip/easy_install/whatever package` should NEVER access an external
host (after some suitable transition period). This needs to include
older installer software, and it needs to make it hard for new tools
to do the wrong thing. How this is achieved really doesn't matter to
me -- if there's a "pip install --insecure Django" that's fine too --
but to me it's non-negotiable that the out-of-the-box configuration
not allow external hosts.

Yes, this means taking some options away from the package creator. It
means that when I'm wearing my author-of-Django hat I can't choose to
list Django on PyPI but provide the download elsewhere. That's not
perfect, but given a "creator choice" vs "out of the box security"
choice the latter has to win. [And as a package creator I still have
options: I can run my own package server, fairly easy to do these
days.]

Again, the *how* isn't a big deal to me, but the result is really
important: the tooling has to be secure-by-default, and that means
(among other things) `pip install package` can never hit something
that's not PyPI without me explicitly asking for it.

Jacob

From pje at telecommunity.com  Tue Mar 12 20:46:48 2013
From: pje at telecommunity.com (PJ Eby)
Date: Tue, 12 Mar 2013 15:46:48 -0400
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting
 on PYPI
In-Reply-To: <513F6EE0.6080503@egenix.com>
References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com>
	<20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com>
Message-ID: <CALeMXf4+tRKTF=dVdx8wXfaM=ZknA2+9PXh3h6_FDtD4e1rAbw@mail.gmail.com>

On Tue, Mar 12, 2013 at 2:07 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> Just a quick note (more later, if time permits)...
>
> On 12.03.2013 18:05, holger krekel wrote:
>> Hi Marc-Andre, all,
>>
>>>> - Prepare PYPI implementation to allow a per-project "hosting mode",
>>>>   effectively enabling or disabling external crawling.  When enabled
>>>>   nothing changes from the current situation of producing ``rel=download``
>>>>   and ``rel=homepage`` attributed links on ``simple/`` pages,
>>>>   causing installers to crawl those sites.
>>>>   When disabled, the attributions of links will change
>>>>   to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to
>>>>   avoid crawling 3rd party sites.  Retaining the meta-information allows
>>>>   tools to still make use of the semantic information.
>>>
>>> Please start using versioned APIs for these things. The
>>> old style index should still be available under some
>>> URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/
>>
>> Not sure it is neccessary in this case.  I would think it makes
>> the implementation harder and it would probably break PEP381 (mirroring
>> infrastructure) as well.
>
> Here's what I meant:
>
> We publish the current implementation of the /simple/ index API
> under a new URL /simple-v1/, so that people that want to use
> the old API can continue to do so.

Do you know of anyone who's *actually* going to need/use this
alternate API.  Why can't they just the XML-RPC API, the DOAP API, or
any other means of obtaining this information?

Heck, the proposal to just change the value of the rel attribute isn't
going to stop anybody from using that data.  Please let's not
complicate this by adding more API formats for PyPI to support..

From holger at merlinux.eu  Tue Mar 12 20:57:07 2013
From: holger at merlinux.eu (holger krekel)
Date: Tue, 12 Mar 2013 19:57:07 +0000
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <CAK8PqJEkDW9uOfmFDuK2e81uo9mmQLcshk-wzw9=_bUf+p76Zg@mail.gmail.com>
References: <CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
	<CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
	<CAK8PqJEkDW9uOfmFDuK2e81uo9mmQLcshk-wzw9=_bUf+p76Zg@mail.gmail.com>
Message-ID: <20130312195707.GL9677@merlinux.eu>

On Tue, Mar 12, 2013 at 14:36 -0500, Jacob Kaplan-Moss wrote:
> On Tue, Mar 12, 2013 at 2:21 PM, PJ Eby <pje at telecommunity.com> wrote:
> > The *only* thing I object to is the part where some people want to ban
> > external links from /simple, always and forever, regardless of the
> > package authors' choice in the matter.
> 
> Here's the thing though, there are already a bunch of other ways users
> can install packages from external repositories. I can think of at
> least two:
> 
> * I can pip/easy_install a given URL (e.g. easy_install
> https://www.djangoproject.com/download/1.5/tarball/)
> * I can use a custom index server (pip install -i http://localserver/ django)
> 
> The important part is that in each of those cases I can see clearly
> where I'm getting things from.
> 
> OTOH, if I do "pip install Django" I ? the person making the install ?
> have no control over where that package comes from. It really violates
> people's expectations that this reaches out to somewhere that's
> not-pypi. More importantly it prevents me from making a security
> choice -- I literally don't know until the download starts where the
> file might be coming from.
> 
> >From where I stand the absolutely non-negotiable part is that
> `pip/easy_install/whatever package` should NEVER access an external
> host (after some suitable transition period). This needs to include
> older installer software, and it needs to make it hard for new tools
> to do the wrong thing. How this is achieved really doesn't matter to
> me -- if there's a "pip install --insecure Django" that's fine too --
> but to me it's non-negotiable that the out-of-the-box configuration
> not allow external hosts.
> 
> Yes, this means taking some options away from the package creator. It
> means that when I'm wearing my author-of-Django hat I can't choose to
> list Django on PyPI but provide the download elsewhere. That's not
> perfect, but given a "creator choice" vs "out of the box security"
> choice the latter has to win. [And as a package creator I still have
> options: I can run my own package server, fairly easy to do these
> days.]
> 
> Again, the *how* isn't a big deal to me, but the result is really
> important: the tooling has to be secure-by-default, and that means
> (among other things) `pip install package` can never hit something
> that's not PyPI without me explicitly asking for it.

Let's be clear, however, that we are at most reducing attack vectors,
there are substantial attack vectors left.  Nobody should be lead to
think that PYPI is a trusted or reviewed source of software even 
if we got rid of external hosting completely.

holger

> Jacob
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
> 

From holger at merlinux.eu  Tue Mar 12 20:59:02 2013
From: holger at merlinux.eu (holger krekel)
Date: Tue, 12 Mar 2013 19:59:02 +0000
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
References: <CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
	<CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
Message-ID: <20130312195902.GM9677@merlinux.eu>

On Tue, Mar 12, 2013 at 15:21 -0400, PJ Eby wrote:
> On Tue, Mar 12, 2013 at 2:18 PM, Carl Meyer <carl at oddbird.net> wrote:
> > It seems to me that there's a remarkable level of consensus developing
> > here (though it may not look like it), and a small set of remaining open
> > questions.
> >
> > The consensus (as I see it):
> >
> > - Migrate away from scraping external HTML pages, with package owners in
> > control of the migration but a deadline for a forced switch, as outlined
> > in Holger's PEP (with all appropriate caution and testing).
> >
> > - In some way, migrate to a situation where the popular installer tools
> > install only release files from PyPI by default, but are capable of
> > installing from other locations if the user provides an option.
> 
> Perhaps I'm confused, but ISTM that every time I've said this, Donald
> and Lennart argue that it should not be possible to provide such an
> option -- or to be more specific, that PyPI should not publish the
> information that makes that option possible.
> 
> If that's *not* the position they're taking, it'd be good to know,
> because we could totally stop arguing about it in that case.

I don't know.  At least the pre-PEP doesn't take the position
to unconditionally ban external links.  Maybe Lennart or Donald can they
whether they oppose the moves outlined in the PEP.  I'd hope
that the perceived "perfect" doesn't become the enemy 
of the good here :)

> > A) Leave external links in the PyPI simple index, but migrate the major
> > tools to not use external links by default (i.e. Philip's plan to make
> > allow-hosts=pypi the default in a future setuptools), with an option to
> > turn them back on.
> 
> I don't know who has proposed this option, but it's not me.  You seem
> to be confusing external links and HTML-scraped links (rel=""
> attributed links in /simple).

The suggested behaviour of installers is not fully formulated yet in
the PEP.  We should improve that.

> I was the first person to propose disabling HTML-scraped links from
> PyPI *ASAP*.

Yes, and thanks for pushing us in this direction. 

> I still want them gone.  That won't require tool
> changes, it just requires a rollout plan.  Holger has one, let's work
> on that.
>
> The second thing I proposed is that new tools be developed to *assist*
> package authors in moving their files onto PyPI, so that future tool
> changes wouldn't result in widespread instances of people needing to
> set their tools to insecure settings just to get anything done.  We
> need to get people's files moving onto PyPI *first*, in order to make
> changing the tool defaults practical.

Indeed, it's a good idea to require the "re-hosting" or "transfer" tool ready
before installers change their defaults.

> The *only* thing I object to is the part where some people want to ban
> external links from /simple, always and forever, regardless of the
> package authors' choice in the matter.

I agree the package author should have a choice about the serving of links
for their package.  And installers should change defaults so that install-users have a choice as well, eventually, to control whether they are fine with
crawling or using external links.
 
> > B) Do a second PyPI migration, again with a per-package toggle and
> > package owners in control, to a "no external links in simple index" setting.
> >
> > Consider for a moment how similar the end state here is with either A or
> > B. In either case, by default users install only from PyPI, but by
> > providing a special option they can install from some external source.
> > (In B, that special option would be something like --find-links with a
> > URL). In either case, we can continue to allow packages to register
> > themselves on PyPI, be found in searches, etc, without uploading release
> > files to PyPI if they prefer not to; they'll just have to provide
> > special installation instructions to their users in that case.
> 
> Not true: approach B means that you won't know what values to pass to
> the option.

Yes and no: in the one case you need to specify "--crawl" or 
"--use-external-links" and in the other "--find-links https://..." 
The latter requires reading the homepage for the correct URL or 
long_description of a package so is less obvious to install-users.

> It's also confused about an important point.  All the links that
> appear in /simple are *already* completely under the package author's
> control.  No new switches are required to remove external links - you
> can simply remove them from your releases' descriptions.  This process
> could be made more transparent or easy, sure -- but it's a mistake to
> say that this is granting the package owners control that they don't
> already have.

Right.  I think allowing a package maintainer to say "actually, please don't
serve external links for my package" (hosting mode "pypi-only") is an
easy expressive way to exert this control.

> What they lack control over is the rel="" attributes, short of
> removing those links entirely.  That's why I've proposed having a
> switch for that , as reflected in Holger's pre-PEP.
> 
> 
> > 1) With B, we can provide a gentler migration for package owners, where
> > they are in control of when the switch happens.
> >
> > 2) With B, all end users benefit from the new defaults, not only end
> > users who update to the latest and greatest tools.
> >
> > 3) With B (and probably some forms of A as well), end users clearly
> > state which external sources they would like to trust and install from,
> > rather than having a global "trust everything!" flag, which is less
> > secure and less sensible.
> 
> These 3 statements all mischaracterize things substantially, because
> none of those benefits are exclusive to A, and nobody has proposed a

i guess you mean "B" here.

> "trust everything" flag.  Removing rel="" attributes also benefits
> everyone right away, *without* new tools.

Right.  I don't see much overall disagreement however ...
let's re-check once the next PEP draft is out :)

holger

> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
> 

From mal at egenix.com  Tue Mar 12 20:59:30 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 12 Mar 2013 20:59:30 +0100
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting
 on PYPI
In-Reply-To: <CALeMXf4+tRKTF=dVdx8wXfaM=ZknA2+9PXh3h6_FDtD4e1rAbw@mail.gmail.com>
References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com>
	<20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com>
	<CALeMXf4+tRKTF=dVdx8wXfaM=ZknA2+9PXh3h6_FDtD4e1rAbw@mail.gmail.com>
Message-ID: <513F8922.90008@egenix.com>

On 12.03.2013 20:46, PJ Eby wrote:
> On Tue, Mar 12, 2013 at 2:07 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> Just a quick note (more later, if time permits)...
>>
>> On 12.03.2013 18:05, holger krekel wrote:
>>> Hi Marc-Andre, all,
>>>
>>>>> - Prepare PYPI implementation to allow a per-project "hosting mode",
>>>>>   effectively enabling or disabling external crawling.  When enabled
>>>>>   nothing changes from the current situation of producing ``rel=download``
>>>>>   and ``rel=homepage`` attributed links on ``simple/`` pages,
>>>>>   causing installers to crawl those sites.
>>>>>   When disabled, the attributions of links will change
>>>>>   to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to
>>>>>   avoid crawling 3rd party sites.  Retaining the meta-information allows
>>>>>   tools to still make use of the semantic information.
>>>>
>>>> Please start using versioned APIs for these things. The
>>>> old style index should still be available under some
>>>> URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/
>>>
>>> Not sure it is neccessary in this case.  I would think it makes
>>> the implementation harder and it would probably break PEP381 (mirroring
>>> infrastructure) as well.
>>
>> Here's what I meant:
>>
>> We publish the current implementation of the /simple/ index API
>> under a new URL /simple-v1/, so that people that want to use
>> the old API can continue to do so.
> 
> Do you know of anyone who's *actually* going to need/use this
> alternate API.

I think we should establish a versioned API like that for PyPI
to make progress easier. All major web APIs use versioning
for this reason.

> Why can't they just the XML-RPC API, the DOAP API, or
> any other means of obtaining this information?

Those cannot easily be put on the CDN and
would cause an unnecessary strain on the PyPI server.

We could/should probably also make the PKG-INFO meta data file,
plus some other static information such as upload/release dates
(as RSS/Atom file) available on the /simple/ page to make this
easier to use over the CDN.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 12 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From mal at egenix.com  Tue Mar 12 20:59:59 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 12 Mar 2013 20:59:59 +0100
Subject: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource
 sorting algorithm
In-Reply-To: <513F70B5.5030501@egenix.com>
References: <513F70B5.5030501@egenix.com>
Message-ID: <513F893F.9010707@egenix.com>

On 12.03.2013 19:15, M.-A. Lemburg wrote:
> I've run into a weird issue with easy_install, that I'm trying to solve:
> 
> If I place two files named
> 
> egenix_mxodbc_connect_client-2.0.2-py2.6.egg
> egenix-mxodbc-connect-client-2.0.2.win32-py2.6.prebuilt.zip
> 
> into the same directory and let easy_install running on Linux
> scan this, it considers the second file for Windows as best
> match.
> 
> Is the algorithm used for determining the best match documented
> somewhere ?
> 
> I've had a look at the implementation, but this left me rather
> clueless.
> 
> I thought that setuptools would prefer the .egg file over
> the prebuilt .zip file - binary files being easier to install
> than "source" files.

After some experiments, I found that the follow change
in filename (swapping platform and python version, in addition
to use '-' instead of '.) works:

egenix-mxodbc-connect-client-2.0.2-py2.6-win32.prebuilt.zip

OTOH, this one doesn't (notice the difference ?):

egenix-mxodbc-connect-client-2.0.2.py2.6-win32.prebuilt.zip

The logic behind all this looks rather fragile to me.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 12 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From donald at stufft.io  Tue Mar 12 21:01:26 2013
From: donald at stufft.io (Donald Stufft)
Date: Tue, 12 Mar 2013 16:01:26 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <20130312195707.GL9677@merlinux.eu>
References: <CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
	<CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
	<CAK8PqJEkDW9uOfmFDuK2e81uo9mmQLcshk-wzw9=_bUf+p76Zg@mail.gmail.com>
	<20130312195707.GL9677@merlinux.eu>
Message-ID: <2C9D488E-7F9E-4FE9-ABB8-7CBBC309C90F@stufft.io>


On Mar 12, 2013, at 3:57 PM, holger krekel <holger at merlinux.eu> wrote:

> On Tue, Mar 12, 2013 at 14:36 -0500, Jacob Kaplan-Moss wrote:
>> On Tue, Mar 12, 2013 at 2:21 PM, PJ Eby <pje at telecommunity.com> wrote:
>>> The *only* thing I object to is the part where some people want to ban
>>> external links from /simple, always and forever, regardless of the
>>> package authors' choice in the matter.
>> 
>> Here's the thing though, there are already a bunch of other ways users
>> can install packages from external repositories. I can think of at
>> least two:
>> 
>> * I can pip/easy_install a given URL (e.g. easy_install
>> https://www.djangoproject.com/download/1.5/tarball/)
>> * I can use a custom index server (pip install -i http://localserver/ django)
>> 
>> The important part is that in each of those cases I can see clearly
>> where I'm getting things from.
>> 
>> OTOH, if I do "pip install Django" I ? the person making the install ?
>> have no control over where that package comes from. It really violates
>> people's expectations that this reaches out to somewhere that's
>> not-pypi. More importantly it prevents me from making a security
>> choice -- I literally don't know until the download starts where the
>> file might be coming from.
>> 
>>> From where I stand the absolutely non-negotiable part is that
>> `pip/easy_install/whatever package` should NEVER access an external
>> host (after some suitable transition period). This needs to include
>> older installer software, and it needs to make it hard for new tools
>> to do the wrong thing. How this is achieved really doesn't matter to
>> me -- if there's a "pip install --insecure Django" that's fine too --
>> but to me it's non-negotiable that the out-of-the-box configuration
>> not allow external hosts.
>> 
>> Yes, this means taking some options away from the package creator. It
>> means that when I'm wearing my author-of-Django hat I can't choose to
>> list Django on PyPI but provide the download elsewhere. That's not
>> perfect, but given a "creator choice" vs "out of the box security"
>> choice the latter has to win. [And as a package creator I still have
>> options: I can run my own package server, fairly easy to do these
>> days.]
>> 
>> Again, the *how* isn't a big deal to me, but the result is really
>> important: the tooling has to be secure-by-default, and that means
>> (among other things) `pip install package` can never hit something
>> that's not PyPI without me explicitly asking for it.
> 
> Let's be clear, however, that we are at most reducing attack vectors,
> there are substantial attack vectors left.  Nobody should be lead to
> think that PYPI is a trusted or reviewed source of software even 
> if we got rid of external hosting completely.

"Trust" depends on your trust model. 

PyPI is not and will never be a system where you can pip install random packages and expect nothing bad to happen.

You should however be able to trust that when you `pip install foo==1.0`` you will get exactly that. That it will not have been tampered with. It's up to you to decide is foo 1.0 is something trustworthy. There's handwaving here about what foo 1.0 is defined as. But in general when you ask for X you should get exactly X no more, no less.

> 
> holger
> 
>> Jacob
>> _______________________________________________
>> Catalog-SIG mailing list
>> Catalog-SIG at python.org
>> http://mail.python.org/mailman/listinfo/catalog-sig
>> 
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130312/6bef3377/attachment-0001.pgp>

From holger at merlinux.eu  Tue Mar 12 21:02:15 2013
From: holger at merlinux.eu (holger krekel)
Date: Tue, 12 Mar 2013 20:02:15 +0000
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting
 on PYPI
In-Reply-To: <513F8922.90008@egenix.com>
References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com>
	<20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com>
	<CALeMXf4+tRKTF=dVdx8wXfaM=ZknA2+9PXh3h6_FDtD4e1rAbw@mail.gmail.com>
	<513F8922.90008@egenix.com>
Message-ID: <20130312200215.GN9677@merlinux.eu>

On Tue, Mar 12, 2013 at 20:59 +0100, M.-A. Lemburg wrote:
> On 12.03.2013 20:46, PJ Eby wrote:
> > On Tue, Mar 12, 2013 at 2:07 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> >> Just a quick note (more later, if time permits)...
> >>
> >> On 12.03.2013 18:05, holger krekel wrote:
> >>> Hi Marc-Andre, all,
> >>>
> >>>>> - Prepare PYPI implementation to allow a per-project "hosting mode",
> >>>>>   effectively enabling or disabling external crawling.  When enabled
> >>>>>   nothing changes from the current situation of producing ``rel=download``
> >>>>>   and ``rel=homepage`` attributed links on ``simple/`` pages,
> >>>>>   causing installers to crawl those sites.
> >>>>>   When disabled, the attributions of links will change
> >>>>>   to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to
> >>>>>   avoid crawling 3rd party sites.  Retaining the meta-information allows
> >>>>>   tools to still make use of the semantic information.
> >>>>
> >>>> Please start using versioned APIs for these things. The
> >>>> old style index should still be available under some
> >>>> URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/
> >>>
> >>> Not sure it is neccessary in this case.  I would think it makes
> >>> the implementation harder and it would probably break PEP381 (mirroring
> >>> infrastructure) as well.
> >>
> >> Here's what I meant:
> >>
> >> We publish the current implementation of the /simple/ index API
> >> under a new URL /simple-v1/, so that people that want to use
> >> the old API can continue to do so.
> > 
> > Do you know of anyone who's *actually* going to need/use this
> > alternate API.
> 
> I think we should establish a versioned API like that for PyPI
> to make progress easier. All major web APIs use versioning
> for this reason.

> > Why can't they just the XML-RPC API, the DOAP API, or
> > any other means of obtaining this information?
> 
> Those cannot easily be put on the CDN and
> would cause an unnecessary strain on the PyPI server.

The JSON API could be put on the CDN however.

> We could/should probably also make the PKG-INFO meta data file,
> plus some other static information such as upload/release dates
> (as RSS/Atom file) available on the /simple/ page to make this
> easier to use over the CDN.

That should go into a re-newed CDN PEP :)

holger

> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 12 2013)
> >>> Python Projects, Consulting and Support ...   http://www.egenix.com/
> >>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
> >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>            Registered at Amtsgericht Duesseldorf: HRB 46611
>                http://www.egenix.com/company/contact/
> 

From carl at oddbird.net  Tue Mar 12 21:14:59 2013
From: carl at oddbird.net (Carl Meyer)
Date: Tue, 12 Mar 2013 14:14:59 -0600
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
	<CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
Message-ID: <513F8CC3.2070002@oddbird.net>

On 03/12/2013 01:21 PM, PJ Eby wrote:
>> - In some way, migrate to a situation where the popular installer tools
>> install only release files from PyPI by default, but are capable of
>> installing from other locations if the user provides an option.
> 
> Perhaps I'm confused, but ISTM that every time I've said this, Donald
> and Lennart argue that it should not be possible to provide such an
> option -- or to be more specific, that PyPI should not publish the
> information that makes that option possible.
> 
> If that's *not* the position they're taking, it'd be good to know,
> because we could totally stop arguing about it in that case.

I think there's been misunderstanding on this point. Donald and Lennart
can confirm for themselves, but I don't believe _anyone_ thinks that
tools should not be able to install from non-PyPI sources when
explicitly requested to do so. And IIUC from your previous message,
you've "already agreed to change setuptools to default this option to
only allow downloads from the same host as its index URL, in a future
release". So I think everyone is roughly on the same page about where we
should be headed.

There is disagreement about how to make that work. My point is that I
don't think PyPI publishing scraped-from-metadata external links on the
simple/ index specifically, in perpetuity, is necessary or even
beneficial to that future state.

>> A) Leave external links in the PyPI simple index, but migrate the major
>> tools to not use external links by default (i.e. Philip's plan to make
>> allow-hosts=pypi the default in a future setuptools), with an option to
>> turn them back on.
> 
> I don't know who has proposed this option, but it's not me.  You seem
> to be confusing external links and HTML-scraped links (rel=""
> attributed links in /simple).

No, I'm not confusing those. All I'm referring to here is where you said
you've "already agreed to change setuptools to default [allow-hosts] to
only allow downloads from the same host as its index URL, in a future
release." Did I not characterize that accurately?

> I was the first person to propose disabling HTML-scraped links from
> PyPI *ASAP*.  I still want them gone.  That won't require tool
> changes, it just requires a rollout plan.  Holger has one, let's work
> on that.

Fully agreed. I understand from Holger that he would like his PEP to
also discuss the rough plan beyond just disabling rel-link HTML
scraping, for how to get to a point where the tools don't follow
off-PyPI links at all by default. This second stage is what I'm talking
about.

> The second thing I proposed is that new tools be developed to *assist*
> package authors in moving their files onto PyPI, so that future tool
> changes wouldn't result in widespread instances of people needing to
> set their tools to insecure settings just to get anything done.  We
> need to get people's files moving onto PyPI *first*, in order to make
> changing the tool defaults practical.

Totally agreed that such tools could be useful, I should have included
that point explicitly in my summary.

> The *only* thing I object to is the part where some people want to ban
> external links from /simple, always and forever, regardless of the
> package authors' choice in the matter.

I think the question of external links in /simple is causing far more
heat than it's worth (from all sides), because it's fundamentally an
implementation detail, not an end in itself.  Discussing the pros and
cons of this implementation detail is more or less what rest is all about.

>> B) Do a second PyPI migration, again with a per-package toggle and
>> package owners in control, to a "no external links in simple index" setting.
>>
>> Consider for a moment how similar the end state here is with either A or
>> B. In either case, by default users install only from PyPI, but by
>> providing a special option they can install from some external source.
>> (In B, that special option would be something like --find-links with a
>> URL). In either case, we can continue to allow packages to register
>> themselves on PyPI, be found in searches, etc, without uploading release
>> files to PyPI if they prefer not to; they'll just have to provide
>> special installation instructions to their users in that case.
> 
> Not true: approach B means that you won't know what values to pass to
> the option.

You say below that "nobody has proposed a 'trust everything' flag." If
there is no "trust everything" flag, then it seems to me that with
either option A or option B the user needs to specify what they intend
to trust. I.e. if you make the default value of allow-hosts the index
url host, as you said you plan to do at some point, users would need to
override it with the hosts they want to allow.

It seems like maybe what you are wanting is automatically-discoverable
installation from externally-hosted files? I.e. that I could say
"easy_install Foo --allow-external", without needing to know any
specific external url for Foo?

This is what I was characterizing as a "trust everything" flag, but on
reflection I don't think I have any problem with that. I do think that:

1) external release-file URLs should be explicitly nominated by the
package owner, not automatically sucked out of text metadata.

2) (After a suitable package-owner-controlled migration) those external
links should live at a new separate (machine-readable) endpoint, not the
existing /simple index. This has two benefits: a) even tools that exist
today eventually gain the benefit of safer-by-default installations, and
b) it's simpler and more reliable for future tools to distinguish
between internal and external release file links.

> It's also confused about an important point.  All the links that
> appear in /simple are *already* completely under the package author's
> control.  No new switches are required to remove external links - you
> can simply remove them from your releases' descriptions.  This process
> could be made more transparent or easy, sure -- but it's a mistake to
> say that this is granting the package owners control that they don't
> already have.

This is partly true. An explicit flag grants package owners more control
in that right now they don't have a choice about whether external links
to tarballs in their long_description automatically get sucked into the
simple index. This is not hypothetical; even if there were no rel-link
scraping, I've had cases where package owners have complained to me
about pip installing an RC tarball they had linked directly from their
long-description, not intending it to be auto-installable.

I think it would be preferable if in the future package owners wouldn't
need to be careful what release-file links they might place in their
long_description, and release files would be only explicitly nominated.
I think the current "automatically suck in links to simple/" behavior is
only useful as a backwards-compatibility hack, which is why I think an
explicit switch to disable it (on by default for newly-registered
projects, slowly, gently, carefully migrated to on for existing
projects) is better than keeping this link-scraping behavior
indefinitely for all projects and asking package owners to clean up
their long-descriptions.

> What they lack control over is the rel="" attributes, short of
> removing those links entirely.  That's why I've proposed having a
> switch for that , as reflected in Holger's pre-PEP.

I agree with this switch, but I think there is more benefit than cost in
extending the concept to all automatically-sucked-in external links.

>> 1) With B, we can provide a gentler migration for package owners, where
>> they are in control of when the switch happens.
>>
>> 2) With B, all end users benefit from the new defaults, not only end
>> users who update to the latest and greatest tools.
>>
>> 3) With B (and probably some forms of A as well), end users clearly
>> state which external sources they would like to trust and install from,
>> rather than having a global "trust everything!" flag, which is less
>> secure and less sensible.
> 
> These 3 statements all mischaracterize things substantially, because
> none of those benefits are exclusive to A, and nobody has proposed a
> "trust everything" flag.  

You're right that item 1 is not technically exclusive to B, although I
think B makes it much easier and simpler for package owners. "Just flip
a switch and done" rather than "Go clean up all your package metadata
including all past releases, or trust this tool we built to go editing
all your release metadata for you." I'm not even sure how that
hypothetical tool would work - what exactly would it do to automatically
clean up a link to an external tarball that it finds in the
long_description of a release from three years ago? Just remove it? What
if the package owner actually wants that link there for human use?

> Removing rel="" attributes also benefits
> everyone right away, *without* new tools.

Sure, and I'm fully in support of that being the first stage.

Carl


From pje at telecommunity.com  Tue Mar 12 21:16:25 2013
From: pje at telecommunity.com (PJ Eby)
Date: Tue, 12 Mar 2013 16:16:25 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CAK8PqJEkDW9uOfmFDuK2e81uo9mmQLcshk-wzw9=_bUf+p76Zg@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
	<CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
	<CAK8PqJEkDW9uOfmFDuK2e81uo9mmQLcshk-wzw9=_bUf+p76Zg@mail.gmail.com>
Message-ID: <CALeMXf4Q6hAtLhya1fY76-Kro62kKzcQcNSHX2Fd6CEvCTb2oA@mail.gmail.com>

On Tue, Mar 12, 2013 at 3:36 PM, Jacob Kaplan-Moss <jacob at jacobian.org> wrote:
> On Tue, Mar 12, 2013 at 2:21 PM, PJ Eby <pje at telecommunity.com> wrote:
>> The *only* thing I object to is the part where some people want to ban
>> external links from /simple, always and forever, regardless of the
>> package authors' choice in the matter.
>
> Here's the thing though, there are already a bunch of other ways users
> can install packages from external repositories. I can think of at
> least two:
>
> * I can pip/easy_install a given URL (e.g. easy_install
> https://www.djangoproject.com/download/1.5/tarball/)
> * I can use a custom index server (pip install -i http://localserver/ django)
>
> The important part is that in each of those cases I can see clearly
> where I'm getting things from.
>

>
> From where I stand the absolutely non-negotiable part is that
> `pip/easy_install/whatever package` should NEVER access an external
> host (after some suitable transition period). This needs to include
> older installer software, and it needs to make it hard for new tools
> to do the wrong thing. How this is achieved really doesn't matter to
> me -- if there's a "pip install --insecure Django" that's fine too --
> but to me it's non-negotiable that the out-of-the-box configuration
> not allow external hosts.

I'm confused by this statement.  "never access an external host" is
not consistent with "have the option to specify what hosts you trust",
while still keeping PyPI as a universal index of Python software.


> Yes, this means taking some options away from the package creator. It
> means that when I'm wearing my author-of-Django hat I can't choose to
> list Django on PyPI but provide the download elsewhere. That's not
> perfect, but given a "creator choice" vs "out of the box security"
> choice the latter has to win. [And as a package creator I still have
> options: I can run my own package server, fairly easy to do these
> days.]
>
> Again, the *how* isn't a big deal to me, but the result is really
> important: the tooling has to be secure-by-default, and that means
> (among other things) `pip install package` can never hit something
> that's not PyPI without me explicitly asking for it.

That part's fine.  As I've said repeatedly, though, it's the removing
other links from the /simple index entirely that's the problem.

Under what I've proposed, as soon as the tools are updated to
secure-default (and the situation *now* if you set your --allow-hosts
to PyPI-only), is that easy_install will announce what URLs it is
skipping because they're not on PyPI.  (pip too, IIUC.)

I can't tell you how to configure pip for this, but if you want to
configure easy_install to be secure right *now*, add:

[easy_install]
allow_hosts=pypi.python.org

to your user-level or site-wide distutils .cfg file.

Better yet, encourage other people to add it now, find out what they
can no longer install, and talk to their upstream providers about
moving to PyPI.

This is all good.

I'm just saying, we don't need to change PyPI to do anything but drop
the rel="" links, and change the tools to default allow-hosts to equal
index-url.  (pip has the same parameters, not sure what config files
it uses, though.  I don't think it inherits [easy_install] settings,
though.)

From donald at stufft.io  Tue Mar 12 21:23:14 2013
From: donald at stufft.io (Donald Stufft)
Date: Tue, 12 Mar 2013 16:23:14 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <513F8CC3.2070002@oddbird.net>
References: <20130310150740.GE9677@merlinux.eu>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
	<CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
	<513F8CC3.20 70002@oddbird.net>
Message-ID: <DADDA8CB-A770-43A8-9BF1-55EB95F857CE@stufft.io>

On Mar 12, 2013, at 4:14 PM, Carl Meyer <carl at oddbird.net> wrote:

> On 03/12/2013 01:21 PM, PJ Eby wrote:
>>> - In some way, migrate to a situation where the popular installer tools
>>> install only release files from PyPI by default, but are capable of
>>> installing from other locations if the user provides an option.
>> 
>> Perhaps I'm confused, but ISTM that every time I've said this, Donald
>> and Lennart argue that it should not be possible to provide such an
>> option -- or to be more specific, that PyPI should not publish the
>> information that makes that option possible.
>> 
>> If that's *not* the position they're taking, it'd be good to know,
>> because we could totally stop arguing about it in that case.
> 
> I think there's been misunderstanding on this point. Donald and Lennart
> can confirm for themselves, but I don't believe _anyone_ thinks that
> tools should not be able to install from non-PyPI sources when
> explicitly requested to do so. And IIUC from your previous message,
> you've "already agreed to change setuptools to default this option to
> only allow downloads from the same host as its index URL, in a future
> release". So I think everyone is roughly on the same page about where we
> should be headed.

I've never and I never will support a proposal that removes the end users ability to install from a non PyPI source when requested to do so. Considering I operate a non PyPI source i'm not sure how this idea started.

> 
> There is disagreement about how to make that work. My point is that I
> don't think PyPI publishing scraped-from-metadata external links on the
> simple/ index specifically, in perpetuity, is necessary or even
> beneficial to that future state.
> 
>>> A) Leave external links in the PyPI simple index, but migrate the major
>>> tools to not use external links by default (i.e. Philip's plan to make
>>> allow-hosts=pypi the default in a future setuptools), with an option to
>>> turn them back on.
>> 
>> I don't know who has proposed this option, but it's not me.  You seem
>> to be confusing external links and HTML-scraped links (rel=""
>> attributed links in /simple).
> 
> No, I'm not confusing those. All I'm referring to here is where you said
> you've "already agreed to change setuptools to default [allow-hosts] to
> only allow downloads from the same host as its index URL, in a future
> release." Did I not characterize that accurately?
> 
>> I was the first person to propose disabling HTML-scraped links from
>> PyPI *ASAP*.  I still want them gone.  That won't require tool
>> changes, it just requires a rollout plan.  Holger has one, let's work
>> on that.
> 
> Fully agreed. I understand from Holger that he would like his PEP to
> also discuss the rough plan beyond just disabling rel-link HTML
> scraping, for how to get to a point where the tools don't follow
> off-PyPI links at all by default. This second stage is what I'm talking
> about.
> 
>> The second thing I proposed is that new tools be developed to *assist*
>> package authors in moving their files onto PyPI, so that future tool
>> changes wouldn't result in widespread instances of people needing to
>> set their tools to insecure settings just to get anything done.  We
>> need to get people's files moving onto PyPI *first*, in order to make
>> changing the tool defaults practical.
> 
> Totally agreed that such tools could be useful, I should have included
> that point explicitly in my summary.
> 
>> The *only* thing I object to is the part where some people want to ban
>> external links from /simple, always and forever, regardless of the
>> package authors' choice in the matter.
> 
> I think the question of external links in /simple is causing far more
> heat than it's worth (from all sides), because it's fundamentally an
> implementation detail, not an end in itself.  Discussing the pros and
> cons of this implementation detail is more or less what rest is all about.
> 
>>> B) Do a second PyPI migration, again with a per-package toggle and
>>> package owners in control, to a "no external links in simple index" setting.
>>> 
>>> Consider for a moment how similar the end state here is with either A or
>>> B. In either case, by default users install only from PyPI, but by
>>> providing a special option they can install from some external source.
>>> (In B, that special option would be something like --find-links with a
>>> URL). In either case, we can continue to allow packages to register
>>> themselves on PyPI, be found in searches, etc, without uploading release
>>> files to PyPI if they prefer not to; they'll just have to provide
>>> special installation instructions to their users in that case.
>> 
>> Not true: approach B means that you won't know what values to pass to
>> the option.
> 
> You say below that "nobody has proposed a 'trust everything' flag." If
> there is no "trust everything" flag, then it seems to me that with
> either option A or option B the user needs to specify what they intend
> to trust. I.e. if you make the default value of allow-hosts the index
> url host, as you said you plan to do at some point, users would need to
> override it with the hosts they want to allow.
> 
> It seems like maybe what you are wanting is automatically-discoverable
> installation from externally-hosted files? I.e. that I could say
> "easy_install Foo --allow-external", without needing to know any
> specific external url for Foo?
> 
> This is what I was characterizing as a "trust everything" flag, but on
> reflection I don't think I have any problem with that. I do think that:
> 
> 1) external release-file URLs should be explicitly nominated by the
> package owner, not automatically sucked out of text metadata.
> 
> 2) (After a suitable package-owner-controlled migration) those external
> links should live at a new separate (machine-readable) endpoint, not the
> existing /simple index. This has two benefits: a) even tools that exist
> today eventually gain the benefit of safer-by-default installations, and
> b) it's simpler and more reliable for future tools to distinguish
> between internal and external release file links.
> 
>> It's also confused about an important point.  All the links that
>> appear in /simple are *already* completely under the package author's
>> control.  No new switches are required to remove external links - you
>> can simply remove them from your releases' descriptions.  This process
>> could be made more transparent or easy, sure -- but it's a mistake to
>> say that this is granting the package owners control that they don't
>> already have.
> 
> This is partly true. An explicit flag grants package owners more control
> in that right now they don't have a choice about whether external links
> to tarballs in their long_description automatically get sucked into the
> simple index. This is not hypothetical; even if there were no rel-link
> scraping, I've had cases where package owners have complained to me
> about pip installing an RC tarball they had linked directly from their
> long-description, not intending it to be auto-installable.
> 
> I think it would be preferable if in the future package owners wouldn't
> need to be careful what release-file links they might place in their
> long_description, and release files would be only explicitly nominated.
> I think the current "automatically suck in links to simple/" behavior is
> only useful as a backwards-compatibility hack, which is why I think an
> explicit switch to disable it (on by default for newly-registered
> projects, slowly, gently, carefully migrated to on for existing
> projects) is better than keeping this link-scraping behavior
> indefinitely for all projects and asking package owners to clean up
> their long-descriptions.
> 
>> What they lack control over is the rel="" attributes, short of
>> removing those links entirely.  That's why I've proposed having a
>> switch for that , as reflected in Holger's pre-PEP.
> 
> I agree with this switch, but I think there is more benefit than cost in
> extending the concept to all automatically-sucked-in external links.
> 
>>> 1) With B, we can provide a gentler migration for package owners, where
>>> they are in control of when the switch happens.
>>> 
>>> 2) With B, all end users benefit from the new defaults, not only end
>>> users who update to the latest and greatest tools.
>>> 
>>> 3) With B (and probably some forms of A as well), end users clearly
>>> state which external sources they would like to trust and install from,
>>> rather than having a global "trust everything!" flag, which is less
>>> secure and less sensible.
>> 
>> These 3 statements all mischaracterize things substantially, because
>> none of those benefits are exclusive to A, and nobody has proposed a
>> "trust everything" flag.  
> 
> You're right that item 1 is not technically exclusive to B, although I
> think B makes it much easier and simpler for package owners. "Just flip
> a switch and done" rather than "Go clean up all your package metadata
> including all past releases, or trust this tool we built to go editing
> all your release metadata for you." I'm not even sure how that
> hypothetical tool would work - what exactly would it do to automatically
> clean up a link to an external tarball that it finds in the
> long_description of a release from three years ago? Just remove it? What
> if the package owner actually wants that link there for human use?
> 
>> Removing rel="" attributes also benefits
>> everyone right away, *without* new tools.
> 
> Sure, and I'm fully in support of that being the first stage.
> 
> Carl
> 
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130312/36dbbc27/attachment.pgp>

From jacob at jacobian.org  Tue Mar 12 21:30:22 2013
From: jacob at jacobian.org (Jacob Kaplan-Moss)
Date: Tue, 12 Mar 2013 15:30:22 -0500
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CALeMXf4Q6hAtLhya1fY76-Kro62kKzcQcNSHX2Fd6CEvCTb2oA@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
	<CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
	<CAK8PqJEkDW9uOfmFDuK2e81uo9mmQLcshk-wzw9=_bUf+p76Zg@mail.gmail.com>
	<CALeMXf4Q6hAtLhya1fY76-Kro62kKzcQcNSHX2Fd6CEvCTb2oA@mail.gmail.com>
Message-ID: <CAK8PqJHiyn+NJKdyqFDaYCa_mwHo4Qg_F7fe0D-whx=qq_kXyg@mail.gmail.com>

On Tue, Mar 12, 2013 at 3:16 PM, PJ Eby <pje at telecommunity.com> wrote:
> I'm confused by this statement.  "never access an external host" is
> not consistent with "have the option to specify what hosts you trust",
> while still keeping PyPI as a universal index of Python software.

Sorry to be confusing! I'm trying to make a distinction between the
out-of-the-box defaults and optional... options.

Here's what I mean: imagine I'm new to Python and getting started. I
grab my machine, install Python (via apt-get, homebrew, from source,
whatever), and grab whatever the programmer next to me at work tells
me is latest and greatest in the packaging world. No configuration, no
editing of a config file, no reading of documentation, just  `apt-get
install python python-pip` or the equivalent.

Now I type `pip install Django`. Again, with no configuration, no
tweaking, no editing of anything, and no real understanding of what's
going on.

The point I'm trying to make is that I consider it absolutely critical
that this by-the-defaults approach gets me the *best* security the
Python ecosystem has to offer. So this means no external packages, it
also means signing and verifying once that infrastructure is in place
[1].

On the other hand, the "have the option" means that `pip install
<url>` needs to continue to work, too.

Is that clear? Again I'm sorry if I'm being confusing; I think I'm
having "translate from brain to keyboard" fail.

> I'm just saying, we don't need to change PyPI to do anything but drop
> the rel="" links, and change the tools to default allow-hosts to equal
> index-url.  (pip has the same parameters, not sure what config files
> it uses, though.  I don't think it inherits [easy_install] settings,
> though.)

As I've said, the implementation details aren't of a concern to me;
the result is.

Jacob

[1] This is also an important step a bit further down the line is
eliminating or drastically reducing the use of an executable setup.py.
But that's another show.

From jacob at jacobian.org  Tue Mar 12 21:35:23 2013
From: jacob at jacobian.org (Jacob Kaplan-Moss)
Date: Tue, 12 Mar 2013 15:35:23 -0500
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CAK8PqJHiyn+NJKdyqFDaYCa_mwHo4Qg_F7fe0D-whx=qq_kXyg@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
	<CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
	<CAK8PqJEkDW9uOfmFDuK2e81uo9mmQLcshk-wzw9=_bUf+p76Zg@mail.gmail.com>
	<CALeMXf4Q6hAtLhya1fY76-Kro62kKzcQcNSHX2Fd6CEvCTb2oA@mail.gmail.com>
	<CAK8PqJHiyn+NJKdyqFDaYCa_mwHo4Qg_F7fe0D-whx=qq_kXyg@mail.gmail.com>
Message-ID: <CAK8PqJFAiH6+nMbNbmTeLojRNLCNeNjwE6B7iVCbY-++3dSE-Q@mail.gmail.com>

On Tue, Mar 12, 2013 at 3:30 PM, Jacob Kaplan-Moss <jacob at jacobian.org> wrote:
> As I've said, the implementation details aren't of a concern to me;
> the result is.

You know what though, I kinda lied.

While I don't care about the implementation, I *do* care about keeping
this process moving forward. Holger has a PEP that's essentially done
(if controversial), and Donald's offered to implement it. The PyCon
sprints next week means we'll have a ton of focused attention, so
there's a very good chance if we strike now we'll have this done in
the next couple weeks.

So yeah, I'm going to back the proposal that has a critical mass
behind it, and it solves the problem. My experience with Python
packaging is that there's a massive amount of inertia, so I think it's
pretty vital to get work done while there are people who've got time
to work on it.

Not to put too fine a point on it, but unless there's actually
something really wrong with Holger's proposal I can't see why we'd
want to wait for some hypothetically better solution.

Jacob

From pje at telecommunity.com  Tue Mar 12 22:22:02 2013
From: pje at telecommunity.com (PJ Eby)
Date: Tue, 12 Mar 2013 17:22:02 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <513F8CC3.2070002@oddbird.net>
References: <20130310150740.GE9677@merlinux.eu>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
	<CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
	<513F8CC3.2070002@oddbird.net>
Message-ID: <CALeMXf60-beq4DfxQO87PKycEz4eo2FAN1nJLD5oe6E8KuaYqQ@mail.gmail.com>

On Tue, Mar 12, 2013 at 4:14 PM, Carl Meyer <carl at oddbird.net> wrote:
> You say below that "nobody has proposed a 'trust everything' flag." If
> there is no "trust everything" flag, then it seems to me that with
> either option A or option B the user needs to specify what they intend
> to trust. I.e. if you make the default value of allow-hosts the index
> url host, as you said you plan to do at some point, users would need to
> override it with the hosts they want to allow.
>
> It seems like maybe what you are wanting is automatically-discoverable
> installation from externally-hosted files? I.e. that I could say
> "easy_install Foo --allow-external", without needing to know any
> specific external url for Foo?
>
> This is what I was characterizing as a "trust everything" flag, but on
> reflection I don't think I have any problem with that.

Here's a story to illustrate what I mean:

Joe wants to install foo.  He runs "easy_install Foo".  Foo is hosted
externally to PyPI, so easy_install says:

URL foo.com/downloads/foo-1.2.tgz BLOCKED by allow-hosts option --
install failed.

(Or words to that effect; I'd have to check the source to get you the
exact phrasing).

The point is, Joe now *knows where to get foo from*, because PyPI
still had the information.  Joe can now decide whether he wants to
download it manually and inspect it first, expand his allow-hosts
option, or give Foo a pass.

The proposals that call for banning all links from the /simple index,
prevent Joe from being able to do this at all.


> This is partly true. An explicit flag grants package owners more control
> in that right now they don't have a choice about whether external links
> to tarballs in their long_description automatically get sucked into the
> simple index. This is not hypothetical; even if there were no rel-link
> scraping, I've had cases where package owners have complained to me
> about pip installing an RC tarball they had linked directly from their
> long-description, not intending it to be auto-installable.

Fair enough.  Thank you for actually providing an illustration of a
problem.  There's been far too much handwaving of problems without any
explicit description of what the problem *is*.

I would support making references to external links explicit rather
than implicit.


> I think it would be preferable if in the future package owners wouldn't
> need to be careful what release-file links they might place in their
> long_description, and release files would be only explicitly nominated.

Ok.


> I think the current "automatically suck in links to simple/" behavior is
> only useful as a backwards-compatibility hack, which is why I think an
> explicit switch to disable it (on by default for newly-registered
> projects, slowly, gently, carefully migrated to on for existing
> projects) is better than keeping this link-scraping behavior
> indefinitely for all projects and asking package owners to clean up
> their long-descriptions.

I would agree with dropping link parsing from the description field,
provided that an alternative way is provided for projects to
explicitly add external links to /simple, concurrent with the other
changes.

Thank you for taking the time to engage and re-engage on this issue,
and to "Explain It Like I'm Five" for me, with an illustration of an
actual problematic use case.  ;-)

From tk47 at students.poly.edu  Tue Mar 12 22:10:48 2013
From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy)
Date: Tue, 12 Mar 2013 17:10:48 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <CAK8PqJFAiH6+nMbNbmTeLojRNLCNeNjwE6B7iVCbY-++3dSE-Q@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
	<CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
	<CAK8PqJEkDW9uOfmFDuK2e81uo9mmQLcshk-wzw9=_bUf+p76Zg@mail.gmail.com>
	<CALeMXf4Q6hAtLhya1fY76-Kro62kKzcQcNSHX2Fd6CEvCTb2oA@mail.gmail.com>
	<CAK8PqJHiyn+NJKdyqFDaYCa_mwHo4Qg_F7fe0D-whx=qq_kXyg@mail.gmail.com>
	<CAK8PqJFAiH6+nMbNbmTeLojRNLCNeNjwE6B7iVCbY-++3dSE-Q@mail.gmail.com>
Message-ID: <513F99D8.7080309@students.poly.edu>

Hello Jacob,

Good to hear from you! Thanks for stating your concerns so clearly, and 
we do understand them. We agree that inertia is important to maintain. 
In fact, we are excited to show this in person to the PyPI community on 
Friday.

We expect to release a design document and a demo in a few hours. Let me 
finish my midterm, and then I will get back to you :)

Thanks,
Trishank

On 03/12/2013 04:35 PM, Jacob Kaplan-Moss wrote:
> On Tue, Mar 12, 2013 at 3:30 PM, Jacob Kaplan-Moss <jacob at jacobian.org> wrote:
>> As I've said, the implementation details aren't of a concern to me;
>> the result is.
>
> You know what though, I kinda lied.
>
> While I don't care about the implementation, I *do* care about keeping
> this process moving forward. Holger has a PEP that's essentially done
> (if controversial), and Donald's offered to implement it. The PyCon
> sprints next week means we'll have a ton of focused attention, so
> there's a very good chance if we strike now we'll have this done in
> the next couple weeks.
>
> So yeah, I'm going to back the proposal that has a critical mass
> behind it, and it solves the problem. My experience with Python
> packaging is that there's a massive amount of inertia, so I think it's
> pretty vital to get work done while there are people who've got time
> to work on it.
>
> Not to put too fine a point on it, but unless there's actually
> something really wrong with Holger's proposal I can't see why we'd
> want to wait for some hypothetically better solution.
>
> Jacob
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
>



From pje at telecommunity.com  Tue Mar 12 22:26:22 2013
From: pje at telecommunity.com (PJ Eby)
Date: Tue, 12 Mar 2013 17:26:22 -0400
Subject: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource
 sorting algorithm
In-Reply-To: <513F893F.9010707@egenix.com>
References: <513F70B5.5030501@egenix.com>
	<513F893F.9010707@egenix.com>
Message-ID: <CALeMXf5fOoVGHz9E2DV-QZFQJUwMpNkJcYQcJDtJSHJ_WRqbHA@mail.gmail.com>

On Tue, Mar 12, 2013 at 3:59 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 12.03.2013 19:15, M.-A. Lemburg wrote:
>> I've run into a weird issue with easy_install, that I'm trying to solve:
>>
>> If I place two files named
>>
>> egenix_mxodbc_connect_client-2.0.2-py2.6.egg
>> egenix-mxodbc-connect-client-2.0.2.win32-py2.6.prebuilt.zip
>>
>> into the same directory and let easy_install running on Linux
>> scan this, it considers the second file for Windows as best
>> match.
>>
>> Is the algorithm used for determining the best match documented
>> somewhere ?
>>
>> I've had a look at the implementation, but this left me rather
>> clueless.
>>
>> I thought that setuptools would prefer the .egg file over
>> the prebuilt .zip file - binary files being easier to install
>> than "source" files.
>
> After some experiments, I found that the follow change
> in filename (swapping platform and python version, in addition
> to use '-' instead of '.) works:
>
> egenix-mxodbc-connect-client-2.0.2-py2.6-win32.prebuilt.zip
>
> OTOH, this one doesn't (notice the difference ?):
>
> egenix-mxodbc-connect-client-2.0.2.py2.6-win32.prebuilt.zip
>
> The logic behind all this looks rather fragile to me.

easy_install only guarantees sane version parsing for distribution
files built using setuptools' naming algorithms.  If you use
distutils, it can only make guesses, because the distutils does not
have a completely unambiguous file naming scheme.  And if you are
naming the files by hand, God help you.  ;-)

From carl at oddbird.net  Tue Mar 12 22:52:54 2013
From: carl at oddbird.net (Carl Meyer)
Date: Tue, 12 Mar 2013 15:52:54 -0600
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
 pypi site
In-Reply-To: <CALeMXf60-beq4DfxQO87PKycEz4eo2FAN1nJLD5oe6E8KuaYqQ@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
	<CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
	<513F8CC3.2070002@oddbird.net>
	<CALeMXf60-beq4DfxQO87PKycEz4eo2FAN1nJLD5oe6E8KuaYqQ@mail.gmail.com>
Message-ID: <513FA3B6.8000307@oddbird.net>

On 03/12/2013 03:22 PM, PJ Eby wrote:
> Here's a story to illustrate what I mean:
> 
> Joe wants to install foo.  He runs "easy_install Foo".  Foo is hosted
> externally to PyPI, so easy_install says:
> 
> URL foo.com/downloads/foo-1.2.tgz BLOCKED by allow-hosts option --
> install failed.
> 
> (Or words to that effect; I'd have to check the source to get you the
> exact phrasing).
> 
> The point is, Joe now *knows where to get foo from*, because PyPI
> still had the information.  Joe can now decide whether he wants to
> download it manually and inspect it first, expand his allow-hosts
> option, or give Foo a pass.
> 
> The proposals that call for banning all links from the /simple index,
> prevent Joe from being able to do this at all.

Ah, thank you! Yes, I was indeed missing that mode of getting the
information to the user. Makes perfect sense now.

> I would support making references to external links explicit rather
> than implicit.

Excellent.

>> I think the current "automatically suck in links to simple/" behavior is
>> only useful as a backwards-compatibility hack, which is why I think an
>> explicit switch to disable it (on by default for newly-registered
>> projects, slowly, gently, carefully migrated to on for existing
>> projects) is better than keeping this link-scraping behavior
>> indefinitely for all projects and asking package owners to clean up
>> their long-descriptions.
> 
> I would agree with dropping link parsing from the description field,
> provided that an alternative way is provided for projects to
> explicitly add external links to /simple, concurrent with the other
> changes.

So the other change I proposed is that these new explicitly-nominated
external links would not be added to the main simple/ index page for a
project, but to a with-external-links/ sub-page that includes all links,
internal and external. (This being, of course, subject to the same
package-owner-controlled migration process, nothing done abruptly). The
long-term benefits I see to making this tweak:

1) Users still using today's easy_install on RHEL in five years will
automatically get the benefit of safe-by-default (as each package owner
makes their migration) without needing to upgrade their easy_install.

2) Implementors of future installers can make explicit choices about
which set of links to ask for, without every single installer needing to
reimplement possibly-error-prone and possibly-subject-to-attack
host-comparison code.

I realize that this requires updating easy_install/pip/buildout in order
to take advantage of externally-hosted files in the new system, but
since end-user tooling updates are part of the plan either way, I think
in the spirit of safe-by-default it's preferable to require end-user
tooling updates to get access to less-safe options, rather than require
end-user tooling updates in order to become safer by default.

What do you think?

> Thank you for taking the time to engage and re-engage on this issue,
> and to "Explain It Like I'm Five" for me, with an illustration of an
> actual problematic use case.  ;-)

Of course, and likewise; I've learned a lot from this exchange and
appreciate you sticking with it and explaining things the second and
third time until I got it. :-)

Carl

From reinout at vanrees.org  Tue Mar 12 23:13:59 2013
From: reinout at vanrees.org (Reinout van Rees)
Date: Tue, 12 Mar 2013 23:13:59 +0100
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CAL0kPAWuaupd1ykf+okzLfRebSonLw=u6mKOHeaAkqGhJsMoBw@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<CAL0kPAUELqLo+__GJD0NF0j_dM8zzeDa-wPxjedyKCO-sqLcqQ@mail.gmail.com>
	<212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com>
	<CAL0kPAUt7NKex9i9rm-0K-haYxemCJdjPZmz1pH7z6ZpM2UyGg@mail.gmail.com>
	<826D31AF-BE1C-4FC3-8FF9-EAC3B7D6EA54@mac.com>
	<CAL0kPAU-JbVa0AsSfSrOA3yKgiXxb85=9dwg_MOuu1ULGv-qng@mail.gmail.com>
	<DAEC3884-78A8-4CDD-BDFD-37DB458E7F6B@mac.com>
	<CAL0kPAWuaupd1ykf+okzLfRebSonLw=u6mKOHeaAkqGhJsMoBw@mail.gmail.com>
Message-ID: <kho9b3$bia$1@ger.gmane.org>

On 11-03-13 11:44, Lennart Regebro wrote:
> That's now all the energy I'm willing to spend on discussing this
> topic. Third-party hosting needs to go. I believe there is a broad
> consensus on this. Let's instead discuss*how*  to implement it.

Hear hear!

I'm so fed up with other people's non-pypi hosts breaking down breaking 
my releases...

I should not be forced to deploy some caching proxy between ohters and 
my releases in order to get a marginally-working system.

Those that have good reasons to break everybody's build processes should 
take their packages elsewhere.


Reinout

-- 
Reinout van Rees                    http://reinout.vanrees.org/
reinout at vanrees.org             http://www.nelen-schuurmans.nl/
"If you're not sure what to do, make something. -- Paul Graham"


From reinout at vanrees.org  Tue Mar 12 23:21:47 2013
From: reinout at vanrees.org (Reinout van Rees)
Date: Tue, 12 Mar 2013 23:21:47 +0100
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
References: <20130310150740.GE9677@merlinux.eu>
	<710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io>
	<20130310181828.GH9677@merlinux.eu>
	<D1CA9D5F-91E5-4C4F-B0C0-EC3E1C7EC7C0@stufft.io>
	<20130310195405.GI9677@merlinux.eu>
	<AA59B0A4-0ADF-4D94-853A-34191BB829C8@stufft.io>
	<CALeMXf7t2NFNTuOgkXjokMzNbEL2hGUZO3RewP+omDVDawN=rA@mail.gmail.com>
	<1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io>
	<CALeMXf6qrWSKQmT2nsK2HhB5h40yQ0AybuneQaCSOPiOMrZ3Tg@mail.gmail.com>
	<459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io>
	<CALeMXf6uxF6w6c92ve8qgRB4zFnSHur+3gmmrnwoJJZXCAhwCg@mail.gmail.com>
	<CAL0kPAXasJ1OAR2uWXOGVKdLKgGKL5GMPtyFvWtN8Lj+bHi-xQ@mail.gmail.com>
	<CALeMXf5gHocepdaOrJ=vHv+0Lf5+6AsRG07nNoCqyhg8vd=FQw@mail.gmail.com>
	<CAL0kPAWsmzM76O7mFx-9eLCgVdgiMMN7nf2sNF-FBCxCQh5Sow@mail.gmail.com>
	<CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
Message-ID: <kho9pn$l2a$1@ger.gmane.org>

On 12-03-13 16:38, PJ Eby wrote:
> I'll ask it again: why should*thousands*  of projects be censored or
> made to change their release processes, because*you*  can't be
> bothered to cache the distributions of the projects you depend on?

So... everyone that uses pypi should be *forced* to use their own 
private pypi+externals cache? Otherwise they're not friendly enough to 
projects that don't want to use Pypi but that do it anyway?

Wow, that's user friendly... Thanks!

Why aren't there instructions on the front page of pypi on how to set up 
a private mirror of all external packages as that's obviously the 
professional requirement of every single person that types in "pip install"?



Reinout

-- 
Reinout van Rees                    http://reinout.vanrees.org/
reinout at vanrees.org             http://www.nelen-schuurmans.nl/
"If you're not sure what to do, make something. -- Paul Graham"


From ncoghlan at gmail.com  Wed Mar 13 07:28:47 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 12 Mar 2013 23:28:47 -0700
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting
 on PYPI
In-Reply-To: <513F8922.90008@egenix.com>
References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com>
	<20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com>
	<CALeMXf4+tRKTF=dVdx8wXfaM=ZknA2+9PXh3h6_FDtD4e1rAbw@mail.gmail.com>
	<513F8922.90008@egenix.com>
Message-ID: <CADiSq7fh89J0SfCDYEz5y0Zn_z2nqG3U6NaU8ohekn9rW_5CrQ@mail.gmail.com>

On Tue, Mar 12, 2013 at 12:59 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> I think we should establish a versioned API like that for PyPI
> to make progress easier. All major web APIs use versioning
> for this reason.

Why set up versioning for something we want to phase out? There will
never be a simple-v3, so this is really overengineering the proposed
change.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From tk47 at students.poly.edu  Wed Mar 13 07:41:55 2013
From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy)
Date: Wed, 13 Mar 2013 02:41:55 -0400
Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF
Message-ID: <51401FB3.7000408@students.poly.edu>

Hello everyone,

I am pleased to announce our demonstration of PyPI and pip with TUF.

Firstly, we solicit your thoughts and comments on our design document 
for integrating PyPI with TUF:

https://docs.google.com/document/d/1sHMhgrGXNCvBZdmjVJzuoN5uMaUAUDWBmn3jo7vxjjw/edit?usp=sharing

Secondly, you may wish to test our demo of PyPI and pip with TUF:

https://github.com/dachshund/pip/wiki/pip-over-TUF

Thirdly, this is how little it takes to secure pip with TUF:

https://github.com/dachshund/pip/compare/develop...tuf

Finally, you may be interested to learn about how one might manually 
secure a PyPI package index with TUF:

https://github.com/dachshund/pip/wiki/PyPI-over-TUF

We are excited to be able to show this to you now, and in person at our 
lightning talk at PyCon this Friday.

We think that there is great potential for the PyPI and TUF community to 
work together to secure Python package management. This is just the 
beginning, and there is some work left to do, but we are confident that 
we have demonstrated to you that PyPI could be secured with TUF in the 
very near future. We would be happy to discuss with you how we compare 
with other proposals.

We look forward to your questions and feedback!

Thanks,
Trishank


From mal at egenix.com  Wed Mar 13 09:23:24 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 13 Mar 2013 09:23:24 +0100
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting
 on PYPI
In-Reply-To: <CADiSq7fh89J0SfCDYEz5y0Zn_z2nqG3U6NaU8ohekn9rW_5CrQ@mail.gmail.com>
References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com>
	<20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com>
	<CALeMXf4+tRKTF=dVdx8wXfaM=ZknA2+9PXh3h6_FDtD4e1rAbw@mail.gmail.com>
	<513F8922.90008@egenix.com>
	<CADiSq7fh89J0SfCDYEz5y0Zn_z2nqG3U6NaU8ohekn9rW_5CrQ@mail.gmail.com>
Message-ID: <5140377C.90909@egenix.com>

On 13.03.2013 07:28, Nick Coghlan wrote:
> On Tue, Mar 12, 2013 at 12:59 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> I think we should establish a versioned API like that for PyPI
>> to make progress easier. All major web APIs use versioning
>> for this reason.
> 
> Why set up versioning for something we want to phase out? There will
> never be a simple-v3, so this is really overengineering the proposed
> change.

Who says that we want to phase out the /simple/ index ?

FWIW, I don't think that two or three small changes to the PyPI
(see my email to Holger) server warrants calling this over-engineering.
This is about moving forward in a backwards compatible and future
proof way.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 13 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From ncoghlan at gmail.com  Wed Mar 13 09:09:26 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 13 Mar 2013 01:09:26 -0700
Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF
In-Reply-To: <51401FB3.7000408@students.poly.edu>
References: <51401FB3.7000408@students.poly.edu>
Message-ID: <CADiSq7cHDeG93XfL4gPDxaco1cZKKAq5q70jDMpPeX1tS-vEgQ@mail.gmail.com>

On Tue, Mar 12, 2013 at 11:41 PM, Trishank Karthik Kuppusamy
<tk47 at students.poly.edu> wrote:
> Hello everyone,
>
> I am pleased to announce our demonstration of PyPI and pip with TUF.
>
> Firstly, we solicit your thoughts and comments on our design document for
> integrating PyPI with TUF:
>
> https://docs.google.com/document/d/1sHMhgrGXNCvBZdmjVJzuoN5uMaUAUDWBmn3jo7vxjjw/edit?usp=sharing

Thanks for putting this together!

Just a few notes regarding key management:
- the PSF board generally stays out of the technical details of
running the python.org infrastructure, so it's likely that any root
keys would be handled by the PSF infrastructure committee. A (2, 4) or
(3, 5) trust configuration would likely be manageable at this level.
- at the target delegation level, PyPI supports the registration of
new projects through the web service (see
http://docs.python.org/2/distutils/packageindex.html). If my
understanding of target delegation is correct, this means the "simple"
and "packages/source/<letter>" delegations will need to be (1, 1) and
online.
- higher levels of the target delegation hierarchy could conceivably
be kept offline, but there seems little value in doing so if they're
trusting on online (1, 1) key
- many PyPI packages are maintained by single developers, so (1, 1) or
(1, n) is likely to be the only generally feasible level of signing at
the project level.

With the current focus being on getting an improvement from the status
quo that we can successfully deploy in a reasonable period of time,
the target delegation side of things probably needs to be
substantially simpler in the initial iteration. Yes, it leaves us open
to certain vulnerabilities we would like to remove in the long run,
but we need to be very cautious in the additional demands we place on
the users uploading to PyPI. It may even mean the initial iteration
allows projects to rely on a PyPI provided signing key for their TUF
metadata, using the existing upload mechanisms to add the files to
PyPI.

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From tk47 at students.poly.edu  Wed Mar 13 10:13:16 2013
From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy)
Date: Wed, 13 Mar 2013 05:13:16 -0400
Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF
In-Reply-To: <CADiSq7cHDeG93XfL4gPDxaco1cZKKAq5q70jDMpPeX1tS-vEgQ@mail.gmail.com>
References: <51401FB3.7000408@students.poly.edu>
	<CADiSq7cHDeG93XfL4gPDxaco1cZKKAq5q70jDMpPeX1tS-vEgQ@mail.gmail.com>
Message-ID: <5140432C.7000904@students.poly.edu>

Hello Nick,

On 3/13/13 4:09 AM, Nick Coghlan wrote:
>
> - the PSF board generally stays out of the technical details of
> running the python.org infrastructure, so it's likely that any root
> keys would be handled by the PSF infrastructure committee. A (2, 4) or
> (3, 5) trust configuration would likely be manageable at this level.

Understood. We think a higher (t, n) [where t out of n signatures are 
needed to trust the metadata for a role] is better for the root role 
simply because its crucial metadata (the authorized keys for top-level 
roles) should change very rarely.

> - at the target delegation level, PyPI supports the registration of
> new projects through the web service (see
> http://docs.python.org/2/distutils/packageindex.html). If my
> understanding of target delegation is correct, this means the "simple"
> and "packages/source/<letter>" delegations will need to be (1, 1) and
> online.
> - higher levels of the target delegation hierarchy could conceivably
> be kept offline, but there seems little value in doing so if they're
> trusting on online (1, 1) key

Fortunately, the "targets/simple" and 
"targets/packages/(version)/(letter)/" roles should not require (1, 1) 
online keys, as their metadata (simply target delegations and no actual 
target files) should also fluctuate fairly rarely. I should make this 
clearer in our design document.

> - many PyPI packages are maintained by single developers, so (1, 1) or
> (1, n) is likely to be the only generally feasible level of signing at
> the project level.

Yes, the package developers themselves could choose any (t, n) they 
like. In our design, we propose that PyPI could eventually delegate to 
"stable" packages which need little change (and use more security with 
more offline keys) and to "unstable" packages which need frequent change 
(and use less security with more online keys).

> With the current focus being on getting an improvement from the status
> quo that we can successfully deploy in a reasonable period of time,
> the target delegation side of things probably needs to be
> substantially simpler in the initial iteration. Yes, it leaves us open
> to certain vulnerabilities we would like to remove in the long run,
> but we need to be very cautious in the additional demands we place on
> the users uploading to PyPI. It may even mean the initial iteration
> allows projects to rely on a PyPI provided signing key for their TUF
> metadata, using the existing upload mechanisms to add the files to
> PyPI.

I agree that there is a delicate problem of balancing security with 
usability here, especially in the beginning.

You raised a very good issue there: on first migration, how would PyPI 
accommodate packages which have not had their target files delegated to 
their developers? We imagine that in this case, PyPI could assume 
initial responsibility for these packages, and later PyPI would delegate 
those packages to their respective developers.

Thanks for your input,
Trishank


From holger at merlinux.eu  Wed Mar 13 12:21:59 2013
From: holger at merlinux.eu (holger krekel)
Date: Wed, 13 Mar 2013 11:21:59 +0000
Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of
	release files
Message-ID: <20130313112158.GO9677@merlinux.eu>

Hi all,

after some more discussions and hours spend by Carl Meyer (who is now
co-authoring the PEP) and me, here is a new V3 pre-submit draft.  
It is now more ambitious than the previous draft as should be obvious
from the modified abstract (and Carl Meyers and Philip's earlier
interactions on this list).  There also are more details of how
the current link-scraping works among other improvements and incorporations
of feedback from discussions here.

We intend to submit this draft tonight to the PEP editors.  

Feedback now and later remains welcome.  I am sure there are issues to 
be sorted and clarified, among them the versioning-API suggestion by 
Marc-Andre.

Thanks for everybody's support and feedback so far,
holger


PEP: XXX
Title: Transitioning to release-file hosting on PyPI
Version: $Revision$
Last-Modified: $Date$
Author: Holger Krekel <holger at merlinux.eu>, Carl Meyer <carl at oddbird.net>
Discussions-To: catalog-sig at python.org
Status: Draft (PRE-submit V3)
Type: Process
Content-Type: text/x-rst
Created: 10-Mar-2013
Post-History:


Abstract
========

This PEP proposes a backward-compatible two-phase transition process to speed
up, simplify and robustify installing from the pypi.python.org (PyPI)
package index.  To ease the transition and minimize client-side
friction, **no changes to distutils or existing installation tools are
required in order to benefit from the transition phases, which is to
result in faster, more reliable installs for most existing packages**.

The first transition phase implements easy and explicit means for
a package maintainter to control which release file links are 
served to present-day installation tools.  The first phase also
includes the implementation of analysis tools for present-day packages,
to support communication with package maintainers and the automated
setting of default modes for controling release file links.   

The second transition phase will result in the current PYPI index 
to only serve PYPI-hosted files by default.  Externally hosted files
will still be automatically discoverable through a second index. 
Present-day installation tools will be able to continue working
by specifying this second index.  New versions of installation
tools shall default to only install packages from PYPI unless
the user explicitely wishes to include non-PYPI sites.



Rationale
=========

.. _history:

History and motivations for external hosting
--------------------------------------------

When PyPI went online, it offered release registration but had no
facility to host release files itself.  When hosting was added, no
automated downloading tool existed yet.  When Philip Eby implemented
automated downloading (through setuptools), he made the choice to
allow people to use download hosts of their choice.  The finding of
externally-hosted packages was implemented as follows:

#. The PyPI ``simple/`` index for a package contains all links found
   anywhere in that package's metadata for any release. Links in the
   "Download-URL" and "Home-page" metadata fields are given
   ``rel=download`` and ``rel=homepage`` attributes, respectively.

#. Any of these links whose target is a file whose name appears to be
   in the form of an installable source or binary distribution, with
   basename in the form "packagename-version.ARCHIVEEXT", is considered 
   a potential installation candidate.

#. Similarly, any links suffixed with an "#egg=packagename-version"
   fragment are considered an installation candidate.

#. Additionally, the ``rel=homepage`` and ``rel=download`` links are
   followed and, if HTML, are themselves scraped for release-file links
   in the above formats.

Today, most packages released on PyPI host their release files on
PyPI, but a small percentage (XXX need updated data) rely on external
hosting.

There are many reasons [2]_ why people have chosen external
hosting. To cite just a few:

- release processes and scripts have been developed already and upload
  to external sites

- it takes too long to upload large files from some places in the
  world

- export restrictions e.g. for crypto-related software

- company policies which require offering open source packages
  through own sites

- problems with integrating uploading to PYPI into one's release
  process (because of release policies)

- desiring download statistics different from those maintained by PyPI

- perceived bad reliability of PYPI

- not aware that PyPI offers file-hosting

Irrespective of the present-day validity of these reasons, there
clearly is a history why people choose to host files externally and it
even was for some time the only way you could do things.


Problem
-------

**Today, python package installers (pip, easy_install, buildout, and
others) often need to query many non-PyPI URLs even if there are no
externally hosted files**.  Apart from querying pypi.python.org's
simple index pages, also all homepages and download pages ever
specified with any release of a package are crawled by an installer.
The need for installers to crawl external sites slows down
installation and makes for a brittle and unreliable installation
process.  Those sites and packages also don't take part in the
:pep:`381` mirroring infrastructure, further decreasing reliability
and speed of automated installation processes around the world.

Most packages are hosted directly on pypi.python.org [1]_.  Even for
these packages, installers still crawl the homepage(s) of a package.
Many package uploaders are not aware that specifying the "homepage" in
their release process will slow down the installation process for all
users.

Relying on third party sites also opens up more attack vectors for
injecting malicious packages into sites using automated installs.  A
simple attack might just involve getting hold of an old now-unused
homepage domain and placing malicious packages there.  Moreover,
performing a Man-in-The-Middle (MITM) attack between an installation
site and any of the download sites can inject malicious packages on
the installation site.  As many homepages and download locations are
using HTTP and not HTTPS, such attacks are not hard to launch.  Such
MITM attacks can easily happen even for packages which never intended
to host files externally as their homepages are contacted by
installers anyway.

There is currently no way for package maintainers to avoid 3rd party
crawling, other than removing all homepage/download url metadata for
all historic releases.  While a script [3]_ has been written to
perform this action, it is not a good general solution because it
removes semantic information like the "homepage" specification from
PYPI packages.

Even if the "Homepage" and "Download-URL" links were not scraped for
further links, there is still no way under the current system for a
package owner to link to an installable file from their package
metadata without installation tools automatically considering that
file a candidate for installation.


Solution / two transition phases
================================

This first transition phase starts off by introducing a "hosting-mode"
field for each project on PYPI, allowing explicit control of which
machine-readable release file links are served to present-day
installation tools.  The first transition will, after successful
hosting-mode manipulations of individual early-adopters, then set a
default hosting mode for existing packages, based on automated anaylsis.
**Maintainers will be notified one month ahead of any such automated
change**.  At completion of the first transition phase, **all
present-day existing release and installation processes and tools are
expected to continue working**.  Any remaining errors or problems are
expected to only relate to installation of individual packages and can
be easily corrected by package maintainers or PYPI admins if maintainers
are not reachable.

**The second transition phase will then get PyPI, after a three month
warning period, to only serve links for PyPI-hosted packages under the 
present-day ``simple/`` index**.  At this point, present-day installation 
tools will not see externally hosted links anymore, unless they specify
a new ``simple/-with-externals`` index which PYPI MUST offer ahead of 
the start of the second transition phase.  This new index contains 
the external links as controled by a package maintainer.  Moreover, PYPI 
MUST also provide means to register and control download
links, independently from the current metadata and remote html-scraping 
methods.  At completion of the second transition phase, all present-day
installation tools will and all future installation releases SHALL
default to only install PYPI-hosted packages unless a user specifies
option(s) to include external links or the external index.   If an
installation tool chooses to use the new ``simple/-with-externals/`` as
a default, it MUST warn a user with a precise messsage of which external
links were followed.

Maintainers of packages which currently host release files on non-PyPI
sites shall receive instructions and tools to ease "re-hosting" of
their historic and future package release files.  The implementation
of such a re-hosting tool is expected but NOT REQUIRED to be available 
at the beginning of phase 2.


Implementation
==============

The foundation of both transition phases is the introduction of three
"modes" of PyPI hosting for a package, effecting which links are
generated for the ``simple/`` index in transition phase 1.  These modes 
are implemented without requiring changes to installation tools via changes 
to the algorithm for generating the machine-readable "/simple" index.

The modes are:

- ``pypi-ext-crawl``: no change from the current situation of generating
  machine-readable links for installation tools, as outlined in the
  history_.

- ``pypi-ext``: for a package in this mode, the "Home-page" and
  "Download-url" links added to the simple index are given
  ``rel=ext-homepage`` and ``rel=ext-download`` attributes instead of
  ``rel=homepage`` and ``rel=download``. The effect of this (with no
  change in installation tools neccessary) is that these links will 
  not be followed and scraped for further candidate links. Only installable 
  files linked directly from PyPI metadata (wherever they are hosted) will be
  considered for installation.

- ``pypi-only``: for a package in this mode, only links to URLs on
  PyPI itself will be added to the simple index.

At the end of the warning period of transition phase 2, the ``simple/``
index will be restricted to only show links to URLs on PyPI itself while the 
``simple/-with-externals`` index will during both transition phases show 
links to PYPI and any externals as controled by the package maintainer 
and the hosting-mode.

For a package in ``pypi-only`` mode, external links will no longer be
automatically scraped from metadata and added to the two indexes.
However, PyPI will expose an interface for package maintainers to
explicitly specify any number of URLs to externally hosted installable
files for a given release, and these URLs will be added to the
``simple/-with-ext`` index page for that project but NOT to the basic 
``simple/`` index page. Thus the ``-with-ext`` alternative index provides 
a means for package owners with good reason to host their packages elsewhere a
means to do so (even under the ``pypi-only`` package mode) and still
have that information reflected on PyPI in machine-readable form, allowing
installation tool users an explicit and easy choice of whether they wish
to read an index that includes externally-hosted packages or one that
does not.

The goal of this PEP is that eventually all projects on PyPI can be
migrated to the ``pypi-only`` mode, while preserving the ability to
install release files hosted from third parties in an automated manner.

Deprecation of hosting-modes to eventually only allow the "pypi-only"
mode is NOT REGULATED by this PEP but is expected to become feasible
some time after successfull implementation of the two transition phases
described in this PEP.


Implementation and interaction timeline
--------------------------------------------------

The proposed solution consists of multiple implementation and
communication steps:

#. Implement in PyPI the three modes and the ``-with-ext`` index as
   described above, and an interface for package owners to select the
   mode for each package and register explicit external file URLs for
   the ``-with-ext`` index (for projects in the ``pypi-only`` mode).
   Default all newly-registered packages to ``pypi-only`` mode (but
   package owners can still switch to the other modes as
   desired). Implement in ``pep381client`` the mirroring of the
   ``-with-ext`` index pages.

#. Determine which packages have installable versions available that
   are linked only from homepage/download pages (group B) and which
   packages have all installable files available on PyPI itself (group
   A).

#. Send mail to maintainers of projects in group A that their project
   is going to be automatically configured to ``pypi-ext`` mode in one
   month.  Inform them that this change is not expected to affect
   installability of their project at all, but will result in faster
   and safer installs for their users.  Encourage them to set this
   mode (or ``pypi-only``) themselves earlier to benefit their users.

#. Send mail to maintainers of packages in group B that their package
   hosting mode is ``pypi-ext-crawl``, list the sites which currently
   are crawled, and suggest that they re-host their packages directly
   on PyPI and then switch to ``pypi-only``.  Provide instructions and
   tools to help with this "re-uploading" process.

In addition, maintainers of installation tools are asked to release
two updates.  The first one shall provide clear warnings if
externally-hosted packages (that is, packages at a URL whose domain
name differs from the domain name of the index URL in use) are
selected for download, for which projects and URLS exactly this
happens, and that in future versions externally-hosted downloads 
will be disabled by default.

The second update for installation tools should change the default
mode to allow only installation of package files hosted at the index
domain, and allow installation of externally-hosted packages only when
the user supplies an option (ideally an option specifying exactly
which external domains are to be trusted as download sources). When
download of an externally-hosted package is disallowed, the user
should be notified, with instructions for how to make the install
succeed and warnings about the potential consequences.

It is expected that tools in this release may choose to change the
default index url to ``https://pypi.python.org/simple/-with-ext`` in
order to support explicitly-registered external URLs for projects in
``pypi-only`` mode. Tools may choose to do this only when the user
requests installation of externally-hosted packages, or may choose to
do this in all cases so as to be able to notify users when an
externally-hosted file is available.

Specific timelines for deprecation of ``pypi-ext-crawl`` and
``pypi-ext`` modes are not mandated in this PEP; this will depend on
observed behavior of package owners and availability of tooling. It is
expected that ``pypi-ext-crawl`` mode will be an early candidate for
deprecation; it may be necessary to leave ``pypi-ext`` mode in place 
for quite some time, at least for those packages already
depending on it (it may be removed as an option for new packages when
tool support for explicit external URLs and the ``-with-ext`` index is
sufficient).



Open questions
==============

- Should we introduce a third index which maintains the old behaviour
  of providing links irrespective of a maintainer's hosting-mode choice?

- should we introduce some form of PYPI API versioning in this PEP?
  (it might complicate matters and delay the implementation but is 
  often seen as good practise)


References
==========

.. [1] Donald Stufft, ratio of externally hosted versus pypi-hosted, http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html (XXX need to update this data for all easy_install-supported formats)

.. [2] Marc-Andre Lemburg, reasons for external hosting, http://mail.python.org/pipermail/catalog-sig/2013-March/005626.html

.. [3] Holger Krekel, Script to remove homepage/download metadata for all releases http://mail.python.org/pipermail/catalog-sig/2013-February/005423.html

Acknowledgements
================

Philip Eby for precise information and the basic ideas to implement
the transition via server-side changes only.

Donald Stufft for pushing away from external hosting and 
and offering to implement both a Pull Request for the neccessary PYPI changes 
and the analysis tool to drive the transition phase 1.

Marc-Andre Lemburg, Nick Coghlan and catalog-sig in general for 
thinking through issues regarding getting rid of "external hosting".

Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:

From pje at telecommunity.com  Wed Mar 13 15:26:16 2013
From: pje at telecommunity.com (PJ Eby)
Date: Wed, 13 Mar 2013 10:26:16 -0400
Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of
 release files
In-Reply-To: <20130313112158.GO9677@merlinux.eu>
References: <20130313112158.GO9677@merlinux.eu>
Message-ID: <CALeMXf7Sv0tygUGyB+zS9x=mXbxosHYuJ5Gxbx6446fEao40Eg@mail.gmail.com>

On Wed, Mar 13, 2013 at 7:21 AM, holger krekel <holger at merlinux.eu> wrote:
> Hi all,
>
> after some more discussions and hours spend by Carl Meyer (who is now
> co-authoring the PEP) and me, here is a new V3 pre-submit draft.
> It is now more ambitious than the previous draft as should be obvious
> from the modified abstract (and Carl Meyers and Philip's earlier
> interactions on this list).  There also are more details of how
> the current link-scraping works among other improvements and incorporations
> of feedback from discussions here.
>
> We intend to submit this draft tonight to the PEP editors.
>
> Feedback now and later remains welcome.  I am sure there are issues to
> be sorted and clarified, among them the versioning-API suggestion by
> Marc-Andre.
>
> Thanks for everybody's support and feedback so far,
> holger

Looks good to me!

Setuptools' two releases will probably look like this:

1. Default to externals index, warn when fetching URLs that are not
the same host as the index
2. Default to externals index, reject URLs that are not the same host
as the index unless --allow-hosts is configured  (IOW, default
allow-hosts to equal index-url host)

That way, external URLs can still be discovered by the user, but the
default configuration is still secure.

From tseaver at palladion.com  Wed Mar 13 17:54:04 2013
From: tseaver at palladion.com (Tres Seaver)
Date: Wed, 13 Mar 2013 12:54:04 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <20130312195707.GL9677@merlinux.eu>
References: <CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
	<CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
	<CAK8PqJEkDW9uOfmFDuK2e81uo9mmQLcshk-wzw9=_bUf+p76Zg@mail.gmail.com>
	<20130312195707.GL9677@merlinux.eu>
Message-ID: <khqb7b$7eu$1@ger.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/12/2013 03:57 PM, holger krekel wrote:
> Nobody should be lead to think that PYPI is a trusted or reviewed
> source of software even if we got rid of external hosting completely.

Amen.  I still boggle at the amount of "sky is falling" stuff here over
MITM / external links / whatever, given the potential damaage from
explicitly malicious uploads (trojans, viruses, whatever).  Package
signing might help here, but only for consumers who willing to think hard
enough about the problem to manage a web of trust (frankly, a vanishingly
small minority).

And then there are these problems:

- - Backward-imcompatible releases (even those which make appropriate
  signals in their version numbers).

- - Removal of distributions / releases / projects.

- - Re-upload of new distributions which sliently replace previous
  distributions *of the same release* ("Yes, Virginia, there are
  people out there who do this").

which are deal-killers for the folks who want always-on, reliable,
repeatable, automatic installation from PyPI (instead of creating their
own indexes).

Adding HTTPS or removing external links does nothing to mitigate those
issues.


Tres.
- -- 
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iEYEARECAAYFAlFArywACgkQ+gerLs4ltQ7zLACgluGTMdUYheeMGoFgAUH1VZja
VJYAnjBPXbs8yeQ1FYa0mNZhAkTlcJQf
=8KSF
-----END PGP SIGNATURE-----


From donald at stufft.io  Wed Mar 13 18:06:08 2013
From: donald at stufft.io (Donald Stufft)
Date: Wed, 13 Mar 2013 13:06:08 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <khqb7b$7eu$1@ger.gmane.org>
References: <CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
	<CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
	<CAK8PqJEkDW9uOfmFDuK2e81uo9mmQLcshk-wzw9=_bUf+p76Zg@mail.gmail.com>
	<20130312195707.GL9677@merlinux.eu> <khqb7b$7eu$1@ger.gmane.org>
Message-ID: <48C1CAC9-C80A-470A-A0FF-500391101918@stufft.io>


On Mar 13, 2013, at 12:54 PM, Tres Seaver <tseaver at palladion.com> wrote:

> Signed PGP part
> On 03/12/2013 03:57 PM, holger krekel wrote:
> > Nobody should be lead to think that PYPI is a trusted or reviewed
> > source of software even if we got rid of external hosting completely.
> 
> Amen.  I still boggle at the amount of "sky is falling" stuff here over
> MITM / external links / whatever, given the potential damaage from
> explicitly malicious uploads (trojans, viruses, whatever).  Package
> signing might help here, but only for consumers who willing to think hard
> enough about the problem to manage a web of trust (frankly, a vanishingly
> small minority).

Really now? Let's see I can easily protect against malicous uploads by only installing from trusted authors. I cannot easily prevent a MITM or a compromised external host if the tools don't protect me against it. Without the tooling and infrastructure moving to close this gap the only way to do it is to not use that tooling or infrastructure at all. Namely even if the author of the package is myself I cannot be secure installing it using the current toolchain and infrastructure unless I bend over backwards to make sure that no installable link appears anywhere in my long description, and I don't have a homepage, and I don't have a download url.

> 
> And then there are these problems:
> 
> - - Backward-imcompatible releases (even those which make appropriate
>   signals in their version numbers).
> 
> - - Removal of distributions / releases / projects.
> 
> - - Re-upload of new distributions which sliently replace previous
>   distributions *of the same release* ("Yes, Virginia, there are
>   people out there who do this").
> 
> which are deal-killers for the folks who want always-on, reliable,
> repeatable, automatic installation from PyPI (instead of creating their
> own indexes).
> 
> Adding HTTPS or removing external links does nothing to mitigate those
> issues.

Yes there are other problems, so let's just throw our hands in the air and say fuck it instead of iteratively working to secure the system.

> 
> 
> Tres.
> - -- 
> ===================================================================
> Tres Seaver          +1 540-429-0999          tseaver at palladion.com
> Palladion Software   "Excellence by Design"    http://palladion.com
> 
> 
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130313/98d60f89/attachment.pgp>

From donald at stufft.io  Wed Mar 13 18:12:59 2013
From: donald at stufft.io (Donald Stufft)
Date: Wed, 13 Mar 2013 13:12:59 -0400
Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of
	release files
In-Reply-To: <CALeMXf7Sv0tygUGyB+zS9x=mXbxosHYuJ5Gxbx6446fEao40Eg@mail.gmail.com>
References: <20130313112158.GO9677@merlinux.eu>
	<CALeMXf7Sv0tygUGyB+zS9x=mXbxosHYuJ5Gxbx6446fEao40Eg@mail.gmail.com>
Message-ID: <17BFC490-4CB2-4CE0-B946-2FDD30A34111@stufft.io>

On Mar 13, 2013, at 10:26 AM, PJ Eby <pje at telecommunity.com> wrote:

> On Wed, Mar 13, 2013 at 7:21 AM, holger krekel <holger at merlinux.eu> wrote:
>> Hi all,
>> 
>> after some more discussions and hours spend by Carl Meyer (who is now
>> co-authoring the PEP) and me, here is a new V3 pre-submit draft.
>> It is now more ambitious than the previous draft as should be obvious
>> from the modified abstract (and Carl Meyers and Philip's earlier
>> interactions on this list).  There also are more details of how
>> the current link-scraping works among other improvements and incorporations
>> of feedback from discussions here.
>> 
>> We intend to submit this draft tonight to the PEP editors.
>> 
>> Feedback now and later remains welcome.  I am sure there are issues to
>> be sorted and clarified, among them the versioning-API suggestion by
>> Marc-Andre.
>> 
>> Thanks for everybody's support and feedback so far,
>> holger
> 
> Looks good to me!
> 
> Setuptools' two releases will probably look like this:
> 
> 1. Default to externals index, warn when fetching URLs that are not
> the same host as the index
> 2. Default to externals index, reject URLs that are not the same host
> as the index unless --allow-hosts is configured  (IOW, default
> allow-hosts to equal index-url host)
> 
> That way, external URLs can still be discovered by the user, but the
> default configuration is still secure.
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


For the record I support the PEP and these 2 steps sound ok to me.

My only suggestion is an additional rel attribute for indexes to indicate this is index hosted file incase the index domain and the package host domain differ (as is the case with Crate).

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130313/0188d60d/attachment.pgp>

From tseaver at palladion.com  Wed Mar 13 18:21:45 2013
From: tseaver at palladion.com (Tres Seaver)
Date: Wed, 13 Mar 2013 13:21:45 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <48C1CAC9-C80A-470A-A0FF-500391101918@stufft.io>
References: <CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
	<CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
	<CAK8PqJEkDW9uOfmFDuK2e81uo9mmQLcshk-wzw9=_bUf+p76Zg@mail.gmail.com>
	<20130312195707.GL9677@merlinux.eu> <khqb7b$7eu$1@ger.gmane.org>
	<48C1CAC9-C80A-470A-A0FF-500391101918@stufft.io>
Message-ID: <khqcqu$om3$1@ger.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/13/2013 01:06 PM, Donald Stufft wrote:
> Really now? Let's see I can easily protect against malicous uploads
> by only installing from trusted authors

How do you know who to trust?  What if an author you trust adds a
dependency to a package to an author you have no konwledege of, or one
you actively distrust?  What if an author you trust commits one of the
other changes I outlined (removes a release / distribution, makes
backward-incompatible changes, re-uploads a changed distribution over an
existing one?)

The only way to implement "only install from trusted authors" is to run
your own index, and explicitly review / curate the package set maintained
there.   In that scenario, you run a script from time to time which looks
for new versions of your packages on PyPI and puts them into a queue for
review.

Bob, a casual reviewer, might install the new verison from PyPI into a
fresh virtualenv and test it there before pushing it into the curated index.

Carol, more pranoid^Wsecurity mindex, downloads the package, verifies its
signature, unpacks the tarball, diffs it against the curated version,
compares that diff against the changelog, looks at new / changed
dependencies, and installs it into a hardened sandbox for testing.  Only
after that kind of review does she push the newly-reviewed distribution
into the curated index.

Adding an entirely new package to the curated index is a similar process,
but requires more effort from either Bob or Carol.


Tres.
- -- 
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iEYEARECAAYFAlFAtakACgkQ+gerLs4ltQ5O4wCcC92ew66wVGEPBM/Jr8z1bYU8
e9AAoNXmaiuBHQOIFQlT0SRemI43hoG7
=idDp
-----END PGP SIGNATURE-----


From donald at stufft.io  Wed Mar 13 18:34:45 2013
From: donald at stufft.io (Donald Stufft)
Date: Wed, 13 Mar 2013 13:34:45 -0400
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <khqcqu$om3$1@ger.gmane.org>
References: <CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
	<CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
	<CAK8PqJEkDW9uOfmFDuK2e81uo9mmQLcshk-wzw9=_bUf+p76Zg@mail.gmail.com>
	<20130312195707.GL9677@merlinux.eu> <khqb7b$7eu$1@ger.gmane.org>
	<48C1CAC9-C80A-470A-A0FF-500391101918@stufft.io>
	<khqcqu$om3$1@ger.gmane.org>
Message-ID: <7141E066-8DD0-49FE-BA28-DBCF81F37465@stufft.io>


On Mar 13, 2013, at 1:21 PM, Tres Seaver <tseaver at palladion.com> wrote:

> Signed PGP part
> On 03/13/2013 01:06 PM, Donald Stufft wrote:
> > Really now? Let's see I can easily protect against malicous uploads
> > by only installing from trusted authors
> 
> How do you know who to trust?  What if an author you trust adds a
> dependency to a package to an author you have no konwledege of, or one
> you actively distrust?  What if an author you trust commits one of the
> other changes I outlined (removes a release / distribution, makes
> backward-incompatible changes, re-uploads a changed distribution over an
> existing one?)
> 
> The only way to implement "only install from trusted authors" is to run
> your own index, and explicitly review / curate the package set maintained
> there.   In that scenario, you run a script from time to time which looks
> for new versions of your packages on PyPI and puts them into a queue for
> review.
> 
> Bob, a casual reviewer, might install the new verison from PyPI into a
> fresh virtualenv and test it there before pushing it into the curated index.
> 
> Carol, more pranoid^Wsecurity mindex, downloads the package, verifies its
> signature, unpacks the tarball, diffs it against the curated version,
> compares that diff against the changelog, looks at new / changed
> dependencies, and installs it into a hardened sandbox for testing.  Only
> after that kind of review does she push the newly-reviewed distribution
> into the curated index.
> 
> Adding an entirely new package to the curated index is a similar process,
> but requires more effort from either Bob or Carol.
> 
> 
> Tres.
> - -- 
> ===================================================================
> Tres Seaver          +1 540-429-0999          tseaver at palladion.com
> Palladion Software   "Excellence by Design"    http://palladion.com
> 
> 
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


Threat models are a thing. It the way it *should* work in PyPI is you ask for X, you get X and it was not modified in transit (and ideally not on the repository as well but that is more difficult). PyPI is not and will never be a curated index. However if I trust Author A, then I implicity trust his actions. I trust that he won't do your stated issues. 

Now is a curated index *more secure*? Well again it depends on what your threat model is. PyPI isn't going to protect you from a malicious or incompetent author. For the threat model that PyPI is able to deliver on your system is no more or less secure. In fact without the sort of things you dismiss here your proposal is also just as insecure unless you only ever access it on a protected network which you can be sure no attacker has gained access too.

Even your 3 issues are far less concerning than the fact MiTM on either PyPI (fixed now with pip 1.3) or an external url allows a random guy at PyCon to execute arbitrary code on your machine if you install a package from PyPI at pycon, or at a coffee shop, or on any wifi ever that could have someone else on it.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130313/476cb35c/attachment-0001.pgp>

From robertc at robertcollins.net  Wed Mar 13 18:41:33 2013
From: robertc at robertcollins.net (Robert Collins)
Date: Thu, 14 Mar 2013 06:41:33 +1300
Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at
	pypi site
In-Reply-To: <khqb7b$7eu$1@ger.gmane.org>
References: <CALeMXf5ieHxOsMMomg5f6DkXO0eJi-GMEa7TroX2POr8GVnAsA@mail.gmail.com>
	<CAL0kPAUErZ+9vzB+CSMeB5mXQ0Na9TUdk7qmzyA1LWK0M=qjDw@mail.gmail.com>
	<CALeMXf70Z+pT7yBKnEAmNOPHOGbmc9DzjcBy5aEPhNN=cSUx_A@mail.gmail.com>
	<CAK8PqJFXQkh2s21CJ=9QphQyf36Uqq7TrPx+nFbukq8TY=wqwQ@mail.gmail.com>
	<513F5596.5090302@egenix.com>
	<CAK8PqJH7wQU=OCV3NZgx-JQZy=uhjVvWMe0BdWHrfi_4qzQ1cA@mail.gmail.com>
	<CALeMXf5hSosdcnygor7G_M2qp7hF9rVzPdgyLnFOzv-wgJbGSg@mail.gmail.com>
	<513F718D.4040307@oddbird.net>
	<CALeMXf5xmc+qe+_Mf2p1oN5+6Ouc27qm3LvU+n2VdRqiLkxT5A@mail.gmail.com>
	<CAK8PqJEkDW9uOfmFDuK2e81uo9mmQLcshk-wzw9=_bUf+p76Zg@mail.gmail.com>
	<20130312195707.GL9677@merlinux.eu> <khqb7b$7eu$1@ger.gmane.org>
Message-ID: <CAJ3HoZ2Ew-eRt0PzmisYjA1AsyABRwZYd4oZugERL5N4nnZSiA@mail.gmail.com>

On 14 March 2013 05:54, Tres Seaver <tseaver at palladion.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 03/12/2013 03:57 PM, holger krekel wrote:
>> Nobody should be lead to think that PYPI is a trusted or reviewed
>> source of software even if we got rid of external hosting completely.
>
> Amen.  I still boggle at the amount of "sky is falling" stuff here over
> MITM / external links / whatever, given the potential damaage from
> explicitly malicious uploads (trojans, viruses, whatever).  Package
> signing might help here, but only for consumers who willing to think hard
> enough about the problem to manage a web of trust (frankly, a vanishingly
> small minority).

Well yes HTTPS and external links are problems which it is necessary
to solve, and not sufficient to make 'pypi secure' - but that doesn't
mean we should do a poor job solving them.

-Rob
-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Cloud Services

From dholth at gmail.com  Wed Mar 13 19:15:16 2013
From: dholth at gmail.com (Daniel Holth)
Date: Wed, 13 Mar 2013 14:15:16 -0400
Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF
In-Reply-To: <5140432C.7000904@students.poly.edu>
References: <51401FB3.7000408@students.poly.edu>
	<CADiSq7cHDeG93XfL4gPDxaco1cZKKAq5q70jDMpPeX1tS-vEgQ@mail.gmail.com>
	<5140432C.7000904@students.poly.edu>
Message-ID: <CAG8k2+4eOYo=F_HDhB-u0b-91mHC1i+kgmavHp+xFERaDpUE3w@mail.gmail.com>

On Wed, Mar 13, 2013 at 5:13 AM, Trishank Karthik Kuppusamy
<tk47 at students.poly.edu> wrote:
> Hello Nick,
>
>
> On 3/13/13 4:09 AM, Nick Coghlan wrote:
>>
>>
>> - the PSF board generally stays out of the technical details of
>> running the python.org infrastructure, so it's likely that any root
>> keys would be handled by the PSF infrastructure committee. A (2, 4) or
>> (3, 5) trust configuration would likely be manageable at this level.
>
>
> Understood. We think a higher (t, n) [where t out of n signatures are needed
> to trust the metadata for a role] is better for the root role simply because
> its crucial metadata (the authorized keys for top-level roles) should change
> very rarely.
>
>
>> - at the target delegation level, PyPI supports the registration of
>> new projects through the web service (see
>> http://docs.python.org/2/distutils/packageindex.html). If my
>> understanding of target delegation is correct, this means the "simple"
>> and "packages/source/<letter>" delegations will need to be (1, 1) and
>> online.
>> - higher levels of the target delegation hierarchy could conceivably
>> be kept offline, but there seems little value in doing so if they're
>> trusting on online (1, 1) key
>
>
> Fortunately, the "targets/simple" and "targets/packages/(version)/(letter)/"
> roles should not require (1, 1) online keys, as their metadata (simply
> target delegations and no actual target files) should also fluctuate fairly
> rarely. I should make this clearer in our design document.
>
>
>> - many PyPI packages are maintained by single developers, so (1, 1) or
>> (1, n) is likely to be the only generally feasible level of signing at
>> the project level.
>
>
> Yes, the package developers themselves could choose any (t, n) they like. In
> our design, we propose that PyPI could eventually delegate to "stable"
> packages which need little change (and use more security with more offline
> keys) and to "unstable" packages which need frequent change (and use less
> security with more online keys).
>
>
>> With the current focus being on getting an improvement from the status
>> quo that we can successfully deploy in a reasonable period of time,
>> the target delegation side of things probably needs to be
>> substantially simpler in the initial iteration. Yes, it leaves us open
>> to certain vulnerabilities we would like to remove in the long run,
>> but we need to be very cautious in the additional demands we place on
>> the users uploading to PyPI. It may even mean the initial iteration
>> allows projects to rely on a PyPI provided signing key for their TUF
>> metadata, using the existing upload mechanisms to add the files to
>> PyPI.
>
>
> I agree that there is a delicate problem of balancing security with
> usability here, especially in the beginning.
>
> You raised a very good issue there: on first migration, how would PyPI
> accommodate packages which have not had their target files delegated to
> their developers? We imagine that in this case, PyPI could assume initial
> responsibility for these packages, and later PyPI would delegate those
> packages to their respective developers.
>
> Thanks for your input,
> Trishank

With all the different kinds of metadata, It's interesting to note
that currently TUF seems to only be concerned with the available file
names and their integrity. (Some of us will think of PEP 426
"PKG-INFO" first when we hear the word metadata.)

It looks like the D metadata lists all the filenames for Django, and
then Django lists them again with hashes and signatures. Why all the
lists? Does every Django release re-assert all the versions of Django
that are available on the index?

How might I deal with producing the official source distribution
myself and having a friend produce the official Windows build of a
package?

As an aside PyPI has been doubling in size every 1.5 - 2 years.

Thanks

Daniel Holth

From jcappos at poly.edu  Wed Mar 13 19:29:49 2013
From: jcappos at poly.edu (Justin Cappos)
Date: Wed, 13 Mar 2013 14:29:49 -0400
Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF
In-Reply-To: <CAG8k2+4eOYo=F_HDhB-u0b-91mHC1i+kgmavHp+xFERaDpUE3w@mail.gmail.com>
References: <51401FB3.7000408@students.poly.edu>
	<CADiSq7cHDeG93XfL4gPDxaco1cZKKAq5q70jDMpPeX1tS-vEgQ@mail.gmail.com>
	<5140432C.7000904@students.poly.edu>
	<CAG8k2+4eOYo=F_HDhB-u0b-91mHC1i+kgmavHp+xFERaDpUE3w@mail.gmail.com>
Message-ID: <CAMVss_rcd+9kS2NizSi9d2A8V_faRBDJv=OPRk+DKj2-cjoj-w@mail.gmail.com>

We may have something unclear in the doc.   We definitely don't just worry
about package names.

(In between meetings, will send a longer response in a bit.)

Thanks,
Justin


On Wed, Mar 13, 2013 at 2:15 PM, Daniel Holth <dholth at gmail.com> wrote:

> On Wed, Mar 13, 2013 at 5:13 AM, Trishank Karthik Kuppusamy
> <tk47 at students.poly.edu> wrote:
> > Hello Nick,
> >
> >
> > On 3/13/13 4:09 AM, Nick Coghlan wrote:
> >>
> >>
> >> - the PSF board generally stays out of the technical details of
> >> running the python.org infrastructure, so it's likely that any root
> >> keys would be handled by the PSF infrastructure committee. A (2, 4) or
> >> (3, 5) trust configuration would likely be manageable at this level.
> >
> >
> > Understood. We think a higher (t, n) [where t out of n signatures are
> needed
> > to trust the metadata for a role] is better for the root role simply
> because
> > its crucial metadata (the authorized keys for top-level roles) should
> change
> > very rarely.
> >
> >
> >> - at the target delegation level, PyPI supports the registration of
> >> new projects through the web service (see
> >> http://docs.python.org/2/distutils/packageindex.html). If my
> >> understanding of target delegation is correct, this means the "simple"
> >> and "packages/source/<letter>" delegations will need to be (1, 1) and
> >> online.
> >> - higher levels of the target delegation hierarchy could conceivably
> >> be kept offline, but there seems little value in doing so if they're
> >> trusting on online (1, 1) key
> >
> >
> > Fortunately, the "targets/simple" and
> "targets/packages/(version)/(letter)/"
> > roles should not require (1, 1) online keys, as their metadata (simply
> > target delegations and no actual target files) should also fluctuate
> fairly
> > rarely. I should make this clearer in our design document.
> >
> >
> >> - many PyPI packages are maintained by single developers, so (1, 1) or
> >> (1, n) is likely to be the only generally feasible level of signing at
> >> the project level.
> >
> >
> > Yes, the package developers themselves could choose any (t, n) they
> like. In
> > our design, we propose that PyPI could eventually delegate to "stable"
> > packages which need little change (and use more security with more
> offline
> > keys) and to "unstable" packages which need frequent change (and use less
> > security with more online keys).
> >
> >
> >> With the current focus being on getting an improvement from the status
> >> quo that we can successfully deploy in a reasonable period of time,
> >> the target delegation side of things probably needs to be
> >> substantially simpler in the initial iteration. Yes, it leaves us open
> >> to certain vulnerabilities we would like to remove in the long run,
> >> but we need to be very cautious in the additional demands we place on
> >> the users uploading to PyPI. It may even mean the initial iteration
> >> allows projects to rely on a PyPI provided signing key for their TUF
> >> metadata, using the existing upload mechanisms to add the files to
> >> PyPI.
> >
> >
> > I agree that there is a delicate problem of balancing security with
> > usability here, especially in the beginning.
> >
> > You raised a very good issue there: on first migration, how would PyPI
> > accommodate packages which have not had their target files delegated to
> > their developers? We imagine that in this case, PyPI could assume initial
> > responsibility for these packages, and later PyPI would delegate those
> > packages to their respective developers.
> >
> > Thanks for your input,
> > Trishank
>
> With all the different kinds of metadata, It's interesting to note
> that currently TUF seems to only be concerned with the available file
> names and their integrity. (Some of us will think of PEP 426
> "PKG-INFO" first when we hear the word metadata.)
>
> It looks like the D metadata lists all the filenames for Django, and
> then Django lists them again with hashes and signatures. Why all the
> lists? Does every Django release re-assert all the versions of Django
> that are available on the index?
>
> How might I deal with producing the official source distribution
> myself and having a friend produce the official Windows build of a
> package?
>
> As an aside PyPI has been doubling in size every 1.5 - 2 years.
>
> Thanks
>
> Daniel Holth
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130313/be06c065/attachment-0001.html>

From mal at egenix.com  Wed Mar 13 19:57:58 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 13 Mar 2013 19:57:58 +0100
Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of
 release files
In-Reply-To: <20130313112158.GO9677@merlinux.eu>
References: <20130313112158.GO9677@merlinux.eu>
Message-ID: <5140CC36.10807@egenix.com>

On 13.03.2013 12:21, holger krekel wrote:
> Hi all,
> 
> after some more discussions and hours spend by Carl Meyer (who is now
> co-authoring the PEP) and me, here is a new V3 pre-submit draft.  
> It is now more ambitious than the previous draft as should be obvious
> from the modified abstract (and Carl Meyers and Philip's earlier
> interactions on this list).  There also are more details of how
> the current link-scraping works among other improvements and incorporations
> of feedback from discussions here.
> 
> We intend to submit this draft tonight to the PEP editors.  
> 
> Feedback now and later remains welcome.  I am sure there are issues to 
> be sorted and clarified, among them the versioning-API suggestion by 
> Marc-Andre.
> 
> Thanks for everybody's support and feedback so far,
> holger
> 
> 
> PEP: XXX
> Title: Transitioning to release-file hosting on PyPI
> Version: $Revision$
> Last-Modified: $Date$
> Author: Holger Krekel <holger at merlinux.eu>, Carl Meyer <carl at oddbird.net>
> Discussions-To: catalog-sig at python.org
> Status: Draft (PRE-submit V3)
> Type: Process
> Content-Type: text/x-rst
> Created: 10-Mar-2013
> Post-History:
> 
> 
> Abstract
> ========
> 
> This PEP proposes a backward-compatible two-phase transition process to speed
> up, simplify and robustify installing from the pypi.python.org (PyPI)
> package index.  To ease the transition and minimize client-side
> friction, **no changes to distutils or existing installation tools are
> required in order to benefit from the transition phases, which is to
> result in faster, more reliable installs for most existing packages**.
> 
> The first transition phase implements easy and explicit means for
> a package maintainter to control which release file links are 
> served to present-day installation tools.  The first phase also
> includes the implementation of analysis tools for present-day packages,
> to support communication with package maintainers and the automated
> setting of default modes for controling release file links.   
> 
> The second transition phase will result in the current PYPI index 
> to only serve PYPI-hosted files by default.  Externally hosted files
> will still be automatically discoverable through a second index. 
> Present-day installation tools will be able to continue working
> by specifying this second index.  New versions of installation
> tools shall default to only install packages from PYPI unless
> the user explicitely wishes to include non-PYPI sites.

I must say, don't like this change in motivation compared
to V1 and V2.

The original of the discussion was to make PyPI more secure
and the installation process faster and more reliable
by moving away from crawling arbitrary external web pages.

Both can be had by:

* limiting the crawling to package author defined specific
  URLs, with added hashes to make sure that the URLs and
  their target content is not modified (this is the securing
  external downloads part - see here for an example:
  https://pypi.python.org/pypi/egenix-pyopenssl/0.13.1.1.0.1.5),
  and

* adding a way for the package authors to say "PyPI, please go
  ahead and cache/copy my distributions files" (this is the
  increase download reliability part - can be had by doing
  opt-in CDN caching/proxying of external links via PyPI)

Now, with V3 of the proposal, you are moving towards a system
that basically says "do it this way, or stay out of our eco
system", which, in my book, is not what the Python eco system
is all about.

Your V2 was much more inviting in this respect.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 13 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From jcappos at poly.edu  Wed Mar 13 19:58:31 2013
From: jcappos at poly.edu (Justin Cappos)
Date: Wed, 13 Mar 2013 14:58:31 -0400
Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF
In-Reply-To: <CAG8k2+4eOYo=F_HDhB-u0b-91mHC1i+kgmavHp+xFERaDpUE3w@mail.gmail.com>
References: <51401FB3.7000408@students.poly.edu>
	<CADiSq7cHDeG93XfL4gPDxaco1cZKKAq5q70jDMpPeX1tS-vEgQ@mail.gmail.com>
	<5140432C.7000904@students.poly.edu>
	<CAG8k2+4eOYo=F_HDhB-u0b-91mHC1i+kgmavHp+xFERaDpUE3w@mail.gmail.com>
Message-ID: <CAMVss_rqfKx-qnRABCoYTWt-s=cy8g0XYKgcM_DG5YQGQmM5Pg@mail.gmail.com>

We use the simple directory and filenames because that is what pip uses.

You have a nice suggestion to include other metadata in the TUF metadata.
We certainly could do this if desirable.   This required a redesign of the
PyPI API and we weren't sure if this was wanted.   Our current doc /
prototype is trying to minimize the changes needed all around.

Thanks,
Justin


On Wed, Mar 13, 2013 at 2:15 PM, Daniel Holth <dholth at gmail.com> wrote:

> On Wed, Mar 13, 2013 at 5:13 AM, Trishank Karthik Kuppusamy
> <tk47 at students.poly.edu> wrote:
> > Hello Nick,
> >
> >
> > On 3/13/13 4:09 AM, Nick Coghlan wrote:
> >>
> >>
> >> - the PSF board generally stays out of the technical details of
> >> running the python.org infrastructure, so it's likely that any root
> >> keys would be handled by the PSF infrastructure committee. A (2, 4) or
> >> (3, 5) trust configuration would likely be manageable at this level.
> >
> >
> > Understood. We think a higher (t, n) [where t out of n signatures are
> needed
> > to trust the metadata for a role] is better for the root role simply
> because
> > its crucial metadata (the authorized keys for top-level roles) should
> change
> > very rarely.
> >
> >
> >> - at the target delegation level, PyPI supports the registration of
> >> new projects through the web service (see
> >> http://docs.python.org/2/distutils/packageindex.html). If my
> >> understanding of target delegation is correct, this means the "simple"
> >> and "packages/source/<letter>" delegations will need to be (1, 1) and
> >> online.
> >> - higher levels of the target delegation hierarchy could conceivably
> >> be kept offline, but there seems little value in doing so if they're
> >> trusting on online (1, 1) key
> >
> >
> > Fortunately, the "targets/simple" and
> "targets/packages/(version)/(letter)/"
> > roles should not require (1, 1) online keys, as their metadata (simply
> > target delegations and no actual target files) should also fluctuate
> fairly
> > rarely. I should make this clearer in our design document.
> >
> >
> >> - many PyPI packages are maintained by single developers, so (1, 1) or
> >> (1, n) is likely to be the only generally feasible level of signing at
> >> the project level.
> >
> >
> > Yes, the package developers themselves could choose any (t, n) they
> like. In
> > our design, we propose that PyPI could eventually delegate to "stable"
> > packages which need little change (and use more security with more
> offline
> > keys) and to "unstable" packages which need frequent change (and use less
> > security with more online keys).
> >
> >
> >> With the current focus being on getting an improvement from the status
> >> quo that we can successfully deploy in a reasonable period of time,
> >> the target delegation side of things probably needs to be
> >> substantially simpler in the initial iteration. Yes, it leaves us open
> >> to certain vulnerabilities we would like to remove in the long run,
> >> but we need to be very cautious in the additional demands we place on
> >> the users uploading to PyPI. It may even mean the initial iteration
> >> allows projects to rely on a PyPI provided signing key for their TUF
> >> metadata, using the existing upload mechanisms to add the files to
> >> PyPI.
> >
> >
> > I agree that there is a delicate problem of balancing security with
> > usability here, especially in the beginning.
> >
> > You raised a very good issue there: on first migration, how would PyPI
> > accommodate packages which have not had their target files delegated to
> > their developers? We imagine that in this case, PyPI could assume initial
> > responsibility for these packages, and later PyPI would delegate those
> > packages to their respective developers.
> >
> > Thanks for your input,
> > Trishank
>
> With all the different kinds of metadata, It's interesting to note
> that currently TUF seems to only be concerned with the available file
> names and their integrity. (Some of us will think of PEP 426
> "PKG-INFO" first when we hear the word metadata.)
>
> It looks like the D metadata lists all the filenames for Django, and
> then Django lists them again with hashes and signatures. Why all the
> lists? Does every Django release re-assert all the versions of Django
> that are available on the index?
>
> How might I deal with producing the official source distribution
> myself and having a friend produce the official Windows build of a
> package?
>
> As an aside PyPI has been doubling in size every 1.5 - 2 years.
>
> Thanks
>
> Daniel Holth
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130313/5fb401bc/attachment.html>

From donald at stufft.io  Wed Mar 13 20:08:32 2013
From: donald at stufft.io (Donald Stufft)
Date: Wed, 13 Mar 2013 15:08:32 -0400
Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of
	release files
In-Reply-To: <5140CC36.10807@egenix.com>
References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com>
Message-ID: <8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io>


On Mar 13, 2013, at 2:57 PM, "M.-A. Lemburg" <mal at egenix.com> wrote:

> On 13.03.2013 12:21, holger krekel wrote:
>> Hi all,
>> 
>> after some more discussions and hours spend by Carl Meyer (who is now
>> co-authoring the PEP) and me, here is a new V3 pre-submit draft.  
>> It is now more ambitious than the previous draft as should be obvious
>> from the modified abstract (and Carl Meyers and Philip's earlier
>> interactions on this list).  There also are more details of how
>> the current link-scraping works among other improvements and incorporations
>> of feedback from discussions here.
>> 
>> We intend to submit this draft tonight to the PEP editors.  
>> 
>> Feedback now and later remains welcome.  I am sure there are issues to 
>> be sorted and clarified, among them the versioning-API suggestion by 
>> Marc-Andre.
>> 
>> Thanks for everybody's support and feedback so far,
>> holger
>> 
>> 
>> PEP: XXX
>> Title: Transitioning to release-file hosting on PyPI
>> Version: $Revision$
>> Last-Modified: $Date$
>> Author: Holger Krekel <holger at merlinux.eu>, Carl Meyer <carl at oddbird.net>
>> Discussions-To: catalog-sig at python.org
>> Status: Draft (PRE-submit V3)
>> Type: Process
>> Content-Type: text/x-rst
>> Created: 10-Mar-2013
>> Post-History:
>> 
>> 
>> Abstract
>> ========
>> 
>> This PEP proposes a backward-compatible two-phase transition process to speed
>> up, simplify and robustify installing from the pypi.python.org (PyPI)
>> package index.  To ease the transition and minimize client-side
>> friction, **no changes to distutils or existing installation tools are
>> required in order to benefit from the transition phases, which is to
>> result in faster, more reliable installs for most existing packages**.
>> 
>> The first transition phase implements easy and explicit means for
>> a package maintainter to control which release file links are 
>> served to present-day installation tools.  The first phase also
>> includes the implementation of analysis tools for present-day packages,
>> to support communication with package maintainers and the automated
>> setting of default modes for controling release file links.   
>> 
>> The second transition phase will result in the current PYPI index 
>> to only serve PYPI-hosted files by default.  Externally hosted files
>> will still be automatically discoverable through a second index. 
>> Present-day installation tools will be able to continue working
>> by specifying this second index.  New versions of installation
>> tools shall default to only install packages from PYPI unless
>> the user explicitely wishes to include non-PYPI sites.
> 
> I must say, don't like this change in motivation compared
> to V1 and V2.
> 
> The original of the discussion was to make PyPI more secure
> and the installation process faster and more reliable
> by moving away from crawling arbitrary external web pages.
> 
> Both can be had by:
> 
> * limiting the crawling to package author defined specific
>  URLs, with added hashes to make sure that the URLs and
>  their target content is not modified (this is the securing
>  external downloads part - see here for an example:
>  https://pypi.python.org/pypi/egenix-pyopenssl/0.13.1.1.0.1.5),
>  and
> 
> * adding a way for the package authors to say "PyPI, please go
>  ahead and cache/copy my distributions files" (this is the
>  increase download reliability part - can be had by doing
>  opt-in CDN caching/proxying of external links via PyPI)
> 
> Now, with V3 of the proposal, you are moving towards a system
> that basically says "do it this way, or stay out of our eco
> system", which, in my book, is not what the Python eco system
> is all about.
> 

I don't see how? The -with-externals index will still contain all the existing links, and indeed PJ Elby has already stated that setuptools will move to support this index by default but with proper warnings to people so they know they are installing a package off site.

This allows existing tools to be moved to a secure by default position. Allows future tools to choose if they want to enable the existing behavior through use of -with-externals (hopefully with a warning or opt-in sort of thing like laid out by PJE, but it's certainly not required). And even allows users of existing tools to opt into the old behavior via the -i option.

Maybe i'm missing it but in what way does this force authors to "do it this way or stay out of our eco system" since all the same options are available as there are today?

> Your V2 was much more inviting in this respect.
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 13 2013)
>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130313/a2911175/attachment-0001.pgp>

From mal at egenix.com  Wed Mar 13 20:33:36 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 13 Mar 2013 20:33:36 +0100
Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of
 release files
In-Reply-To: <8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io>
References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com>
	<8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io>
Message-ID: <5140D490.3040401@egenix.com>

On 13.03.2013 20:08, Donald Stufft wrote:
> 
> On Mar 13, 2013, at 2:57 PM, "M.-A. Lemburg" <mal at egenix.com> wrote:
> 
>> On 13.03.2013 12:21, holger krekel wrote:
>>> [V3 proposal]
>>
>> I must say, don't like this change in motivation compared
>> to V1 and V2.
>>
>> The original of the discussion was to make PyPI more secure
>> and the installation process faster and more reliable
>> by moving away from crawling arbitrary external web pages.
>>
>> Both can be had by:
>>
>> * limiting the crawling to package author defined specific
>>  URLs, with added hashes to make sure that the URLs and
>>  their target content is not modified (this is the securing
>>  external downloads part - see here for an example:
>>  https://pypi.python.org/pypi/egenix-pyopenssl/0.13.1.1.0.1.5),
>>  and
>>
>> * adding a way for the package authors to say "PyPI, please go
>>  ahead and cache/copy my distributions files" (this is the
>>  increase download reliability part - can be had by doing
>>  opt-in CDN caching/proxying of external links via PyPI)
>>
>> Now, with V3 of the proposal, you are moving towards a system
>> that basically says "do it this way, or stay out of our eco
>> system", which, in my book, is not what the Python eco system
>> is all about.
>>
> 
> I don't see how? The -with-externals index will still contain all the existing links, and indeed PJ Elby has already stated that setuptools will move to support this index by default but with proper warnings to people so they know they are installing a package off site.

> This allows existing tools to be moved to a secure by default position. Allows future tools to choose if they want to enable the existing behavior through use of -with-externals (hopefully with a warning or opt-in sort of thing like laid out by PJE, but it's certainly not required). And even allows users of existing tools to opt into the old behavior via the -i option.
> 
> Maybe i'm missing it but in what way does this force authors to "do it this way or stay out of our eco system" since all the same options are available as there are today?

The proposal marks all external links as evil, and instead of
making external links more secure, the user is left with the option
to either not enable external links at all, or to let the
"devil" in :-)

That's not nice. It's also security theater.

The real problem is unreviewed code getting executed by users,
or worse, automated build systems. Yet, we let users believe
that everything is secured on PyPI.

Taking an extreme position, it would probably be better just
leave everything as it is and instead educate users about the
risk they are taking with a "pip install AngryBirds", signed
with keys issued by the PSF on the official PyPI server,
delivered straight to your drive via the latest in crypto
technology, only to wipe your notebook...

But then, I don't like extreme positions, so would rather
like to incrementally improve the situation both from the
server and the client side, both addressing user and author
concerns, and keeping the Python eco system a friendly place
to be.

>> Your V2 was much more inviting in this respect.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 13 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From dholth at gmail.com  Wed Mar 13 20:40:08 2013
From: dholth at gmail.com (Daniel Holth)
Date: Wed, 13 Mar 2013 15:40:08 -0400
Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of
 release files
In-Reply-To: <5140D490.3040401@egenix.com>
References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com>
	<8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io>
	<5140D490.3040401@egenix.com>
Message-ID: <CAG8k2+5PbTrF2bXXLfea8K1arQ4hYucmJZF5bpFHB--23N+epw@mail.gmail.com>

On Wed, Mar 13, 2013 at 3:33 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 13.03.2013 20:08, Donald Stufft wrote:
>>
>> On Mar 13, 2013, at 2:57 PM, "M.-A. Lemburg" <mal at egenix.com> wrote:
>>
>>> On 13.03.2013 12:21, holger krekel wrote:
>>>> [V3 proposal]
>>>
>>> I must say, don't like this change in motivation compared
>>> to V1 and V2.
>>>
>>> The original of the discussion was to make PyPI more secure
>>> and the installation process faster and more reliable
>>> by moving away from crawling arbitrary external web pages.
>>>
>>> Both can be had by:
>>>
>>> * limiting the crawling to package author defined specific
>>>  URLs, with added hashes to make sure that the URLs and
>>>  their target content is not modified (this is the securing
>>>  external downloads part - see here for an example:
>>>  https://pypi.python.org/pypi/egenix-pyopenssl/0.13.1.1.0.1.5),
>>>  and
>>>
>>> * adding a way for the package authors to say "PyPI, please go
>>>  ahead and cache/copy my distributions files" (this is the
>>>  increase download reliability part - can be had by doing
>>>  opt-in CDN caching/proxying of external links via PyPI)
>>>
>>> Now, with V3 of the proposal, you are moving towards a system
>>> that basically says "do it this way, or stay out of our eco
>>> system", which, in my book, is not what the Python eco system
>>> is all about.
>>>
>>
>> I don't see how? The -with-externals index will still contain all the existing links, and indeed PJ Elby has already stated that setuptools will move to support this index by default but with proper warnings to people so they know they are installing a package off site.
>
>> This allows existing tools to be moved to a secure by default position. Allows future tools to choose if they want to enable the existing behavior through use of -with-externals (hopefully with a warning or opt-in sort of thing like laid out by PJE, but it's certainly not required). And even allows users of existing tools to opt into the old behavior via the -i option.
>>
>> Maybe i'm missing it but in what way does this force authors to "do it this way or stay out of our eco system" since all the same options are available as there are today?
>
> The proposal marks all external links as evil, and instead of
> making external links more secure, the user is left with the option
> to either not enable external links at all, or to let the
> "devil" in :-)
>
> That's not nice. It's also security theater.
>
> The real problem is unreviewed code getting executed by users,
> or worse, automated build systems. Yet, we let users believe
> that everything is secured on PyPI.
>
> Taking an extreme position, it would probably be better just
> leave everything as it is and instead educate users about the
> risk they are taking with a "pip install AngryBirds", signed
> with keys issued by the PSF on the official PyPI server,
> delivered straight to your drive via the latest in crypto
> technology, only to wipe your notebook...
>
> But then, I don't like extreme positions, so would rather
> like to incrementally improve the situation both from the
> server and the client side, both addressing user and author
> concerns, and keeping the Python eco system a friendly place
> to be.
>
>>> Your V2 was much more inviting in this respect.

Perhaps it would be better to decide whether it is "reliability
theater" and concentrate on consistency rather than whether the code
actually does what you want. It is nice to have a system that at least
prevents targeted third party bad-package attacks.

From donald at stufft.io  Wed Mar 13 20:46:37 2013
From: donald at stufft.io (Donald Stufft)
Date: Wed, 13 Mar 2013 15:46:37 -0400
Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of
	release files
In-Reply-To: <5140D490.3040401@egenix.com>
References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com>
	<8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io>
	<5140D490.3040401@egenix.com>
Message-ID: <8507A01C-D6C6-49C2-82D7-C3B48EDF16FF@stufft.io>


On Mar 13, 2013, at 3:33 PM, "M.-A. Lemburg" <mal at egenix.com> wrote:

> On 13.03.2013 20:08, Donald Stufft wrote:
>> 
>> On Mar 13, 2013, at 2:57 PM, "M.-A. Lemburg" <mal at egenix.com> wrote:
>> 
>>> On 13.03.2013 12:21, holger krekel wrote:
>>>> [V3 proposal]
>>> 
>>> I must say, don't like this change in motivation compared
>>> to V1 and V2.
>>> 
>>> The original of the discussion was to make PyPI more secure
>>> and the installation process faster and more reliable
>>> by moving away from crawling arbitrary external web pages.
>>> 
>>> Both can be had by:
>>> 
>>> * limiting the crawling to package author defined specific
>>> URLs, with added hashes to make sure that the URLs and
>>> their target content is not modified (this is the securing
>>> external downloads part - see here for an example:
>>> https://pypi.python.org/pypi/egenix-pyopenssl/0.13.1.1.0.1.5),
>>> and
>>> 
>>> * adding a way for the package authors to say "PyPI, please go
>>> ahead and cache/copy my distributions files" (this is the
>>> increase download reliability part - can be had by doing
>>> opt-in CDN caching/proxying of external links via PyPI)
>>> 
>>> Now, with V3 of the proposal, you are moving towards a system
>>> that basically says "do it this way, or stay out of our eco
>>> system", which, in my book, is not what the Python eco system
>>> is all about.
>>> 
>> 
>> I don't see how? The -with-externals index will still contain all the existing links, and indeed PJ Elby has already stated that setuptools will move to support this index by default but with proper warnings to people so they know they are installing a package off site.
> 
>> This allows existing tools to be moved to a secure by default position. Allows future tools to choose if they want to enable the existing behavior through use of -with-externals (hopefully with a warning or opt-in sort of thing like laid out by PJE, but it's certainly not required). And even allows users of existing tools to opt into the old behavior via the -i option.
>> 
>> Maybe i'm missing it but in what way does this force authors to "do it this way or stay out of our eco system" since all the same options are available as there are today?
> 
> The proposal marks all external links as evil, and instead of
> making external links more secure, the user is left with the option
> to either not enable external links at all, or to let the
> "devil" in :-)

It doesn't mark them as evil, it marks them as requiring users to opt into them. Authors are free to not publish their packages directly to PyPI and users are free to opt in to installing the external urls that the authors haven chosen to publish. Further more it gives package authors complete control over what urls appear on their simple index page.

ISTM that this is even friendlier than before because now both sides have explicitly decided to use those urls, instead of it being completely implicit on one said, and partially implicit on the other.

> 
> That's not nice. It's also security theater.

It's not security theater, it moves the defaults to more secure. Further work can (and will be) to ensure that for those users and authors who opt into the external urls it's still secure while again requiring both sides to explicitly opt into it.

> 
> The real problem is unreviewed code getting executed by users,
> or worse, automated build systems. Yet, we let users believe
> that everything is secured on PyPI.

"We"? I' don't think anyones ever said that *everything is secured on pypi*. The best the PyPI infrastructure and tooling can do (security wise) is to try and make as sure as possible then when you ask for foo==X.Y PyPI currently can't make that claim for external links.

On top of that many users (and i'd wager most users) are not aware that when they install something it reaches outwardly to other hosts. This proposal makes it so they *are* aware so they opt into potentially lowering their downtime and they opt into exposing details to external hosts (which may or may not be SSL secured).

> 
> Taking an extreme position, it would probably be better just
> leave everything as it is and instead educate users about the
> risk they are taking with a "pip install AngryBirds", signed
> with keys issued by the PSF on the official PyPI server,
> delivered straight to your drive via the latest in crypto
> technology, only to wipe your notebook...
> 
> But then, I don't like extreme positions, so would rather
> like to incrementally improve the situation both from the
> server and the client side, both addressing user and author
> concerns, and keeping the Python eco system a friendly place
> to be.
> 
>>> Your V2 was much more inviting in this respect.

This gives _all_ the abilities of the current system (besides spidering random urls) with *more* control given to the authors as to what exists on their various index pages. This is a net win for everyone involved. The only "loss" is that projects that choose to host externally to PyPI will have people trying to install it told to explicitly allow it (as mentioned by PJ Elby).

> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 13 2013)
>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130313/d5ad5581/attachment-0001.pgp>

From tk47 at students.poly.edu  Thu Mar 14 01:11:04 2013
From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy)
Date: Wed, 13 Mar 2013 20:11:04 -0400
Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF
In-Reply-To: <CAG8k2+4eOYo=F_HDhB-u0b-91mHC1i+kgmavHp+xFERaDpUE3w@mail.gmail.com>
References: <51401FB3.7000408@students.poly.edu>
	<CADiSq7cHDeG93XfL4gPDxaco1cZKKAq5q70jDMpPeX1tS-vEgQ@mail.gmail.com>
	<5140432C.7000904@students.poly.edu>
	<CAG8k2+4eOYo=F_HDhB-u0b-91mHC1i+kgmavHp+xFERaDpUE3w@mail.gmail.com>
Message-ID: <51411598.1010100@students.poly.edu>

On 03/13/2013 02:15 PM, Daniel Holth wrote:
>
> With all the different kinds of metadata, It's interesting to note
> that currently TUF seems to only be concerned with the available file
> names and their integrity. (Some of us will think of PEP 426
> "PKG-INFO" first when we hear the word metadata.)

Yes, you are right that the many different kinds of metadata in this 
discussion (TUF metadata, PyPI metadata) makes things a little confusing 
sometimes! :))

My understanding of PEP 426 is that the distribution metadata is 
specified by the developer with the setup.py script.

To take the running Django example, since the Django developers will 
sign everything under the Django role with their own keys that the D 
role will talk about, setup.py, as well as the generated "PKG-INFO", 
will be signed by the Django developers. This means that pip + TUF will 
be able to verify these distribution metadata indirectly via the source 
distribution package.

Does this answer your question?

> It looks like the D metadata lists all the filenames for Django, and
> then Django lists them again with hashes and signatures. Why all the
> lists? Does every Django release re-assert all the versions of Django
> that are available on the index?

Good observation. For D, you are talking about the "paths" attribute here:

https://updateframework.com/pypi/repository/metadata/targets/packages/source/D.txt

For Django, you are talking about the "targets" attribute here:

https://updateframework.com/pypi/repository/metadata/targets/packages/source/D/Django.txt

Why is "paths" in D listing all the "targets" that Django already talks 
about? Presently, this is because our target delegation tool 
(signercli.py) is being paranoid and making sure that D is explicitly 
delegating only targets matching these "paths".

However, the TUF specification allows for D to simply say, "I delegate 
any target whatsoever under Django", by settings "paths" to 
"packages/source/D/Django/**":

https://www.updateframework.com/browser/specs/tuf-spec.txt#L525

> How might I deal with producing the official source distribution
> myself and having a friend produce the official Windows build of a
> package?

There are a few solutions. You could have your friend produce the 
official Windows build for a package, and then you could sign it, 
implicitly trusting your friend but not publishing that trust.

A more secure solution would have you delegate that target to your friend.

> As an aside PyPI has been doubling in size every 1.5 - 2 years.

Exponential growth strikes again! We have anticipated this, and we have 
a few solutions to curb the growth of TUF metadata. Since TUF metadata 
is simply text, GZIP compression would go a long way. Alternatively, we 
could implement delta updates of TUF metadata.

The more difficult problem is how to ensure that target delegation 
structure scales with PyPI growth. A good design will keep this in mind 
and plan accordingly.

Speaking of which, it may be the case that our design document for 
integrating PyPI with TUF may not be terribly easy to understand. (After 
all, you do need to understand TUF first, but TUF is fairly easy once 
you understand its main ideas.) I plan to publish a friendlier document 
which introduce TUF at a very high-level and instead discuss more 
pragmatic issues (such as workflows).


From jcappos at poly.edu  Thu Mar 14 01:15:03 2013
From: jcappos at poly.edu (Justin Cappos)
Date: Wed, 13 Mar 2013 20:15:03 -0400
Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF
In-Reply-To: <51411598.1010100@students.poly.edu>
References: <51401FB3.7000408@students.poly.edu>
	<CADiSq7cHDeG93XfL4gPDxaco1cZKKAq5q70jDMpPeX1tS-vEgQ@mail.gmail.com>
	<5140432C.7000904@students.poly.edu>
	<CAG8k2+4eOYo=F_HDhB-u0b-91mHC1i+kgmavHp+xFERaDpUE3w@mail.gmail.com>
	<51411598.1010100@students.poly.edu>
Message-ID: <CAMVss_rHjKVEU4JnTT2H5Nksbn_xF+oud9wZq2TVkhyGp_c5mw@mail.gmail.com>

> Speaking of which, it may be the case that our design document for
> integrating PyPI with TUF may not be terribly easy to understand. (After
> all, you do need to understand TUF first, but TUF is fairly easy once you
> understand its main ideas.) I plan to publish a friendlier document which
> introduce TUF at a very high-level and instead discuss more pragmatic
> issues (such as workflows).
>
>
Feel free to chime in if you'd rather see something else or want us to
focus on clarifying a specific topic.

Thanks,
Justin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130313/923aeeba/attachment.html>

From carl at oddbird.net  Thu Mar 14 01:16:30 2013
From: carl at oddbird.net (Carl Meyer)
Date: Wed, 13 Mar 2013 18:16:30 -0600
Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of
 release files
In-Reply-To: <5140D490.3040401@egenix.com>
References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com>
	<8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io>
	<5140D490.3040401@egenix.com>
Message-ID: <514116DE.50907@oddbird.net>

On 03/13/2013 01:33 PM, M.-A. Lemburg wrote:
> The proposal marks all external links as evil, 

I'm sorry the text of the PEP gave you that impression. I can see how
you'd have gotten it from some of the comments here on catalog-sig, but
we went to some lengths to avoid it in the PEP text, and plan to further
revise the text to try harder to avoid that implication.

In the proposed PEP, we are attempting to balance two things that I
believe to be true:

1) There are good and valid reasons for some package owners to prefer
external hosting, and it is good for automated installers to easily be
able to install such packages (on user request).

2) Installing non-PyPI-hosted packages should not be the *default*
behavior of installer tools, for many reasons, among them because that
is unusual and surprising behavior to many newcomers to the Python
ecosystem, and often leads to concerns on their part about the stability
of the ecosystem.

These are the axioms, if you will, of this proposal, and while I'd guess
many people in this discussion are at least slightly uncomfortable with
one or the other of them, I think accepting both is the most likely path
to a compromise everyone can live with.

I think we can find a solution that embraces both these axioms and
maintains good backwards-compatibility and usability. Holger and I had a
long talk this evening about that, and here are some of our thoughts:

A) You mentioned opt-in PyPI caching of externally-hosted files as a
means to improve reliability. We basically agree, but implementing this
on the PyPI side adds complexity to the PyPI implementation that we are
hesitant to propose. Rather, we propose that this is better handled by a
client-side tool that you point at a PyPI release with externally-hosted
files, and it simply copies those release files onto PyPI. This has
essentially the same effect. We envision this being a simple enough tool
that it could reasonably be run for every release of a project in an
ongoing way, not just as a one-time project-wide migration. We plan to
change the line in the PEP that says the existence of this tool is NOT
REQUIRED to begin the phase 2 transition to instead say that the
existence of this tool IS REQUIRED before the phase 2 transition begins.
(Holger already has a partial implementation of this tool.)

B) We also plan to change the PEP to say even more strongly that
installer tools should provide an easy option for installing
externally-hosted projects, and that our definition of "easy" includes
the ability for an installer to automatically tell a user what options
they can use to install a specific externally-hosted package that the
tool is refusing to install by default.

C) To make that latter part of (B) easier, we also propose that the
basic simple index include a link with a distinct rel attribute that
points to the -with-externals index page for that project, only for a
package that has external links. This way even tools using the
no-externals index by default can notify users of the existence of
external links for a project when they try to install it.

There's also another possible change, a bit more significant, that we
discussed that I'd be curious to hear your thoughts on. The initial
motivation for separating external links from the main simple/ index was
twofold: 1) Allow future tools to distinguish between internal and
external links without every tool needing to implement host-comparison
algorithms (which may break indexes that host "internal" files on a
CDN), and 2) Allow today's installers, without upgrade, to automatically
migrate eventually to no-external-installs-by-default.

Some things have caused us to re-evaluate these points:

- PyPI can automatically tag internal/external links in the simple index
with rel="internal" and rel="external", which gives future tools a more
reliable marker than host-comparison. So this takes care of #1.

- It may be that giving up #2 is acceptable in the interest of better
backward-compatibility. Old tools will still gain most of the benefits
of this PEP due to the eventual elimination of automatic link-scraping
(both from metadata and external pages) and the move to explicit
submission of external links, only for those projects that want them.
And old tools will not be able to provide a useful error message to
users trying to install an externally-hosted package that is no longer
listed in the main simple/ index, which is a bad usability breakage.

Given that, we are thinking of perhaps simplifying the PEP to eliminate
the separate -with-externals index, and list external links in the main
simple/ index, clearly marked with rel="external". The PEP would still
recommend that future installer tools not follow rel="external" links
without specific user authorization. Old tools still get many of the
benefits, without the breakage.

> and instead of
> making external links more secure, the user is left with the option
> to either not enable external links at all, or to let the
> "devil" in :-)

There is no "instead of." There are parallel proposals (see the TUF
thread) to improve the security of the ecosystem, and those proposals
are not mutually exclusive with this one. If you search the PEP text,
note that you don't find the words "secure" or "security" anywhere
within it, or any claims of security achieved by this proposal alone.
There is a brief mention of MITM attacks, which is relevant to the PEP
because avoiding external link-crawling does reduce that attack surface,
even if other proposals will also help with that (even more).

Thanks for taking the time to read all this! Looking forward to hearing
your thoughts,

Carl

From dholth at gmail.com  Thu Mar 14 02:19:12 2013
From: dholth at gmail.com (Daniel Holth)
Date: Wed, 13 Mar 2013 21:19:12 -0400
Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF
In-Reply-To: <51411598.1010100@students.poly.edu>
References: <51401FB3.7000408@students.poly.edu>
	<CADiSq7cHDeG93XfL4gPDxaco1cZKKAq5q70jDMpPeX1tS-vEgQ@mail.gmail.com>
	<5140432C.7000904@students.poly.edu>
	<CAG8k2+4eOYo=F_HDhB-u0b-91mHC1i+kgmavHp+xFERaDpUE3w@mail.gmail.com>
	<51411598.1010100@students.poly.edu>
Message-ID: <CAG8k2+5eR_GQb-rh=xzJDF3-iHjXg+g-u+HNxqoJ9GRbkrhTLQ@mail.gmail.com>

On Wed, Mar 13, 2013 at 8:11 PM, Trishank Karthik Kuppusamy
<tk47 at students.poly.edu> wrote:
> On 03/13/2013 02:15 PM, Daniel Holth wrote:
>>
>>
>> With all the different kinds of metadata, It's interesting to note
>> that currently TUF seems to only be concerned with the available file
>> names and their integrity. (Some of us will think of PEP 426
>> "PKG-INFO" first when we hear the word metadata.)
>
>
> Yes, you are right that the many different kinds of metadata in this
> discussion (TUF metadata, PyPI metadata) makes things a little confusing
> sometimes! :))
>
> My understanding of PEP 426 is that the distribution metadata is specified
> by the developer with the setup.py script.
>
> To take the running Django example, since the Django developers will sign
> everything under the Django role with their own keys that the D role will
> talk about, setup.py, as well as the generated "PKG-INFO", will be signed by
> the Django developers. This means that pip + TUF will be able to verify
> these distribution metadata indirectly via the source distribution package.
>
> Does this answer your question?

Thanks, yes. The individual .tar.gz distributions do contain PKG-INFO
but we would eventually like to expose it in a more efficient way.
Then to be suitably paranoid you would also have to check that it
matched the package you downloaded! :(

Also note that on http://crate.io the simple index works the same way
as on pypi, except that the actual packages are on a different (CDN)
host.

Thanks,

Daniel

>> It looks like the D metadata lists all the filenames for Django, and
>> then Django lists them again with hashes and signatures. Why all the
>> lists? Does every Django release re-assert all the versions of Django
>> that are available on the index?
>
>
> Good observation. For D, you are talking about the "paths" attribute here:
>
> https://updateframework.com/pypi/repository/metadata/targets/packages/source/D.txt
>
> For Django, you are talking about the "targets" attribute here:
>
> https://updateframework.com/pypi/repository/metadata/targets/packages/source/D/Django.txt
>
> Why is "paths" in D listing all the "targets" that Django already talks
> about? Presently, this is because our target delegation tool (signercli.py)
> is being paranoid and making sure that D is explicitly delegating only
> targets matching these "paths".
>
> However, the TUF specification allows for D to simply say, "I delegate any
> target whatsoever under Django", by settings "paths" to
> "packages/source/D/Django/**":
>
> https://www.updateframework.com/browser/specs/tuf-spec.txt#L525
>
>
>> How might I deal with producing the official source distribution
>> myself and having a friend produce the official Windows build of a
>> package?
>
>
> There are a few solutions. You could have your friend produce the official
> Windows build for a package, and then you could sign it, implicitly trusting
> your friend but not publishing that trust.
>
> A more secure solution would have you delegate that target to your friend.
>
>
>> As an aside PyPI has been doubling in size every 1.5 - 2 years.
>
>
> Exponential growth strikes again! We have anticipated this, and we have a
> few solutions to curb the growth of TUF metadata. Since TUF metadata is
> simply text, GZIP compression would go a long way. Alternatively, we could
> implement delta updates of TUF metadata.
>
> The more difficult problem is how to ensure that target delegation structure
> scales with PyPI growth. A good design will keep this in mind and plan
> accordingly.
>
> Speaking of which, it may be the case that our design document for
> integrating PyPI with TUF may not be terribly easy to understand. (After
> all, you do need to understand TUF first, but TUF is fairly easy once you
> understand its main ideas.) I plan to publish a friendlier document which
> introduce TUF at a very high-level and instead discuss more pragmatic issues
> (such as workflows).
>

From fqj1994 at gmail.com  Thu Mar 14 05:17:35 2013
From: fqj1994 at gmail.com (Qijiang Fan)
Date: Thu, 14 Mar 2013 12:17:35 +0800
Subject: [Catalog-sig] ResponseNotReady error while trying to do fresh sync
Message-ID: <CAG1ZdCBPNVUQ4TNkzTA=rdCcS-sXQrpSTyL5KSZVyw2pd89fPw@mail.gmail.com>

Hello,
I'm maintaining e.pypi.python.org (with Aron Xu).
We met some issues on our network attached storage, so we decided to
do a fresh sync of pypi.
We met an issue while doing that,

we got an exception httplib.ResponseNotReady

similar to this mail
"http://mail.python.org/pipermail/catalog-sig/2013-February/005224.html"

Currently, we ignored all packages with that issues, and finish the sync.

But there would be some files missing.

The three packages which cause that exception are listed below:
https://pypi.python.org/simple/iterator/
https://pypi.python.org/simple/nester_test_ling/
https://pypi.python.org/simple/nesterswe/

Please notify us when it get fixed, so that we can update it and make
it completed.

Best Regards,
Qijiang Fan

From tk47 at students.poly.edu  Thu Mar 14 06:47:17 2013
From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy)
Date: Thu, 14 Mar 2013 01:47:17 -0400
Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF
In-Reply-To: <CAG8k2+5eR_GQb-rh=xzJDF3-iHjXg+g-u+HNxqoJ9GRbkrhTLQ@mail.gmail.com>
References: <51401FB3.7000408@students.poly.edu>
	<CADiSq7cHDeG93XfL4gPDxaco1cZKKAq5q70jDMpPeX1tS-vEgQ@mail.gmail.com>
	<5140432C.7000904@students.poly.edu>
	<CAG8k2+4eOYo=F_HDhB-u0b-91mHC1i+kgmavHp+xFERaDpUE3w@mail.gmail.com>
	<51411598.1010100@students.poly
Message-ID: <51416465.6080407@students.poly.edu>

On 3/13/13 9:19 PM, Daniel Holth wrote:
>
> Thanks, yes. The individual .tar.gz distributions do contain PKG-INFO
> but we would eventually like to expose it in a more efficient way.
> Then to be suitably paranoid you would also have to check that it
> matched the package you downloaded! :(

Great, glad we could help. Well, at least the paranoid would just need 
an extra download :))

> Also note that on http://crate.io the simple index works the same way
> as on pypi, except that the actual packages are on a different (CDN)
> host.

Got it. I'll take a look at crate.io to see how it works. Conceivably, 
the TUF metadata and the PyPI files could live in separate locations 
altogether and we would just have to check that the TUF metadata matches 
the PyPI files.


From ncoghlan at gmail.com  Thu Mar 14 07:19:15 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 13 Mar 2013 23:19:15 -0700
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting
 on PYPI
In-Reply-To: <5140377C.90909@egenix.com>
References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com>
	<20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com>
	<CALeMXf4+tRKTF=dVdx8wXfaM=ZknA2+9PXh3h6_FDtD4e1rAbw@mail.gmail.com>
	<513F8922.90008@egenix.com>
	<CADiSq7fh89J0SfCDYEz5y0Zn_z2nqG3U6NaU8ohekn9rW_5CrQ@mail.gmail.com>
	<5140377C.90909@egenix.com>
Message-ID: <CADiSq7cgBxmZqoYbO6XGHqFFdpbY74_sKcMJRuH=HfTw7VaF=Q@mail.gmail.com>

On Wed, Mar 13, 2013 at 1:23 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 13.03.2013 07:28, Nick Coghlan wrote:
>> On Tue, Mar 12, 2013 at 12:59 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>> I think we should establish a versioned API like that for PyPI
>>> to make progress easier. All major web APIs use versioning
>>> for this reason.
>>
>> Why set up versioning for something we want to phase out? There will
>> never be a simple-v3, so this is really overengineering the proposed
>> change.
>
> Who says that we want to phase out the /simple/ index ?

I want to render it redundant, because it's a crazy way to distribute
completely inadequate metadata.

Cheers,
Nick.

>
> FWIW, I don't think that two or three small changes to the PyPI
> (see my email to Holger) server warrants calling this over-engineering.
> This is about moving forward in a backwards compatible and future
> proof way.
>
> --
> Marc-Andre Lemburg
> eGenix.com
>
> Professional Python Services directly from the Source  (#1, Mar 13 2013)
>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
>
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
>
>    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>            Registered at Amtsgericht Duesseldorf: HRB 46611
>                http://www.egenix.com/company/contact/



-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Thu Mar 14 07:25:27 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 13 Mar 2013 23:25:27 -0700
Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting
 on PYPI
In-Reply-To: <CADiSq7cgBxmZqoYbO6XGHqFFdpbY74_sKcMJRuH=HfTw7VaF=Q@mail.gmail.com>
References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com>
	<20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com>
	<CALeMXf4+tRKTF=dVdx8wXfaM=ZknA2+9PXh3h6_FDtD4e1rAbw@mail.gmail.com>
	<513F8922.90008@egenix.com>
	<CADiSq7fh89J0SfCDYEz5y0Zn_z2nqG3U6NaU8ohekn9rW_5CrQ@mail.gmail.com>
	<5140377C.90909@egenix.com>
	<CADiSq7cgBxmZqoYbO6XGHqFFdpbY74_sKcMJRuH=HfTw7VaF=Q@mail.gmail.com>
Message-ID: <CADiSq7fJ-4kkww19zYsSuknKn-+t762U9y9pH1mg5nO-FmhVWQ@mail.gmail.com>

On Wed, Mar 13, 2013 at 11:19 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Wed, Mar 13, 2013 at 1:23 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 13.03.2013 07:28, Nick Coghlan wrote:
>>> On Tue, Mar 12, 2013 at 12:59 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>> I think we should establish a versioned API like that for PyPI
>>>> to make progress easier. All major web APIs use versioning
>>>> for this reason.
>>>
>>> Why set up versioning for something we want to phase out? There will
>>> never be a simple-v3, so this is really overengineering the proposed
>>> change.
>>
>> Who says that we want to phase out the /simple/ index ?
>
> I want to render it redundant, because it's a crazy way to distribute
> completely inadequate metadata.

Specifically, once we have the infrastructure in place to publish
metadata v2.0 (or a suitable subset) to installation tools, the
relatively impoverished contents of the simple index will be a legacy
interface retained only to preserve the correct operation of existing
tools.

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Thu Mar 14 07:43:20 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 13 Mar 2013 23:43:20 -0700
Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of
 release files
In-Reply-To: <514116DE.50907@oddbird.net>
References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com>
	<8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io>
	<5140D490.3040401@egenix.com> <514116DE.50907@oddbird.net>
Message-ID: <CADiSq7eV5P48RnAOo+cjrvcC7kNeKXq2oXuaCM80FzrDjQuG8Q@mail.gmail.com>

On Wed, Mar 13, 2013 at 5:16 PM, Carl Meyer <carl at oddbird.net> wrote:
> There is no "instead of." There are parallel proposals (see the TUF
> thread) to improve the security of the ecosystem, and those proposals
> are not mutually exclusive with this one. If you search the PEP text,
> note that you don't find the words "secure" or "security" anywhere
> within it, or any claims of security achieved by this proposal alone.
> There is a brief mention of MITM attacks, which is relevant to the PEP
> because avoiding external link-crawling does reduce that attack surface,
> even if other proposals will also help with that (even more).

Right, the changes to provide end-to-end security require more
extensive changes and need to be given appropriate consideration
before we proceed to implementation and deployment. This PEP,
especially with the additional changes you propose here is an
excellent approach to *near term* improvement, as a parallel effort to
the more complex proposals.

The /simple/ index will also be around for a long time for backwards
compatibility reasons, regardless of any other changes that happen in
the overall distribution ecosystem.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Thu Mar 14 08:03:00 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 14 Mar 2013 00:03:00 -0700
Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF
In-Reply-To: <CAMVss_rqfKx-qnRABCoYTWt-s=cy8g0XYKgcM_DG5YQGQmM5Pg@mail.gmail.com>
References: <51401FB3.7000408@students.poly.edu>
	<CADiSq7cHDeG93XfL4gPDxaco1cZKKAq5q70jDMpPeX1tS-vEgQ@mail.gmail.com>
	<5140432C.7000904@students.poly.edu>
	<CAG8k2+4eOYo=F_HDhB-u0b-91mHC1i+kgmavHp+xFERaDpUE3w@mail.gmail.com>
	<CAMVss_rqfKx-qnRABCoYTWt-s=cy8g0XYKgcM_DG5YQGQmM5Pg@mail.gmail.com>
Message-ID: <CADiSq7ej5TtFdDqN7QOmWfpjDDKVBjzYCa0bHj519LMCZZk4kw@mail.gmail.com>

On Wed, Mar 13, 2013 at 11:58 AM, Justin Cappos <jcappos at poly.edu> wrote:
> We use the simple directory and filenames because that is what pip uses.
>
> You have a nice suggestion to include other metadata in the TUF metadata.
> We certainly could do this if desirable.   This required a redesign of the
> PyPI API and we weren't sure if this was wanted.   Our current doc /
> prototype is trying to minimize the changes needed all around.

I think what you currently propose (signing the metadata pip already
understands) is a good first step, especially if we can have PyPI
signing *all* the target metadata in the initial deployment, and defer
the delegation to package developers until the next phase of the
rollout (we obviously want to do that eventually, but it's easier if
we can get a preliminary version working without needing to change the
upload tools).

While such an approach doesn't immediately give us the end-to-end
security we ultimately want to set up, it means a few things become
possible:
1. Rather than requiring every developer to start signing end-to-end
metadata immediately, we can ask a few major projects (e.g. Django,
Zope, NumPy) if they're willing to serve as guinea pigs for the
developer target signing delegations. Once we're happy the signing
process is usable, we can make it generally available as an option to
projects (while also allowing them to continue with PyPI's existing
upload mechanisms and only offer PyPI-user integrity checks rather
than developer-user)
2. Gives the PSF infrastructure team and the PyPI maintainers a chance
to work with the installation tool developers to get the PyPI-user
link sorted out, before needing to work on the developer-PyPI link
3. Considering alternate mirroring solutions based on replicating the
TUF metadata rather than PEP 381

Eventually I would also like to tunnel a subset of the PEP 426
metadata through TUF's "custom" fields, but again, I think we're
better off skipping that for the first iteration. Incremental
enhancements are a good thing :)

Regards,
Nick.

>
> Thanks,
> Justin
>
>
> On Wed, Mar 13, 2013 at 2:15 PM, Daniel Holth <dholth at gmail.com> wrote:
>>
>> On Wed, Mar 13, 2013 at 5:13 AM, Trishank Karthik Kuppusamy
>> <tk47 at students.poly.edu> wrote:
>> > Hello Nick,
>> >
>> >
>> > On 3/13/13 4:09 AM, Nick Coghlan wrote:
>> >>
>> >>
>> >> - the PSF board generally stays out of the technical details of
>> >> running the python.org infrastructure, so it's likely that any root
>> >> keys would be handled by the PSF infrastructure committee. A (2, 4) or
>> >> (3, 5) trust configuration would likely be manageable at this level.
>> >
>> >
>> > Understood. We think a higher (t, n) [where t out of n signatures are
>> > needed
>> > to trust the metadata for a role] is better for the root role simply
>> > because
>> > its crucial metadata (the authorized keys for top-level roles) should
>> > change
>> > very rarely.
>> >
>> >
>> >> - at the target delegation level, PyPI supports the registration of
>> >> new projects through the web service (see
>> >> http://docs.python.org/2/distutils/packageindex.html). If my
>> >> understanding of target delegation is correct, this means the "simple"
>> >> and "packages/source/<letter>" delegations will need to be (1, 1) and
>> >> online.
>> >> - higher levels of the target delegation hierarchy could conceivably
>> >> be kept offline, but there seems little value in doing so if they're
>> >> trusting on online (1, 1) key
>> >
>> >
>> > Fortunately, the "targets/simple" and
>> > "targets/packages/(version)/(letter)/"
>> > roles should not require (1, 1) online keys, as their metadata (simply
>> > target delegations and no actual target files) should also fluctuate
>> > fairly
>> > rarely. I should make this clearer in our design document.
>> >
>> >
>> >> - many PyPI packages are maintained by single developers, so (1, 1) or
>> >> (1, n) is likely to be the only generally feasible level of signing at
>> >> the project level.
>> >
>> >
>> > Yes, the package developers themselves could choose any (t, n) they
>> > like. In
>> > our design, we propose that PyPI could eventually delegate to "stable"
>> > packages which need little change (and use more security with more
>> > offline
>> > keys) and to "unstable" packages which need frequent change (and use
>> > less
>> > security with more online keys).
>> >
>> >
>> >> With the current focus being on getting an improvement from the status
>> >> quo that we can successfully deploy in a reasonable period of time,
>> >> the target delegation side of things probably needs to be
>> >> substantially simpler in the initial iteration. Yes, it leaves us open
>> >> to certain vulnerabilities we would like to remove in the long run,
>> >> but we need to be very cautious in the additional demands we place on
>> >> the users uploading to PyPI. It may even mean the initial iteration
>> >> allows projects to rely on a PyPI provided signing key for their TUF
>> >> metadata, using the existing upload mechanisms to add the files to
>> >> PyPI.
>> >
>> >
>> > I agree that there is a delicate problem of balancing security with
>> > usability here, especially in the beginning.
>> >
>> > You raised a very good issue there: on first migration, how would PyPI
>> > accommodate packages which have not had their target files delegated to
>> > their developers? We imagine that in this case, PyPI could assume
>> > initial
>> > responsibility for these packages, and later PyPI would delegate those
>> > packages to their respective developers.
>> >
>> > Thanks for your input,
>> > Trishank
>>
>> With all the different kinds of metadata, It's interesting to note
>> that currently TUF seems to only be concerned with the available file
>> names and their integrity. (Some of us will think of PEP 426
>> "PKG-INFO" first when we hear the word metadata.)
>>
>> It looks like the D metadata lists all the filenames for Django, and
>> then Django lists them again with hashes and signatures. Why all the
>> lists? Does every Django release re-assert all the versions of Django
>> that are available on the index?
>>
>> How might I deal with producing the official source distribution
>> myself and having a friend produce the official Windows build of a
>> package?
>>
>> As an aside PyPI has been doubling in size every 1.5 - 2 years.
>>
>> Thanks
>>
>> Daniel Holth
>
>



-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From mal at egenix.com  Thu Mar 14 08:54:05 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 14 Mar 2013 08:54:05 +0100
Subject: [Catalog-sig] Publishing metadata (was: V2 pre-PEP:
 transitioning to release file hosting on PYPI)
In-Reply-To: <CADiSq7fJ-4kkww19zYsSuknKn-+t762U9y9pH1mg5nO-FmhVWQ@mail.gmail.com>
References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com>
	<20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com>
	<CALeMXf4+tRKTF=dVdx8wXfaM=ZknA2+9PXh3h6_FDtD4e1rAbw@mail.gmail.com>
	<513F8922.90008@egenix.com>
	<CADiSq7fh89J0SfCDYEz5y0Zn_z2nqG3U6NaU8ohekn9rW_5CrQ@mail.gmail.com>
	<5140377C.90909@egenix.com>
	<CADiSq7cgBxmZqoYbO6XGHqFFdpbY74_sKcMJRuH=HfTw7VaF=Q@mail.gmail.com>
	<CADiSq7fJ-4kkww19zYsSuknKn-+t762U9y9pH1mg5nO-FmhVWQ@mail.gmail.com>
Message-ID: <5141821D.6060601@egenix.com>

On 14.03.2013 07:25, Nick Coghlan wrote:
> On Wed, Mar 13, 2013 at 11:19 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> On Wed, Mar 13, 2013 at 1:23 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>>> On 13.03.2013 07:28, Nick Coghlan wrote:
>>>> On Tue, Mar 12, 2013 at 12:59 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>>> I think we should establish a versioned API like that for PyPI
>>>>> to make progress easier. All major web APIs use versioning
>>>>> for this reason.
>>>>
>>>> Why set up versioning for something we want to phase out? There will
>>>> never be a simple-v3, so this is really overengineering the proposed
>>>> change.
>>>
>>> Who says that we want to phase out the /simple/ index ?
>>
>> I want to render it redundant, because it's a crazy way to distribute
>> completely inadequate metadata.
> 
> Specifically, once we have the infrastructure in place to publish
> metadata v2.0 (or a suitable subset) to installation tools, the
> relatively impoverished contents of the simple index will be a legacy
> interface retained only to preserve the correct operation of existing
> tools.

Those two are orthogonal.

The index itself is just a bag of things and, as such, one that's very
well suited to publish data, since it can easily be exposed in form
of static files, which can be put on a CDNs or mirrored using
rsync.

It's easy to add the metadata file to that index for tools to
pick up - in addition to the other data exposed on the index
pages and perfectly backwards compatible.

As mentioned before, I think we should start publishing the
existing metadata stored in the PyPI database on those
index pages as PKG-INFO files, so that tools can easily
access the data without having to go through XML-RPC.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 14 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From tk47 at students.poly.edu  Thu Mar 14 08:21:47 2013
From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy)
Date: Thu, 14 Mar 2013 03:21:47 -0400
Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF
In-Reply-To: <CADiSq7ej5TtFdDqN7QOmWfpjDDKVBjzYCa0bHj519LMCZZk4kw@mail.gmail.com>
References: <51401FB3.7000408@students.poly.edu>
	<CADiSq7cHDeG93XfL4gPDxaco1cZKKAq5q70jDMpPeX1tS-vEgQ@mail.gmail.com>
	<5140432C.7000904@students.poly.edu>
	<CAG8k2+4eOYo=F_HDhB-u0b-91mHC1i+kgmavHp+xFERaDpUE3w@mail.gmail.com>
	<CAMVss_rqfKx-qnRABCoYTWt-s=cy8g0XYKgcM_DG5YQGQmM5Pg@mail.gmail.com>
	<CADiSq7ej5TtFdDqN7QOmWfpjDDKVBjzYCa0bHj519LMCZZk4kw@mail.gmail.com>
Message-ID: <51417A8B.8030909@students.poly.edu>

On 3/14/13 3:03 AM, Nick Coghlan wrote:
>
> I think what you currently propose (signing the metadata pip already
> understands) is a good first step, especially if we can have PyPI
> signing *all* the target metadata in the initial deployment, and defer
> the delegation to package developers until the next phase of the
> rollout (we obviously want to do that eventually, but it's easier if
> we can get a preliminary version working without needing to change the
> upload tools).
>
> While such an approach doesn't immediately give us the end-to-end
> security we ultimately want to set up, it means a few things become
> possible:
> 1. Rather than requiring every developer to start signing end-to-end
> metadata immediately, we can ask a few major projects (e.g. Django,
> Zope, NumPy) if they're willing to serve as guinea pigs for the
> developer target signing delegations. Once we're happy the signing
> process is usable, we can make it generally available as an option to
> projects (while also allowing them to continue with PyPI's existing
> upload mechanisms and only offer PyPI-user integrity checks rather
> than developer-user)
> 2. Gives the PSF infrastructure team and the PyPI maintainers a chance
> to work with the installation tool developers to get the PyPI-user
> link sorted out, before needing to work on the developer-PyPI link
> 3. Considering alternate mirroring solutions based on replicating the
> TUF metadata rather than PEP 381
>
> Eventually I would also like to tunnel a subset of the PEP 426
> metadata through TUF's "custom" fields, but again, I think we're
> better off skipping that for the first iteration. Incremental
> enhancements are a good thing :)

This sounds good to me --- I like the idea of incremental enhancements. 
Justin, what are your thoughts from a security perspective?


From holger at merlinux.eu  Thu Mar 14 09:58:01 2013
From: holger at merlinux.eu (holger krekel)
Date: Thu, 14 Mar 2013 08:58:01 +0000
Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of
 release files
In-Reply-To: <CADiSq7eV5P48RnAOo+cjrvcC7kNeKXq2oXuaCM80FzrDjQuG8Q@mail.gmail.com>
References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com>
	<8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io>
	<5140D490.3040401@egenix.com> <514116DE.50907@oddbird.net>
	<CADiSq7eV5P48RnAOo+cjrvcC7kNeKXq2oXuaCM80FzrDjQuG8Q@mail.gmail.com>
Message-ID: <20130314085800.GT9677@merlinux.eu>

On Wed, Mar 13, 2013 at 23:43 -0700, Nick Coghlan wrote:
> On Wed, Mar 13, 2013 at 5:16 PM, Carl Meyer <carl at oddbird.net> wrote:
> > There is no "instead of." There are parallel proposals (see the TUF
> > thread) to improve the security of the ecosystem, and those proposals
> > are not mutually exclusive with this one. If you search the PEP text,
> > note that you don't find the words "secure" or "security" anywhere
> > within it, or any claims of security achieved by this proposal alone.
> > There is a brief mention of MITM attacks, which is relevant to the PEP
> > because avoiding external link-crawling does reduce that attack surface,
> > even if other proposals will also help with that (even more).
> 
> Right, the changes to provide end-to-end security require more
> extensive changes and need to be given appropriate consideration
> before we proceed to implementation and deployment. This PEP,
> especially with the additional changes you propose here is an
> excellent approach to *near term* improvement, as a parallel effort to
> the more complex proposals.
> 
> The /simple/ index will also be around for a long time for backwards
> compatibility reasons, regardless of any other changes that happen in
> the overall distribution ecosystem.

I haven't followed the latest TUF discussions and related docs in
depths yet but if those developments will regard "simple/" as a deprecated
interface, i think this PEP here should maybe not introduce
"simple/-with-externals" as it will just make the situation more 
complicated for everyone to understand in a few months from now.

best,
holger


> Cheers,
> Nick.
> 
> -- 
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
> 

From mal at egenix.com  Thu Mar 14 11:07:07 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 14 Mar 2013 11:07:07 +0100
Subject: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource
 sorting algorithm
In-Reply-To: <CALeMXf5fOoVGHz9E2DV-QZFQJUwMpNkJcYQcJDtJSHJ_WRqbHA@mail.gmail.com>
References: <513F70B5.5030501@egenix.com> <513F893F.9010707@egenix.com>
	<CALeMXf5fOoVGHz9E2DV-QZFQJUwMpNkJcYQcJDtJSHJ_WRqbHA@mail.gmail.com>
Message-ID: <5141A14B.9030301@egenix.com>

On 12.03.2013 22:26, PJ Eby wrote:
> On Tue, Mar 12, 2013 at 3:59 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 12.03.2013 19:15, M.-A. Lemburg wrote:
>>> I've run into a weird issue with easy_install, that I'm trying to solve:
>>>
>>> If I place two files named
>>>
>>> egenix_mxodbc_connect_client-2.0.2-py2.6.egg
>>> egenix-mxodbc-connect-client-2.0.2.win32-py2.6.prebuilt.zip
>>>
>>> into the same directory and let easy_install running on Linux
>>> scan this, it considers the second file for Windows as best
>>> match.
>>>
>>> Is the algorithm used for determining the best match documented
>>> somewhere ?
>>>
>>> I've had a look at the implementation, but this left me rather
>>> clueless.
>>>
>>> I thought that setuptools would prefer the .egg file over
>>> the prebuilt .zip file - binary files being easier to install
>>> than "source" files.
>>
>> After some experiments, I found that the follow change
>> in filename (swapping platform and python version, in addition
>> to use '-' instead of '.) works:
>>
>> egenix-mxodbc-connect-client-2.0.2-py2.6-win32.prebuilt.zip
>>
>> OTOH, this one doesn't (notice the difference ?):
>>
>> egenix-mxodbc-connect-client-2.0.2.py2.6-win32.prebuilt.zip
>>
>> The logic behind all this looks rather fragile to me.
> 
> easy_install only guarantees sane version parsing for distribution
> files built using setuptools' naming algorithms.  If you use
> distutils, it can only make guesses, because the distutils does not
> have a completely unambiguous file naming scheme.  And if you are
> naming the files by hand, God help you.  ;-)

The problem appears to be a bug in setuptools' package_index.py.

The function interpret_distro_name() creates a set of possible
separations of the found name into project name and version.

It does find the right separation, but for some reason, the
code using that function does not check the found project
names against the project name the user is trying to install,
but simply takes the last entry of the list returned by the
above function.

As a result, easy_install downloads and tries to install
project files that don't match the project name in some
cases.

Here's another example where it fails (say you're on a x64 Linux box):

# easy_install egenix-pyopenssl

As example, say it finds these distribution files:

    'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs2-linux-x86_64-prebuilt.zip',
    'egenix_pyopenssl-0.13.1.1.0.1.5-py2.7-linux-x86_64.egg',
    'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs2-macosx-10.5-x86_64-prebuilt.zip',
    'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs4-macosx-10.5-x86_64-prebuilt.zip',

It then creates different interpretations of those names, puts
them in a list and sorts them. Here's the end of that list:

egenix-pyopenssl; 0.13.1.1.0.1.5 <<-- this would be the correct .egg file
egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs2-linux-x86-64-prebuilt
egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs2-macosx-10.5-x86-64-prebuilt
egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt
egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs2-macosx; 10.5-x86-64-prebuilt
egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs4-macosx; 10.5-x86-64-prebuilt

It picks the last entry, which would be for a project called
"egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs4-macosx" - not the one
the user searched.

I'm trying to find a way to get it to use the correct .egg file
The .egg files does have precedence over the other files, since
easy_install regards them as source files with lower precedence.

This is important, because the /simple/ index page will have links
not only to .egg files, but also to our prebuilt .zip files,
which use a source file compatible setup.py interface.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 14 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From tk47 at students.poly.edu  Thu Mar 14 13:14:50 2013
From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy)
Date: Thu, 14 Mar 2013 08:14:50 -0400
Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of
 release files
In-Reply-To: <20130314085800.GT9677@merlinux.eu>
References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com>
	<8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io>
	<5140D490.3040401@egenix.com> <514116DE.50907@oddbird.net>
	<CADiSq7eV5P48RnAOo+cjrvcC7kNeKXq2oXuaCM80FzrDjQuG8Q@mail.gmail.com>
	<20130314085800.GT9677@merlinux.eu>
Message-ID: <5141BF3A.6060606@students.poly.edu>

On 3/14/13 4:58 AM, holger krekel wrote:
>
> I haven't followed the latest TUF discussions and related docs in
> depths yet but if those developments will regard "simple/" as a deprecated
> interface, i think this PEP here should maybe not introduce
> "simple/-with-externals" as it will just make the situation more
> complicated for everyone to understand in a few months from now.

I haven't yet followed your PEP in as much depth as I would like, but I 
wish to assure you that we do not regard "/simple/" as a deprecated 
interface. In fact, we aim to preserve backwards-compatibility as much 
as possible! :)


From jim at zope.com  Thu Mar 14 13:26:07 2013
From: jim at zope.com (Jim Fulton)
Date: Thu, 14 Mar 2013 08:26:07 -0400
Subject: [Catalog-sig] Packaging & Distribution Mini-Summit at PyCon US
In-Reply-To: <CAPDm-FiWSPAVsgYF6vHMMzChm-_3MvZX=QemzDqh0Q5H+MCKvg@mail.gmail.com>
References: <CADiSq7dNoiE=Qu1x_rP1Y84bNPu=qJ37RTJx4cUDxtGN68ndOQ@mail.gmail.com>
	<CAPDm-FiWSPAVsgYF6vHMMzChm-_3MvZX=QemzDqh0Q5H+MCKvg@mail.gmail.com>
Message-ID: <CAPDm-FgDqA=Gkj1s+LBJycGzK6+bx87RmU5UyST8jaG+ceGuFg@mail.gmail.com>

On Thu, Feb 7, 2013 at 10:19 AM, Jim Fulton <jim at zope.com> wrote:
> On Wed, Feb 6, 2013 at 3:15 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> As folks may be aware, I am moderating a panel called "Directions in
>> Packaging" on the Saturday afternoon at PyCon US.
>>
>> Before that though, I am also organising what I am calling a
>> "Packaging & Distribution Mini-Summit" as an open space on the Friday
>> night (we have one of the larger open space rooms reserved, so we
>> should have a fair bit of space if a decent crowd turns up).
>
> I wasn't going to be at PyCon, but I changed my plans specifically to
> participate in this. Thanks for setting this up.
>
>> An overview of what I'm hoping we can achieve at the session is at
>> https://us.pycon.org/2013/community/openspaces/packaginganddistributionminisummit/
>> (that page should be editable by anyone that has registered for PyCon
>> US).
>
> Cool.  A major difficulty in these sorts of discussions is that people
> have different problems they want to solve and argue about solutions
> without clearly stating their problems.
>
> If you don't mind, I'll try to find some time in the next few days to
> add a section
> to that page to list goals/problems.

OK, well, hopefully better late than never.  I took a stab at adding
this to the end of:

https://us.pycon.org/2013/community/openspaces/packaginganddistributionminisummit/

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton

From jcappos at poly.edu  Thu Mar 14 15:13:03 2013
From: jcappos at poly.edu (Justin Cappos)
Date: Thu, 14 Mar 2013 10:13:03 -0400
Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of
 release files
In-Reply-To: <5141BF3A.6060606@students.poly.edu>
References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com>
	<8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io>
	<5140D490.3040401@egenix.com> <514116DE.50907@oddbird.net>
	<CADiSq7eV5P48RnAOo+cjrvcC7kNeKXq2oXuaCM80FzrDjQuG8Q@mail.gmail.com>
	<20130314085800.GT9677@merlinux.eu>
	<5141BF3A.6060606@students.poly.edu>
Message-ID: <CAMVss_qb5C3+=OJZwZxRQ9DCFAG51a35g0wjFov8vneGV=KMHQ@mail.gmail.com>

Maybe a different way to say it is that the current TUF integration doc
assumes that it is desirable to make minimal change to PyPI's layout and
pip, easy_install, etc. while adding security.   We made several choices
based upon this assumption, including using and retaining the /simple dir.


If the community wants a more 'clean-slate' design, we could put that
together also.   This requires a lot of information specific to your setup
and use cases so we'd appreciate collaboration with you guys to write that
up.

Thanks,
Justin


On Thu, Mar 14, 2013 at 8:14 AM, Trishank Karthik Kuppusamy <
tk47 at students.poly.edu> wrote:

> On 3/14/13 4:58 AM, holger krekel wrote:
>
>>
>> I haven't followed the latest TUF discussions and related docs in
>> depths yet but if those developments will regard "simple/" as a deprecated
>> interface, i think this PEP here should maybe not introduce
>> "simple/-with-externals" as it will just make the situation more
>> complicated for everyone to understand in a few months from now.
>>
>
> I haven't yet followed your PEP in as much depth as I would like, but I
> wish to assure you that we do not regard "/simple/" as a deprecated
> interface. In fact, we aim to preserve backwards-compatibility as much as
> possible! :)
>
>
> ______________________________**_________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/**mailman/listinfo/catalog-sig<http://mail.python.org/mailman/listinfo/catalog-sig>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130314/9ecdf847/attachment.html>

From ncoghlan at gmail.com  Thu Mar 14 15:39:46 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 14 Mar 2013 07:39:46 -0700
Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of
 release files
In-Reply-To: <CAMVss_qb5C3+=OJZwZxRQ9DCFAG51a35g0wjFov8vneGV=KMHQ@mail.gmail.com>
References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com>
	<8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io>
	<5140D490.3040401@egenix.com> <514116DE.50907@oddbird.net>
	<CADiSq7eV5P48RnAOo+cjrvcC7kNeKXq2oXuaCM80FzrDjQuG8Q@mail.gmail.com>
	<20130314085800.GT9677@merlinux.eu>
	<5141BF3A.6060606@students.poly.edu>
	<CAMVss_qb5C3+=OJZwZxRQ9DCFAG51a35g0wjFov8vneGV=KMHQ@mail.gmail.com>
Message-ID: <CADiSq7d-nkQypyYzWooom7LW41iuPOZ_e3AGBJ=HgEtbbDxCdQ@mail.gmail.com>

On Thu, Mar 14, 2013 at 7:13 AM, Justin Cappos <jcappos at poly.edu> wrote:
> Maybe a different way to say it is that the current TUF integration doc
> assumes that it is desirable to make minimal change to PyPI's layout and
> pip, easy_install, etc. while adding security.   We made several choices
> based upon this assumption, including using and retaining the /simple dir.

I think what you're proposing now is a pretty good place to state
(although I'm suggesting making it even simpler in the near term by
starting by focusing on the PyPI->end user link, and then moving to
delegating signing of the per-project metadata to the individual
projects as a later step)

> If the community wants a more 'clean-slate' design, we could put that
> together also.   This requires a lot of information specific to your setup
> and use cases so we'd appreciate collaboration with you guys to write that
> up.

I'd like to do a "distribution 2.0" at some point where we make the
simple index redundant by including that info (and more) directly in
the TUF metadata, but I think that's a "later" project - securing what
we have now is a better place to start.

Cheers,
Nick.

>
> Thanks,
> Justin
>
>
> On Thu, Mar 14, 2013 at 8:14 AM, Trishank Karthik Kuppusamy
> <tk47 at students.poly.edu> wrote:
>>
>> On 3/14/13 4:58 AM, holger krekel wrote:
>>>
>>>
>>> I haven't followed the latest TUF discussions and related docs in
>>> depths yet but if those developments will regard "simple/" as a
>>> deprecated
>>> interface, i think this PEP here should maybe not introduce
>>> "simple/-with-externals" as it will just make the situation more
>>> complicated for everyone to understand in a few months from now.
>>
>>
>> I haven't yet followed your PEP in as much depth as I would like, but I
>> wish to assure you that we do not regard "/simple/" as a deprecated
>> interface. In fact, we aim to preserve backwards-compatibility as much as
>> possible! :)
>>
>>
>> _______________________________________________
>> Catalog-SIG mailing list
>> Catalog-SIG at python.org
>> http://mail.python.org/mailman/listinfo/catalog-sig
>
>
>
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
>



-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Thu Mar 14 15:45:23 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 14 Mar 2013 07:45:23 -0700
Subject: [Catalog-sig] Publishing metadata (was: V2 pre-PEP:
 transitioning to release file hosting on PYPI)
In-Reply-To: <5141821D.6060601@egenix.com>
References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com>
	<20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com>
	<CALeMXf4+tRKTF=dVdx8wXfaM=ZknA2+9PXh3h6_FDtD4e1rAbw@mail.gmail.com>
	<513F8922.90008@egenix.com>
	<CADiSq7fh89J0SfCDYEz5y0Zn_z2nqG3U6NaU8ohekn9rW_5CrQ@mail.gmail.com>
	<5140377C.90909@egenix.com>
	<CADiSq7cgBxmZqoYbO6XGHqFFdpbY74_sKcMJRuH=HfTw7VaF=Q@mail.gmail.com>
	<CADiSq7fJ-4kkww19zYsSuknKn-+t762U9y9pH1mg5nO-FmhVWQ@mail.gmail.com>
	<5141821D.6060601@egenix.com>
Message-ID: <CADiSq7cFEam55dsf4Uz5T4x-O8HsHK_=1QOPMbH83AmxfLaDUQ@mail.gmail.com>

On Thu, Mar 14, 2013 at 12:54 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> The index itself is just a bag of things and, as such, one that's very
> well suited to publish data, since it can easily be exposed in form
> of static files, which can be put on a CDNs or mirrored using
> rsync.

The TUF metadata is also just a collection of static files which can
be put on CDNs and mirrored using rsync. That's one of the reasons TUF
is an interesting approach :)

> It's easy to add the metadata file to that index for tools to
> pick up - in addition to the other data exposed on the index
> pages and perfectly backwards compatible.
>
> As mentioned before, I think we should start publishing the
> existing metadata stored in the PyPI database on those
> index pages as PKG-INFO files, so that tools can easily
> access the data without having to go through XML-RPC.

Yes, I think that's a good near term approach. However, there's still
a lot of duplication of functionality between the TUF metadata and the
simple index, so if we get TUF-based security up and running, my long
term aim will be to make it so that once you have downloaded the TUF
metadata, you shouldn't *need* anything from the simple index, and
would be able to go directly to downloading the release files. That's
a longer term idea, though and we may even decide it isn't worth the
hassle if PKG-INFO is made available through /simple.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From jcappos at poly.edu  Thu Mar 14 15:58:14 2013
From: jcappos at poly.edu (Justin Cappos)
Date: Thu, 14 Mar 2013 10:58:14 -0400
Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF
In-Reply-To: <51417A8B.8030909@students.poly.edu>
References: <51401FB3.7000408@students.poly.edu>
	<CADiSq7cHDeG93XfL4gPDxaco1cZKKAq5q70jDMpPeX1tS-vEgQ@mail.gmail.com>
	<5140432C.7000904@students.poly.edu>
	<CAG8k2+4eOYo=F_HDhB-u0b-91mHC1i+kgmavHp+xFERaDpUE3w@mail.gmail.com>
	<CAMVss_rqfKx-qnRABCoYTWt-s=cy8g0XYKgcM_DG5YQGQmM5Pg@mail.gmail.com>
	<CADiSq7ej5TtFdDqN7QOmWfpjDDKVBjzYCa0bHj519LMCZZk4kw@mail.gmail.com>
	<51417A8B.8030909@students.poly.edu>
Message-ID: <CAMVss_qy2dDHwGNf+3=GOZ74bGsK7tCp70n+KTuond-phzAeZw@mail.gmail.com>

Yes, Nick's suggestions are good ones.

I'd agree that getting an initial deployment together that doesn't include
things like custom metadata is probably for the best.   We can certainly
add things incrementally.

Thanks,
Justin




On Thu, Mar 14, 2013 at 3:21 AM, Trishank Karthik Kuppusamy <
tk47 at students.poly.edu> wrote:

> On 3/14/13 3:03 AM, Nick Coghlan wrote:
>
>>
>> I think what you currently propose (signing the metadata pip already
>> understands) is a good first step, especially if we can have PyPI
>> signing *all* the target metadata in the initial deployment, and defer
>> the delegation to package developers until the next phase of the
>> rollout (we obviously want to do that eventually, but it's easier if
>> we can get a preliminary version working without needing to change the
>> upload tools).
>>
>> While such an approach doesn't immediately give us the end-to-end
>> security we ultimately want to set up, it means a few things become
>> possible:
>> 1. Rather than requiring every developer to start signing end-to-end
>> metadata immediately, we can ask a few major projects (e.g. Django,
>> Zope, NumPy) if they're willing to serve as guinea pigs for the
>> developer target signing delegations. Once we're happy the signing
>> process is usable, we can make it generally available as an option to
>> projects (while also allowing them to continue with PyPI's existing
>> upload mechanisms and only offer PyPI-user integrity checks rather
>> than developer-user)
>> 2. Gives the PSF infrastructure team and the PyPI maintainers a chance
>> to work with the installation tool developers to get the PyPI-user
>> link sorted out, before needing to work on the developer-PyPI link
>> 3. Considering alternate mirroring solutions based on replicating the
>> TUF metadata rather than PEP 381
>>
>> Eventually I would also like to tunnel a subset of the PEP 426
>> metadata through TUF's "custom" fields, but again, I think we're
>> better off skipping that for the first iteration. Incremental
>> enhancements are a good thing :)
>>
>
> This sounds good to me --- I like the idea of incremental enhancements.
> Justin, what are your thoughts from a security perspective?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130314/5b760105/attachment.html>

From pje at telecommunity.com  Thu Mar 14 17:39:44 2013
From: pje at telecommunity.com (PJ Eby)
Date: Thu, 14 Mar 2013 12:39:44 -0400
Subject: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource
 sorting algorithm
In-Reply-To: <5141A14B.9030301@egenix.com>
References: <513F70B5.5030501@egenix.com> <513F893F.9010707@egenix.com>
	<CALeMXf5fOoVGHz9E2DV-QZFQJUwMpNkJcYQcJDtJSHJ_WRqbHA@mail.gmail.com>
	<5141A14B.9030301@egenix.com>
Message-ID: <CALeMXf6kx5Zrhu5_c5jUHqtCBjUbhdXUqNPdwmXBRwg1q2XMpg@mail.gmail.com>

On Thu, Mar 14, 2013 at 6:07 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 12.03.2013 22:26, PJ Eby wrote:
>> On Tue, Mar 12, 2013 at 3:59 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>> On 12.03.2013 19:15, M.-A. Lemburg wrote:
>>>> I've run into a weird issue with easy_install, that I'm trying to solve:
>>>>
>>>> If I place two files named
>>>>
>>>> egenix_mxodbc_connect_client-2.0.2-py2.6.egg
>>>> egenix-mxodbc-connect-client-2.0.2.win32-py2.6.prebuilt.zip
>>>>
>>>> into the same directory and let easy_install running on Linux
>>>> scan this, it considers the second file for Windows as best
>>>> match.
>>>>
>>>> Is the algorithm used for determining the best match documented
>>>> somewhere ?
>>>>
>>>> I've had a look at the implementation, but this left me rather
>>>> clueless.
>>>>
>>>> I thought that setuptools would prefer the .egg file over
>>>> the prebuilt .zip file - binary files being easier to install
>>>> than "source" files.
>>>
>>> After some experiments, I found that the follow change
>>> in filename (swapping platform and python version, in addition
>>> to use '-' instead of '.) works:
>>>
>>> egenix-mxodbc-connect-client-2.0.2-py2.6-win32.prebuilt.zip
>>>
>>> OTOH, this one doesn't (notice the difference ?):
>>>
>>> egenix-mxodbc-connect-client-2.0.2.py2.6-win32.prebuilt.zip
>>>
>>> The logic behind all this looks rather fragile to me.
>>
>> easy_install only guarantees sane version parsing for distribution
>> files built using setuptools' naming algorithms.  If you use
>> distutils, it can only make guesses, because the distutils does not
>> have a completely unambiguous file naming scheme.  And if you are
>> naming the files by hand, God help you.  ;-)
>
> The problem appears to be a bug in setuptools' package_index.py.
>
> The function interpret_distro_name() creates a set of possible
> separations of the found name into project name and version.
>
> It does find the right separation, but for some reason, the
> code using that function does not check the found project
> names against the project name the user is trying to install,
> but simply takes the last entry of the list returned by the
> above function.
>
> As a result, easy_install downloads and tries to install
> project files that don't match the project name in some
> cases.
>
> Here's another example where it fails (say you're on a x64 Linux box):
>
> # easy_install egenix-pyopenssl
>
> As example, say it finds these distribution files:
>
>     'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs2-linux-x86_64-prebuilt.zip',
>     'egenix_pyopenssl-0.13.1.1.0.1.5-py2.7-linux-x86_64.egg',
>     'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs2-macosx-10.5-x86_64-prebuilt.zip',
>     'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs4-macosx-10.5-x86_64-prebuilt.zip',
>
> It then creates different interpretations of those names, puts
> them in a list and sorts them. Here's the end of that list:
>
> egenix-pyopenssl; 0.13.1.1.0.1.5 <<-- this would be the correct .egg file
> egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs2-linux-x86-64-prebuilt
> egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs2-macosx-10.5-x86-64-prebuilt
> egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt
> egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs2-macosx; 10.5-x86-64-prebuilt
> egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs4-macosx; 10.5-x86-64-prebuilt
>
> It picks the last entry, which would be for a project called
> "egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs4-macosx" - not the one
> the user searched.

Actually, that's not quite true.  It's picking:

egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt

Because it thinks that
'0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt' is a higher
version than 0.13.1.1.0.1.5.

It does also record the possibility you mentioned, but it doesn't pick
that one.  The project names actually *do* have to match.

If you open a ticket on the setuptools tracker, 'll try to see if I
can get it to recognize that strings like py2.7, macosx, ucs, and the
like are terminators for a version number.  I don't know how
successful I'll be, though.  Basically, those zip files are (I assume)
bdist_dumb distributions being taken for source distributions, and
easy_install doesn't actually support bdist_dumb files at the moment.

From mal at egenix.com  Thu Mar 14 19:11:59 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 14 Mar 2013 19:11:59 +0100
Subject: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource
 sorting algorithm
In-Reply-To: <CALeMXf6kx5Zrhu5_c5jUHqtCBjUbhdXUqNPdwmXBRwg1q2XMpg@mail.gmail.com>
References: <513F70B5.5030501@egenix.com> <513F893F.9010707@egenix.com>
	<CALeMXf5fOoVGHz9E2DV-QZFQJUwMpNkJcYQcJDtJSHJ_WRqbHA@mail.gmail.com>
	<5141A14B.9030301@egenix.com>
	<CALeMXf6kx5Zrhu5_c5jUHqtCBjUbhdXUqNPdwmXBRwg1q2XMpg@mail.gmail.com>
Message-ID: <514212EF.4030505@egenix.com>

On 14.03.2013 17:39, PJ Eby wrote:
> On Thu, Mar 14, 2013 at 6:07 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 12.03.2013 22:26, PJ Eby wrote:
>>> On Tue, Mar 12, 2013 at 3:59 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>> On 12.03.2013 19:15, M.-A. Lemburg wrote:
>>>>> I've run into a weird issue with easy_install, that I'm trying to solve:
>>>>>
>>>>> If I place two files named
>>>>>
>>>>> egenix_mxodbc_connect_client-2.0.2-py2.6.egg
>>>>> egenix-mxodbc-connect-client-2.0.2.win32-py2.6.prebuilt.zip
>>>>>
>>>>> into the same directory and let easy_install running on Linux
>>>>> scan this, it considers the second file for Windows as best
>>>>> match.
>>>>>
>>>>> Is the algorithm used for determining the best match documented
>>>>> somewhere ?
>>>>>
>>>>> I've had a look at the implementation, but this left me rather
>>>>> clueless.
>>>>>
>>>>> I thought that setuptools would prefer the .egg file over
>>>>> the prebuilt .zip file - binary files being easier to install
>>>>> than "source" files.
>>>>
>>>> After some experiments, I found that the follow change
>>>> in filename (swapping platform and python version, in addition
>>>> to use '-' instead of '.) works:
>>>>
>>>> egenix-mxodbc-connect-client-2.0.2-py2.6-win32.prebuilt.zip
>>>>
>>>> OTOH, this one doesn't (notice the difference ?):
>>>>
>>>> egenix-mxodbc-connect-client-2.0.2.py2.6-win32.prebuilt.zip
>>>>
>>>> The logic behind all this looks rather fragile to me.
>>>
>>> easy_install only guarantees sane version parsing for distribution
>>> files built using setuptools' naming algorithms.  If you use
>>> distutils, it can only make guesses, because the distutils does not
>>> have a completely unambiguous file naming scheme.  And if you are
>>> naming the files by hand, God help you.  ;-)
>>
>> The problem appears to be a bug in setuptools' package_index.py.
>>
>> The function interpret_distro_name() creates a set of possible
>> separations of the found name into project name and version.
>>
>> It does find the right separation, but for some reason, the
>> code using that function does not check the found project
>> names against the project name the user is trying to install,
>> but simply takes the last entry of the list returned by the
>> above function.
>>
>> As a result, easy_install downloads and tries to install
>> project files that don't match the project name in some
>> cases.
>>
>> Here's another example where it fails (say you're on a x64 Linux box):
>>
>> # easy_install egenix-pyopenssl
>>
>> As example, say it finds these distribution files:
>>
>>     'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs2-linux-x86_64-prebuilt.zip',
>>     'egenix_pyopenssl-0.13.1.1.0.1.5-py2.7-linux-x86_64.egg',
>>     'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs2-macosx-10.5-x86_64-prebuilt.zip',
>>     'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs4-macosx-10.5-x86_64-prebuilt.zip',
>>
>> It then creates different interpretations of those names, puts
>> them in a list and sorts them. Here's the end of that list:
>>
>> egenix-pyopenssl; 0.13.1.1.0.1.5 <<-- this would be the correct .egg file
>> egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs2-linux-x86-64-prebuilt
>> egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs2-macosx-10.5-x86-64-prebuilt
>> egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt
>> egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs2-macosx; 10.5-x86-64-prebuilt
>> egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs4-macosx; 10.5-x86-64-prebuilt
>>
>> It picks the last entry, which would be for a project called
>> "egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs4-macosx" - not the one
>> the user searched.
> 
> Actually, that's not quite true.  It's picking:
> 
> egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt
> 
> Because it thinks that
> '0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt' is a higher
> version than 0.13.1.1.0.1.5.
> 
> It does also record the possibility you mentioned, but it doesn't pick
> that one.  The project names actually *do* have to match.

Ah, ok, that makes sense then.

Is there any way to have "0.13.1.1.0.1.5-<something>" sort before
"0.13.1.1.0.1.5" ? (e.g. like is done for release candidates)

Ideally, I'd like to get this to work without any changes
to setuptools, even though it would of course be better
not to take stuff after a Python version marker into account
when looking for a package version (since the Python marker
is actually a new component in the file name).

> If you open a ticket on the setuptools tracker, 'll try to see if I
> can get it to recognize that strings like py2.7, macosx, ucs, and the
> like are terminators for a version number.  I don't know how
> successful I'll be, though.  Basically, those zip files are (I assume)
> bdist_dumb distributions being taken for source distributions, and
> easy_install doesn't actually support bdist_dumb files at the moment.

If you could point me to that tracker, I'll open a ticket :-)

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 14 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From pje at telecommunity.com  Thu Mar 14 22:03:14 2013
From: pje at telecommunity.com (PJ Eby)
Date: Thu, 14 Mar 2013 17:03:14 -0400
Subject: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource
 sorting algorithm
In-Reply-To: <514212EF.4030505@egenix.com>
References: <513F70B5.5030501@egenix.com> <513F893F.9010707@egenix.com>
	<CALeMXf5fOoVGHz9E2DV-QZFQJUwMpNkJcYQcJDtJSHJ_WRqbHA@mail.gmail.com>
	<5141A14B.9030301@egenix.com>
	<CALeMXf6kx5Zrhu5_c5jUHqtCBjUbhdXUqNPdwmXBRwg1q2XMpg@mail.gmail.com>
	<514212EF.4030505@egenix.com>
Message-ID: <CALeMXf5Dsqy+bgQGBa1T9JC6yFAAs69EAayHYRMbAJhikX-QcQ@mail.gmail.com>

On Thu, Mar 14, 2013 at 2:11 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> Is there any way to have "0.13.1.1.0.1.5-<something>" sort before
> "0.13.1.1.0.1.5" ? (e.g. like is done for release candidates)

Make it "0.13.1.1.0.1.5-dev<something>", and it'll have lower
precedence than both "0.13.1.1.0.1.5" and
"0.13.1.1.0.1.5-<something>".

> If you could point me to that tracker, I'll open a ticket :-)

http://bugs.python.org/setuptools/

From qwcode at gmail.com  Fri Mar 15 08:32:02 2013
From: qwcode at gmail.com (Marcus Smith)
Date: Fri, 15 Mar 2013 00:32:02 -0700
Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of
 release files
In-Reply-To: <20130313112158.GO9677@merlinux.eu>
References: <20130313112158.GO9677@merlinux.eu>
Message-ID: <CAPYWazpLJS2kS-FJAA+2J9Tj24DRqVj5J4h=ftMphyL_JBRwEA@mail.gmail.com>

In addition, maintainers of installation tools are asked to release
> two updates.  The first one shall provide clear warnings [...]
> The second update for installation tools should change the default
> mode to allow only installation of package files hosted at the index
> domain,


sounds good to me.


It is expected that tools in this release may choose to change the
> default index url to ``https://pypi.python.org/simple/-with-ext``<https://pypi.python.org/simple/-with-ext>in
>

so, *eventually*, the /simple interface (that has been transitioned to only
serve pypi links) could be deprecated?
(because new tools would be smart enough to responsibly navigate
 /simple/-with-ext)

but slightly ironic that we'd be left with an interface called
"simple/-with-ext", given the goal of all this, but it makes sense.

Marcus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130315/ced46c87/attachment.html>

From holger at merlinux.eu  Fri Mar 15 10:29:59 2013
From: holger at merlinux.eu (holger krekel)
Date: Fri, 15 Mar 2013 09:29:59 +0000
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
Message-ID: <20130315092959.GA9677@merlinux.eu>

Hi all, in particular Philip, Marc-Andre, Donald,

Carl and me decided to simplify the PEP and avoid the somewhat
awkward ``simple/-with-externals`` index for various reasons, among them
Marc-Andre's criticisms.  This also means present-day installation tools
(shipped with Redhat/Debian/etc.) will continue to work as today for
those packages which remain in a hosting-mode that requires crawling and
scraping.  They will still benefit from the fact that most packages will
soon have a hosting-mode that avoids it.  Future releases of installation
tools will default to not perform crawling or using (scraped) external
links, and new PYPI projects will default to only serve uploaded files.

The V4 pre-PEP also renames the three PyPI hosting modes to be more
descriptive. Since all three modes allow external links, "pypi-ext" vs
"pypi-only" were misleading. The new naming distinguishes the mode that both
scrapes links from metadata and crawls external pages for more links
("pypi-scrape-crawl") from the mode that only scrapes links from metadata
("pypi-scrape") from the mode where all links are explicit ("pypi-explicit").

Without the separate external index, it also turns out that the two transition
phases are separated into PyPI changes (phase one) and installer-tool
updates (phase two). There are no PyPI changes necessary in phase two.
As stated in a new open question, it should be possible to do 
PEP-related installation tool updates during phase 1, that may require
a bit of clarification in the PEP's language still.

Carl and me are happy with this PEP version now and hope you all are as
well.  Donald is already working on improving the analysis tool so
we hopefully have some updated numbers soon.

cheers,

Holger


PEP: XXX
Title: Transitioning to release-file hosting on PyPI
Version: $Revision$
Last-Modified: $Date$
Author: Holger Krekel <holger at merlinux.eu>, Carl Meyer <carl at oddbird.net>
Discussions-To: catalog-sig at python.org
Status: Draft (PRE-submit V4)
Type: Process
Content-Type: text/x-rst
Created: 10-Mar-2013
Post-History:


Abstract
========

This PEP proposes a backward-compatible two-phase transition process
to speed up, simplify and robustify installing from the
pypi.python.org (PyPI) package index.  To ease the transition and
minimize client-side friction, **no changes to distutils or existing
installation tools are required in order to benefit from the first
transition phase, which will result in faster, more reliable installs
for most existing packages**.

The first transition phase implements an easy and explicit means for a
package maintainer to control which release file links are served to
present-day installation tools.  The first phase also includes the
implementation of analysis tools for present-day packages, to support
communication with package maintainers and the automated setting of
default modes for controlling release file links.  The first phase
also will make new projects on PYPI use a default to only serve 
links to release files which were uploaded to PYPI.

The second transition phase concerns end-user installation tools,
which shall default to only install release files that are hosted on
PyPI and tell the user if external release files exist, offering
a choice to automatically use those external files.


Rationale
=========

.. _history:

History and motivations for external hosting
--------------------------------------------

When PyPI went online, it offered release registration but had no
facility to host release files itself.  When hosting was added, no
automated downloading tool existed yet.  When Philip Eby implemented
automated downloading (through setuptools), he made the choice to
allow people to use download hosts of their choice.  The finding of
externally-hosted packages was implemented as follows:

#. The PyPI ``simple/`` index for a package contains all links found
   by scraping them from that package's long_description metadata for 
   any release. Links in the "Download-URL" and "Home-page" metadata
   fields are given ``rel=download`` and ``rel=homepage`` attributes,
   respectively.

#. Any of these links whose target is a file whose name appears to be
   in the form of an installable source or binary distribution, with
   name in the form "packagename-version.ARCHIVEEXT", is considered a
   potential installation candidate by installation tools.

#. Similarly, any links suffixed with an "#egg=packagename-version"
   fragment are considered an installation candidate.

#. Additionally, the ``rel=homepage`` and ``rel=download`` links are
   crawled by installation tools and, if HTML, are themselves scraped
   for release-file links in the above formats.

Today, most packages released on PyPI host their release files on
PyPI, but a small percentage (XXX need updated data) rely on external
hosting.

There are many reasons [2]_ why people have chosen external
hosting. To cite just a few:

- release processes and scripts have been developed already and upload
  to external sites

- it takes too long to upload large files from some places in the
  world

- export restrictions e.g. for crypto-related software

- company policies which require offering open source packages
  through own sites

- problems with integrating uploading to PyPI into one's release
  process (because of release policies)

- desiring download statistics different from those maintained by PyPI

- perceived bad reliability of PyPI

- not aware that PyPI offers file-hosting

Irrespective of the present-day validity of these reasons, there
clearly is a history why people choose to host files externally and it
even was for some time the only way you could do things.  This PEP
takes the position that there are at least some valid reasons for
external hosting.

Problem
-------

**Today, python package installers (pip, easy_install, buildout, and
others) often need to query many non-PyPI URLs even if there are no
externally hosted files**.  Apart from querying pypi.python.org's
simple index pages, also all homepages and download pages ever
specified with any release of a package are crawled by an installer.
The need for installers to crawl external sites slows down
installation and makes for a brittle and unreliable installation
process.  Those sites and packages also don't take part in the
:pep:`381` mirroring infrastructure, further decreasing reliability
and speed of automated installation processes around the world.

Most packages are hosted directly on pypi.python.org [1]_.  Even for
these packages, installers still crawl their homepage and
download-url, if specified.  Many package uploaders are not aware that
specifying the "homepage" or "download-url" in their package metadata
will needlessly slow down the installation process for all users.

Relying on third party sites also opens up more attack vectors for
injecting malicious packages into sites using automated installs.  A
simple attack might just involve getting hold of an old now-unused
homepage domain and placing malicious packages there.  Moreover,
performing a Man-in-The-Middle (MITM) attack between an installation
site and any of the download sites can inject malicious packages on
the installation site.  As many homepages and download locations are
using HTTP and not HTTPS, such attacks are not hard to launch.  Such
MITM attacks can easily happen even for packages which never intended
to host files externally as their homepages are contacted by
installers anyway.

There is currently no way for package maintainers to avoid
external-link crawling, other than removing all homepage/download url
metadata for all historic releases.  While a script [3]_ has been
written to perform this action, it is not a good general solution
because it removes useful metadata from PyPI releases.

Even if the sites referenced by "Homepage" and "Download-URL" links were 
not scraped for further links, there is no obvious way under the current
system for a package owner to link to an installable file from a 
long_description metadata field (which is shown as package documentation
on ``/pypi/PKG``) without installation tools automatically considering
that file a candidate for installation.  Conversely, there is no way
to explicitely register multiple external release files without 
putting them in metadata fields.


Goals
-----

These are the goals to be achieved by implementation of this PEP:

* Package owners should be able to explicitly control which files are
  presented by PyPI to installer tools as installation
  candidates. Installation should not be slowed and made less reliable
  by extensive and unnecessary crawling of links that package owners
  did not explicitly nominate as installation files.

* It should remain possible for package owners to choose to host their
  release files on their own hosting, external to PyPI. It should be
  easy for a user to request the installation of such releases using
  automated installer tools.

* Automated installer tools should not install externally-hosted
  packages **by default**, but only when explicitly authorized to do
  so by the user. When tools refuse to install such a package by
  default, they should tell the user exactly which external link(s)
  they would need to follow, and what option(s) the user can provide
  to authorize the tool to follow those links. PyPI should provide all
  necessary metadata for installer tools to implement this easily
  and within a single request/reply interaction.

* Migration from the status quo to the above points should be gradual
  and minimize breakage. This includes tooling that makes it easy for
  package owners with an existing release process that uploads to
  non-PyPI hosting to also upload those release files to PyPI.  


Solution / two transition phases
================================

The first transition phase introduces a "hosting-mode" field for each
project on PyPI, allowing package owners explicit control of which
release file links are served to present-day installation tools in the
machine-readable ``simple/`` index. The first transition will, after
successful hosting-mode manipulations by individual early-adopters,
set a default hosting mode for existing packages, based on
automated analysis.  **Maintainers will be notified one month ahead of
any such automated change**.  At completion of the first transition
phase, **all present-day existing release and installation processes
and tools are expected to continue working**.  Any remaining errors or
problems are expected to only relate to installation of individual
packages and can be easily corrected by package maintainers or PyPI
admins if maintainers are not reachable.

Also in the first phase, each link served in the ``simple/`` index
will be explicitly marked as ``rel="internal"`` (hosted by the index
itself) or ``rel="external"`` (linking to an external site that is not
part of the index).

In the second transition phase, PyPI client installation tools shall
be updated to default to only install ``rel="internal"`` packages
unless a user specifies option(s) to permit installing from external
links.

Maintainers of packages which currently host release files on non-PyPI
sites shall receive instructions and tools to ease "re-hosting" of
their historic and future package release files.  This re-hosting tool
MUST be available before automated hosting-mode changes are announced
to package maintainers.


Implementation
==============

Hosting modes
-------------

The foundation of the first transition phase is the introduction of
three "modes" of PyPI hosting for a package, affecting which links are
generated for the ``simple/`` index.  These modes are implemented
without requiring changes to installation tools via changes to the
algorithm for generating the machine-readable ``simple/`` index.

The modes are:

- ``pypi-scrape-crawl``: no change from the current situation of
  generating machine-readable links for installation tools, as
  outlined in the history_.

- ``pypi-scrape``: for a package in this mode, links to be added to
  the ``simple/`` index are still scraped from package
  metadata. However, the "Home-page" and "Download-url" links are
  given ``rel=ext-homepage`` and ``rel=ext-download`` attributes
  instead of ``rel=homepage`` and ``rel=download``. The effect of this
  (with no change in installation tools necessary) is that these links
  will not be followed and scraped for further candidate links by present-day
  installation tools: only installable files directly hosted from PYPI or
  linked directly from PyPI metadata will be considered for installation.
  Installation tools MAY evolve to offer an option to use the new 
  rel-attribution to crawl external pages but MUST NOT default to it.

- ``pypi-explicit``: for a package in this mode, only links to release
  files uploaded to PyPI, and external links to release files
  explicitly nominated by the package owner (via a new interface
  exposed by PyPI) will be added to the ``simple/`` index.

Thus the hope is that eventually all projects on PyPI can be migrated
to the ``pypi-explicit`` mode, while preserving the ability to install
release files hosted externally via installer tools. Deprecation of
hosting modes to eventually only allow the ``pypi-explicit`` mode is
NOT REGULATED by this PEP but is expected to become feasible some time
after successful implementation of the transition phases described in
this PEP.  It is expected that deprecation requires **a new process to deal 
with abandoned packages** because of unreachable maintainers for still
popular packages.


First transition phase (PyPI)
-----------------------------

The proposed solution consists of multiple implementation and
communication steps:

#. Implement in PyPI the three modes described above, with an
   interface for package owners to select the mode for each package
   and register explicit external file URLs.

#. For packages in all modes, label all links in the ``simple/`` index
   with ``rel="internal"`` or ``rel="external"``, to make it easier
   for client tools to distinguish the types of links in the second
   transition phase.

#. Default all newly-registered packages to ``pypi-explicit`` mode
   (package owners can still switch to the other modes as desired).

#. Determine (via an automated analysis tool) which packages have all
   installable files available on PyPI itself (group A), which have
   all installable files linked directly from PyPI metadata (group B),
   and which have installable versions available that are linked only
   from external homepage/download HTML pages (group C).

#. Send mail to maintainers of projects in group A that their project
   will be automatically configured to ``pypi-explicit`` mode in one
   month, and similarly to maintainers of projects in group B that
   their project will be automatically configured to ``pypi-scrape``
   mode.  Inform them that this change is not expected to affect
   installability of their project at all, but will result in faster
   and safer installs for their users.  Encourage them to set this
   mode themselves sooner to benefit their users.

#. Send mail to maintainers of packages in group C that their package
   hosting mode is ``pypi-scrape-crawl``, list the URLs which
   currently are crawled, and suggest that they either re-host their
   packages directly on PyPI and switch to ``pypi-explicit``, or at
   least provide direct links to release files in PyPI metadata and
   switch to ``pypi-scrape``.  Provide instructions and tools to help
   with these transitions.


Second transition phase (installer tools)
-----------------------------------------

For the second transition phase, maintainers of installation tools are
asked to release two updates. 

The first update shall provide clear warnings if externally-hosted
release files (that is, files whose link is ``rel="external"``) are
selected for download, for which projects and URLs exactly this
happens, and warn that in future versions externally-hosted downloads
will be disabled by default.

The second update should change the default mode to allow only
installation of ``rel="internal"`` package files, and allow
installation of externally-hosted packages only when the user supplies
an option (ideally an option specifying exactly which external domains
are to be trusted as download sources). When download of an
externally-hosted package is disallowed, the user should be notified,
with instructions for how to make the install succeed and warnings
about the implication (that a file will be downloaded from a site that
is not part of the package index).


Open questions / Tasks
===========================

- Should we introduce some form of PyPI API versioning in this PEP?
  (it might complicate matters and delay the implementation but is
  often seen as good practise).

- in pypi-scrape mode: does PYPI determine itself what are installation
  candidates and avoids presenting other random links (which are currently
  served)?

- consider that installation tools may choose to release updates 
  during transition phase 1 already, to warn about crawling and scraped
  links (which are easily identifiable today and after the new rel-attribution
  after transition phase 1).


References
==========

.. [1] Donald Stufft, ratio of externally hosted versus pypi-hosted, http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html (XXX need to update this data for all easy_install-supported formats)

.. [2] Marc-Andre Lemburg, reasons for external hosting, http://mail.python.org/pipermail/catalog-sig/2013-March/005626.html

.. [3] Holger Krekel, Script to remove homepage/download metadata for all releases http://mail.python.org/pipermail/catalog-sig/2013-February/005423.html

Acknowledgments
================

Philip Eby for precise information and the basic ideas to implement
the transition via server-side changes only.

Donald Stufft for pushing away from external hosting and offering to
implement both a Pull Request for the necessary PyPI changes and the
analysis tool to drive the transition phase 1.

Marc-Andre Lemburg, Nick Coghlan and catalog-sig in general for
thinking through issues regarding getting rid of "external hosting".

Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:


From pje at telecommunity.com  Fri Mar 15 16:15:57 2013
From: pje at telecommunity.com (PJ Eby)
Date: Fri, 15 Mar 2013 11:15:57 -0400
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
	PYPI
In-Reply-To: <20130315092959.GA9677@merlinux.eu>
References: <20130315092959.GA9677@merlinux.eu>
Message-ID: <CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>

Do we even need the internal/external rel info?  I was planning to
just use the URL hostname.

i.e., are there any use cases for designating an externally-hosted
file internal, or an internally-hosted file external?  If not, it
seems the rel="" is redundant.

It's also more work to implement, vs. just defaulting --allow-hosts to
be the --index-url host; a strategy ISTM pip could also use, since it
has the same two options available.

Also, if we're not doing homepage/download crawling any more, I was
hoping we could just drop the code that 'parses' rel="" links in the
first place, as it's an awkward ugly hack.  ;-)

From donald at stufft.io  Fri Mar 15 16:22:05 2013
From: donald at stufft.io (Donald Stufft)
Date: Fri, 15 Mar 2013 11:22:05 -0400
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
	PYPI
In-Reply-To: <CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>
References: <20130315092959.GA9677@merlinux.eu>
	<CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>
Message-ID: <C2799F23-EE19-459A-A324-F98777E0CFDE@stufft.io>

On Mar 15, 2013, at 11:15 AM, PJ Eby <pje at telecommunity.com> wrote:

> Do we even need the internal/external rel info?  I was planning to
> just use the URL hostname.
> 
> i.e., are there any use cases for designating an externally-hosted
> file internal, or an internally-hosted file external?  If not, it
> seems the rel="" is redundant.
> 
> It's also more work to implement, vs. just defaulting --allow-hosts to
> be the --index-url host; a strategy ISTM pip could also use, since it
> has the same two options available.
> 
> Also, if we're not doing homepage/download crawling any more, I was
> hoping we could just drop the code that 'parses' rel="" links in the
> first place, as it's an awkward ugly hack.  ;-)
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

It makes things uglier for end users if you have packages and the simple index hosted on several sites. It also just adds extra information so if setuptools/easy_install wants to just use the host case that wouldn't be bad.

It's actually more defensible to keep the service (ala PyPI/simple index) and the user uploaded content (ala distribution files) hosted on separate domains as it makes things like gifar style attacks harder to execute. Making a move like that would break mirroring ATM on PyPI but it's good information to include on the simple index to make it simpler for tools to determine what links are internal and what are external. 

FWIW Crate has the uploaded files on an external domain for just this reason. (Also for CDN reasons but that's because a SSL CDN is $$$$).


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130315/3bc7ddb3/attachment.pgp>

From holger at merlinux.eu  Fri Mar 15 16:30:41 2013
From: holger at merlinux.eu (holger krekel)
Date: Fri, 15 Mar 2013 15:30:41 +0000
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
 PYPI
In-Reply-To: <CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>
References: <20130315092959.GA9677@merlinux.eu>
	<CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>
Message-ID: <20130315153041.GF9677@merlinux.eu>

On Fri, Mar 15, 2013 at 11:15 -0400, PJ Eby wrote:
> Do we even need the internal/external rel info?  I was planning to
> just use the URL hostname.
> 
> i.e., are there any use cases for designating an externally-hosted
> file internal, or an internally-hosted file external?  If not, it
> seems the rel="" is redundant.
> 
> It's also more work to implement, vs. just defaulting --allow-hosts to
> be the --index-url host; a strategy ISTM pip could also use, since it
> has the same two options available.
> 
> Also, if we're not doing homepage/download crawling any more, I was
> hoping we could just drop the code that 'parses' rel="" links in the
> first place, as it's an awkward ugly hack.  ;-)

We wanted to avoid requiring hostname-checking especially in light of
parallel developments putting PYPI release files on a CDN, i.e.  non
pypi.python.org domains.  The "rel=internal" communicates that this link
is under control of the index server and the installer should not be
worried and users need not know about allow-hosts etc.  For example,
Donald's https://crate.io is already operating in this manner and has
its files on crate-cdn.com.

best,
holger



From carl at oddbird.net  Fri Mar 15 17:07:59 2013
From: carl at oddbird.net (Carl Meyer)
Date: Fri, 15 Mar 2013 10:07:59 -0600
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
 PYPI
In-Reply-To: <CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>
References: <20130315092959.GA9677@merlinux.eu>
	<CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>
Message-ID: <5143475F.50708@oddbird.net>

On 03/15/2013 09:15 AM, PJ Eby wrote:
> Do we even need the internal/external rel info?  I was planning to
> just use the URL hostname.
> 
> i.e., are there any use cases for designating an externally-hosted
> file internal, or an internally-hosted file external?  If not, it
> seems the rel="" is redundant.

Right; Donald and Holger already gave the rationale for this: there are
good reasons for an index to not have "internal" links actually on the
exact same hostname. Even just using a different subdomain would break
simple host comparison.

> It's also more work to implement, vs. just defaulting --allow-hosts to
> be the --index-url host; a strategy ISTM pip could also use, since it
> has the same two options available.

Pip actually doesn't currently have --allow-hosts, although there's no
good reason for that; it ought to.

> Also, if we're not doing homepage/download crawling any more, I was
> hoping we could just drop the code that 'parses' rel="" links in the
> first place, as it's an awkward ugly hack.  ;-)

Well, parsing HTML links as an API is an ugly hack, but within that
existing framework "rel" seems like the appropriate semantic attribute
for this type of information, not really upping the hackiness quotient :-)

Carl

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130315/1d97b329/attachment.pgp>

From carl at oddbird.net  Fri Mar 15 17:10:41 2013
From: carl at oddbird.net (Carl Meyer)
Date: Fri, 15 Mar 2013 10:10:41 -0600
Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of
 release files
In-Reply-To: <CAPYWazpLJS2kS-FJAA+2J9Tj24DRqVj5J4h=ftMphyL_JBRwEA@mail.gmail.com>
References: <20130313112158.GO9677@merlinux.eu>
	<CAPYWazpLJS2kS-FJAA+2J9Tj24DRqVj5J4h=ftMphyL_JBRwEA@mail.gmail.com>
Message-ID: <51434801.3090300@oddbird.net>

Hi Marcus,

On 03/15/2013 01:32 AM, Marcus Smith wrote:
> 
> 
>     In addition, maintainers of installation tools are asked to release
>     two updates.  The first one shall provide clear warnings [...]
>     The second update for installation tools should change the default
>     mode to allow only installation of package files hosted at the index
>     domain, 
> 
> 
> sounds good to me.

Excellent, having the installer-tool maintainers on-board is obviously
important here :-)

>     It is expected that tools in this release may choose to change the
>     default index url to ``https://pypi.python.org/simple/-with-ext``
>     <https://pypi.python.org/simple/-with-ext> in
> 
> 
> so, *eventually*, the /simple interface (that has been transitioned to
> only serve pypi links) could be deprecated?
> (because new tools would be smart enough to responsibly navigate
>  /simple/-with-ext)
> 
> but slightly ironic that we'd be left with an interface called
> "simple/-with-ext", given the goal of all this, but it makes sense.

Right, it was precisely this awkwardness (the likelihood that tools
would want to default to -with-ext and use host-comparison to
distinguish internal/external, so as to provide info about external
links with a single request-response) that led us to eliminate the
separate indexes in our latest V4 draft and use rel attributes to
distinguish link types.

Carl

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130315/a757246f/attachment.pgp>

From pje at telecommunity.com  Fri Mar 15 17:51:11 2013
From: pje at telecommunity.com (PJ Eby)
Date: Fri, 15 Mar 2013 12:51:11 -0400
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
	PYPI
In-Reply-To: <5143475F.50708@oddbird.net>
References: <20130315092959.GA9677@merlinux.eu>
	<CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>
	<5143475F.50708@oddbird.net>
Message-ID: <CALeMXf4u-PX_WpY2=-AZu3207yOjg9ZG6tCKZwDQ5mTtfo=ZLQ@mail.gmail.com>

On Fri, Mar 15, 2013 at 12:07 PM, Carl Meyer <carl at oddbird.net> wrote:
> On 03/15/2013 09:15 AM, PJ Eby wrote:
>> Do we even need the internal/external rel info?  I was planning to
>> just use the URL hostname.
>>
>> i.e., are there any use cases for designating an externally-hosted
>> file internal, or an internally-hosted file external?  If not, it
>> seems the rel="" is redundant.
>
> Right; Donald and Holger already gave the rationale for this: there are
> good reasons for an index to not have "internal" links actually on the
> exact same hostname. Even just using a different subdomain would break
> simple host comparison.
>
>> It's also more work to implement, vs. just defaulting --allow-hosts to
>> be the --index-url host; a strategy ISTM pip could also use, since it
>> has the same two options available.
>
> Pip actually doesn't currently have --allow-hosts, although there's no
> good reason for that; it ought to.
>
>> Also, if we're not doing homepage/download crawling any more, I was
>> hoping we could just drop the code that 'parses' rel="" links in the
>> first place, as it's an awkward ugly hack.  ;-)
>
> Well, parsing HTML links as an API is an ugly hack, but within that
> existing framework "rel" seems like the appropriate semantic attribute
> for this type of information, not really upping the hackiness quotient :-)

Well, to be clear, I liked previous versions of the proposal better
than this one.  But while I *really* don't want to do any new rel
parsing, that's not the only or even the most important reason.

The main reason is that I think internal vs. external is a bogus
distinction: what's important (IMO) is what hosts you do and don't
trust.  Giving a blanket pass to all external links doesn't seem like
such a good idea to me, nor does allowing the index to define what
hosts the client should trust.   As for the internal ones, I'm not
sure why we can't at least make a subdomain requirement, or have users
explicitly add a PyPI CDN to their configured --allow-hosts.

To try to put it another way: there should be one, and preferably only
one, obvious way to specify where you get downloads from.  That way in
easy_install is currently --allow-hosts.  Adding new options that
interact and overlap with that looks like bad UI design to me,
increasing the possibility of user confusion.

From donald at stufft.io  Fri Mar 15 18:00:11 2013
From: donald at stufft.io (Donald Stufft)
Date: Fri, 15 Mar 2013 13:00:11 -0400
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
	PYPI
In-Reply-To: <CALeMXf4u-PX_WpY2=-AZu3207yOjg9ZG6tCKZwDQ5mTtfo=ZLQ@mail.gmail.com>
References: <20130315092959.GA9677@merlinux.eu>
	<CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>
	<5143475F.50708@oddbird.net>
	<CALeMXf4u-PX_WpY2=-AZu3207yOjg9ZG6tCKZwDQ5mTtfo=ZLQ@mail.gmail.com>
Message-ID: <82AFA590-D17E-443C-A57F-6B4AB466DEB0@stufft.io>


On Mar 15, 2013, at 12:51 PM, PJ Eby <pje at telecommunity.com> wrote:

> On Fri, Mar 15, 2013 at 12:07 PM, Carl Meyer <carl at oddbird.net> wrote:
>> On 03/15/2013 09:15 AM, PJ Eby wrote:
>>> Do we even need the internal/external rel info?  I was planning to
>>> just use the URL hostname.
>>> 
>>> i.e., are there any use cases for designating an externally-hosted
>>> file internal, or an internally-hosted file external?  If not, it
>>> seems the rel="" is redundant.
>> 
>> Right; Donald and Holger already gave the rationale for this: there are
>> good reasons for an index to not have "internal" links actually on the
>> exact same hostname. Even just using a different subdomain would break
>> simple host comparison.
>> 
>>> It's also more work to implement, vs. just defaulting --allow-hosts to
>>> be the --index-url host; a strategy ISTM pip could also use, since it
>>> has the same two options available.
>> 
>> Pip actually doesn't currently have --allow-hosts, although there's no
>> good reason for that; it ought to.
>> 
>>> Also, if we're not doing homepage/download crawling any more, I was
>>> hoping we could just drop the code that 'parses' rel="" links in the
>>> first place, as it's an awkward ugly hack.  ;-)
>> 
>> Well, parsing HTML links as an API is an ugly hack, but within that
>> existing framework "rel" seems like the appropriate semantic attribute
>> for this type of information, not really upping the hackiness quotient :-)
> 
> Well, to be clear, I liked previous versions of the proposal better
> than this one.  But while I *really* don't want to do any new rel
> parsing, that's not the only or even the most important reason.
> 
> The main reason is that I think internal vs. external is a bogus
> distinction: what's important (IMO) is what hosts you do and don't
> trust.  Giving a blanket pass to all external links doesn't seem like
> such a good idea to me, nor does allowing the index to define what
> hosts the client should trust.   As for the internal ones, I'm not
> sure why we can't at least make a subdomain requirement, or have users
> explicitly add a PyPI CDN to their configured --allow-hosts.
> 
> To try to put it another way: there should be one, and preferably only
> one, obvious way to specify where you get downloads from.  That way in
> easy_install is currently --allow-hosts.  Adding new options that
> interact and overlap with that looks like bad UI design to me,
> increasing the possibility of user confusion.
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


You can do that fwiw. That's fine. You can optionally just use the internal links as a indicator about which hosts should automatically be added to the a--allow-hosts for a particular index.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130315/ce9f77cc/attachment.pgp>

From carl at oddbird.net  Fri Mar 15 18:39:58 2013
From: carl at oddbird.net (Carl Meyer)
Date: Fri, 15 Mar 2013 11:39:58 -0600
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
 PYPI
In-Reply-To: <CALeMXf4u-PX_WpY2=-AZu3207yOjg9ZG6tCKZwDQ5mTtfo=ZLQ@mail.gmail.com>
References: <20130315092959.GA9677@merlinux.eu>
	<CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>
	<5143475F.50708@oddbird.net>
	<CALeMXf4u-PX_WpY2=-AZu3207yOjg9ZG6tCKZwDQ5mTtfo=ZLQ@mail.gmail.com>
Message-ID: <51435CEE.3020505@oddbird.net>

On 03/15/2013 10:51 AM, PJ Eby wrote:
> Giving a blanket pass to all external links doesn't seem like
> such a good idea to me, 

This is a very good point, and it should be made clearer in the PEP that
we don't recommend a single blanket option to allow all external links,
but an option (like allow-hosts) that lets you specify with more
granularity which external links to use. I think perhaps rel="external"
confuses this point; the real purpose of the rel tags is just so that
rel="internal" can be considered "part of the index."

FWIW I think it would be just as reasonable UI for a hypothetical tool
to let you say "I want to trust external links for the Foo project"
rather than "I want to trust external links to djangoproject.com" and
avoid host-comparison altogether. IOW, I don't think "hostname" is
inherently a better or safer indicator of trust than "project name";
hosts can change ownership at least as easily and silently as PyPI
projects! So I don't think the PEP should require all installer tools to
choose trust-by-hostname (which would be implied by removing the rel tags).

> nor does allowing the index to define what
> hosts the client should trust.   

I'm not sure about this. By using an index at all, you are trusting that
index to provide whatever level of
reliability/stability/security/whatever you expect from it. Allowing the
index itself to specify that it keeps its files on a different host in a
way that is transparent to the user seems like a natural extension of
this trust that doesn't harm anything and aids usability greatly. (Cases
where the index is lying to you definitely fall outside the scope of
what this PEP is aiming to help with.)

As for the internal ones, I'm not
> sure why we can't at least make a subdomain requirement, or have users
> explicitly add a PyPI CDN to their configured --allow-hosts.

Even a subdomain requirement can make a CDN more difficult/expensive to
implement. And once you go beyond simple host-equality comparisons and
into subdomain-equivalence I'm wary of the added implementation
complexity we're asking of every installer tool, and the potential for
subtle differences in implementation. This seems to me like a worse can
of worms than rel-parsing.

> To try to put it another way: there should be one, and preferably only
> one, obvious way to specify where you get downloads from.  That way in
> easy_install is currently --allow-hosts.  Adding new options that
> interact and overlap with that looks like bad UI design to me,
> increasing the possibility of user confusion.

Like Donald says, I don't see any problem with you choosing to keep
allow-hosts as the only user-facing option for easy_install. It would be
up to you whether you also want to use rel="internal" as a hint for
implicitly (perhaps with warning) adding to --allow-hosts, to allow
better compatibility with indexes that use a different host for
file-hosting (it's possible that even PyPI itself may move into this
category, I haven't been following the CDN discussions carefully).

PyPI wouldn't be enforcing a UI on you here, just providing metadata
that you can use as you wish. I do think the internal/external
distinction is meaningful and unambiguous metadata that the index is
able to provide, and there's no reason for the index to withhold it.
(That distinction is not new in this version of the PEP, either, it's
just made via rel tags now instead of via a separate index.)

Carl

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130315/efa2228e/attachment.pgp>

From mal at python.org  Fri Mar 15 16:47:34 2013
From: mal at python.org (M.-A. Lemburg)
Date: Fri, 15 Mar 2013 16:47:34 +0100
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
 PYPI
In-Reply-To: <20130315092959.GA9677@merlinux.eu>
References: <20130315092959.GA9677@merlinux.eu>
Message-ID: <51434296.9030503@python.org>

Thanks, Holger. This version looks a lot better :-)

There are still some minor quirks which would need to be
addressed more explicitly, but overall, this proposal provides
a good way forward.

Perhaps it would also be possible to add the secured download
links and the caching/proxying ideas to the PEP at some point,
or we turn those into a new PEP.

I can't follow up in detail today, but will have a closer look
next week.

On 15.03.2013 10:29, holger krekel wrote:
> Hi all, in particular Philip, Marc-Andre, Donald,
> 
> Carl and me decided to simplify the PEP and avoid the somewhat
> awkward ``simple/-with-externals`` index for various reasons, among them
> Marc-Andre's criticisms.  This also means present-day installation tools
> (shipped with Redhat/Debian/etc.) will continue to work as today for
> those packages which remain in a hosting-mode that requires crawling and
> scraping.  They will still benefit from the fact that most packages will
> soon have a hosting-mode that avoids it.  Future releases of installation
> tools will default to not perform crawling or using (scraped) external
> links, and new PYPI projects will default to only serve uploaded files.
> 
> The V4 pre-PEP also renames the three PyPI hosting modes to be more
> descriptive. Since all three modes allow external links, "pypi-ext" vs
> "pypi-only" were misleading. The new naming distinguishes the mode that both
> scrapes links from metadata and crawls external pages for more links
> ("pypi-scrape-crawl") from the mode that only scrapes links from metadata
> ("pypi-scrape") from the mode where all links are explicit ("pypi-explicit").
> 
> Without the separate external index, it also turns out that the two transition
> phases are separated into PyPI changes (phase one) and installer-tool
> updates (phase two). There are no PyPI changes necessary in phase two.
> As stated in a new open question, it should be possible to do 
> PEP-related installation tool updates during phase 1, that may require
> a bit of clarification in the PEP's language still.
> 
> Carl and me are happy with this PEP version now and hope you all are as
> well.  Donald is already working on improving the analysis tool so
> we hopefully have some updated numbers soon.
> 
> cheers,
> 
> Holger
> 
> 
> PEP: XXX
> Title: Transitioning to release-file hosting on PyPI
> Version: $Revision$
> Last-Modified: $Date$
> Author: Holger Krekel <holger at merlinux.eu>, Carl Meyer <carl at oddbird.net>
> Discussions-To: catalog-sig at python.org
> Status: Draft (PRE-submit V4)
> Type: Process
> Content-Type: text/x-rst
> Created: 10-Mar-2013
> Post-History:
> 
> 
> Abstract
> ========
> 
> This PEP proposes a backward-compatible two-phase transition process
> to speed up, simplify and robustify installing from the
> pypi.python.org (PyPI) package index.  To ease the transition and
> minimize client-side friction, **no changes to distutils or existing
> installation tools are required in order to benefit from the first
> transition phase, which will result in faster, more reliable installs
> for most existing packages**.
> 
> The first transition phase implements an easy and explicit means for a
> package maintainer to control which release file links are served to
> present-day installation tools.  The first phase also includes the
> implementation of analysis tools for present-day packages, to support
> communication with package maintainers and the automated setting of
> default modes for controlling release file links.  The first phase
> also will make new projects on PYPI use a default to only serve 
> links to release files which were uploaded to PYPI.
> 
> The second transition phase concerns end-user installation tools,
> which shall default to only install release files that are hosted on
> PyPI and tell the user if external release files exist, offering
> a choice to automatically use those external files.
> 
> 
> Rationale
> =========
> 
> .. _history:
> 
> History and motivations for external hosting
> --------------------------------------------
> 
> When PyPI went online, it offered release registration but had no
> facility to host release files itself.  When hosting was added, no
> automated downloading tool existed yet.  When Philip Eby implemented
> automated downloading (through setuptools), he made the choice to
> allow people to use download hosts of their choice.  The finding of
> externally-hosted packages was implemented as follows:
> 
> #. The PyPI ``simple/`` index for a package contains all links found
>    by scraping them from that package's long_description metadata for 
>    any release. Links in the "Download-URL" and "Home-page" metadata
>    fields are given ``rel=download`` and ``rel=homepage`` attributes,
>    respectively.
> 
> #. Any of these links whose target is a file whose name appears to be
>    in the form of an installable source or binary distribution, with
>    name in the form "packagename-version.ARCHIVEEXT", is considered a
>    potential installation candidate by installation tools.
> 
> #. Similarly, any links suffixed with an "#egg=packagename-version"
>    fragment are considered an installation candidate.
> 
> #. Additionally, the ``rel=homepage`` and ``rel=download`` links are
>    crawled by installation tools and, if HTML, are themselves scraped
>    for release-file links in the above formats.
> 
> Today, most packages released on PyPI host their release files on
> PyPI, but a small percentage (XXX need updated data) rely on external
> hosting.
> 
> There are many reasons [2]_ why people have chosen external
> hosting. To cite just a few:
> 
> - release processes and scripts have been developed already and upload
>   to external sites
> 
> - it takes too long to upload large files from some places in the
>   world
> 
> - export restrictions e.g. for crypto-related software
> 
> - company policies which require offering open source packages
>   through own sites
> 
> - problems with integrating uploading to PyPI into one's release
>   process (because of release policies)
> 
> - desiring download statistics different from those maintained by PyPI
> 
> - perceived bad reliability of PyPI
> 
> - not aware that PyPI offers file-hosting
> 
> Irrespective of the present-day validity of these reasons, there
> clearly is a history why people choose to host files externally and it
> even was for some time the only way you could do things.  This PEP
> takes the position that there are at least some valid reasons for
> external hosting.
> 
> Problem
> -------
> 
> **Today, python package installers (pip, easy_install, buildout, and
> others) often need to query many non-PyPI URLs even if there are no
> externally hosted files**.  Apart from querying pypi.python.org's
> simple index pages, also all homepages and download pages ever
> specified with any release of a package are crawled by an installer.
> The need for installers to crawl external sites slows down
> installation and makes for a brittle and unreliable installation
> process.  Those sites and packages also don't take part in the
> :pep:`381` mirroring infrastructure, further decreasing reliability
> and speed of automated installation processes around the world.
> 
> Most packages are hosted directly on pypi.python.org [1]_.  Even for
> these packages, installers still crawl their homepage and
> download-url, if specified.  Many package uploaders are not aware that
> specifying the "homepage" or "download-url" in their package metadata
> will needlessly slow down the installation process for all users.
> 
> Relying on third party sites also opens up more attack vectors for
> injecting malicious packages into sites using automated installs.  A
> simple attack might just involve getting hold of an old now-unused
> homepage domain and placing malicious packages there.  Moreover,
> performing a Man-in-The-Middle (MITM) attack between an installation
> site and any of the download sites can inject malicious packages on
> the installation site.  As many homepages and download locations are
> using HTTP and not HTTPS, such attacks are not hard to launch.  Such
> MITM attacks can easily happen even for packages which never intended
> to host files externally as their homepages are contacted by
> installers anyway.
> 
> There is currently no way for package maintainers to avoid
> external-link crawling, other than removing all homepage/download url
> metadata for all historic releases.  While a script [3]_ has been
> written to perform this action, it is not a good general solution
> because it removes useful metadata from PyPI releases.
> 
> Even if the sites referenced by "Homepage" and "Download-URL" links were 
> not scraped for further links, there is no obvious way under the current
> system for a package owner to link to an installable file from a 
> long_description metadata field (which is shown as package documentation
> on ``/pypi/PKG``) without installation tools automatically considering
> that file a candidate for installation.  Conversely, there is no way
> to explicitely register multiple external release files without 
> putting them in metadata fields.
> 
> 
> Goals
> -----
> 
> These are the goals to be achieved by implementation of this PEP:
> 
> * Package owners should be able to explicitly control which files are
>   presented by PyPI to installer tools as installation
>   candidates. Installation should not be slowed and made less reliable
>   by extensive and unnecessary crawling of links that package owners
>   did not explicitly nominate as installation files.
> 
> * It should remain possible for package owners to choose to host their
>   release files on their own hosting, external to PyPI. It should be
>   easy for a user to request the installation of such releases using
>   automated installer tools.
> 
> * Automated installer tools should not install externally-hosted
>   packages **by default**, but only when explicitly authorized to do
>   so by the user. When tools refuse to install such a package by
>   default, they should tell the user exactly which external link(s)
>   they would need to follow, and what option(s) the user can provide
>   to authorize the tool to follow those links. PyPI should provide all
>   necessary metadata for installer tools to implement this easily
>   and within a single request/reply interaction.
> 
> * Migration from the status quo to the above points should be gradual
>   and minimize breakage. This includes tooling that makes it easy for
>   package owners with an existing release process that uploads to
>   non-PyPI hosting to also upload those release files to PyPI.  
> 
> 
> Solution / two transition phases
> ================================
> 
> The first transition phase introduces a "hosting-mode" field for each
> project on PyPI, allowing package owners explicit control of which
> release file links are served to present-day installation tools in the
> machine-readable ``simple/`` index. The first transition will, after
> successful hosting-mode manipulations by individual early-adopters,
> set a default hosting mode for existing packages, based on
> automated analysis.  **Maintainers will be notified one month ahead of
> any such automated change**.  At completion of the first transition
> phase, **all present-day existing release and installation processes
> and tools are expected to continue working**.  Any remaining errors or
> problems are expected to only relate to installation of individual
> packages and can be easily corrected by package maintainers or PyPI
> admins if maintainers are not reachable.
> 
> Also in the first phase, each link served in the ``simple/`` index
> will be explicitly marked as ``rel="internal"`` (hosted by the index
> itself) or ``rel="external"`` (linking to an external site that is not
> part of the index).
> 
> In the second transition phase, PyPI client installation tools shall
> be updated to default to only install ``rel="internal"`` packages
> unless a user specifies option(s) to permit installing from external
> links.
> 
> Maintainers of packages which currently host release files on non-PyPI
> sites shall receive instructions and tools to ease "re-hosting" of
> their historic and future package release files.  This re-hosting tool
> MUST be available before automated hosting-mode changes are announced
> to package maintainers.
> 
> 
> Implementation
> ==============
> 
> Hosting modes
> -------------
> 
> The foundation of the first transition phase is the introduction of
> three "modes" of PyPI hosting for a package, affecting which links are
> generated for the ``simple/`` index.  These modes are implemented
> without requiring changes to installation tools via changes to the
> algorithm for generating the machine-readable ``simple/`` index.
> 
> The modes are:
> 
> - ``pypi-scrape-crawl``: no change from the current situation of
>   generating machine-readable links for installation tools, as
>   outlined in the history_.
> 
> - ``pypi-scrape``: for a package in this mode, links to be added to
>   the ``simple/`` index are still scraped from package
>   metadata. However, the "Home-page" and "Download-url" links are
>   given ``rel=ext-homepage`` and ``rel=ext-download`` attributes
>   instead of ``rel=homepage`` and ``rel=download``. The effect of this
>   (with no change in installation tools necessary) is that these links
>   will not be followed and scraped for further candidate links by present-day
>   installation tools: only installable files directly hosted from PYPI or
>   linked directly from PyPI metadata will be considered for installation.
>   Installation tools MAY evolve to offer an option to use the new 
>   rel-attribution to crawl external pages but MUST NOT default to it.
> 
> - ``pypi-explicit``: for a package in this mode, only links to release
>   files uploaded to PyPI, and external links to release files
>   explicitly nominated by the package owner (via a new interface
>   exposed by PyPI) will be added to the ``simple/`` index.
> 
> Thus the hope is that eventually all projects on PyPI can be migrated
> to the ``pypi-explicit`` mode, while preserving the ability to install
> release files hosted externally via installer tools. Deprecation of
> hosting modes to eventually only allow the ``pypi-explicit`` mode is
> NOT REGULATED by this PEP but is expected to become feasible some time
> after successful implementation of the transition phases described in
> this PEP.  It is expected that deprecation requires **a new process to deal 
> with abandoned packages** because of unreachable maintainers for still
> popular packages.
> 
> 
> First transition phase (PyPI)
> -----------------------------
> 
> The proposed solution consists of multiple implementation and
> communication steps:
> 
> #. Implement in PyPI the three modes described above, with an
>    interface for package owners to select the mode for each package
>    and register explicit external file URLs.
> 
> #. For packages in all modes, label all links in the ``simple/`` index
>    with ``rel="internal"`` or ``rel="external"``, to make it easier
>    for client tools to distinguish the types of links in the second
>    transition phase.
> 
> #. Default all newly-registered packages to ``pypi-explicit`` mode
>    (package owners can still switch to the other modes as desired).
> 
> #. Determine (via an automated analysis tool) which packages have all
>    installable files available on PyPI itself (group A), which have
>    all installable files linked directly from PyPI metadata (group B),
>    and which have installable versions available that are linked only
>    from external homepage/download HTML pages (group C).
> 
> #. Send mail to maintainers of projects in group A that their project
>    will be automatically configured to ``pypi-explicit`` mode in one
>    month, and similarly to maintainers of projects in group B that
>    their project will be automatically configured to ``pypi-scrape``
>    mode.  Inform them that this change is not expected to affect
>    installability of their project at all, but will result in faster
>    and safer installs for their users.  Encourage them to set this
>    mode themselves sooner to benefit their users.
> 
> #. Send mail to maintainers of packages in group C that their package
>    hosting mode is ``pypi-scrape-crawl``, list the URLs which
>    currently are crawled, and suggest that they either re-host their
>    packages directly on PyPI and switch to ``pypi-explicit``, or at
>    least provide direct links to release files in PyPI metadata and
>    switch to ``pypi-scrape``.  Provide instructions and tools to help
>    with these transitions.
> 
> 
> Second transition phase (installer tools)
> -----------------------------------------
> 
> For the second transition phase, maintainers of installation tools are
> asked to release two updates. 
> 
> The first update shall provide clear warnings if externally-hosted
> release files (that is, files whose link is ``rel="external"``) are
> selected for download, for which projects and URLs exactly this
> happens, and warn that in future versions externally-hosted downloads
> will be disabled by default.
> 
> The second update should change the default mode to allow only
> installation of ``rel="internal"`` package files, and allow
> installation of externally-hosted packages only when the user supplies
> an option (ideally an option specifying exactly which external domains
> are to be trusted as download sources). When download of an
> externally-hosted package is disallowed, the user should be notified,
> with instructions for how to make the install succeed and warnings
> about the implication (that a file will be downloaded from a site that
> is not part of the package index).
> 
> 
> Open questions / Tasks
> ===========================
> 
> - Should we introduce some form of PyPI API versioning in this PEP?
>   (it might complicate matters and delay the implementation but is
>   often seen as good practise).
> 
> - in pypi-scrape mode: does PYPI determine itself what are installation
>   candidates and avoids presenting other random links (which are currently
>   served)?
> 
> - consider that installation tools may choose to release updates 
>   during transition phase 1 already, to warn about crawling and scraped
>   links (which are easily identifiable today and after the new rel-attribution
>   after transition phase 1).
> 
> 
> References
> ==========
> 
> .. [1] Donald Stufft, ratio of externally hosted versus pypi-hosted, http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html (XXX need to update this data for all easy_install-supported formats)
> 
> .. [2] Marc-Andre Lemburg, reasons for external hosting, http://mail.python.org/pipermail/catalog-sig/2013-March/005626.html
> 
> .. [3] Holger Krekel, Script to remove homepage/download metadata for all releases http://mail.python.org/pipermail/catalog-sig/2013-February/005423.html
> 
> Acknowledgments
> ================
> 
> Philip Eby for precise information and the basic ideas to implement
> the transition via server-side changes only.
> 
> Donald Stufft for pushing away from external hosting and offering to
> implement both a Pull Request for the necessary PyPI changes and the
> analysis tool to drive the transition phase 1.
> 
> Marc-Andre Lemburg, Nick Coghlan and catalog-sig in general for
> thinking through issues regarding getting rid of "external hosting".
> 
> Copyright
> =========
> 
> This document has been placed in the public domain.
> 
> 
> 
> ..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    coding: utf-8
>    End:
> 
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
> 

-- 
Marc-Andre Lemburg
PSF Vice Chairman

From pje at telecommunity.com  Fri Mar 15 19:59:56 2013
From: pje at telecommunity.com (PJ Eby)
Date: Fri, 15 Mar 2013 14:59:56 -0400
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
	PYPI
In-Reply-To: <51435CEE.3020505@oddbird.net>
References: <20130315092959.GA9677@merlinux.eu>
	<CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>
	<5143475F.50708@oddbird.net>
	<CALeMXf4u-PX_WpY2=-AZu3207yOjg9ZG6tCKZwDQ5mTtfo=ZLQ@mail.gmail.com>
	<51435CEE.3020505@oddbird.net>
Message-ID: <CALeMXf718pJ0+6uZ03T+FwwRBbjnpKgN=sDpMP7ZFFm4NKGh5w@mail.gmail.com>

On Fri, Mar 15, 2013 at 1:39 PM, Carl Meyer <carl at oddbird.net> wrote:
> up to you whether you also want to use rel="internal" as a hint for
> implicitly (perhaps with warning) adding to --allow-hosts,

That's the bit I don't like.  The security model is that if it's not
allowed by allowed-hosts, it's *not allowed*.  Introducing a way to
sneak something past allow-hosts is a bad idea, because it means
people either have to explicitly widen their allow-hosts to arbitrary
hosts, or else that you can't actually enforce an allowed-hosts
policy, or that you need to learn a whole bunch of options to
implement it.

ISTM that this is a bad design choice for users, and I'm not
comfortable with this without some way to define the allowed
"internal" hosts based in some way on the base index URL.  Not just
for ease of automated translation, but so that *users* can know who
they're dealing with, and easily predict the effects of their chosen
options.

A frequent refrain has been, "users don't know they're downloading
stuff from places other than PyPI", so if this new approach allows
downloads from somewhere other than *.pypi.python.org when you've
chosen pypi.python.org as your index, ISTM the proposal is failing to
meet its original goals.  As the PEP is written, PyPI could change out
to a different CDN each week or use different ones for different
files, and users would be back in the position of not being sure where
stuff is coming from.

I'm fine with extending the default host matching to
"indexhost,*.indexhost" if we want to leave more of an option for PyPI
and other indexes to use a CDN.  But I'm not sure how much point to it
there is, since a /simple index is static, and small in size compared
to the downloads, so you might as well host a copy of the /simple
index alongside the downloads, and make the index pypicdn.com/simple
or whatever in the first place.  (In other words, not a lot of benefit
to splitting a static index from its associated files, so why support
it?)


> PyPI wouldn't be enforcing a UI on you here, just providing metadata
> that you can use as you wish.

That's not what the PEP says.  It does in fact *mandate* the use of
the rel attributes.  So if somebody adds an "external link" that
actually points back to PyPI, technically I'm not supposed to use it
unless it's been explicitly authorized.  ;-)

I'd really prefer to see explicit language that says the rel
information is advisory only and that installers aren't required to
parse it, let alone use it.  At the moment, the PEP is a substantial
departure from the version I agreed with.

(If there were to be any meaningful distinction in the links
themselves, I would think it'd more be whether, e.g. hash information
is available for the download.  That's a potentially relevant
distinction right now, in that PyPI automatically provides #md5 info.
Even so, I'm not sure that's enough of a distinction for anyone to
care about.)

From mal at egenix.com  Fri Mar 15 22:24:36 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 15 Mar 2013 22:24:36 +0100
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
 PYPI
In-Reply-To: <51434296.9030503@python.org>
References: <20130315092959.GA9677@merlinux.eu> <51434296.9030503@python.org>
Message-ID: <51439194.2070207@egenix.com>

A little off-topic, but I thought you might enjoy this in the
context of all the crypto, hash and signing debate:

http://xkcd.com/1181/

Cheers,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 15 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From carl at oddbird.net  Sat Mar 16 00:16:19 2013
From: carl at oddbird.net (Carl Meyer)
Date: Fri, 15 Mar 2013 17:16:19 -0600
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
 PYPI
In-Reply-To: <CALeMXf718pJ0+6uZ03T+FwwRBbjnpKgN=sDpMP7ZFFm4NKGh5w@mail.gmail.com>
References: <20130315092959.GA9677@merlinux.eu>
	<CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>
	<5143475F.50708@oddbird.net>
	<CALeMXf4u-PX_WpY2=-AZu3207yOjg9ZG6tCKZwDQ5mTtfo=ZLQ@mail.gmail.com>
	<51435CEE.3020505@oddbird.net>
	<CALeMXf718pJ0+6uZ03T+FwwRBbjnpKgN=sDpMP7ZFFm4NKGh5w@mail.gmail.com>
Message-ID: <5143ABC3.8030603@oddbird.net>

tl;dr: I see your points, we'll change the PEP to allow clients to use
hostnames instead of the rel attributes if they prefer. More comments below:

On 03/15/2013 12:59 PM, PJ Eby wrote:
> That's the bit I don't like.  The security model is that if it's not
> allowed by allowed-hosts, it's *not allowed*.  Introducing a way to
> sneak something past allow-hosts is a bad idea, because it means
> people either have to explicitly widen their allow-hosts to arbitrary
> hosts, or else that you can't actually enforce an allowed-hosts
> policy, or that you need to learn a whole bunch of options to
> implement it.
> 
> ISTM that this is a bad design choice for users, and I'm not
> comfortable with this without some way to define the allowed
> "internal" hosts based in some way on the base index URL.  Not just
> for ease of automated translation, but so that *users* can know who
> they're dealing with, and easily predict the effects of their chosen
> options.
> 
> A frequent refrain has been, "users don't know they're downloading
> stuff from places other than PyPI", so if this new approach allows
> downloads from somewhere other than *.pypi.python.org when you've
> chosen pypi.python.org as your index, ISTM the proposal is failing to
> meet its original goals.  As the PEP is written, PyPI could change out
> to a different CDN each week or use different ones for different
> files, and users would be back in the position of not being sure where
> stuff is coming from.

I guess the key question is the definition of "places other than PyPI."
I think a CDN that is part of the index's architecture is just as much
"part of PyPI" whether it's on the same domain or not. But I understand
the difficulty integrating this with the --allow-hosts option in a way
that maintains a clear and simple UI.

> I'm fine with extending the default host matching to
> "indexhost,*.indexhost" if we want to leave more of an option for PyPI
> and other indexes to use a CDN.  But I'm not sure how much point to it
> there is, since a /simple index is static, and small in size compared
> to the downloads, so you might as well host a copy of the /simple
> index alongside the downloads, and make the index pypicdn.com/simple
> or whatever in the first place.  (In other words, not a lot of benefit
> to splitting a static index from its associated files, so why support
> it?)

Putting the /simple/ API on a CDN isn't quite that easy because it
currently involves some server-side redirects to effectively make
project names case-insensitive. I think in a hypothetical
re-architecture of PyPI there may be good security reasons to put
user-uploaded files on a different domain from dynamic portions of the
API (Donald alluded to this, more discussion at
http://security.stackexchange.com/questions/11756/is-it-safe-to-serve-any-user-uploaded-file-under-only-white-listed-mime-content).

So I think this issue may come up again in the future. But I'm fine with
deferring it in this PEP for now...

>> PyPI wouldn't be enforcing a UI on you here, just providing metadata
>> that you can use as you wish.
> 
> That's not what the PEP says.  It does in fact *mandate* the use of
> the rel attributes.  So if somebody adds an "external link" that
> actually points back to PyPI, technically I'm not supposed to use it
> unless it's been explicitly authorized.  ;-)
> 
> I'd really prefer to see explicit language that says the rel
> information is advisory only and that installers aren't required to
> parse it, let alone use it.  At the moment, the PEP is a substantial
> departure from the version I agreed with.

Ok, pending agreement from Holger I'll make a change in the PEP to
explicitly allow clients to make decisions based on either the rel
attributes or based on hostnames. Would that be sufficient to address
your concerns?

Carl

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130315/17d9fbc3/attachment.pgp>

From pje at telecommunity.com  Sat Mar 16 03:01:57 2013
From: pje at telecommunity.com (PJ Eby)
Date: Fri, 15 Mar 2013 22:01:57 -0400
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
	PYPI
In-Reply-To: <5143ABC3.8030603@oddbird.net>
References: <20130315092959.GA9677@merlinux.eu>
	<CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>
	<5143475F.50708@oddbird.net>
	<CALeMXf4u-PX_WpY2=-AZu3207yOjg9ZG6tCKZwDQ5mTtfo=ZLQ@mail.gmail.com>
	<51435CEE.3020505@oddbird.net>
	<CALeMXf718pJ0+6uZ03T+FwwRBbjnpKgN=sDpMP7ZFFm4NKGh5w@mail.gmail.com>
	<5143ABC3.8030603@oddbird.net>
Message-ID: <CALeMXf7ZeTENoEB+suubDBVEPRppEeko6FhvdZY_wG+4+CP91w@mail.gmail.com>

On Fri, Mar 15, 2013 at 7:16 PM, Carl Meyer <carl at oddbird.net> wrote:
> Ok, pending agreement from Holger I'll make a change in the PEP to
> explicitly allow clients to make decisions based on either the rel
> attributes or based on hostnames. Would that be sufficient to address
> your concerns?

Yes.  I just don't want to be in a situation down the road where
there's another argument about this on Catalog-SIG when PyPI starts
using a CDN that, "but it says this in the rel and you're supposed to
use that", and I say, "but Carl and Holger said..."  and they go,
"doesn't matter, PEP says"   ;-)

This way, the PEP will be clear that supporting a split of PyPI's
hostnames isn't in current scope.

I am also okay with the PEP allowing *.indexhost instead of just
indexhost as the filtering mechanism, as long as it specifies one
*now*.  (Again, so this doesn't have to be revisited later.)  If
somebody who knows something about CDNs, TUF, etc., needs to weigh in
on it first, that's fine.  I just want to know where things stand.


> Putting the /simple/ API on a CDN isn't quite that easy because it
> currently involves some server-side redirects to effectively make
> project names case-insensitive.

FWIW, easy_install works fine without this.  If a matching index page
isn't found, it checks the full package list.  PyPI's redirection just
reduces bandwidth usage and request overhead in the case where the
case of the user's request doesn't match the actual package listing.
But it could be completely static without affecting easy_install and
tools that use its package-finding code.

From holger at merlinux.eu  Sat Mar 16 06:30:18 2013
From: holger at merlinux.eu (holger krekel)
Date: Sat, 16 Mar 2013 05:30:18 +0000
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
 PYPI
In-Reply-To: <CALeMXf7ZeTENoEB+suubDBVEPRppEeko6FhvdZY_wG+4+CP91w@mail.gmail.com>
References: <20130315092959.GA9677@merlinux.eu>
	<CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>
	<5143475F.50708@oddbird.net>
	<CALeMXf4u-PX_WpY2=-AZu3207yOjg9ZG6tCKZwDQ5mTtfo=ZLQ@mail.gmail.com>
	<51435CEE.3020505@oddbird.net>
	<CALeMXf718pJ0+6uZ03T+FwwRBbjnpKgN=sDpMP7ZFFm4NKGh5w@mail.gmail.com>
	<5143ABC3.8030603@oddbird.net>
	<CALeMXf7ZeTENoEB+suubDBVEPRppEeko6FhvdZY_wG+4+CP91w@mail.gmail.com>
Message-ID: <20130316053018.GL9677@merlinux.eu>

On Fri, Mar 15, 2013 at 22:01 -0400, PJ Eby wrote:
> On Fri, Mar 15, 2013 at 7:16 PM, Carl Meyer <carl at oddbird.net> wrote:
> > Ok, pending agreement from Holger I'll make a change in the PEP to
> > explicitly allow clients to make decisions based on either the rel
> > attributes or based on hostnames. Would that be sufficient to address
> > your concerns?
> 
> Yes.  I just don't want to be in a situation down the road where
> there's another argument about this on Catalog-SIG when PyPI starts
> using a CDN that, "but it says this in the rel and you're supposed to
> use that", and I say, "but Carl and Holger said..."  and they go,
> "doesn't matter, PEP says"   ;-)
> 
> This way, the PEP will be clear that supporting a split of PyPI's
> hostnames isn't in current scope.

> 
> I am also okay with the PEP allowing *.indexhost instead of just
> indexhost as the filtering mechanism, as long as it specifies one
> *now*.  (Again, so this doesn't have to be revisited later.)  If
> somebody who knows something about CDNs, TUF, etc., needs to weigh in
> on it first, that's fine.  I just want to know where things stand.
 
One related question.  The "rel=internal" links will contain
a (md5 currently) hash so if the referenced resource resolves to
a file matching that hash, we can be sure about its integrity.
What kind of security does host-checking add on top?

holger

> > Putting the /simple/ API on a CDN isn't quite that easy because it
> > currently involves some server-side redirects to effectively make
> > project names case-insensitive.
> 
> FWIW, easy_install works fine without this.  If a matching index page
> isn't found, it checks the full package list.  PyPI's redirection just
> reduces bandwidth usage and request overhead in the case where the
> case of the user's request doesn't match the actual package listing.
> But it could be completely static without affecting easy_install and
> tools that use its package-finding code.
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
> 

From ncoghlan at gmail.com  Sat Mar 16 08:15:06 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 16 Mar 2013 00:15:06 -0700
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
	PYPI
In-Reply-To: <5143ABC3.8030603@oddbird.net>
References: <20130315092959.GA9677@merlinux.eu>
	<CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>
	<5143475F.50708@oddbird.net>
	<CALeMXf4u-PX_WpY2=-AZu3207yOjg9ZG6tCKZwDQ5mTtfo=ZLQ@mail.gmail.com>
	<51435CEE.3020505@oddbird.net>
	<CALeMXf718pJ0+6uZ03T+FwwRBbjnpKgN=sDpMP7ZFFm4NKGh5w@mail.gmail.com>
	<5143ABC3.8030603@oddbird.net>
Message-ID: <CADiSq7f+063rr3WBfzCw6Rd_u2bRzZVjLrT8Apvswythn9iXjw@mail.gmail.com>

On 15 Mar 2013 16:16, "Carl Meyer" <carl at oddbird.net> wrote:
>
> tl;dr: I see your points, we'll change the PEP to allow clients to use
> hostnames instead of the rel attributes if they prefer.

I will veto any such change. Clients MUST NOT assume that the architecture
of the index service will be limited to a single host name, they must
process the explicit metadata provided by the index that indicates which
hosts the index controls.

Adding a "--trust-indices" flag to make this optional in setuptools would
be fine, but it seems perverse to trust every aspect of an index *except*
its claims to control additional hosts.

Regards,
Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130316/d96cd92a/attachment.html>

From carl at oddbird.net  Mon Mar 18 00:09:28 2013
From: carl at oddbird.net (Carl Meyer)
Date: Sun, 17 Mar 2013 16:09:28 -0700
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
 PYPI
In-Reply-To: <CADiSq7f+063rr3WBfzCw6Rd_u2bRzZVjLrT8Apvswythn9iXjw@mail.gmail.com>
References: <20130315092959.GA9677@merlinux.eu>
	<CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>
	<5143475F.50708@oddbird.net>
	<CALeMXf4u-PX_WpY2=-AZu3207yOjg9ZG6tCKZwDQ5mTtfo=ZLQ@mail.gmail.com>
	<51435CEE.3020505@oddbird.net>
	<CALeMXf718pJ0+6uZ03T+FwwRBbjnpKgN=sDpMP7ZFFm4NKGh5w@mail.gmail.com>
	<5143ABC3.8030603@oddbird.net>
	<CADiSq7f+063rr3WBfzCw6Rd_u2bRzZVjLrT8Apvswythn9iXjw@mail.gmail.com>
Message-ID: <51464D28.3080906@oddbird.net>

On 03/16/2013 12:15 AM, Nick Coghlan wrote:
> On 15 Mar 2013 16:16, "Carl Meyer" <carl at oddbird.net
> <mailto:carl at oddbird.net>> wrote:
>>
>> tl;dr: I see your points, we'll change the PEP to allow clients to use
>> hostnames instead of the rel attributes if they prefer.
> 
> I will veto any such change. Clients MUST NOT assume that the
> architecture of the index service will be limited to a single host name,
> they must process the explicit metadata provided by the index that
> indicates which hosts the index controls.
> 
> Adding a "--trust-indices" flag to make this optional in setuptools
> would be fine, but it seems perverse to trust every aspect of an index
> *except* its claims to control additional hosts.

Ok, based on this I retract my earlier comment. I've pushed a minor
update to the PEP (at https://bitbucket.org/hpk42/pep-pypi, not yet at
python.org) to clarify explicitly that indexes may choose to host
internal files on a separate host/domain.

Carl

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130317/97111111/attachment.pgp>

From tk47 at students.poly.edu  Mon Mar 18 07:15:41 2013
From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy)
Date: Mon, 18 Mar 2013 02:15:41 -0400
Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF
In-Reply-To: <51411598.1010100@students.poly.edu>
References: <51401FB3.7000408@students.poly.edu>
	<CADiSq7cHDeG93XfL4gPDxaco1cZKKAq5q70jDMpPeX1tS-vEgQ@mail.gmail.com>
	<5140432C.7000904@students.poly.edu>
	<CAG8k2+4eOYo=F_HDhB-u0b-91mHC1i+kgmavHp+xFERaDpUE3w@mail.gmail.com>
	<51411598.1010100@students.poly
Message-ID: <5146B10D.1050606@students.poly.edu>

On 3/13/13 8:11 PM, Trishank Karthik Kuppusamy wrote:
>
> Speaking of which, it may be the case that our design document for
> integrating PyPI with TUF may not be terribly easy to understand. (After
> all, you do need to understand TUF first, but TUF is fairly easy once
> you understand its main ideas.) I plan to publish a friendlier document
> which introduce TUF at a very high-level and instead discuss more
> pragmatic issues (such as workflows).

We presented a lightning talk on PyPI + TUF + pip at PyCon yesterday, 
and perhaps it would make things easier to understand:

https://www.youtube.com/watch?v=2sx1lS6cT3g

https://docs.google.com/presentation/d/1FMptD5sMH41BTgS3-PN0-7j5Zqvs_zZZ3ntsD_4u-7w/edit?usp=sharing



From pje at telecommunity.com  Mon Mar 18 18:22:20 2013
From: pje at telecommunity.com (PJ Eby)
Date: Mon, 18 Mar 2013 13:22:20 -0400
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
	PYPI
In-Reply-To: <CADiSq7f+063rr3WBfzCw6Rd_u2bRzZVjLrT8Apvswythn9iXjw@mail.gmail.com>
References: <20130315092959.GA9677@merlinux.eu>
	<CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>
	<5143475F.50708@oddbird.net>
	<CALeMXf4u-PX_WpY2=-AZu3207yOjg9ZG6tCKZwDQ5mTtfo=ZLQ@mail.gmail.com>
	<51435CEE.3020505@oddbird.net>
	<CALeMXf718pJ0+6uZ03T+FwwRBbjnpKgN=sDpMP7ZFFm4NKGh5w@mail.gmail.com>
	<5143ABC3.8030603@oddbird.net>
	<CADiSq7f+063rr3WBfzCw6Rd_u2bRzZVjLrT8Apvswythn9iXjw@mail.gmail.com>
Message-ID: <CALeMXf6LUC6kX1JUsFpd24uroU74Fvf9qhinTEpW0SpU7CgO=A@mail.gmail.com>

On Sat, Mar 16, 2013 at 3:15 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
> On 15 Mar 2013 16:16, "Carl Meyer" <carl at oddbird.net> wrote:
>>
>> tl;dr: I see your points, we'll change the PEP to allow clients to use
>> hostnames instead of the rel attributes if they prefer.
>
> I will veto any such change. Clients MUST NOT assume that the architecture
> of the index service will be limited to a single host name, they must
> process the explicit metadata provided by the index that indicates which
> hosts the index controls.
>
> Adding a "--trust-indices" flag to make this optional in setuptools would be
> fine, but it seems perverse to trust every aspect of an index *except* its
> claims to control additional hosts.

Actually, setuptools trusts redirects, so that mechanism is available
for splitting the hosted files to another domain.

As it stands, though, I don't see a way to support this without
introducing confusion.  The advantage of using allow-hosts based on
the index host is that it *also* specifies what to do with dependency
links provided by individual packages; the PEP does not provide any
real guidance on this point.

So, I have to withdraw my support for the PEP with these recent
changes, as it no longer reflects the approach I previously agreed to,
and as yet there have been no alternatives proposed to address the
user confusion issues (which IMO at least are a big part of the point
of having the PEP).

Of course, if redirection is required for non-extrapolatable
hostnames, or if somebody comes up with a new and brilliant scheme to
manage the menage of permissions needed across dependency_links, the
index, and general host trusting issues (while remaining
comprehensible and predictable to end users), I'll certainly have a
look again.  But I took the weekend off from this discussion to try to
come up with one myself, and so far I've got nothing.

From pje at telecommunity.com  Mon Mar 18 18:26:10 2013
From: pje at telecommunity.com (PJ Eby)
Date: Mon, 18 Mar 2013 13:26:10 -0400
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
	PYPI
In-Reply-To: <CALeMXf6LUC6kX1JUsFpd24uroU74Fvf9qhinTEpW0SpU7CgO=A@mail.gmail.com>
References: <20130315092959.GA9677@merlinux.eu>
	<CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>
	<5143475F.50708@oddbird.net>
	<CALeMXf4u-PX_WpY2=-AZu3207yOjg9ZG6tCKZwDQ5mTtfo=ZLQ@mail.gmail.com>
	<51435CEE.3020505@oddbird.net>
	<CALeMXf718pJ0+6uZ03T+FwwRBbjnpKgN=sDpMP7ZFFm4NKGh5w@mail.gmail.com>
	<5143ABC3.8030603@oddbird.net>
	<CADiSq7f+063rr3WBfzCw6Rd_u2bRzZVjLrT8Apvswythn9iXjw@mail.gmail.com>
	<CALeMXf6LUC6kX1JUsFpd24uroU74Fvf9qhinTEpW0SpU7CgO=A@mail.gmail.com>
Message-ID: <CALeMXf7zJ_j2eA1D7PhYz4Ep=mUiyQQJHdsyOCDi7tPN1pB2Ag@mail.gmail.com>

On Mon, Mar 18, 2013 at 1:22 PM, PJ Eby <pje at telecommunity.com> wrote:
> Actually, setuptools trusts redirects, so that mechanism is available
> for splitting the hosted files to another domain.
>
> As it stands, though, I don't see a way to support this without
> introducing confusion.

Oops - that wasn't clear.  By "this" I meant the current version of the PEP.

From richard at python.org  Mon Mar 18 20:02:35 2013
From: richard at python.org (Richard Jones)
Date: Mon, 18 Mar 2013 12:02:35 -0700
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
	PYPI
In-Reply-To: <20130315092959.GA9677@merlinux.eu>
References: <20130315092959.GA9677@merlinux.eu>
Message-ID: <CAHrZfZA_V6K2XGeP9XA60z16e_BEmNKq0gS9uF4UszbZ069a2Q@mail.gmail.com>

Some suggested edits; I'm otherwise quite happy with the current draft.

On 15 March 2013 02:29, holger krekel <holger at merlinux.eu> wrote:
> History and motivations for external hosting

Could we please have a reference to the Package Index "API"* here?


> Today, most packages released on PyPI host their release files on
> PyPI, but a small percentage (XXX need updated data) rely on
> external hosting.

The above should probably be re-worded since "rely" is loaded and we
don't necessarily know the motivation for projects using external
links. The important numbers though are:

projects with any external only links: 2581
projects with only external only links: 1332
total projects: 29117

Whether the projects with links that also have hosted files (ie. the
1249 project difference between those numbers) *rely* on us retaining
the external links facility is unknown.


> Hosting modes
> -------------
>
> The foundation of the first transition phase is the introduction of
> three "modes" of PyPI hosting for a package, affecting which links are
> generated for the ``simple/`` index.  These modes are implemented
> without requiring changes to installation tools via changes to the
> algorithm for generating the machine-readable ``simple/`` index.
>
> The modes are:
>
> - ``pypi-scrape-crawl``: no change from the current situation of
>   generating machine-readable links for installation tools, as
>   outlined in the history_.
>
> - ``pypi-scrape``: for a package in this mode, links to be added to
>   the ``simple/`` index are still scraped from package
>   metadata. However, the "Home-page" and "Download-url" links are
>   given ``rel=ext-homepage`` and ``rel=ext-download`` attributes
>   instead of ``rel=homepage`` and ``rel=download``. The effect of this
>   (with no change in installation tools necessary) is that these links
>   will not be followed and scraped for further candidate links by present-day
>   installation tools: only installable files directly hosted from PYPI or
>   linked directly from PyPI metadata will be considered for installation.
>   Installation tools MAY evolve to offer an option to use the new
>   rel-attribution to crawl external pages but MUST NOT default to it.

I'd just like to confirm that the rel="download" / rel="ext-download"
switch will not affect the installability of distribution downloads
linked directly by download_url.


> - ``pypi-explicit``: for a package in this mode, only links to release
>   files uploaded to PyPI, and external links to release files
>   explicitly nominated by the package owner (via a new interface
>   exposed by PyPI) will be added to the ``simple/`` index.

The bracketed bit there needs to be emphasised (ie. not just a
bracketed afterthought) as it changes the current packaging user
experience considerably for those who wish to remain externally
hosting files.



     Richard

* http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api

From donald at stufft.io  Mon Mar 18 20:16:19 2013
From: donald at stufft.io (Donald Stufft)
Date: Mon, 18 Mar 2013 15:16:19 -0400
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
	PYPI
In-Reply-To: <CAHrZfZA_V6K2XGeP9XA60z16e_BEmNKq0gS9uF4UszbZ069a2Q@mail.gmail.com>
References: <20130315092959.GA9677@merlinux.eu>
	<CAHrZfZA_V6K2XGeP9XA60z16e_BEmNKq0gS9uF4UszbZ069a2Q@mail.gmail.com>
Message-ID: <E467F9E8-4740-4A5D-A724-A53AC80503A6@stufft.io>


On Mar 18, 2013, at 3:02 PM, Richard Jones <richard at python.org> wrote:

> Some suggested edits; I'm otherwise quite happy with the current draft.
> 
> On 15 March 2013 02:29, holger krekel <holger at merlinux.eu> wrote:
>> History and motivations for external hosting
> 
> Could we please have a reference to the Package Index "API"* here?
> 
> 
>> Today, most packages released on PyPI host their release files on
>> PyPI, but a small percentage (XXX need updated data) rely on
>> external hosting.
> 
> The above should probably be re-worded since "rely" is loaded and we
> don't necessarily know the motivation for projects using external
> links. The important numbers though are:
> 
> projects with any external only links: 2581
> projects with only external only links: 1332
> total projects: 29117
> 
> Whether the projects with links that also have hosted files (ie. the
> 1249 project difference between those numbers) *rely* on us retaining
> the external links facility is unknown.
> 
> 
>> Hosting modes
>> -------------
>> 
>> The foundation of the first transition phase is the introduction of
>> three "modes" of PyPI hosting for a package, affecting which links are
>> generated for the ``simple/`` index.  These modes are implemented
>> without requiring changes to installation tools via changes to the
>> algorithm for generating the machine-readable ``simple/`` index.
>> 
>> The modes are:
>> 
>> - ``pypi-scrape-crawl``: no change from the current situation of
>>  generating machine-readable links for installation tools, as
>>  outlined in the history_.
>> 
>> - ``pypi-scrape``: for a package in this mode, links to be added to
>>  the ``simple/`` index are still scraped from package
>>  metadata. However, the "Home-page" and "Download-url" links are
>>  given ``rel=ext-homepage`` and ``rel=ext-download`` attributes
>>  instead of ``rel=homepage`` and ``rel=download``. The effect of this
>>  (with no change in installation tools necessary) is that these links
>>  will not be followed and scraped for further candidate links by present-day
>>  installation tools: only installable files directly hosted from PYPI or
>>  linked directly from PyPI metadata will be considered for installation.
>>  Installation tools MAY evolve to offer an option to use the new
>>  rel-attribution to crawl external pages but MUST NOT default to it.
> 
> I'd just like to confirm that the rel="download" / rel="ext-download"
> switch will not affect the installability of distribution downloads
> linked directly by download_url.

As far as I know all existing tools ignore the rel attribute for purposes of finding direct links.

> 
> 
>> - ``pypi-explicit``: for a package in this mode, only links to release
>>  files uploaded to PyPI, and external links to release files
>>  explicitly nominated by the package owner (via a new interface
>>  exposed by PyPI) will be added to the ``simple/`` index.
> 
> The bracketed bit there needs to be emphasised (ie. not just a
> bracketed afterthought) as it changes the current packaging user
> experience considerably for those who wish to remain externally
> hosting files.
> 
> 
> 
>     Richard
> 
> * http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130318/6167d150/attachment.pgp>

From aclark at aclark.net  Mon Mar 18 22:41:03 2013
From: aclark at aclark.net (Alex Clark)
Date: Mon, 18 Mar 2013 17:41:03 -0400
Subject: [Catalog-sig] New PyPI stats available
References: <CAHrZfZD_k4pCopH65BkhT5cSwy1iEic9oVnzHZ1CF51MtYg41A@mail.gmail.com>
	<kftfcr$o9n$1@ger.gmane.org>
	<CALeMXf7GCJ3m83MO9Qrx0LB4FoM=LsX5VHPQsJEGTj26NQ3-Yg@mail.gmail.com>
	<kfvrb2$vcl$1@ger.gmane.org>
Message-ID: <ki81lb$s4k$1@ger.gmane.org>

On 2013-02-19 12:31:33 +0000, Alex Clark said:

> On 2013-02-18 22:06:56 +0000, PJ Eby said:
> 
>> On Mon, Feb 18, 2013 at 9:55 AM, Alex Clark <aclark at aclark.net> wrote:
>>> aclark at Alexs-MacBook-Pro:~/Developer/aclark/resume/ > vanity pydstat
>>> pydstat-1.0.0.tar.gz     2012-08-15    2,216
>>> pydstat-1.0.1.tar.gz     2012-08-23    4,367
>>> --------------------------------------------
>>> pydstat has been downloaded 6,583 times!
>> 
>> Nice -- any chance you could add version filtering?  "vanity
>> setuptools" reports ~8.4 million downloads for setuptools, but the
>> current release actually stands at only around 4.8 million.  ;-)
> 
> 
> Sure, can you specify what you want 
> here?https://github.com/aclark4life/vanity/issues/7. I assume you 
> mean:allow for easy reporting of the number of downloads for each 
> releasee.g. the current release. (Vanity currently displays all the 
> releasetotals then the sum.)
> 
> (Of course, as I'm testing this, vanity is not working. Did XML-RPC 
> onPyPI go away recently? Maybe I should switch to json 
> e.g.https://pypi.python.org/pypi/setuptools/json)
> 
> 
>> (Also, the formatting is off for the most popular downloads, because
>> the count column isn't wide enough to show 7 significant figures.)
> 
> 
> Thanks, reported: https://github.com/aclark4life/vanity/issues/8


And? done in 1.2.5: https://pypi.python.org/pypi/vanity/1.2.5

> 
> 
> Alex


-- 
Alex Clark ? http://about.me/alex.clark



From carl at oddbird.net  Tue Mar 19 00:37:53 2013
From: carl at oddbird.net (Carl Meyer)
Date: Mon, 18 Mar 2013 16:37:53 -0700
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
 PYPI
In-Reply-To: <CALeMXf6LUC6kX1JUsFpd24uroU74Fvf9qhinTEpW0SpU7CgO=A@mail.gmail.com>
References: <20130315092959.GA9677@merlinux.eu>
	<CALeMXf67QFgiwD6NkypTt63k1XpG8hFD58RtHFbEMN=y6Zskmw@mail.gmail.com>
	<5143475F.50708@oddbird.net>
	<CALeMXf4u-PX_WpY2=-AZu3207yOjg9ZG6tCKZwDQ5mTtfo=ZLQ@mail.gmail.com>
	<51435CEE.3020505@oddbird.net>
	<CALeMXf718pJ0+6uZ03T+FwwRBbjnpKgN=sDpMP7ZFFm4NKGh5w@mail.gmail.com>
	<5143ABC3.8030603@oddbird.net>
	<CADiSq7f+063rr3WBfzCw6Rd_u2bRzZVjLrT8Apvswythn9iXjw@mail.gmail.com>
	<CALeMXf6LUC6kX1JUsFpd24uroU74Fvf9qhinTEpW0SpU7CgO=A@mail.gmail.com>
Message-ID: <5147A551.4050506@oddbird.net>

On 03/18/2013 10:22 AM, PJ Eby wrote:
> Actually, setuptools trusts redirects, so that mechanism is available
> for splitting the hosted files to another domain.

By "trusts redirects" you mean that redirects bypass allow-hosts? This
seems to contradict your line of argument up to this point (that
allow-hosts must be simple and without exceptions or users will be
confused).

> As it stands, though, I don't see a way to support this without
> introducing confusion.  The advantage of using allow-hosts based on
> the index host is that it *also* specifies what to do with dependency
> links provided by individual packages; the PEP does not provide any
> real guidance on this point.

I'm updating the PEP to eliminate rel="external" (as it causes this
confusion and provides no additional value) and clarify that any link,
from anywhere, that is not rel="internal" should be considered an
external link.

> So, I have to withdraw my support for the PEP with these recent
> changes, as it no longer reflects the approach I previously agreed to,
> and as yet there have been no alternatives proposed to address the
> user confusion issues (which IMO at least are a big part of the point
> of having the PEP).

I don't think there is any "user confusion" problem for an installer
that does not already provide allow-hosts: just use a per-project "I
want to trust external links provided by Django" option instead. And I
don't really even think there is a user confusion problem in providing
both allow-hosts and a new option on this model - they are options at
different levels of abstraction and with different use cases (though I
think the value of allow-hosts is weak if redirects bypass it anyway).

IOW, this is not a problem with the PEP, this is a
backwards-compatibility question and UI choice for easy_install
maintainers. The PEP provides the right metadata, and there are
reasonable options (in general) for installer UIs to make use of this
metadata.

Carl

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130318/ceaea901/attachment.pgp>

From carl at oddbird.net  Tue Mar 19 01:48:52 2013
From: carl at oddbird.net (Carl Meyer)
Date: Mon, 18 Mar 2013 17:48:52 -0700
Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on
 PYPI
In-Reply-To: <CAHrZfZA_V6K2XGeP9XA60z16e_BEmNKq0gS9uF4UszbZ069a2Q@mail.gmail.com>
References: <20130315092959.GA9677@merlinux.eu>
	<CAHrZfZA_V6K2XGeP9XA60z16e_BEmNKq0gS9uF4UszbZ069a2Q@mail.gmail.com>
Message-ID: <5147B5F4.3050707@oddbird.net>

Hi Richard,

On 03/18/2013 12:02 PM, Richard Jones wrote:
> Some suggested edits; I'm otherwise quite happy with the current draft.
> 
> On 15 March 2013 02:29, holger krekel <holger at merlinux.eu> wrote:
>> History and motivations for external hosting
> 
> Could we please have a reference to the Package Index "API"* here?

Added.

>> Today, most packages released on PyPI host their release files on
>> PyPI, but a small percentage (XXX need updated data) rely on
>> external hosting.
> 
> The above should probably be re-worded since "rely" is loaded and we
> don't necessarily know the motivation for projects using external
> links. The important numbers though are:
> 
> projects with any external only links: 2581
> projects with only external only links: 1332
> total projects: 29117
> 
> Whether the projects with links that also have hosted files (ie. the
> 1249 project difference between those numbers) *rely* on us retaining
> the external links facility is unknown.

Done: updated to include the latest numbers, re-worded to remove the
word "rely", and added a link to the data and analysis tool source code
at https://github.com/dstufft/pypi.linkcheck

>> Hosting modes
>> -------------
>>
>> The foundation of the first transition phase is the introduction of
>> three "modes" of PyPI hosting for a package, affecting which links are
>> generated for the ``simple/`` index.  These modes are implemented
>> without requiring changes to installation tools via changes to the
>> algorithm for generating the machine-readable ``simple/`` index.
>>
>> The modes are:
>>
>> - ``pypi-scrape-crawl``: no change from the current situation of
>>   generating machine-readable links for installation tools, as
>>   outlined in the history_.
>>
>> - ``pypi-scrape``: for a package in this mode, links to be added to
>>   the ``simple/`` index are still scraped from package
>>   metadata. However, the "Home-page" and "Download-url" links are
>>   given ``rel=ext-homepage`` and ``rel=ext-download`` attributes
>>   instead of ``rel=homepage`` and ``rel=download``. The effect of this
>>   (with no change in installation tools necessary) is that these links
>>   will not be followed and scraped for further candidate links by present-day
>>   installation tools: only installable files directly hosted from PYPI or
>>   linked directly from PyPI metadata will be considered for installation.
>>   Installation tools MAY evolve to offer an option to use the new
>>   rel-attribution to crawl external pages but MUST NOT default to it.
> 
> I'd just like to confirm that the rel="download" / rel="ext-download"
> switch will not affect the installability of distribution downloads
> linked directly by download_url.

It won't. The rel attribute impacts only whether a link to a non-archive
(HTML) resource is scraped for further links, it doesn't impact a direct
archive link.

>> - ``pypi-explicit``: for a package in this mode, only links to release
>>   files uploaded to PyPI, and external links to release files
>>   explicitly nominated by the package owner (via a new interface
>>   exposed by PyPI) will be added to the ``simple/`` index.
> 
> The bracketed bit there needs to be emphasised (ie. not just a
> bracketed afterthought) as it changes the current packaging user
> experience considerably for those who wish to remain externally
> hosting files.

Done, and added the requirement that external links must include hashes,
as we just discussed in person.

All of these updates are in https://bitbucket.org/hpk42/pep-pypi - feel
free to sync to python.org at your leisure.

Carl

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130318/1c2575e0/attachment.pgp>

From r1chardj0n3s at gmail.com  Wed Mar 20 19:26:45 2013
From: r1chardj0n3s at gmail.com (Richard Jones)
Date: Wed, 20 Mar 2013 11:26:45 -0700
Subject: [Catalog-sig] PEP 438 implementation on testpypi
Message-ID: <CAHrZfZCJA+gPBG1MzkpAUueZB=udp56ghjKMF0j71AH-0wWJ=w@mail.gmail.com>

Thanks to Donald Stufft for his implementation of the PEP 438 changes,
I've made them live on testpypi.python.org - specifically the "urls"
page of package administration. Please poke and play.


     Richard

From mal at egenix.com  Wed Mar 20 20:31:23 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 20 Mar 2013 20:31:23 +0100
Subject: [Catalog-sig] PEP 438 implementation on testpypi
In-Reply-To: <CAHrZfZCJA+gPBG1MzkpAUueZB=udp56ghjKMF0j71AH-0wWJ=w@mail.gmail.com>
References: <CAHrZfZCJA+gPBG1MzkpAUueZB=udp56ghjKMF0j71AH-0wWJ=w@mail.gmail.com>
Message-ID: <514A0E8B.5030500@egenix.com>

On 20.03.2013 19:26, Richard Jones wrote:
> Thanks to Donald Stufft for his implementation of the PEP 438 changes,
> I've made them live on testpypi.python.org - specifically the "urls"
> page of package administration. Please poke and play.

Nice... first tests:

* Going to "urls" and then clicking on [Change] gives an error:

"""
Name and version are required

Name and version are required
"""

It doesn't matter which choice you select.

* Will there be an RPC interface to register URLs with PyPI ?

Doing this manually for a large number of files is, well,
not ideal :-)

* Adding URLs should do some more tests, I think:

It's possible to register "test#md5=123" (without http/ftp and
without providing the full MD5 sum).

It's possible to register "../test/#md5=123", i.e. point
to different files on PyPI itself. Not sure whether this
is a bug or feature ;-)

It's possible to register "test#md5=123&sha1=123". This is
actually a good thing, since it allows implementing the
hash tag extensions proposed by Christian Heimes. I'm just
mentioning this, so that it becomes a supported feature.

* I'm missing an option:

[ ] Ask tools to scrape only the Download URL.

This should result in the download_url being put on the /simple/
index page with rel="download" being set.

Reasoning: This is the designated URL where packages should be
downloaded from. With the current list of choices, I'd have to
select the last option, which includes the old long description
links and the homepage URL.

Other things:
-------------

* Would it be possible to add a link to the corresponding
/simple/ index page on the package menu (the one with files,
urls, etc.) ?

* Could you add a link to the PKG-INFO file from
  pypi?:action=display_pkginfo to the /simple/ page as
  <version>-PKG-INFO (to match the other links) ?

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 20 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From mal at egenix.com  Wed Mar 20 20:35:26 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 20 Mar 2013 20:35:26 +0100
Subject: [Catalog-sig] PEP 438 implementation on testpypi
In-Reply-To: <514A0E8B.5030500@egenix.com>
References: <CAHrZfZCJA+gPBG1MzkpAUueZB=udp56ghjKMF0j71AH-0wWJ=w@mail.gmail.com>
	<514A0E8B.5030500@egenix.com>
Message-ID: <514A0F7E.2090103@egenix.com>

On 20.03.2013 20:31, M.-A. Lemburg wrote:
> Other things:
> -------------
> 
> * Would it be possible to add a link to the corresponding
> /simple/ index page on the package menu (the one with files,
> urls, etc.) ?
> 
> * Could you add a link to the PKG-INFO file from
>   pypi?:action=display_pkginfo to the /simple/ page as
>   <version>-PKG-INFO (to match the other links) ?

Or even better and more suitable for the CDN...

Have PyPI publish the PKG-INFO under the /simple/ index URL:

/simple/package/<version>-PKG-INFO

(instead of just setting a link to the /pypi/ page)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 20 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From r1chardj0n3s at gmail.com  Wed Mar 20 21:16:11 2013
From: r1chardj0n3s at gmail.com (Richard Jones)
Date: Wed, 20 Mar 2013 13:16:11 -0700
Subject: [Catalog-sig] PEP 438 implementation on testpypi
In-Reply-To: <514A0E8B.5030500@egenix.com>
References: <CAHrZfZCJA+gPBG1MzkpAUueZB=udp56ghjKMF0j71AH-0wWJ=w@mail.gmail.com>
	<514A0E8B.5030500@egenix.com>
Message-ID: <CAHrZfZBCv0CrhkX9sSJ7Ud3v8STHkc0HtTDEP5-Svj23+7gh8Q@mail.gmail.com>

On 20 March 2013 12:31, M.-A. Lemburg <mal at egenix.com> wrote:
> On 20.03.2013 19:26, Richard Jones wrote:
>> Thanks to Donald Stufft for his implementation of the PEP 438 changes,
>> I've made them live on testpypi.python.org - specifically the "urls"
>> page of package administration. Please poke and play.
>
> Nice... first tests:
>
> * Going to "urls" and then clicking on [Change] gives an error:
>
> """
> Name and version are required
>
> Name and version are required
> """
>
> It doesn't matter which choice you select.

Oops. This is fixed. You'll have to reload the page to get the correct
form code.


> * Will there be an RPC interface to register URLs with PyPI ?
>
> Doing this manually for a large number of files is, well,
> not ideal :-)

It's just a HTTP POST and there's plans for a tool.



> * Adding URLs should do some more tests, I think:

I thought about it, but didn't see any benefit. It's documented...



> * I'm missing an option:
>
> [ ] Ask tools to scrape only the Download URL.

This is not part of the planned implementation. The download_url was
never well-specified, and only allows for one URL, hence the
implementation we have.


> * Would it be possible to add a link to the corresponding
> /simple/ index page on the package menu (the one with files,
> urls, etc.) ?

I guess this could be added, yes.


> * Could you add a link to the PKG-INFO file from
>   pypi?:action=display_pkginfo to the /simple/ page as
>   <version>-PKG-INFO (to match the other links) ?

We could think about it - what's the use-case?


      Richard

From mal at egenix.com  Wed Mar 20 21:27:38 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 20 Mar 2013 21:27:38 +0100
Subject: [Catalog-sig] PEP 438 implementation on testpypi
In-Reply-To: <CAHrZfZBCv0CrhkX9sSJ7Ud3v8STHkc0HtTDEP5-Svj23+7gh8Q@mail.gmail.com>
References: <CAHrZfZCJA+gPBG1MzkpAUueZB=udp56ghjKMF0j71AH-0wWJ=w@mail.gmail.com>
	<514A0E8B.5030500@egenix.com>
	<CAHrZfZBCv0CrhkX9sSJ7Ud3v8STHkc0HtTDEP5-Svj23+7gh8Q@mail.gmail.com>
Message-ID: <514A1BBA.8010605@egenix.com>

On 20.03.2013 21:16, Richard Jones wrote:
> On 20 March 2013 12:31, M.-A. Lemburg <mal at egenix.com> wrote:
>> * Will there be an RPC interface to register URLs with PyPI ?
>>
>> Doing this manually for a large number of files is, well,
>> not ideal :-)
> 
> It's just a HTTP POST and there's plans for a tool.

Is this documented somewhere ? I'd like to add support for it
to our release process.

>> * Adding URLs should do some more tests, I think:
> 
> I thought about it, but didn't see any benefit. It's documented...

Hmm, where ? :-)

>> * I'm missing an option:
>>
>> [ ] Ask tools to scrape only the Download URL.
> 
> This is not part of the planned implementation. The download_url was
> never well-specified, and only allows for one URL, hence the
> implementation we have.

I know it's not in PEP 438 at the moment, but was one of the
nits I mentioned to Holger last week. It's specified in the
meta-data format 1.1 as "A string containing the URL from
which this version of the package can be downloaded.":

http://www.python.org/dev/peps/pep-0314/

Having such an option would allow cleaning up the /simple/
index pages a lot, without any changes on the tools side.

It would also be needed for the my proposal of securing
external downloads, where you point to a hashed download
page with the download_url.

>> * Would it be possible to add a link to the corresponding
>> /simple/ index page on the package menu (the one with files,
>> urls, etc.) ?
> 
> I guess this could be added, yes.

Great.

>> * Could you add a link to the PKG-INFO file from
>>   pypi?:action=display_pkginfo to the /simple/ page as
>>   <version>-PKG-INFO (to match the other links) ?
> 
> We could think about it - what's the use-case?

This would allow tools to easily and safely access meta-data
of a package release without downloading, extracting and
running the release files' setup.py.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 20 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From holger at merlinux.eu  Wed Mar 20 22:02:50 2013
From: holger at merlinux.eu (holger krekel)
Date: Wed, 20 Mar 2013 21:02:50 +0000
Subject: [Catalog-sig] PEP 438 implementation on testpypi
In-Reply-To: <514A1BBA.8010605@egenix.com>
References: <CAHrZfZCJA+gPBG1MzkpAUueZB=udp56ghjKMF0j71AH-0wWJ=w@mail.gmail.com>
	<514A0E8B.5030500@egenix.com>
	<CAHrZfZBCv0CrhkX9sSJ7Ud3v8STHkc0HtTDEP5-Svj23+7gh8Q@mail.gmail.com>
	<514A1BBA.8010605@egenix.com>
Message-ID: <20130320210250.GL9677@merlinux.eu>

On Wed, Mar 20, 2013 at 21:27 +0100, M.-A. Lemburg wrote:
> On 20.03.2013 21:16, Richard Jones wrote:
> > On 20 March 2013 12:31, M.-A. Lemburg <mal at egenix.com> wrote:
> >> * I'm missing an option:
> >>
> >> [ ] Ask tools to scrape only the Download URL.
> > 
> > This is not part of the planned implementation. The download_url was
> > never well-specified, and only allows for one URL, hence the
> > implementation we have.
> 
> I know it's not in PEP 438 at the moment, but was one of the
> nits I mentioned to Holger last week. It's specified in the
> meta-data format 1.1 as "A string containing the URL from
> which this version of the package can be downloaded.":
> 
> http://www.python.org/dev/peps/pep-0314/
> 
> Having such an option would allow cleaning up the /simple/
> index pages a lot, without any changes on the tools side.
> 
> It would also be needed for the my proposal of securing
> external downloads, where you point to a hashed download
> page with the download_url.

I think it's better to just go for a tool which a maintainer can
use to register external urls (with hashes) from crawling and scraping
links once from an external page.  This way client installers worldwide
do not need to visit and scrape that external page just to obtain
release file links.  As you have mostly automated your release process
do you foresee any issues with adding an automated step of registering
externals and putting your package hosting mode to "pypi-explicit"?

holger

> >> * Would it be possible to add a link to the corresponding
> >> /simple/ index page on the package menu (the one with files,
> >> urls, etc.) ?
> > 
> > I guess this could be added, yes.
> 
> Great.
> 
> >> * Could you add a link to the PKG-INFO file from
> >>   pypi?:action=display_pkginfo to the /simple/ page as
> >>   <version>-PKG-INFO (to match the other links) ?
> > 
> > We could think about it - what's the use-case?
> 
> This would allow tools to easily and safely access meta-data
> of a package release without downloading, extracting and
> running the release files' setup.py.

> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 20 2013)
> >>> Python Projects, Consulting and Support ...   http://www.egenix.com/
> >>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
> >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>            Registered at Amtsgericht Duesseldorf: HRB 46611
>                http://www.egenix.com/company/contact/
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
> 

From r1chardj0n3s at gmail.com  Wed Mar 20 22:17:05 2013
From: r1chardj0n3s at gmail.com (Richard Jones)
Date: Wed, 20 Mar 2013 14:17:05 -0700
Subject: [Catalog-sig] PEP 438 implementation on testpypi
In-Reply-To: <514A1BBA.8010605@egenix.com>
References: <CAHrZfZCJA+gPBG1MzkpAUueZB=udp56ghjKMF0j71AH-0wWJ=w@mail.gmail.com>
	<514A0E8B.5030500@egenix.com>
	<CAHrZfZBCv0CrhkX9sSJ7Ud3v8STHkc0HtTDEP5-Svj23+7gh8Q@mail.gmail.com>
	<514A1BBA.8010605@egenix.com>
Message-ID: <CAHrZfZAPpO2m+ZTkLyT8cLnkD4OsySZ+pq9YDzVHT_WDLkgoHA@mail.gmail.com>

On 20 March 2013 13:27, M.-A. Lemburg <mal at egenix.com> wrote:
> On 20.03.2013 21:16, Richard Jones wrote:
>> On 20 March 2013 12:31, M.-A. Lemburg <mal at egenix.com> wrote:
>>> * Will there be an RPC interface to register URLs with PyPI ?
>>>
>>> Doing this manually for a large number of files is, well,
>>> not ideal :-)
>>
>> It's just a HTTP POST and there's plans for a tool.
>
> Is this documented somewhere ? I'd like to add support for it
> to our release process.

I'll think about adding this to the PEP.


>>> * Adding URLs should do some more tests, I think:
>>
>> I thought about it, but didn't see any benefit. It's documented...
>
> Hmm, where ? :-)

In the HTML page just above the add form :-)


    Richard

From mal at egenix.com  Wed Mar 20 22:53:29 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 20 Mar 2013 22:53:29 +0100
Subject: [Catalog-sig] PEP 438 implementation on testpypi
In-Reply-To: <20130320210250.GL9677@merlinux.eu>
References: <CAHrZfZCJA+gPBG1MzkpAUueZB=udp56ghjKMF0j71AH-0wWJ=w@mail.gmail.com>
	<514A0E8B.5030500@egenix.com>
	<CAHrZfZBCv0CrhkX9sSJ7Ud3v8STHkc0HtTDEP5-Svj23+7gh8Q@mail.gmail.com>
	<514A1BBA.8010605@egenix.com> <20130320210250.GL9677@merlinux.eu>
Message-ID: <514A2FD9.9030702@egenix.com>

On 20.03.2013 22:02, holger krekel wrote:
> On Wed, Mar 20, 2013 at 21:27 +0100, M.-A. Lemburg wrote:
>> On 20.03.2013 21:16, Richard Jones wrote:
>>> On 20 March 2013 12:31, M.-A. Lemburg <mal at egenix.com> wrote:
>>>> * I'm missing an option:
>>>>
>>>> [ ] Ask tools to scrape only the Download URL.
>>>
>>> This is not part of the planned implementation. The download_url was
>>> never well-specified, and only allows for one URL, hence the
>>> implementation we have.
>>
>> I know it's not in PEP 438 at the moment, but was one of the
>> nits I mentioned to Holger last week. It's specified in the
>> meta-data format 1.1 as "A string containing the URL from
>> which this version of the package can be downloaded.":
>>
>> http://www.python.org/dev/peps/pep-0314/
>>
>> Having such an option would allow cleaning up the /simple/
>> index pages a lot, without any changes on the tools side.
>>
>> It would also be needed for the my proposal of securing
>> external downloads, where you point to a hashed download
>> page with the download_url.
> 
> I think it's better to just go for a tool which a maintainer can
> use to register external urls (with hashes) from crawling and scraping
> links once from an external page.  This way client installers worldwide
> do not need to visit and scrape that external page just to obtain
> release file links.  As you have mostly automated your release process
> do you foresee any issues with adding an automated step of registering
> externals and putting your package hosting mode to "pypi-explicit"?

I don't have a problem with adding support to our release
process (provided there's some stable way to access the
needed API).

I'm thinking about other package owners who have the
download_url already point to a page with the distribution
file(s) and don't have a release process they can easily
adapt.

For them, it would be a nice possibility to speed up
installation of their packages.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 20 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From mal at egenix.com  Wed Mar 20 22:56:54 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 20 Mar 2013 22:56:54 +0100
Subject: [Catalog-sig] PEP 438 implementation on testpypi
In-Reply-To: <CAHrZfZAPpO2m+ZTkLyT8cLnkD4OsySZ+pq9YDzVHT_WDLkgoHA@mail.gmail.com>
References: <CAHrZfZCJA+gPBG1MzkpAUueZB=udp56ghjKMF0j71AH-0wWJ=w@mail.gmail.com>
	<514A0E8B.5030500@egenix.com>
	<CAHrZfZBCv0CrhkX9sSJ7Ud3v8STHkc0HtTDEP5-Svj23+7gh8Q@mail.gmail.com>
	<514A1BBA.8010605@egenix.com>
	<CAHrZfZAPpO2m+ZTkLyT8cLnkD4OsySZ+pq9YDzVHT_WDLkgoHA@mail.gmail.com>
Message-ID: <514A30A6.3080505@egenix.com>

On 20.03.2013 22:17, Richard Jones wrote:
> On 20 March 2013 13:27, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 20.03.2013 21:16, Richard Jones wrote:
>>> On 20 March 2013 12:31, M.-A. Lemburg <mal at egenix.com> wrote:
>>>> * Will there be an RPC interface to register URLs with PyPI ?
>>>>
>>>> Doing this manually for a large number of files is, well,
>>>> not ideal :-)
>>>
>>> It's just a HTTP POST and there's plans for a tool.
>>
>> Is this documented somewhere ? I'd like to add support for it
>> to our release process.
> 
> I'll think about adding this to the PEP.
> 
> 
>>>> * Adding URLs should do some more tests, I think:
>>>
>>> I thought about it, but didn't see any benefit. It's documented...
>>
>> Hmm, where ? :-)
> 
> In the HTML page just above the add form :-)

Could you change "The URL must end with the MD5 hash of the file
contents" to "The URL must include the MD5 hash of the file contents" ?

(See my original test report for the reason :-))

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 20 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From r1chardj0n3s at gmail.com  Wed Mar 20 23:01:21 2013
From: r1chardj0n3s at gmail.com (Richard Jones)
Date: Wed, 20 Mar 2013 15:01:21 -0700
Subject: [Catalog-sig] PEP 438 implementation on testpypi
In-Reply-To: <514A30A6.3080505@egenix.com>
References: <CAHrZfZCJA+gPBG1MzkpAUueZB=udp56ghjKMF0j71AH-0wWJ=w@mail.gmail.com>
	<514A0E8B.5030500@egenix.com>
	<CAHrZfZBCv0CrhkX9sSJ7Ud3v8STHkc0HtTDEP5-Svj23+7gh8Q@mail.gmail.com>
	<514A1BBA.8010605@egenix.com>
	<CAHrZfZAPpO2m+ZTkLyT8cLnkD4OsySZ+pq9YDzVHT_WDLkgoHA@mail.gmail.com>
	<514A30A6.3080505@egenix.com>
Message-ID: <CAHrZfZDXKCxCXS7hmmfVo-jNZpOsi47DUe=QVtHMpWsVGwHNGQ@mail.gmail.com>

On 20 March 2013 14:56, M.-A. Lemburg <mal at egenix.com> wrote:
> Could you change "The URL must end with the MD5 hash of the file
> contents" to "The URL must include the MD5 hash of the file contents" ?
>
> (See my original test report for the reason :-))

Hm. The wording was passed by one of the pip maintainers so I'll defer
to them on what the URL format should be.


     Richard

From mal at egenix.com  Wed Mar 20 23:19:06 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 20 Mar 2013 23:19:06 +0100
Subject: [Catalog-sig] PEP 438 implementation on testpypi
In-Reply-To: <CAHrZfZDXKCxCXS7hmmfVo-jNZpOsi47DUe=QVtHMpWsVGwHNGQ@mail.gmail.com>
References: <CAHrZfZCJA+gPBG1MzkpAUueZB=udp56ghjKMF0j71AH-0wWJ=w@mail.gmail.com>
	<514A0E8B.5030500@egenix.com>
	<CAHrZfZBCv0CrhkX9sSJ7Ud3v8STHkc0HtTDEP5-Svj23+7gh8Q@mail.gmail.com>
	<514A1BBA.8010605@egenix.com>
	<CAHrZfZAPpO2m+ZTkLyT8cLnkD4OsySZ+pq9YDzVHT_WDLkgoHA@mail.gmail.com>
	<514A30A6.3080505@egenix.com>
	<CAHrZfZDXKCxCXS7hmmfVo-jNZpOsi47DUe=QVtHMpWsVGwHNGQ@mail.gmail.com>
Message-ID: <514A35DA.1020400@egenix.com>

On 20.03.2013 23:01, Richard Jones wrote:
> On 20 March 2013 14:56, M.-A. Lemburg <mal at egenix.com> wrote:
>> Could you change "The URL must end with the MD5 hash of the file
>> contents" to "The URL must include the MD5 hash of the file contents" ?
>>
>> (See my original test report for the reason :-))
> 
> Hm. The wording was passed by one of the pip maintainers so I'll defer
> to them on what the URL format should be.

The format should be defined in the PEP 438. If we adopt the
hash tag extensions, then the URL fragment will just start with
the md5= part and not necessarily also end with it.

pip and easy_install will then have to implement the
extension mechanism; and package authors will have to
decide whether or not they want to stay compatible to
versions of those tools that don't have these implemented.

I was just asking for the text on the page to be in line with
what PyPI actually checks.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 20 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From r1chardj0n3s at gmail.com  Wed Mar 20 23:19:19 2013
From: r1chardj0n3s at gmail.com (Richard Jones)
Date: Wed, 20 Mar 2013 15:19:19 -0700
Subject: [Catalog-sig] PEP 438 implementation on testpypi
In-Reply-To: <CAHrZfZDXKCxCXS7hmmfVo-jNZpOsi47DUe=QVtHMpWsVGwHNGQ@mail.gmail.com>
References: <CAHrZfZCJA+gPBG1MzkpAUueZB=udp56ghjKMF0j71AH-0wWJ=w@mail.gmail.com>
	<514A0E8B.5030500@egenix.com>
	<CAHrZfZBCv0CrhkX9sSJ7Ud3v8STHkc0HtTDEP5-Svj23+7gh8Q@mail.gmail.com>
	<514A1BBA.8010605@egenix.com>
	<CAHrZfZAPpO2m+ZTkLyT8cLnkD4OsySZ+pq9YDzVHT_WDLkgoHA@mail.gmail.com>
	<514A30A6.3080505@egenix.com>
	<CAHrZfZDXKCxCXS7hmmfVo-jNZpOsi47DUe=QVtHMpWsVGwHNGQ@mail.gmail.com>
Message-ID: <CAHrZfZDVu9c+7NEDZRLpmKM8=jOvV0XEOVRMi5P7p9U2CtZ4Fw@mail.gmail.com>

On 20 March 2013 15:01, Richard Jones <r1chardj0n3s at gmail.com> wrote:
> On 20 March 2013 14:56, M.-A. Lemburg <mal at egenix.com> wrote:
>> Could you change "The URL must end with the MD5 hash of the file
>> contents" to "The URL must include the MD5 hash of the file contents" ?
>>
>> (See my original test report for the reason :-))
>
> Hm. The wording was passed by one of the pip maintainers so I'll defer
> to them on what the URL format should be.

Having discussed this further offline I've now modified the text as
above (with a tweak.)


     Richard

From mal at egenix.com  Wed Mar 20 23:20:37 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 20 Mar 2013 23:20:37 +0100
Subject: [Catalog-sig] PEP 438 implementation on testpypi
In-Reply-To: <CAHrZfZDVu9c+7NEDZRLpmKM8=jOvV0XEOVRMi5P7p9U2CtZ4Fw@mail.gmail.com>
References: <CAHrZfZCJA+gPBG1MzkpAUueZB=udp56ghjKMF0j71AH-0wWJ=w@mail.gmail.com>
	<514A0E8B.5030500@egenix.com>
	<CAHrZfZBCv0CrhkX9sSJ7Ud3v8STHkc0HtTDEP5-Svj23+7gh8Q@mail.gmail.com>
	<514A1BBA.8010605@egenix.com>
	<CAHrZfZAPpO2m+ZTkLyT8cLnkD4OsySZ+pq9YDzVHT_WDLkgoHA@mail.gmail.com>
	<514A30A6.3080505@egenix.com>
	<CAHrZfZDXKCxCXS7hmmfVo-jNZpOsi47DUe=QVtHMpWsVGwHNGQ@mail.gmail.com>
	<CAHrZfZDVu9c+7NEDZRLpmKM8=jOvV0XEOVRMi5P7p9U2CtZ4Fw@mail.gmail.com>
Message-ID: <514A3635.2000804@egenix.com>

On 20.03.2013 23:19, Richard Jones wrote:
> On 20 March 2013 15:01, Richard Jones <r1chardj0n3s at gmail.com> wrote:
>> On 20 March 2013 14:56, M.-A. Lemburg <mal at egenix.com> wrote:
>>> Could you change "The URL must end with the MD5 hash of the file
>>> contents" to "The URL must include the MD5 hash of the file contents" ?
>>>
>>> (See my original test report for the reason :-))
>>
>> Hm. The wording was passed by one of the pip maintainers so I'll defer
>> to them on what the URL format should be.
> 
> Having discussed this further offline I've now modified the text as
> above (with a tweak.)

Thanks.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 20 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From r1chardj0n3s at gmail.com  Wed Mar 20 23:28:16 2013
From: r1chardj0n3s at gmail.com (Richard Jones)
Date: Wed, 20 Mar 2013 15:28:16 -0700
Subject: [Catalog-sig] PEP 438 implementation on testpypi
In-Reply-To: <CAHrZfZAPpO2m+ZTkLyT8cLnkD4OsySZ+pq9YDzVHT_WDLkgoHA@mail.gmail.com>
References: <CAHrZfZCJA+gPBG1MzkpAUueZB=udp56ghjKMF0j71AH-0wWJ=w@mail.gmail.com>
	<514A0E8B.5030500@egenix.com>
	<CAHrZfZBCv0CrhkX9sSJ7Ud3v8STHkc0HtTDEP5-Svj23+7gh8Q@mail.gmail.com>
	<514A1BBA.8010605@egenix.com>
	<CAHrZfZAPpO2m+ZTkLyT8cLnkD4OsySZ+pq9YDzVHT_WDLkgoHA@mail.gmail.com>
Message-ID: <CAHrZfZC32UjJUi8EPUqCJQfm0ykx5Lzr96cq5DncxhvwWk7K2Q@mail.gmail.com>

On 20 March 2013 14:17, Richard Jones <r1chardj0n3s at gmail.com> wrote:
> On 20 March 2013 13:27, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 20.03.2013 21:16, Richard Jones wrote:
>>> On 20 March 2013 12:31, M.-A. Lemburg <mal at egenix.com> wrote:
>>>> * Will there be an RPC interface to register URLs with PyPI ?
>>>>
>>>> Doing this manually for a large number of files is, well,
>>>> not ideal :-)
>>>
>>> It's just a HTTP POST and there's plans for a tool.
>>
>> Is this documented somewhere ? I'd like to add support for it
>> to our release process.
>
> I'll think about adding this to the PEP.

This is now in the PEP.


     Richard

From mal at egenix.com  Thu Mar 21 00:23:50 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 21 Mar 2013 00:23:50 +0100
Subject: [Catalog-sig] PEP 438 implementation on testpypi
In-Reply-To: <CAHrZfZC32UjJUi8EPUqCJQfm0ykx5Lzr96cq5DncxhvwWk7K2Q@mail.gmail.com>
References: <CAHrZfZCJA+gPBG1MzkpAUueZB=udp56ghjKMF0j71AH-0wWJ=w@mail.gmail.com>
	<514A0E8B.5030500@egenix.com>
	<CAHrZfZBCv0CrhkX9sSJ7Ud3v8STHkc0HtTDEP5-Svj23+7gh8Q@mail.gmail.com>
	<514A1BBA.8010605@egenix.com>
	<CAHrZfZAPpO2m+ZTkLyT8cLnkD4OsySZ+pq9YDzVHT_WDLkgoHA@mail.gmail.com>
	<CAHrZfZC32UjJUi8EPUqCJQfm0ykx5Lzr96cq5DncxhvwWk7K2Q@mail.gmail.com>
Message-ID: <514A4506.70802@egenix.com>

On 20.03.2013 23:28, Richard Jones wrote:
> On 20 March 2013 14:17, Richard Jones <r1chardj0n3s at gmail.com> wrote:
>> On 20 March 2013 13:27, M.-A. Lemburg <mal at egenix.com> wrote:
>>> On 20.03.2013 21:16, Richard Jones wrote:
>>>> On 20 March 2013 12:31, M.-A. Lemburg <mal at egenix.com> wrote:
>>>>> * Will there be an RPC interface to register URLs with PyPI ?
>>>>>
>>>>> Doing this manually for a large number of files is, well,
>>>>> not ideal :-)
>>>>
>>>> It's just a HTTP POST and there's plans for a tool.
>>>
>>> Is this documented somewhere ? I'd like to add support for it
>>> to our release process.
>>
>> I'll think about adding this to the PEP.
> 
> This is now in the PEP.

Hmm, looks like the PEP update process isn't working on the site:

http://www.python.org/dev/peps/pep-0438/

Last-Modified:	2013-03-15 22:51:25

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 21 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From r1chardj0n3s at gmail.com  Thu Mar 21 00:32:03 2013
From: r1chardj0n3s at gmail.com (Richard Jones)
Date: Wed, 20 Mar 2013 16:32:03 -0700
Subject: [Catalog-sig] PEP 438 implementation on testpypi
In-Reply-To: <514A4506.70802@egenix.com>
References: <CAHrZfZCJA+gPBG1MzkpAUueZB=udp56ghjKMF0j71AH-0wWJ=w@mail.gmail.com>
	<514A0E8B.5030500@egenix.com>
	<CAHrZfZBCv0CrhkX9sSJ7Ud3v8STHkc0HtTDEP5-Svj23+7gh8Q@mail.gmail.com>
	<514A1BBA.8010605@egenix.com>
	<CAHrZfZAPpO2m+ZTkLyT8cLnkD4OsySZ+pq9YDzVHT_WDLkgoHA@mail.gmail.com>
	<CAHrZfZC32UjJUi8EPUqCJQfm0ykx5Lzr96cq5DncxhvwWk7K2Q@mail.gmail.com>
	<514A4506.70802@egenix.com>
Message-ID: <CAHrZfZAS0AuBDJa4Fp1OWh41vNgne9G_-=ban4OJQ9BABRt_Kw@mail.gmail.com>

On 20 March 2013 16:23, M.-A. Lemburg <mal at egenix.com> wrote:
> On 20.03.2013 23:28, Richard Jones wrote:
>> On 20 March 2013 14:17, Richard Jones <r1chardj0n3s at gmail.com> wrote:
>>> On 20 March 2013 13:27, M.-A. Lemburg <mal at egenix.com> wrote:
>>>> On 20.03.2013 21:16, Richard Jones wrote:
>>>>> On 20 March 2013 12:31, M.-A. Lemburg <mal at egenix.com> wrote:
>>>>>> * Will there be an RPC interface to register URLs with PyPI ?
>>>>>>
>>>>>> Doing this manually for a large number of files is, well,
>>>>>> not ideal :-)
>>>>>
>>>>> It's just a HTTP POST and there's plans for a tool.
>>>>
>>>> Is this documented somewhere ? I'd like to add support for it
>>>> to our release process.
>>>
>>> I'll think about adding this to the PEP.
>>
>> This is now in the PEP.
>
> Hmm, looks like the PEP update process isn't working on the site:
>
> http://www.python.org/dev/peps/pep-0438/
>
> Last-Modified:  2013-03-15 22:51:25

It's being edited in a separate repos. I've not submitted the latest
from Holger's repos to the pep editors (yes, I have commit privs but
I'm not fully up to speed on the process so will leave it to those who
are.)


     Richard

From ct at gocept.com  Thu Mar 21 00:59:21 2013
From: ct at gocept.com (Christian Theune)
Date: Wed, 20 Mar 2013 16:59:21 -0700
Subject: [Catalog-sig] Replacement client for pep381client
Message-ID: <kidigl$lvt$1@ger.gmane.org>

Hi,

as you might be aware, I've done my share on bitching about my mirror 
(f.pypi.python.org) breaking.

I have picked pep381client apart yesterday and rebuilt it - mostly from 
ground up.

You can find a working version here:
https://bitbucket.org/ctheune/bandersnatch

The focus has been on making it a lot more robust and a lot easier to 
repair a mirror when it's known to be broken. To achieve that I:

- refactored the code, trying to make it more intentional, less mechanical
- stop parsing the simple pages' html and make more use of the XML-RPC API
- add Tarek's worker/queue approach for parallelizing it
- keep as little state as possible on the client
- switch form timestamps to serial counters for checking what and how 
much to update
- handle locking of concurrent runs more gracefully

I think I have a good grasp of what's going on now so that I can keep 
maintining this in the future.

I'm currently re-initializing my own mirror. This basically can be run 
in-place by just removing the existing state data and calling my sync 
script (bsn-mirror) instead of pep381run with the same parameters.

Tomorrow I'll update the documentation, make it use a config file and 
put some lipstick on the main entry point. After that I should be ready 
for a release.

If you want to give it a try already, you just do this:

$ hg clone https://bitbucket/org/ctheune/bandersnatch
$ cd bandersnatch
$ virtualenv-2.7 .
$ bin/python bootstrap.py
$ bin/buildout
$ bin/bsn-mirror /my/mirror/path

Cheers,
Christian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130320/ac78d7e7/attachment.html>

From r1chardj0n3s at gmail.com  Thu Mar 21 01:30:09 2013
From: r1chardj0n3s at gmail.com (Richard Jones)
Date: Wed, 20 Mar 2013 17:30:09 -0700
Subject: [Catalog-sig] Updated PEP 438
Message-ID: <CAHrZfZC+_b9d+PykJLjwj9fRdnDmf4G1yK=OarF8yuUpxi9Dpw@mail.gmail.com>

I've pushed the latest PEP to the repos. It has all the recent
clarifications and the API docs. Just need to wait for the website to
rebuild or something.

Unless there's any last-minute problems I'll accept the PEP in this
form and push the implementation to the production PyPI next week
after I fly home.


     Richard

From ct at gocept.com  Thu Mar 21 03:27:30 2013
From: ct at gocept.com (Christian Theune)
Date: Wed, 20 Mar 2013 19:27:30 -0700
Subject: [Catalog-sig] ResponseNotReady error while trying to do fresh
	sync
References: <CAG1ZdCBPNVUQ4TNkzTA=rdCcS-sXQrpSTyL5KSZVyw2pd89fPw@mail.gmail.com>
Message-ID: <kidr6e$sv3$1@ger.gmane.org>

On 2013-03-14 04:17:35 +0000, Qijiang Fan said:

> Hello,
> I'm maintaining e.pypi.python.org (with Aron Xu).
> We met some issues on our network attached storage, so we decided to
> do a fresh sync of pypi.
> We met an issue while doing that,

If you're interested: check out bandersnatch. I just had it recover my 
broken index nicely in about 2.5 hours.

Christian



From ct at gocept.com  Thu Mar 21 03:27:53 2013
From: ct at gocept.com (Christian Theune)
Date: Wed, 20 Mar 2013 19:27:53 -0700
Subject: [Catalog-sig] Replacement client for pep381client
References: <kidigl$lvt$1@ger.gmane.org>
Message-ID: <kidr75$sv3$2@ger.gmane.org>

On 2013-03-20 23:59:21 +0000, Christian Theune said:
> 
> I'm currently re-initializing my own mirror. This basically can be run 
> in-place by just removing the existing state data and calling my sync 
> script (bsn-mirror) instead of pep381run with the same parameters.

This worked nicely for me - I'm running my mirror on bandersnatch now.

Christian



From holger at merlinux.eu  Thu Mar 21 07:22:37 2013
From: holger at merlinux.eu (holger krekel)
Date: Thu, 21 Mar 2013 06:22:37 +0000
Subject: [Catalog-sig] Updated PEP 438
In-Reply-To: <CAHrZfZC+_b9d+PykJLjwj9fRdnDmf4G1yK=OarF8yuUpxi9Dpw@mail.gmail.com>
References: <CAHrZfZC+_b9d+PykJLjwj9fRdnDmf4G1yK=OarF8yuUpxi9Dpw@mail.gmail.com>
Message-ID: <20130321062237.GN9677@merlinux.eu>

Hi Richard, all,

On Wed, Mar 20, 2013 at 17:30 -0700, Richard Jones wrote:
> I've pushed the latest PEP to the repos. It has all the recent
> clarifications and the API docs. Just need to wait for the website to
> rebuild or something.

It's online now. Current references to PEP438 (also inlined below):

    http://www.python.org/dev/peps/pep-0438/
    https://bitbucket.org/hpk42/pep-pypi/src/c0cbd3f3508991f5c47eb0fdb036c6e25ef45047/PEP-438.txt?at=default
 
> Unless there's any last-minute problems I'll accept the PEP in this
> form and push the implementation to the production PyPI next week
> after I fly home.

testpypi.python.org keeps 502ing on me - probably makes sense to first have
that stable and reviewed for a few days at least.

best and thanks everybody,

holger


PEP: 438
Title: Transitioning to release-file hosting on PyPI
Version: $Revision$
Last-Modified: $Date$
Author: Holger Krekel <holger at merlinux.eu>, Carl Meyer <carl at oddbird.net>
BDFL-Delegate: Richard Jones <richard at python.org>
Discussions-To: catalog-sig at python.org
Status: Draft
Type: Process
Content-Type: text/x-rst
Created: 15-Mar-2013
Post-History:


Abstract
========

This PEP proposes a backward-compatible two-phase transition process
to speed up, simplify and robustify installing from the
pypi.python.org (PyPI) package index.  To ease the transition and
minimize client-side friction, **no changes to distutils or existing
installation tools are required in order to benefit from the first
transition phase, which will result in faster, more reliable installs
for most existing packages**.

The first transition phase implements easy and explicit means for a
package maintainer to control which release file links are served to
present-day installation tools.  The first phase also includes the
implementation of analysis tools for present-day packages, to support
communication with package maintainers and the automated setting of
default modes for controlling release file links.  The first phase
also will default newly-registered projects on PyPI to only serve
links to release files which were uploaded to PyPI.

The second transition phase concerns end-user installation tools,
which shall default to only install release files that are hosted on
PyPI and tell the user if external release files exist, offering a
choice to automatically use those external files.  External release
files shall in the future be registered together with a checksum
hash so that installation tools can verify the integrity of the
eventual download (PyPI-hosted release files always carry such
a checksum).

Alternative PyPI server implementations should implement the new
simple index serving behaviour of transition phase 1 to avoid
installation tools treating their release links as external ones in
phase 2.


Rationale
=========

.. _history:

History and motivations for external hosting
--------------------------------------------

When PyPI went online, it offered release registration but had no
facility to host release files itself.  When hosting was added, no
automated downloading tool existed yet.  When Philip Eby implemented
automated downloading (through setuptools), he made the choice to
allow people to use download hosts of their choice.  The finding of
externally-hosted packages was implemented as follows:

#. The PyPI ``simple/`` index for a package contains all links found
   by scraping them from that package's long_description metadata for
   any release. Links in the "Download-URL" and "Home-page" metadata
   fields are given ``rel=download`` and ``rel=homepage`` attributes,
   respectively.

#. Any of these links whose target is a file whose name appears to be
   in the form of an installable source or binary distribution, with
   name in the form "packagename-version.ARCHIVEEXT", is considered a
   potential installation candidate by installation tools.

#. Similarly, any links suffixed with an "#egg=packagename-version"
   fragment are considered an installation candidate.

#. Additionally, the ``rel=homepage`` and ``rel=download`` links are
   crawled by installation tools and, if HTML, are themselves scraped
   for release-file links in the above formats.

See the easy_install documentation for a complete description of this
behavior. [1]_

Today, most packages indexed on PyPI host their release files on
PyPI. Out of 29,117 total projects on PyPI, only 2,581 (less than 10%)
include any links to installable files that are available only
off-PyPI. [2]_

There are many reasons [3]_ why people have chosen external
hosting. To cite just a few:

- release processes and scripts have been developed already and upload
  to external sites

- it takes too long to upload large files from some places in the
  world

- export restrictions e.g. for crypto-related software

- company policies which require offering open source packages through
  own sites

- problems with integrating uploading to PyPI into one's release
  process (because of release policies)

- desiring download statistics different from those maintained by PyPI

- perceived bad reliability of PyPI

- not aware that PyPI offers file-hosting

Irrespective of the present-day validity of these reasons, there
clearly is a history why people choose to host files externally and it
even was for some time the only way you could do things.  This PEP
takes the position that there remain some valid reasons for
external hosting even today.

Problem
-------

**Today, python package installers (pip, easy_install, buildout, and
others) often need to query many non-PyPI URLs even if there are no
externally hosted files**.  Apart from querying pypi.python.org's
simple index pages, also all homepages and download pages ever
specified with any release of a package are crawled by an installer.
The need for installers to crawl external sites slows down
installation and makes for a brittle and unreliable installation
process.  Those sites and packages also don't take part in the
:pep:`381` mirroring infrastructure, further decreasing reliability
and speed of automated installation processes around the world.

Most packages are hosted directly on pypi.python.org [2]_.  Even for
these packages, installers still crawl their homepage and
download-url, if specified.  Many package uploaders are not aware that
specifying the "homepage" or "download-url" in their package metadata
will needlessly slow down the installation process for all users.

Relying on third party sites also opens up more attack vectors for
injecting malicious packages into sites using automated installs.  A
simple attack might just involve getting hold of an old now-unused
homepage domain and placing malicious packages there.  Moreover,
performing a Man-in-The-Middle (MITM) attack between an installation
site and any of the download sites can inject malicious packages on
the installation site.  As many homepages and download locations are
using HTTP and not HTTPS, such attacks are not hard to launch.  Such
MITM attacks can easily happen even for packages which never intended
to host files externally as their homepages are contacted by
installers anyway.

There is currently no way for package maintainers to avoid
external-link crawling, other than removing all homepage/download url
metadata for all historic releases.  While a script [4]_ has been
written to perform this action, it is not a good general solution
because it removes useful metadata from PyPI releases.

Even if the sites referenced by "Homepage" and "Download-URL" links
were not scraped for further links, there is no obvious way under the
current system for a package owner to link to an installable file from
a long_description metadata field (which is shown as package
documentation on ``/pypi/PKG``) without installation tools
automatically considering that file a candidate for installation.
Conversely, there is no way to explicitly register multiple external
release files without putting them in metadata fields.


Goals
-----

These are the goals to be achieved by implementation of this PEP:

* Package owners should be able to explicitly control which files are
  presented by PyPI to installer tools as installation
  candidates. Installation should not be slowed and made less reliable
  by extensive and unnecessary crawling of links that package owners
  did not explicitly nominate as installation files.

* It should remain possible for package owners to choose to host their
  release files on their own hosting, external to PyPI. It should be
  easy for a user to request the installation of such releases using
  automated installer tools, especially if the external release files
  were registered together with a checksum hash.

* Automated installer tools should not install externally-hosted
  packages **by default**, but require explicit authorization to do so
  by the user. When tools refuse to install such a package by default,
  they should tell the user exactly which external link(s) the
  installer needs to follow, and what option(s) the user can provide
  to authorize the tool to follow those links. PyPI should provide all
  necessary metadata for installer tools to implement this easily and
  within a single request/reply interaction.

* Migration from the status quo to the above points should be gradual
  and minimize breakage. This includes tooling that makes it easy for
  package owners with an existing release process that uploads to
  non-PyPI hosting to also upload those release files to PyPI.


Solution / two transition phases
================================

The first transition phase introduces a "hosting-mode" field for each
project on PyPI, allowing package owners explicit control of which
release file links are served to present-day installation tools in the
machine-readable ``simple/`` index. The first transition will, after
successful hosting-mode manipulations by individual early-adopters,
set a default hosting mode for existing packages, based on automated
analysis.  **Maintainers will be notified one month ahead of any such
automated change**.  At completion of the first transition phase,
**all present-day existing release and installation processes and
tools are expected to continue working**.  Any remaining errors or
problems are expected to only relate to installation of individual
packages and can be easily corrected by package maintainers or PyPI
admins if maintainers are not reachable.

Also in the first phase, each link served in the ``simple/`` index
will be explicitly marked as ``rel="internal"`` if it is hosted by the
index itself (even if on a separate domain, which may be the case if
the index uses a CDN for file-serving). Any link not so marked will be
considered an external link.

In the second transition phase, PyPI client installation tools shall
be updated to default to only install ``rel="internal"`` packages
unless a user specifies option(s) to permit installing from external
links. See `second transition phase`_ for details on how installers
should behave.

Maintainers of packages which currently host release files on non-PyPI
sites shall receive instructions and tools to ease "re-hosting" of
their historic and future package release files.  This re-hosting tool
MUST be available before automated hosting-mode changes are announced
to package maintainers.


Implementation
==============

Hosting modes
-------------

The foundation of the first transition phase is the introduction of
three "modes" of PyPI hosting for a package, affecting which links are
generated for the ``simple/`` index.  These modes are implemented
without requiring changes to installation tools via changes to the
algorithm for generating the machine-readable ``simple/`` index.

The modes are:

- ``pypi-scrape-crawl``: no change from the current situation of
  generating machine-readable links for installation tools, as
  outlined in the history_.

- ``pypi-scrape``: for a package in this mode, links to be added to
  the ``simple/`` index are still scraped from package
  metadata. However, the "Home-page" and "Download-url" links are
  given ``rel=ext-homepage`` and ``rel=ext-download`` attributes
  instead of ``rel=homepage`` and ``rel=download``. The effect of this
  (with no change in installation tools necessary) is that these links
  will not be followed and scraped for further candidate links by
  present-day installation tools: only installable files directly
  hosted from PyPI or linked directly from PyPI metadata will be
  considered for installation.  Installation tools MAY evolve to offer
  an option to use the new rel-attribution to crawl external pages but
  MUST NOT default to it.

- ``pypi-explicit``: for a package in this mode, only links to release
  files uploaded to PyPI, and external links to release files
  explicitly nominated by the package owner, will be added to the
  ``simple/`` index. PyPI will provide a new interface for package
  owners to supply external release-file URLs. These URLs MUST include
  a URL fragment in the form "#hashtype=hashvalue" specifying a hash
  of the externally-linked file which installer tools MUST use to
  validate that they have downloaded the intended file.

Thus the hope is that eventually all projects on PyPI can be migrated
to the ``pypi-explicit`` mode, while preserving the ability to install
release files hosted externally via installer tools. Deprecation of
hosting modes to eventually only allow the ``pypi-explicit`` mode is
NOT REGULATED by this PEP but is expected to become feasible some time
after successful implementation of the transition phases described in
this PEP.  It is expected that deprecation requires **a new process to
deal with abandoned packages** because of unreachable maintainers for
still popular packages.


First transition phase (PyPI)
-----------------------------

The proposed solution consists of multiple implementation and
communication steps:

#. Implement in PyPI the three modes described above, with an
   interface for package owners to select the mode for each package
   and register explicit external file URLs.

#. For packages in all modes, label links in the ``simple/`` index to
   index-hosted files with ``rel="internal"``, to make it easier for
   client tools to distinguish these links in the second phase.

#. Add an HTML tag ``<meta name="api-version" value="2">`` to all
   ``simple/`` index pages, to allow clients to distinguish between
   indexes providing the ``rel="internal"`` metadata and older ones
   that do not.

#. Default all newly-registered packages to ``pypi-explicit`` mode
   (package owners can still switch to the other modes as desired).

#. Determine (via automated analysis [2]_) which packages have all
   installable files available on PyPI itself (group A), which have
   all installable files on PyPI or linked directly from PyPI metadata
   (group B), and which have installable versions available that are
   linked only from external homepage/download HTML pages (group C).

#. Send mail to maintainers of projects in group A that their project
   will be automatically configured to ``pypi-explicit`` mode in one
   month, and similarly to maintainers of projects in group B that
   their project will be automatically configured to ``pypi-scrape``
   mode.  Inform them that this change is not expected to affect
   installability of their project at all, but will result in faster
   and safer installs for their users.  Encourage them to set this
   mode themselves sooner to benefit their users.

#. Send mail to maintainers of packages in group C that their package
   hosting mode is ``pypi-scrape-crawl``, list the URLs which
   currently are crawled, and suggest that they either re-host their
   packages directly on PyPI and switch to ``pypi-explicit``, or at
   least provide direct links to release files in PyPI metadata and
   switch to ``pypi-scrape``.  Provide instructions and tools to help
   with these transitions.


.. _`second transition phase`:

Second transition phase (installer tools)
-----------------------------------------

For the second transition phase, maintainers of installation tools are
asked to release two updates.

The first update shall provide clear warnings if externally-hosted
release files (that is, files whose link does not include
``rel="internal"``) are selected for download, for which projects and
URLs exactly this happens, and warn that in future versions
externally-hosted downloads will be disabled by default.

The second update should change the default mode to allow only
installation of ``rel="internal"`` package files, and allow
installation of externally-hosted packages only when the user supplies
an option.

The installer should distinguish between verifiable and non-verifiable
external links. A verifiable external link is a direct link to an
installable file from the PyPI ``simple/`` index that includes a hash
in the URL fragment ("#hashtype=hashvalue") which can be used to
verify the integrity of the downloaded file. A non-verifiable external
link is any link (other than those explicitly supplied by the user of
an installer tool) without a hash, scraped from external HTML, or
injected into the search via some other non-PyPI source
(e.g. setuptools' ``dependency_links`` feature).

Installers should provide a blanket option to allow
installing any verifiable external link. Non-verifiable external links
should only be installed if the user-provided option specifies exactly
which external domains can be used or for which specific package names
external links can be used.

When download of an externally-hosted package is disallowed by the
default configuration, the user should be notified, with instructions
for how to make the install succeed and warnings about the implication
(that a file will be downloaded from a site that is not part of the
package index). The warning given for non-verifiable links should
clearly state that the installer cannot verify the integrity of the
downloaded file. The warning given for verifiable external links
should simply note that the file will be downloaded from an external
URL, but that the file integrity can be verified by checksum.

Alternative PyPI-compatible index implementations should upgrade to
begin providing the ``rel="internal"`` metadata and the ``<meta
name="api-version" value="2">`` tag as soon as possible. For
alternative indexes which do not yet provide the meta tag in their
``simple/`` pages, installation tools should provide
backwards-compatible fallback behavior (treat links as internal as in
pre-PEP times and provide a warning).


API For Submitting External Distribution URLs
---------------------------------------------

New distribution URLs may be submitted by performing a HTTP POST to
the URL:

    https://pypi.python.org/pypi

With the following form-encoded data:

============== ================================
Name           Value
-------------- --------------------------------
:action        The string "urls"
name           The package name as a string
version        The release version as a string
new-url        The new URL to store
submit_new_url The string "yes"
============== ================================

The POST must be accompanied by an HTTP Basic Auth header encoding the
username and password of the user authorized to maintain the package
on PyPI.

The HTTP response to this request will be one of:

======= ============ ================================================
Code    Meaning      URL submission implications
------- ------------ ------------------------------------------------
200     OK           Everything worked just fine
400     Bad request  Data provided for submission was malformed
401     Unauthorised The username or password supplied were incorrect
403     Forbidden    User does not have permission to update the
                     package information (not Owner or Maintainer)
======= ============ ================================================


References
==========

.. [1] Philip Eby, easy_install 'Package Index "API"' documentation,
       http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api

.. [2] Donald Stufft, automated analysis of PyPI project links,
       https://github.com/dstufft/pypi.linkcheck

.. [3] Marc-Andre Lemburg, reasons for external hosting,
       http://mail.python.org/pipermail/catalog-sig/2013-March/005626.html

.. [4] Holger Krekel, script to remove homepage/download metadata for
       all releases
       http://mail.python.org/pipermail/catalog-sig/2013-February/005423.html


Acknowledgments
===============

Philip Eby for precise information and the basic ideas to implement
the transition via server-side changes only.

Donald Stufft for pushing away from external hosting and offering to
implement both a Pull Request for the necessary PyPI changes and the
analysis tool to drive the transition phase 1.

Marc-Andre Lemburg, Nick Coghlan and catalog-sig in general for
thinking through issues regarding getting rid of "external hosting".


Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:

From r1chardj0n3s at gmail.com  Thu Mar 21 07:45:56 2013
From: r1chardj0n3s at gmail.com (Richard Jones)
Date: Wed, 20 Mar 2013 23:45:56 -0700
Subject: [Catalog-sig] Updated PEP 438
In-Reply-To: <20130321062237.GN9677@merlinux.eu>
References: <CAHrZfZC+_b9d+PykJLjwj9fRdnDmf4G1yK=OarF8yuUpxi9Dpw@mail.gmail.com>
	<20130321062237.GN9677@merlinux.eu>
Message-ID: <CAHrZfZBpZ7d99GJauhaijxaJEVukRqja1An1BzfGO13c0A66hA@mail.gmail.com>

On 20 March 2013 23:22, holger krekel <holger at merlinux.eu> wrote:
> On Wed, Mar 20, 2013 at 17:30 -0700, Richard Jones wrote:
>> I've pushed the latest PEP to the repos. It has all the recent
>> clarifications and the API docs. Just need to wait for the website to
>> rebuild or something.
>
> It's online now. Current references to PEP438 (also inlined below):
>
>     http://www.python.org/dev/peps/pep-0438/
>     https://bitbucket.org/hpk42/pep-pypi/src/c0cbd3f3508991f5c47eb0fdb036c6e25ef45047/PEP-438.txt?at=default
>
>> Unless there's any last-minute problems I'll accept the PEP in this
>> form and push the implementation to the production PyPI next week
>> after I fly home.
>
> testpypi.python.org keeps 502ing on me - probably makes sense to first have
> that stable and reviewed for a few days at least.

Dammit, I don't know why but uwsgi just keeps bloody dying :-(


    Richard

From holger at merlinux.eu  Thu Mar 21 11:28:24 2013
From: holger at merlinux.eu (holger krekel)
Date: Thu, 21 Mar 2013 10:28:24 +0000
Subject: [Catalog-sig] Replacement client for pep381client
In-Reply-To: <kidr75$sv3$2@ger.gmane.org>
References: <kidigl$lvt$1@ger.gmane.org>
 <kidr75$sv3$2@ger.gmane.org>
Message-ID: <20130321102824.GQ9677@merlinux.eu>

On Wed, Mar 20, 2013 at 19:27 -0700, Christian Theune wrote:
> On 2013-03-20 23:59:21 +0000, Christian Theune said:
> >
> >I'm currently re-initializing my own mirror. This basically can be
> >run in-place by just removing the existing state data and calling
> >my sync script (bsn-mirror) instead of pep381run with the same
> >parameters.
> 
> This worked nicely for me - I'm running my mirror on bandersnatch now.

I got so far 3 errors like this one::

    2013-03-21 14:23:19,759 bandersnatch.package INFO: Downloading: https://pypi.python.org/packages/source/C/Clay/Clay-0.13.tar.gz
    2013-03-21 14:23:20,384 bandersnatch.package ERROR: Error syncing package: Coopr
    Traceback (most recent call last):
      File "/home/hpk/bandersnatch/src/bandersnatch/package.py", line 50, in sync
        self.sync_release_files()
      File "/home/hpk/bandersnatch/src/bandersnatch/package.py", line 68, in sync_release_files
        self.download_file(release_file['url'], release_file['md5_digest'])
      File "/home/hpk/bandersnatch/src/bandersnatch/package.py", line 144, in download_file
        url, existing_hash, md5sum))
    ValueError: https://pypi.python.org/packages/source/C/Coopr/Coopr-1.1.zip has hash 97cb7ae47656df10d243533c4f0c63c1 instead of 7ed6916702b2afccd254b423450ac4af

and the command terminates.  I can restart fine, though.  Will continue
to do continue and see how far i get.  Seems to perform quickly, btw :)

holger

> Christian
> 
> 
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
> 

From christian at python.org  Thu Mar 21 13:06:07 2013
From: christian at python.org (Christian Heimes)
Date: Thu, 21 Mar 2013 13:06:07 +0100
Subject: [Catalog-sig] Access to Windows' cert store
Message-ID: <514AF7AF.7000304@python.org>

Hi,

the message is slightly off-topic but it might be interesting for pip,
setuptools and other developers that are working on HTTPS for PyPI.

I while ago I found C++ example code that shows how to dump CA and CRL
certs from Windows's system cert store. The system cert store contains
the certificates used by Windows, IE etc.

Yesterday I reimplemented the C++ code with Python and ctypes. I have
tested it with Python 2.6 to 3.3 (x86 and x86_64) on Windows 7. It
should work with Windows XP / Windows Server 2003 and all newer versions
of Windows. The output is usabl by Python's SSL module but you have to
dump the certs to a file first.

I'm planing to add the feature to Python 3.4, too.
http://bugs.python.org/issue17134

You can download the code from

  https://bitbucket.org/tiran/wincertstore


Regards,
Christian

From mal at egenix.com  Thu Mar 21 13:58:34 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 21 Mar 2013 13:58:34 +0100
Subject: [Catalog-sig] Access to Windows' cert store
In-Reply-To: <514AF7AF.7000304@python.org>
References: <514AF7AF.7000304@python.org>
Message-ID: <514B03FA.7060605@egenix.com>

On 21.03.2013 13:06, Christian Heimes wrote:
> Hi,
> 
> the message is slightly off-topic but it might be interesting for pip,
> setuptools and other developers that are working on HTTPS for PyPI.
> 
> I while ago I found C++ example code that shows how to dump CA and CRL
> certs from Windows's system cert store. The system cert store contains
> the certificates used by Windows, IE etc.

Why not simply use the Firefox certs ?

We started adding these to our pyOpenSSL distribution with the last release:
https://cms.egenix.com/products/python/pyOpenSSL/doc/#Module_OpenSSL.ca_bundle

> Yesterday I reimplemented the C++ code with Python and ctypes. I have
> tested it with Python 2.6 to 3.3 (x86 and x86_64) on Windows 7. It
> should work with Windows XP / Windows Server 2003 and all newer versions
> of Windows. The output is usabl by Python's SSL module but you have to
> dump the certs to a file first.

You can setup OpenSSL Contexts to validate based in-memory
certificate as well: just add the certs one by one to the
Context using the X509Store object you can obtain using
context.get_cert_store().

> I'm planing to add the feature to Python 3.4, too.
> http://bugs.python.org/issue17134
> 
> You can download the code from
> 
>   https://bitbucket.org/tiran/wincertstore

I think this would be useful addition for pyOpenSSL as well - if
it's possible to extract the Windows certificates without admin
rights.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 21 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From christian at python.org  Thu Mar 21 14:32:24 2013
From: christian at python.org (Christian Heimes)
Date: Thu, 21 Mar 2013 14:32:24 +0100
Subject: [Catalog-sig] Access to Windows' cert store
In-Reply-To: <514B03FA.7060605@egenix.com>
References: <514AF7AF.7000304@python.org> <514B03FA.7060605@egenix.com>
Message-ID: <514B0BE8.6010006@python.org>

Am 21.03.2013 13:58, schrieb M.-A. Lemburg:
> Why not simply use the Firefox certs ?
> 
> We started adding these to our pyOpenSSL distribution with the last release:
> https://cms.egenix.com/products/python/pyOpenSSL/doc/#Module_OpenSSL.ca_bundle

Sure, that's another viable option. But IIRC some people have raised
license concerns.

> You can setup OpenSSL Contexts to validate based in-memory
> certificate as well: just add the certs one by one to the
> Context using the X509Store object you can obtain using
> context.get_cert_store().

I assume you are talking about pyOpenSSL? I was referring to Python's
SSL module. It can only load CA certs from a file or directory. It would
be a useful feature for Python's SSL module, too.

> I think this would be useful addition for pyOpenSSL as well - if
> it's possible to extract the Windows certificates without admin
> rights.

The code works without special privileges. The MSDN references don't
mention any restrictions, too. The code is rather simple -- I'm only
using four functions and three structs.

Christian

From donald at stufft.io  Thu Mar 21 14:40:15 2013
From: donald at stufft.io (Donald Stufft)
Date: Thu, 21 Mar 2013 09:40:15 -0400
Subject: [Catalog-sig] Access to Windows' cert store
In-Reply-To: <514B0BE8.6010006@python.org>
References: <514AF7AF.7000304@python.org> <514B03FA.7060605@egenix.com>
	<514B0BE8.6010006@python.org>
Message-ID: <CD4D0627-32ED-47D8-97B1-50FB4460A675@stufft.io>


On Mar 21, 2013, at 9:32 AM, Christian Heimes <christian at python.org> wrote:

> Am 21.03.2013 13:58, schrieb M.-A. Lemburg:
>> Why not simply use the Firefox certs ?
>> 
>> We started adding these to our pyOpenSSL distribution with the last release:
>> https://cms.egenix.com/products/python/pyOpenSSL/doc/#Module_OpenSSL.ca_bundle
> 
> Sure, that's another viable option. But IIRC some people have raised
> license concerns.

Firefox bundle is releases under the MPL which only applies to the individual files and not the entire project.

> 
>> You can setup OpenSSL Contexts to validate based in-memory
>> certificate as well: just add the certs one by one to the
>> Context using the X509Store object you can obtain using
>> context.get_cert_store().
> 
> I assume you are talking about pyOpenSSL? I was referring to Python's
> SSL module. It can only load CA certs from a file or directory. It would
> be a useful feature for Python's SSL module, too.
> 
>> I think this would be useful addition for pyOpenSSL as well - if
>> it's possible to extract the Windows certificates without admin
>> rights.
> 
> The code works without special privileges. The MSDN references don't
> mention any restrictions, too. The code is rather simple -- I'm only
> using four functions and three structs.

I would love to see this added to Python Core. As it is right now if OpenSSL is configured correctly you can do `urllib.request.urlopen("?", cadefault=True)` and things will just work. This breaks down on Windows though.

> 
> Christian
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130321/56d97fa5/attachment.pgp>

From mal at egenix.com  Thu Mar 21 15:01:08 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 21 Mar 2013 15:01:08 +0100
Subject: [Catalog-sig] Access to Windows' cert store
In-Reply-To: <514B0BE8.6010006@python.org>
References: <514AF7AF.7000304@python.org> <514B03FA.7060605@egenix.com>
	<514B0BE8.6010006@python.org>
Message-ID: <514B12A4.4090105@egenix.com>

On 21.03.2013 14:32, Christian Heimes wrote:
> Am 21.03.2013 13:58, schrieb M.-A. Lemburg:
>> Why not simply use the Firefox certs ?
>>
>> We started adding these to our pyOpenSSL distribution with the last release:
>> https://cms.egenix.com/products/python/pyOpenSSL/doc/#Module_OpenSSL.ca_bundle
> 
> Sure, that's another viable option. But IIRC some people have raised
> license concerns.

I think the more problematic aspect is not being able to easily update
the CA list. Firefox and Windows do this automatically for you,
but for Python, this could only be done with patch level releases.

Still, it's better than not having access to any such CA list,
so would be a good fallback solution.

>> You can setup OpenSSL Contexts to validate based in-memory
>> certificate as well: just add the certs one by one to the
>> Context using the X509Store object you can obtain using
>> context.get_cert_store().
> 
> I assume you are talking about pyOpenSSL? I was referring to Python's
> SSL module. It can only load CA certs from a file or directory. It would
> be a useful feature for Python's SSL module, too.

Ah, right.

>> I think this would be useful addition for pyOpenSSL as well - if
>> it's possible to extract the Windows certificates without admin
>> rights.
> 
> The code works without special privileges. The MSDN references don't
> mention any restrictions, too. The code is rather simple -- I'm only
> using four functions and three structs.

Nice.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 21 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From solipsis at pitrou.net  Thu Mar 21 15:12:12 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 21 Mar 2013 14:12:12 +0000 (UTC)
Subject: [Catalog-sig] Access to Windows' cert store
References: <514AF7AF.7000304@python.org>
Message-ID: <loom.20130321T145846-707@post.gmane.org>

Christian Heimes <christian <at> python.org> writes:
> 
> I'm planing to add the feature to Python 3.4, too.
> http://bugs.python.org/issue17134
> 
> You can download the code from
> 
>   https://bitbucket.org/tiran/wincertstore

This is nice, but can you follow up on the bug tracker? It would be much
more appropriate than catalog-sig.

Also you shouldn't need to encode the certs into PEM format. AFAICT,
SSL_CTX_get_cert_store(), d2i_X509_AUX() and X509_STORE_add_cert() should
be sufficient.

Regards

Antoine.



From pje at telecommunity.com  Thu Mar 21 16:29:21 2013
From: pje at telecommunity.com (PJ Eby)
Date: Thu, 21 Mar 2013 11:29:21 -0400
Subject: [Catalog-sig] Access to Windows' cert store
In-Reply-To: <514AF7AF.7000304@python.org>
References: <514AF7AF.7000304@python.org>
Message-ID: <CALeMXf5MphdFFY5WA9AzL4jEhDUcZDhxvmifY-Sbua4Kb-0sYA@mail.gmail.com>

On Thu, Mar 21, 2013 at 8:06 AM, Christian Heimes <christian at python.org> wrote:
> Hi,
>
> the message is slightly off-topic but it might be interesting for pip,
> setuptools and other developers that are working on HTTPS for PyPI.
>
> I while ago I found C++ example code that shows how to dump CA and CRL
> certs from Windows's system cert store. The system cert store contains
> the certificates used by Windows, IE etc.
>
> Yesterday I reimplemented the C++ code with Python and ctypes. I have
> tested it with Python 2.6 to 3.3 (x86 and x86_64) on Windows 7. It
> should work with Windows XP / Windows Server 2003 and all newer versions
> of Windows. The output is usabl by Python's SSL module but you have to
> dump the certs to a file first.
>
> I'm planing to add the feature to Python 3.4, too.
> http://bugs.python.org/issue17134
>
> You can download the code from
>
>   https://bitbucket.org/tiran/wincertstore
>

Very nice!  I definitely would like to use this for setuptools, but I
actually want it for versions 2.3-2.5, which can't use requests or
urllib3 or anything like that.  So I hacked on the code a bit and got
it to work (or at least got the __main__ stub to spit out a bunch of
data) with Python 2.3 and ctypes 1.0.2 (the last standalone release
for which Windows binaries are available).  Would you like a patch?

(Note: absolute_import, decorators, and the actual use of "with:" and
generator expressions had to go, but this doesn't change any API or
semantics as far as I can tell, just a bit of appearance here and
there, and the code still runs with 2.4, 2.5, 2.7, 3.1, and 3.2 that I
tried.)

From christian at python.org  Thu Mar 21 17:11:45 2013
From: christian at python.org (Christian Heimes)
Date: Thu, 21 Mar 2013 17:11:45 +0100
Subject: [Catalog-sig] Access to Windows' cert store
In-Reply-To: <CALeMXf5MphdFFY5WA9AzL4jEhDUcZDhxvmifY-Sbua4Kb-0sYA@mail.gmail.com>
References: <514AF7AF.7000304@python.org>
	<CALeMXf5MphdFFY5WA9AzL4jEhDUcZDhxvmifY-Sbua4Kb-0sYA@mail.gmail.com>
Message-ID: <514B3141.8040504@python.org>

Am 21.03.2013 16:29, schrieb PJ Eby:
> Very nice!  I definitely would like to use this for setuptools, but I
> actually want it for versions 2.3-2.5, which can't use requests or
> urllib3 or anything like that.  So I hacked on the code a bit and got
> it to work (or at least got the __main__ stub to spit out a bunch of
> data) with Python 2.3 and ctypes 1.0.2 (the last standalone release
> for which Windows binaries are available).  Would you like a patch?
> 
> (Note: absolute_import, decorators, and the actual use of "with:" and
> generator expressions had to go, but this doesn't change any API or
> semantics as far as I can tell, just a bit of appearance here and
> there, and the code still runs with 2.4, 2.5, 2.7, 3.1, and 3.2 that I
> tried.)

Sure, send me your patch and I'll add it later. Feel free to include a
copy of the code in setuptools if you like. I don't mind as long as it
keeps our users happy. ;)

Christian



From christian at python.org  Thu Mar 21 17:22:19 2013
From: christian at python.org (Christian Heimes)
Date: Thu, 21 Mar 2013 17:22:19 +0100
Subject: [Catalog-sig] Access to Windows' cert store
In-Reply-To: <loom.20130321T145846-707@post.gmane.org>
References: <514AF7AF.7000304@python.org>
	<loom.20130321T145846-707@post.gmane.org>
Message-ID: <514B33BB.4070609@python.org>

Am 21.03.2013 15:12, schrieb Antoine Pitrou:
> This is nice, but can you follow up on the bug tracker? It would be much
> more appropriate than catalog-sig.
> 
> Also you shouldn't need to encode the certs into PEM format. AFAICT,
> SSL_CTX_get_cert_store(), d2i_X509_AUX() and X509_STORE_add_cert() should
> be sufficient.

The code is a proof-of-concept. I want to test the feature and provide
something that works without modification of Python stdlib code or a C
extension. It's the only viable option for PIP and setuptools as it
works out of the box.

For Python 3.4 I don't want to use ctypes or PEM. The crypt32 API
provides the certificates and CRLs either as PKCS#7 or DER binary data.
I'll update the ticket as soon as I'm done with testing.

Christian

From richard at python.org  Thu Mar 21 18:31:20 2013
From: richard at python.org (Richard Jones)
Date: Thu, 21 Mar 2013 10:31:20 -0700
Subject: [Catalog-sig] Replacement client for pep381client
In-Reply-To: <kidr75$sv3$2@ger.gmane.org>
References: <kidigl$lvt$1@ger.gmane.org>
	<kidr75$sv3$2@ger.gmane.org>
Message-ID: <CAHrZfZCOycA+CitCMMqKu4qJdmUviGaF-cjLF2QVyxA3oQFX3w@mail.gmail.com>

On 20 March 2013 19:27, Christian Theune <ct at gocept.com> wrote:
> On 2013-03-20 23:59:21 +0000, Christian Theune said:
>>
>>
>> I'm currently re-initializing my own mirror. This basically can be run
>> in-place by just removing the existing state data and calling my sync script
>> (bsn-mirror) instead of pep381run with the same parameters.
>
>
> This worked nicely for me - I'm running my mirror on bandersnatch now.

Nice work, Christian, thanks!


     Richard

From ct at gocept.com  Thu Mar 21 18:18:15 2013
From: ct at gocept.com (Christian Theune)
Date: Thu, 21 Mar 2013 10:18:15 -0700
Subject: [Catalog-sig] Replacement client for pep381client
In-Reply-To: <20130321102824.GQ9677@merlinux.eu>
References: <kidigl$lvt$1@ger.gmane.org> <kidr75$sv3$2@ger.gmane.org>
	<20130321102824.GQ9677@merlinux.eu>
Message-ID: <23104ED9-7614-4165-8CE8-452553979BAC@gocept.com>


On Mar 21, 2013, at 3:28 AM, holger krekel <holger at merlinux.eu> wrote:

> On Wed, Mar 20, 2013 at 19:27 -0700, Christian Theune wrote:
>> On 2013-03-20 23:59:21 +0000, Christian Theune said:
>>> 
>>> I'm currently re-initializing my own mirror. This basically can be
>>> run in-place by just removing the existing state data and calling
>>> my sync script (bsn-mirror) instead of pep381run with the same
>>> parameters.
>> 
>> This worked nicely for me - I'm running my mirror on bandersnatch now.
> 
> I got so far 3 errors like this one::
> 
>    2013-03-21 14:23:19,759 bandersnatch.package INFO: Downloading: https://pypi.python.org/packages/source/C/Clay/Clay-0.13.tar.gz
>    2013-03-21 14:23:20,384 bandersnatch.package ERROR: Error syncing package: Coopr
>    Traceback (most recent call last):
>      File "/home/hpk/bandersnatch/src/bandersnatch/package.py", line 50, in sync
>        self.sync_release_files()
>      File "/home/hpk/bandersnatch/src/bandersnatch/package.py", line 68, in sync_release_files
>        self.download_file(release_file['url'], release_file['md5_digest'])
>      File "/home/hpk/bandersnatch/src/bandersnatch/package.py", line 144, in download_file
>        url, existing_hash, md5sum))
>    ValueError: https://pypi.python.org/packages/source/C/Coopr/Coopr-1.1.zip has hash 97cb7ae47656df10d243533c4f0c63c1 instead of 7ed6916702b2afccd254b423450ac4af
> 
> and the command terminates.  I can restart fine, though.  Will continue
> to do continue and see how far i get.  Seems to perform quickly, btw :)

This is an interesting case: the data was downloaded from PyPI but didn't actually fit the md5sum that was announced. This kind of "should never happen" - but a subsequent run will retry gracefully.

Good to hear that it feels fast. :)

Christian

-- 
Christian Theune ? ct at gocept.com
gocept gmbh & co. kg ? Forsterstra?e 29 ? 06112 Halle (Saale) ? Germany
http://gocept.com ? Tel +49 345 1229889-7
Python, Pyramid, Plone, Zope ? consulting, development, hosting, operations

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4334 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130321/66790336/attachment.bin>

From ct at gocept.com  Thu Mar 21 23:15:34 2013
From: ct at gocept.com (Christian Theune)
Date: Thu, 21 Mar 2013 15:15:34 -0700
Subject: [Catalog-sig] Replacement client for pep381client
References: <kidigl$lvt$1@ger.gmane.org>
Message-ID: <kig0q0$e3k$1@ger.gmane.org>

Hi,

I'm slowly wrapping up my sprint. Here's what happened today:

- fixed some errors reported by users
- allow running a non-deleting mirror (with the hint that official ones 
must not do this)
- add config file handling to avoid complicated command lines including 
some documentation how to handle them
- add test coverage
- add jenkins integration

I got one error regarding filesystem encoding where I noticed that we 
expect that the mirror runs with UTF-8 as the filesystem encoding.
I'm not sure whether just simply encoding the filenames myself is the 
right thing or whether I need to ask operators to tune their 
environment accordingly.

I *guess* that just encoding manually to UTF-8 would be the right thing 
here. Can someone agree or disagree with this?

If you already started using bandersnatch then you need to adapt your 
command line calls once again (the last time) and create a config file.

Christian



From lists at zopyx.com  Fri Mar 22 05:44:29 2013
From: lists at zopyx.com (Andreas Jung)
Date: Fri, 22 Mar 2013 05:44:29 +0100
Subject: [Catalog-sig] Replacement client for pep381client
In-Reply-To: <kig0q0$e3k$1@ger.gmane.org>
References: <kidigl$lvt$1@ger.gmane.org> <kig0q0$e3k$1@ger.gmane.org>
Message-ID: <514BE1AD.2040202@zopyx.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Christian Theune wrote:
> Hi,
> 
> I'm slowly wrapping up my sprint. Here's what happened today:
> 
> - fixed some errors reported by users - allow running a non-deleting
> mirror (with the hint that official ones must not do this) - add
> config file handling to avoid complicated command lines including 
> some documentation how to handle them - add test coverage - add
> jenkins integration
> 
> I got one error regarding filesystem encoding where I noticed that
> we expect that the mirror runs with UTF-8 as the filesystem
> encoding. I'm not sure whether just simply encoding the filenames
> myself is the right thing or whether I need to ask operators to tune
> their environment accordingly.
> 
> I *guess* that just encoding manually to UTF-8 would be the right
> thing here. Can someone agree or disagree with this?

I don't know much about filesystem encodings but if a FS encoding like

>>> sys.getfilesystemencoding()
'ANSI_X3.4-1968'

is a system-wide setting then it is unlikely that you make an encoding
change a mandatory requirement. 'ANSI_X3.4-1968' is at least returned
on my CentOS and Ubuntu box.

Andreas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQGUBAEBAgAGBQJRS+GtAAoJEADcfz7u4AZjs4QLwKo6fPhQQLEwy5LeMQ/BY8Ow
Efh8ERnHxX+PJs684ie4w1ZUwj0hDx/TlK6NHVNIZarNKYo88M3+YJKD2NgHl2O+
FmFo3Pii/Lc0Wj5cX3wdl06Xn/YmDGFmxoBNOd9e2xnBkBhk9r6KtlJMAW1gnfAv
qIsAN37uWsGnfDFyDvQTDbkjr7HxRoQ8PFNL66DzhDntgrBSHwX3U7dGraVFPSlD
mRvxt+r+IlJEeE5GrD75t1N0MlrNZmcvGHyag1PSnmm1AAAqpflJKxAPZ8sV1KG2
BlxLRB0i4WboNWs0/OoNIH7fNdY0nng1mOCwNA5v5DEaWx1Gy59bK4LkbpNyB4kQ
yRUnjf340b4qUNr/KGb2A4ePoV4TNzSB3eli1JMxGpEdJzdm2nVfICEjRIDc3m2K
cRYjVC5FgGENPeQZ4kDteHmgA/Iu4Pxw6nFrxArKBBz9F6C9OWrPf6jiqWsKxZ0v
fYLssMbGT8XkQb38TOn5yEharguEXBk=
=SPdt
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lists.vcf
Type: text/x-vcard
Size: 353 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130322/588e2709/attachment.vcf>

From techtonik at gmail.com  Fri Mar 22 08:37:18 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Fri, 22 Mar 2013 10:37:18 +0300
Subject: [Catalog-sig] API for uploading packages to PyPI
Message-ID: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>

Hi,

I understand that this will make PyPI a potential target for automated spam
bots, but still it will be awesome to have an API to upload packages to
PyPI.

For example, I have a code that extract all necessary meta data for the
package from the source file itself. It is even able to generate setup.py
from this data. https://bitbucket.org/techtonik/astdump The next logical
step in this chain is to teach it to upload stuff to PyPI.

Now I thought that this setup.py is an unnecessary complication. What I
need, ideally is just upload single .py file, or a JSON and a .tar.gz FWIW.
Is there a straightforward API for things like that?

Please, CC.
-- 
anatoly t.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130322/8186ce97/attachment.html>

From ronaldoussoren at mac.com  Fri Mar 22 09:16:34 2013
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Fri, 22 Mar 2013 09:16:34 +0100
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
Message-ID: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>


On 22 Mar, 2013, at 8:37, anatoly techtonik <techtonik at gmail.com> wrote:

> Hi,
> 
> I understand that this will make PyPI a potential target for automated spam bots, but still it will be awesome to have an API to upload packages to PyPI.
> 
> For example, I have a code that extract all necessary meta data for the package from the source file itself. It is even able to generate setup.py from this data. https://bitbucket.org/techtonik/astdump The next logical step in this chain is to teach it to upload stuff to PyPI.
> 
> Now I thought that this setup.py is an unnecessary complication. What I need, ideally is just upload single .py file, or a JSON and a .tar.gz FWIW. Is there a straightforward API for things like that? 

Several APIs are documented on pages linked directly from the PyPI homepage (the Infrastructure box)

Ronald
> 
> Please, CC.
> -- 
> anatoly t.
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130322/34728eb0/attachment.html>

From techtonik at gmail.com  Fri Mar 22 09:26:25 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Fri, 22 Mar 2013 11:26:25 +0300
Subject: [Catalog-sig] PyPI web interface UX (Was: API for uploading
 packages to PyPI)
Message-ID: <CAPkN8xL5YGb=9P-FzQsLFfeK49bpMSv8HFwjCdfs8krS49Os0Q@mail.gmail.com>

OMG. I didn't even looked at the boxes. IMHO somebody should reduce the
amount of duplication and choices between menu and boxes. It is
really-really overburdened. For example, no need to say "use search above"
when it is evident the you need to find the button, or to use that "browse
all packages" link when it is actually a first item on the menu.

RSS should be moved out of scope of main menu. It is just "yikes!".
The whole menu section with links to main Python web site should be moved
out of place (is there a quick way to measure how many users follow these
links)?

Yes, I can send a patch if everyone agrees.
-- 
anatoly t.


On Fri, Mar 22, 2013 at 11:16 AM, Ronald Oussoren <ronaldoussoren at mac.com>wrote:

>
> On 22 Mar, 2013, at 8:37, anatoly techtonik <techtonik at gmail.com> wrote:
>
> Hi,
>
> I understand that this will make PyPI a potential target for automated
> spam bots, but still it will be awesome to have an API to upload packages
> to PyPI.
>
> For example, I have a code that extract all necessary meta data for the
> package from the source file itself. It is even able to generate setup.py
> from this data. https://bitbucket.org/techtonik/astdump The next logical
> step in this chain is to teach it to upload stuff to PyPI.
>
> Now I thought that this setup.py is an unnecessary complication. What I
> need, ideally is just upload single .py file, or a JSON and a .tar.gz FWIW.
> Is there a straightforward API for things like that?
>
>
> Several APIs are documented on pages linked directly from the PyPI
> homepage (the Infrastructure box)
>
> Ronald
>
>
> Please, CC.
> --
> anatoly t.
>  _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130322/e13f4dfa/attachment.html>

From techtonik at gmail.com  Fri Mar 22 09:32:00 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Fri, 22 Mar 2013 11:32:00 +0300
Subject: [Catalog-sig] PyPI Crediting
Message-ID: <CAPkN8x+WSGP7wBXwjWR41TKx=uEiV7J9qyFGMdv7aB1RzDzUdg@mail.gmail.com>

Does anybody think that PyPI source code base should include the names of
the people who contributed to its development?
-- 
anatoly t.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130322/8dfbc3a5/attachment-0001.html>

From techtonik at gmail.com  Fri Mar 22 09:58:35 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Fri, 22 Mar 2013 11:58:35 +0300
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
	<2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
Message-ID: <CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>

On Fri, Mar 22, 2013 at 11:16 AM, Ronald Oussoren <ronaldoussoren at mac.com>wrote:

>
> On 22 Mar, 2013, at 8:37, anatoly techtonik <techtonik at gmail.com> wrote:
>
> Hi,
>
> I understand that this will make PyPI a potential target for automated
> spam bots, but still it will be awesome to have an API to upload packages
> to PyPI.
>
> For example, I have a code that extract all necessary meta data for the
> package from the source file itself. It is even able to generate setup.py
> from this data. https://bitbucket.org/techtonik/astdump The next logical
> step in this chain is to teach it to upload stuff to PyPI.
>
> Now I thought that this setup.py is an unnecessary complication. What I
> need, ideally is just upload single .py file, or a JSON and a .tar.gz FWIW.
> Is there a straightforward API for things like that?
>
>
> Several APIs are documented on pages linked directly from the PyPI
> homepage (the Infrastructure box)
>

Thanks for the pointer.

Some links are broken. I added redirects for wiki pages, but it will be
better to fix links too.
https://bitbucket.org/loewis/pypi/pull-request/4

Among those it seems that only OAuth API can be used to upload stuff.
-- 
anatoly t.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130322/e75fc887/attachment.html>

From ronaldoussoren at mac.com  Fri Mar 22 10:04:24 2013
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Fri, 22 Mar 2013 10:04:24 +0100
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
	<2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
	<CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
Message-ID: <1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com>


On 22 Mar, 2013, at 9:58, anatoly techtonik <techtonik at gmail.com> wrote:

> On Fri, Mar 22, 2013 at 11:16 AM, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
> 
> On 22 Mar, 2013, at 8:37, anatoly techtonik <techtonik at gmail.com> wrote:
> 
>> Hi,
>> 
>> I understand that this will make PyPI a potential target for automated spam bots, but still it will be awesome to have an API to upload packages to PyPI.
>> 
>> For example, I have a code that extract all necessary meta data for the package from the source file itself. It is even able to generate setup.py from this data. https://bitbucket.org/techtonik/astdump The next logical step in this chain is to teach it to upload stuff to PyPI.
>> 
>> Now I thought that this setup.py is an unnecessary complication. What I need, ideally is just upload single .py file, or a JSON and a .tar.gz FWIW. Is there a straightforward API for things like that? 
> 
> Several APIs are documented on pages linked directly from the PyPI homepage (the Infrastructure box)
> 
> Thanks for the pointer.
> 
> Some links are broken. I added redirects for wiki pages, but it will be better to fix links too.
The OAuth link appears to be broken, and that's likely part of the fallout of the wiki.python.org breakin.

> https://bitbucket.org/loewis/pypi/pull-request/4
> 
> Among those it seems that only OAuth API can be used to upload stuff.

I haven't looked at the code yet, but that's unlikely as distutils uses the HTTP API to upload files and AFAIK distutils doesn't implement OAuth.   IIRC OAuth was added fairly recently to make it possible for users to delegate some permissions to external web applications (such as pythonpackages.com) without storing their password in those applications.

Ronald

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130322/e046e330/attachment.html>

From mal at egenix.com  Fri Mar 22 10:14:15 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 22 Mar 2013 10:14:15 +0100
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
	<2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
	<CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
	<1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com>
Message-ID: <514C20E7.5040101@egenix.com>

On 22.03.2013 10:04, Ronald Oussoren wrote:
> 
> On 22 Mar, 2013, at 9:58, anatoly techtonik <techtonik at gmail.com> wrote:
>> Some links are broken. I added redirects for wiki pages, but it will be better to fix links too.
> The OAuth link appears to be broken, and that's likely part of the fallout of the wiki.python.org breakin.

It is broken because of Anatoly's renaming.

The new name is http://wiki.python.org/moin/PyPiOauth

Anatoly: I don't consider such renaming for some perceived level of
consistency important enough to warrant the breakage you are introducing
to external links. Please don't !

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 22 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From mal at egenix.com  Fri Mar 22 10:16:10 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 22 Mar 2013 10:16:10 +0100
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <514C20E7.5040101@egenix.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
	<2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
	<CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
	<1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com>
	<514C20E7.5040101@egenix.com>
Message-ID: <514C215A.9010107@egenix.com>



On 22.03.2013 10:14, M.-A. Lemburg wrote:
> On 22.03.2013 10:04, Ronald Oussoren wrote:
>>
>> On 22 Mar, 2013, at 9:58, anatoly techtonik <techtonik at gmail.com> wrote:
>>> Some links are broken. I added redirects for wiki pages, but it will be better to fix links too.
>> The OAuth link appears to be broken, and that's likely part of the fallout of the wiki.python.org breakin.
> 
> It is broken because of Anatoly's renaming.
> 
> The new name is http://wiki.python.org/moin/PyPiOauth

Sorry, that was the old name, which is now gone. The new name is
http://wiki.python.org/moin/PyPIOAuth

> Anatoly: I don't consider such renaming for some perceived level of
> consistency important enough to warrant the breakage you are introducing
> to external links. Please don't !
> 

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 22 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From mal at egenix.com  Fri Mar 22 11:01:43 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 22 Mar 2013 11:01:43 +0100
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
	<2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
	<CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
Message-ID: <514C2C07.5090209@egenix.com>

On 22.03.2013 09:58, anatoly techtonik wrote:
> On Fri, Mar 22, 2013 at 11:16 AM, Ronald Oussoren <ronaldoussoren at mac.com>wrote:
> 
>>
>> On 22 Mar, 2013, at 8:37, anatoly techtonik <techtonik at gmail.com> wrote:
>>
>> Hi,
>>
>> I understand that this will make PyPI a potential target for automated
>> spam bots, but still it will be awesome to have an API to upload packages
>> to PyPI.
>>
>> For example, I have a code that extract all necessary meta data for the
>> package from the source file itself. It is even able to generate setup.py
>> from this data. https://bitbucket.org/techtonik/astdump The next logical
>> step in this chain is to teach it to upload stuff to PyPI.
>>
>> Now I thought that this setup.py is an unnecessary complication. What I
>> need, ideally is just upload single .py file, or a JSON and a .tar.gz FWIW.
>> Is there a straightforward API for things like that?

Yes: The distutils upload command implements the API. It essentially
uses the same HTML form interface as the PyPI UI.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 22 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From mal at egenix.com  Fri Mar 22 11:10:10 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 22 Mar 2013 11:10:10 +0100
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <514C215A.9010107@egenix.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
	<2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
	<CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
	<1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com>
	<514C20E7.5040101@egenix.com> <514C215A.9010107@egenix.com>
Message-ID: <514C2E02.5060707@egenix.com>

On 22.03.2013 10:16, M.-A. Lemburg wrote:
> 
> 
> On 22.03.2013 10:14, M.-A. Lemburg wrote:
>> On 22.03.2013 10:04, Ronald Oussoren wrote:
>>>
>>> On 22 Mar, 2013, at 9:58, anatoly techtonik <techtonik at gmail.com> wrote:
>>>> Some links are broken. I added redirects for wiki pages, but it will be better to fix links too.
>>> The OAuth link appears to be broken, and that's likely part of the fallout of the wiki.python.org breakin.
>>
>> It is broken because of Anatoly's renaming.
>>
>> The new name is http://wiki.python.org/moin/PyPiOauth
> 
> Sorry, that was the old name, which is now gone. The new name is
> http://wiki.python.org/moin/PyPIOAuth

I added a redirect now to keep the old URL working.

>> Anatoly: I don't consider such renaming for some perceived level of
>> consistency important enough to warrant the breakage you are introducing
>> to external links. Please don't !

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 22 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From techtonik at gmail.com  Fri Mar 22 11:25:35 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Fri, 22 Mar 2013 13:25:35 +0300
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <514C20E7.5040101@egenix.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
	<2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
	<CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
	<1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com>
	<514C20E7.5040101@egenix.com>
Message-ID: <CAPkN8xLkNcdPtAUK_RFJGy7+9WOsYVbhRc2KPE+5Q0J864FTpA@mail.gmail.com>

On Fri, Mar 22, 2013 at 12:14 PM, M.-A. Lemburg <mal at egenix.com> wrote:

> On 22.03.2013 10:04, Ronald Oussoren wrote:
> >
> > On 22 Mar, 2013, at 9:58, anatoly techtonik <techtonik at gmail.com> wrote:
> >> Some links are broken. I added redirects for wiki pages, but it will be
> better to fix links too.
> > The OAuth link appears to be broken, and that's likely part of the
> fallout of the wiki.python.org breakin.
>
> It is broken because of Anatoly's renaming.
>
> The new name is http://wiki.python.org/moin/PyPiOauth
>
> Anatoly: I don't consider such renaming for some perceived level of
> consistency important enough to warrant the breakage you are introducing
> to external links. Please don't !
>

I've renamed PyPIOAuth this long before today and fixed all link on the
wiki. I don't have any tools to monitor any external links in MoinMoin. It
will be nice if you add this request to the internal backlog of tasks for
the next order to pydotorg redesign from PSF.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130322/0e146172/attachment.html>

From ronaldoussoren at mac.com  Fri Mar 22 11:31:12 2013
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Fri, 22 Mar 2013 11:31:12 +0100
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <CAPkN8xLkNcdPtAUK_RFJGy7+9WOsYVbhRc2KPE+5Q0J864FTpA@mail.gmail.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
	<2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
	<CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
	<1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com>
	<514C20E7.5040101@egenix.com>
	<CAPkN8xLkNcdPtAUK_RFJGy7+9WOsYVbhRc2KPE+5Q0J864FTpA@mail.gmail.com>
Message-ID: <07251834-8243-418F-BFCC-2D01565CAF0F@mac.com>


On 22 Mar, 2013, at 11:25, anatoly techtonik <techtonik at gmail.com> wrote:

> On Fri, Mar 22, 2013 at 12:14 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 22.03.2013 10:04, Ronald Oussoren wrote:
> >
> > On 22 Mar, 2013, at 9:58, anatoly techtonik <techtonik at gmail.com> wrote:
> >> Some links are broken. I added redirects for wiki pages, but it will be better to fix links too.
> > The OAuth link appears to be broken, and that's likely part of the fallout of the wiki.python.org breakin.
> 
> It is broken because of Anatoly's renaming.
> 
> The new name is http://wiki.python.org/moin/PyPiOauth
> 
> Anatoly: I don't consider such renaming for some perceived level of
> consistency important enough to warrant the breakage you are introducing
> to external links. Please don't !
> 
> I've renamed PyPIOAuth this long before today and fixed all link on the wiki. I don't have any tools to monitor any external links in MoinMoin. It will be nice if you add this request to the internal backlog of tasks for the next order to pydotorg redesign from PSF.


How would the PSF change links on other websites? Changing page names shouldn't be done lightly because this can, and for projects as popular as python almost certainly will, break links on other websites.

Ronald

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130322/ca179364/attachment.html>

From techtonik at gmail.com  Fri Mar 22 11:31:15 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Fri, 22 Mar 2013 13:31:15 +0300
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <514C2C07.5090209@egenix.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
	<2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
	<CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
	<514C2C07.5090209@egenix.com>
Message-ID: <CAPkN8xKpwqmAcY+2fw2sgoCmaLqG9kGNQ=gKziKC9dzQYGXkrA@mail.gmail.com>

On Fri, Mar 22, 2013 at 1:01 PM, M.-A. Lemburg <mal at egenix.com> wrote:

> On 22.03.2013 09:58, anatoly techtonik wrote:
> > On Fri, Mar 22, 2013 at 11:16 AM, Ronald Oussoren <
> ronaldoussoren at mac.com>wrote:
> >
> >>
> >> On 22 Mar, 2013, at 8:37, anatoly techtonik <techtonik at gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> I understand that this will make PyPI a potential target for automated
> >> spam bots, but still it will be awesome to have an API to upload
> packages
> >> to PyPI.
> >>
> >> For example, I have a code that extract all necessary meta data for the
> >> package from the source file itself. It is even able to generate
> setup.py
> >> from this data. https://bitbucket.org/techtonik/astdump The next
> logical
> >> step in this chain is to teach it to upload stuff to PyPI.
> >>
> >> Now I thought that this setup.py is an unnecessary complication. What I
> >> need, ideally is just upload single .py file, or a JSON and a .tar.gz
> FWIW.
> >> Is there a straightforward API for things like that?
>
> Yes: The distutils upload command implements the API. It essentially
> uses the same HTML form interface as the PyPI UI.


And where is this API defined?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130322/336bd7ac/attachment.html>

From techtonik at gmail.com  Fri Mar 22 11:42:45 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Fri, 22 Mar 2013 13:42:45 +0300
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <07251834-8243-418F-BFCC-2D01565CAF0F@mac.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
	<2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
	<CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
	<1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com>
	<514C20E7.5040101@egenix.com>
	<CAPkN8xLkNcdPtAUK_RFJGy7+9WOsYVbhRc2KPE+5Q0J864FTpA@mail.gmail.com>
	<07251834-8243-418F-BFCC-2D01565CAF0F@mac.com>
Message-ID: <CAPkN8x+acP6+egu1iTQK2bgL5FLsm5av0Yv6T=9oh9ufpG=nTw@mail.gmail.com>

On Fri, Mar 22, 2013 at 1:31 PM, Ronald Oussoren <ronaldoussoren at mac.com>wrote:

>
> On 22 Mar, 2013, at 11:25, anatoly techtonik <techtonik at gmail.com> wrote:
>
> On Fri, Mar 22, 2013 at 12:14 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>
>> On 22.03.2013 10:04, Ronald Oussoren wrote:
>> >
>> > On 22 Mar, 2013, at 9:58, anatoly techtonik <techtonik at gmail.com>
>> wrote:
>> >> Some links are broken. I added redirects for wiki pages, but it will
>> be better to fix links too.
>> > The OAuth link appears to be broken, and that's likely part of the
>> fallout of the wiki.python.org breakin.
>>
>> It is broken because of Anatoly's renaming.
>>
>> The new name is http://wiki.python.org/moin/PyPiOauth
>>
>> Anatoly: I don't consider such renaming for some perceived level of
>> consistency important enough to warrant the breakage you are introducing
>> to external links. Please don't !
>>
>
> I've renamed PyPIOAuth this long before today and fixed all link on the
> wiki. I don't have any tools to monitor any external links in MoinMoin. It
> will be nice if you add this request to the internal backlog of tasks for
> the next order to pydotorg redesign from PSF.
>
>
> How would the PSF change links on other websites? Changing page names
> shouldn't be done lightly because this can, and for projects as popular as
> python almost certainly will, break links on other websites.
>

1. I am the editor of my changes. Not that PSF guy who owns all the stuff
out there and makes himself important by "taking responsibility" over what
I do. If I had the information about external sources linking to this page,
I'd considered contacting these source for update. I mean written the
letter here earlier. =)
2. The change requested it to enable tracking of incoming sources for
MoinMoin pages. It is exactly for the purpose you mentioned - to remove any
fear, uncertainty and despair from people editing the wiki that their
change may or may not break anything.  Many wiki pages don't have any
external references at all and should be reorganized to make somewhat
logical structure from that pile of data.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130322/78f9c646/attachment-0001.html>

From mal at egenix.com  Fri Mar 22 11:49:12 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 22 Mar 2013 11:49:12 +0100
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <CAPkN8xLkNcdPtAUK_RFJGy7+9WOsYVbhRc2KPE+5Q0J864FTpA@mail.gmail.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
	<2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
	<CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
	<1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com>
	<514C20E7.5040101@egenix.com>
	<CAPkN8xLkNcdPtAUK_RFJGy7+9WOsYVbhRc2KPE+5Q0J864FTpA@mail.gmail.com>
Message-ID: <514C3728.5090106@egenix.com>

On 22.03.2013 11:25, anatoly techtonik wrote:
> On Fri, Mar 22, 2013 at 12:14 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> 
>> On 22.03.2013 10:04, Ronald Oussoren wrote:
>>>
>>> On 22 Mar, 2013, at 9:58, anatoly techtonik <techtonik at gmail.com> wrote:
>>>> Some links are broken. I added redirects for wiki pages, but it will be
>> better to fix links too.
>>> The OAuth link appears to be broken, and that's likely part of the
>> fallout of the wiki.python.org breakin.
>>
>> It is broken because of Anatoly's renaming.
>>
>> The new name is http://wiki.python.org/moin/PyPiOauth
>>
>> Anatoly: I don't consider such renaming for some perceived level of
>> consistency important enough to warrant the breakage you are introducing
>> to external links. Please don't !
>>
> 
> I've renamed PyPIOAuth this long before today and fixed all link on the
> wiki. I don't have any tools to monitor any external links in MoinMoin. It
> will be nice if you add this request to the internal backlog of tasks for
> the next order to pydotorg redesign from PSF.

There's no point in adding more work for everyone just because
you feel there's an inconsistency in naming. It's also quite impossible
to change all the links on the Internet pointing to our wiki
pages, even if you knew who to contact.

Again: Please don't do this.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 22 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From ronaldoussoren at mac.com  Fri Mar 22 11:49:20 2013
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Fri, 22 Mar 2013 11:49:20 +0100
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <CAPkN8x+acP6+egu1iTQK2bgL5FLsm5av0Yv6T=9oh9ufpG=nTw@mail.gmail.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
	<2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
	<CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
	<1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com>
	<514C20E7.5040101@egenix.com>
	<CAPkN8xLkNcdPtAUK_RFJGy7+9WOsYVbhRc2KPE+5Q0J864FTpA@mail.gmail.com>
	<07251834-8243-418F-BFCC-2D01565CAF0F@mac.com>
	<CAPkN8x+acP6+egu1iTQK2bgL5FLsm5av0Yv6T=9oh9ufpG=nTw@mail.gmail.com>
Message-ID: <8E7AAF1A-E3E0-4A32-B45F-8EEBF8413633@mac.com>


On 22 Mar, 2013, at 11:42, anatoly techtonik <techtonik at gmail.com> wrote:

>> 
>> 
>> I've renamed PyPIOAuth this long before today and fixed all link on the wiki. I don't have any tools to monitor any external links in MoinMoin. It will be nice if you add this request to the internal backlog of tasks for the next order to pydotorg redesign from PSF.
> 
> 
> How would the PSF change links on other websites? Changing page names shouldn't be done lightly because this can, and for projects as popular as python almost certainly will, break links on other websites.
> 
> 1. I am the editor of my changes. Not that PSF guy who owns all the stuff out there and makes himself important by "taking responsibility" over what I do. If I had the information about external sources linking to this page, I'd considered contacting these source for update. I mean written the letter here earlier. =)

You do know how the internet works do you? It is possible to scrape logs for referer URLs, but that's a guestimate at best and won't find referals from locations that aren't websites (such as books pointing to an URL for more information, or links from desktop application). Finding contact information for websites is non-trivial as well.

Ronald
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130322/31dd0e09/attachment.html>

From techtonik at gmail.com  Fri Mar 22 13:20:15 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Fri, 22 Mar 2013 15:20:15 +0300
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <514C3728.5090106@egenix.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
	<2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
	<CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
	<1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com>
	<514C20E7.5040101@egenix.com>
	<CAPkN8xLkNcdPtAUK_RFJGy7+9WOsYVbhRc2KPE+5Q0J864FTpA@mail.gmail.com>
	<514C3728.5090106@egenix.com>
Message-ID: <CAPkN8x+JCq5psoERywjKAXUwc6R8L+5aKATXfRrSD0gBYj8=EQ@mail.gmail.com>

On Fri, Mar 22, 2013 at 1:49 PM, M.-A. Lemburg <mal at egenix.com> wrote:

> On 22.03.2013 11:25, anatoly techtonik wrote:
> > On Fri, Mar 22, 2013 at 12:14 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> >
> >> On 22.03.2013 10:04, Ronald Oussoren wrote:
> >>>
> >>> On 22 Mar, 2013, at 9:58, anatoly techtonik <techtonik at gmail.com>
> wrote:
> >>>> Some links are broken. I added redirects for wiki pages, but it will
> be
> >> better to fix links too.
> >>> The OAuth link appears to be broken, and that's likely part of the
> >> fallout of the wiki.python.org breakin.
> >>
> >> It is broken because of Anatoly's renaming.
> >>
> >> The new name is http://wiki.python.org/moin/PyPiOauth
> >>
> >> Anatoly: I don't consider such renaming for some perceived level of
> >> consistency important enough to warrant the breakage you are introducing
> >> to external links. Please don't !
> >>
> >
> > I've renamed PyPIOAuth this long before today and fixed all link on the
> > wiki. I don't have any tools to monitor any external links in MoinMoin.
> It
> > will be nice if you add this request to the internal backlog of tasks for
> > the next order to pydotorg redesign from PSF.
>
> There's no point in adding more work for everyone just because
> you feel there's an inconsistency in naming. It's also quite impossible
> to change all the links on the Internet pointing to our wiki
> pages, even if you knew who to contact.
>

hg clone https://bitbucket.org/loewis/pypi
cd pypi
hg pull https://bitbucket.org/techtonik/pypi-contents
hg push

For changing these links it should be proven that they exist first. Anyway,
I don't want to fix all the links on the internet, but since I've already
fixed those on PyPI, all it takes to apply the fix is to copy/paste these 4
commands into the console. Not much work, really. ;)


> Again: Please don't do this.
>

I think you're not against renaming pages, but against renaming without
redirects. In fact, if MoinMoin could automatically insert #REDIRECT
directives when a page is renamed, then there won't be any problem like
this at all. I hope that pydotorg@ or infrastructure@ have this item on
their feature lists.


OT: Speaking of the links and leaving them as-is. IMHO having a clean
outlook has direct influence on the attractiveness of the project. Having
some obvious stuff to fix on the main page is a motivation for me (as a bad
coder who like to hack) to go download and fix the stuff. But inconsistency
in URL design and nits in overall site image (no credits, no license, no
"fork me on github", strange layout and no solid design, no reference to
framework used in the footer and many more other subjective factors) have
direct influence on desire to contribute from somebody more serious.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130322/38256b00/attachment.html>

From mal at egenix.com  Fri Mar 22 13:26:30 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 22 Mar 2013 13:26:30 +0100
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <CAPkN8x+JCq5psoERywjKAXUwc6R8L+5aKATXfRrSD0gBYj8=EQ@mail.gmail.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
	<2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
	<CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
	<1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com>
	<514C20E7.5040101@egenix.com>
	<CAPkN8xLkNcdPtAUK_RFJGy7+9WOsYVbhRc2KPE+5Q0J864FTpA@mail.gmail.com>
	<514C3728.5090106@egenix.com>
	<CAPkN8x+JCq5psoERywjKAXUwc6R8L+5aKATXfRrSD0gBYj8=EQ@mail.gmail.com>
Message-ID: <514C4DF6.7050803@egenix.com>

On 22.03.2013 13:20, anatoly techtonik wrote:
> On Fri, Mar 22, 2013 at 1:49 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> Again: Please don't do this.
>>
> 
> I think you're not against renaming pages, but against renaming without
> redirects. In fact, if MoinMoin could automatically insert #REDIRECT
> directives when a page is renamed, then there won't be any problem like
> this at all. I hope that pydotorg@ or infrastructure@ have this item on
> their feature lists.

You can add redirects from the page names you think are more
correct to the existing ones, but please don't rename the pages
themselves.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 22 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From techtonik at gmail.com  Fri Mar 22 13:38:51 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Fri, 22 Mar 2013 15:38:51 +0300
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <514C4DF6.7050803@egenix.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
	<2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
	<CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
	<1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com>
	<514C20E7.5040101@egenix.com>
	<CAPkN8xLkNcdPtAUK_RFJGy7+9WOsYVbhRc2KPE+5Q0J864FTpA@mail.gmail.com>
	<514C3728.5090106@egenix.com>
	<CAPkN8x+JCq5psoERywjKAXUwc6R8L+5aKATXfRrSD0gBYj8=EQ@mail.gmail.com>
	<514C4DF6.7050803@egenix.com>
Message-ID: <CAPkN8xLue-2CXho=ikWO254a9Gp6FWg-SFyErS6OWROjFH7vkg@mail.gmail.com>

On Fri, Mar 22, 2013 at 3:26 PM, M.-A. Lemburg <mal at egenix.com> wrote:

> On 22.03.2013 13:20, anatoly techtonik wrote:
> > On Fri, Mar 22, 2013 at 1:49 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> >> Again: Please don't do this.
> >>
> >
> > I think you're not against renaming pages, but against renaming without
> > redirects. In fact, if MoinMoin could automatically insert #REDIRECT
> > directives when a page is renamed, then there won't be any problem like
> > this at all. I hope that pydotorg@ or infrastructure@ have this item on
> > their feature lists.
>
> You can add redirects from the page names you think are more
> correct to the existing ones, but please don't rename the pages
> themselves.


You need to expand that, because I don't get it. Why do you want the
canonical pages about PyPI JSON API to bear the name of PyPiJson? This name
is hard to synthesize if you want to type in directly into the URL without
waiting for the page to load to click a link or use search field.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130322/9412e57c/attachment.html>

From mal at egenix.com  Fri Mar 22 14:17:04 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 22 Mar 2013 14:17:04 +0100
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <CAPkN8xLue-2CXho=ikWO254a9Gp6FWg-SFyErS6OWROjFH7vkg@mail.gmail.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
	<2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
	<CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
	<1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com>
	<514C20E7.5040101@egenix.com>
	<CAPkN8xLkNcdPtAUK_RFJGy7+9WOsYVbhRc2KPE+5Q0J864FTpA@mail.gmail.com>
	<514C3728.5090106@egenix.com>
	<CAPkN8x+JCq5psoERywjKAXUwc6R8L+5aKATXfRrSD0gBYj8=EQ@mail.gmail.com>
	<514C4DF6.7050803@egenix.com>
	<CAPkN8xLue-2CXho=ikWO254a9Gp6FWg-SFyErS6OWROjFH7vkg@mail.gmail.com>
Message-ID: <514C59D0.4040607@egenix.com>

On 22.03.2013 13:38, anatoly techtonik wrote:
> On Fri, Mar 22, 2013 at 3:26 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> 
>> On 22.03.2013 13:20, anatoly techtonik wrote:
>>> On Fri, Mar 22, 2013 at 1:49 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>> Again: Please don't do this.
>>>>
>>>
>>> I think you're not against renaming pages, but against renaming without
>>> redirects. In fact, if MoinMoin could automatically insert #REDIRECT
>>> directives when a page is renamed, then there won't be any problem like
>>> this at all. I hope that pydotorg@ or infrastructure@ have this item on
>>> their feature lists.
>>
>> You can add redirects from the page names you think are more
>> correct to the existing ones, but please don't rename the pages
>> themselves.
> 
> 
> You need to expand that, because I don't get it. Why do you want the
> canonical pages about PyPI JSON API to bear the name of PyPiJson? This name
> is hard to synthesize if you want to type in directly into the URL without
> waiting for the page to load to click a link or use search field.

It's not about which name I want. It's about the name of the page
that was used to add content and which has been around long enough
to assume that others have linked to it.

With the redirect from the new name to the existing one,
you get what you want and all others can continue to use
the existing name.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 22 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From techtonik at gmail.com  Fri Mar 22 23:33:13 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Sat, 23 Mar 2013 01:33:13 +0300
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <514C59D0.4040607@egenix.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
	<2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
	<CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
	<1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com>
	<514C20E7.5040101@egenix.com>
	<CAPkN8xLkNcdPtAUK_RFJGy7+9WOsYVbhRc2KPE+5Q0J864FTpA@mail.gmail.com>
	<514C3728.5090106@egenix.com>
	<CAPkN8x+JCq5psoERywjKAXUwc6R8L+5aKATXfRrSD0gBYj8=EQ@mail.gmail.com>
	<514C4DF6.7050803@egenix.com>
	<CAPkN8xLue-2CXho=ikWO254a9Gp6FWg-SFyErS6OWROjFH7vkg@mail.gmail.com>
	<514C59D0.4040607@egenix.com>
Message-ID: <CAPkN8xLALTgZAFbWDyPKR9KqFaoTN5W9niR6hs-iztPy=Hm0LQ@mail.gmail.com>

On Fri, Mar 22, 2013 at 4:17 PM, M.-A. Lemburg <mal at egenix.com> wrote:

> On 22.03.2013 13:38, anatoly techtonik wrote:
> > On Fri, Mar 22, 2013 at 3:26 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> >
> >> On 22.03.2013 13:20, anatoly techtonik wrote:
> >>> On Fri, Mar 22, 2013 at 1:49 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> >>>> Again: Please don't do this.
> >>>>
> >>>
> >>> I think you're not against renaming pages, but against renaming without
> >>> redirects. In fact, if MoinMoin could automatically insert #REDIRECT
> >>> directives when a page is renamed, then there won't be any problem like
> >>> this at all. I hope that pydotorg@ or infrastructure@ have this item
> on
> >>> their feature lists.
> >>
> >> You can add redirects from the page names you think are more
> >> correct to the existing ones, but please don't rename the pages
> >> themselves.
> >
> >
> > You need to expand that, because I don't get it. Why do you want the
> > canonical pages about PyPI JSON API to bear the name of PyPiJson? This
> name
> > is hard to synthesize if you want to type in directly into the URL
> without
> > waiting for the page to load to click a link or use search field.
>
> It's not about which name I want. It's about the name of the page
> that was used to add content and which has been around long enough
> to assume that others have linked to it.
>
> With the redirect from the new name to the existing one,
> you get what you want and all others can continue to use
> the existing name.


All right. So it is the matter of using old name or the new name. But both
names lead to the same page. So the point of conflict here is what should
be the end name of this page. If you say that it is not about which name
do you want, then say why this name should not be the name I want?

I want canonical names for pages. Names that are consistent, which
capitalization is easy to remember and reproduce, and I want that people
linked to these names directly to avoid double redirects.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130323/e26589f2/attachment.html>

From mal at egenix.com  Sat Mar 23 13:11:17 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Sat, 23 Mar 2013 13:11:17 +0100
Subject: [Catalog-sig] API for uploading packages to PyPI
In-Reply-To: <CAPkN8xLALTgZAFbWDyPKR9KqFaoTN5W9niR6hs-iztPy=Hm0LQ@mail.gmail.com>
References: <CAPkN8xKt9-VdwLd-JMV6FaxMib2PGCYd9reEn2N=rU8k+ut_Sg@mail.gmail.com>
	<2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com>
	<CAPkN8xKb9Ezf_Y6y9RNyJoNipBNU-2qWtHsFsFo1io6zR1js1A@mail.gmail.com>
	<1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com>
	<514C20E7.5040101@egenix.com>
	<CAPkN8xLkNcdPtAUK_RFJGy7+9WOsYVbhRc2KPE+5Q0J864FTpA@mail.gmail.com>
	<514C3728.5090106@egenix.com>
	<CAPkN8x+JCq5psoERywjKAXUwc6R8L+5aKATXfRrSD0gBYj8=EQ@mail.gmail.com>
	<514C4DF6.7050803@egenix.com>
	<CAPkN8xLue-2CXho=ikWO254a9Gp6FWg-SFyErS6OWROjFH7vkg@mail.gmail.com>
	<514C59D0.4040607@egenix.com>
	<CAPkN8xLALTgZAFbWDyPKR9KqFaoTN5W9niR6hs-iztPy=Hm0LQ@mail.gmail.com>
Message-ID: <514D9BE5.5090200@egenix.com>

On 22.03.2013 23:33, anatoly techtonik wrote:
> On Fri, Mar 22, 2013 at 4:17 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> 
>> On 22.03.2013 13:38, anatoly techtonik wrote:
>>> On Fri, Mar 22, 2013 at 3:26 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>
>>>> On 22.03.2013 13:20, anatoly techtonik wrote:
>>>>> On Fri, Mar 22, 2013 at 1:49 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>>>> Again: Please don't do this.
>>>>>>
>>>>>
>>>>> I think you're not against renaming pages, but against renaming without
>>>>> redirects. In fact, if MoinMoin could automatically insert #REDIRECT
>>>>> directives when a page is renamed, then there won't be any problem like
>>>>> this at all. I hope that pydotorg@ or infrastructure@ have this item
>> on
>>>>> their feature lists.
>>>>
>>>> You can add redirects from the page names you think are more
>>>> correct to the existing ones, but please don't rename the pages
>>>> themselves.
>>>
>>>
>>> You need to expand that, because I don't get it. Why do you want the
>>> canonical pages about PyPI JSON API to bear the name of PyPiJson? This
>> name
>>> is hard to synthesize if you want to type in directly into the URL
>> without
>>> waiting for the page to load to click a link or use search field.
>>
>> It's not about which name I want. It's about the name of the page
>> that was used to add content and which has been around long enough
>> to assume that others have linked to it.
>>
>> With the redirect from the new name to the existing one,
>> you get what you want and all others can continue to use
>> the existing name.
> 
> 
> All right. So it is the matter of using old name or the new name. But both
> names lead to the same page. So the point of conflict here is what should
> be the end name of this page. If you say that it is not about which name
> do you want, then say why this name should not be the name I want?

The person who created the pages got to chose. There's nothing much
to argue here.

> I want canonical names for pages. Names that are consistent, which
> capitalization is easy to remember and reproduce, and I want that people
> linked to these names directly to avoid double redirects.

That's fine: for pages that you create, you get to chose.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 23 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From ct at gocept.com  Mon Mar 25 18:45:26 2013
From: ct at gocept.com (Christian Theune)
Date: Mon, 25 Mar 2013 18:45:26 +0100
Subject: [Catalog-sig] Replacement client for pep381client
In-Reply-To: <514BE1AD.2040202@zopyx.com>
References: <kidigl$lvt$1@ger.gmane.org> <kig0q0$e3k$1@ger.gmane.org>
	<514BE1AD.2040202@zopyx.com>
Message-ID: <8C6E3EE5-4B41-456F-BD1C-1FC9B191EA01@gocept.com>

Hi,

On Mar 22, 2013, at 5:44 AM, Andreas Jung <lists at zopyx.com> wrote:

> 
> I don't know much about filesystem encodings but if a FS encoding like
> 
>>>> sys.getfilesystemencoding()
> 'ANSI_X3.4-1968'
> 
> is a system-wide setting then it is unlikely that you make an encoding
> change a mandatory requirement. 'ANSI_X3.4-1968' is at least returned
> on my CentOS and Ubuntu box.

Reading up on the VFS unicode handling it appears that we just need to treat everything as bytestrings and encode it ourselves. The locale setting is really just an environment variable influencing library behaviour (like glib) - the kernel doesn't seem to care except for '/' and '\0'.

However, you may also need to make sure that your web server treats the unicode URLs correctly and uses UTF-8 as the encoding for looking up the filenames.

I have applied a fix forcing the filenames to always be encoded as UTF-8.

Christian

-- 
Christian Theune ? ct at gocept.com
gocept gmbh & co. kg ? Forsterstra?e 29 ? 06112 Halle (Saale) ? Germany
http://gocept.com ? Tel +49 345 1229889-7
Python, Pyramid, Plone, Zope ? consulting, development, hosting, operations

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4334 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130325/8779052f/attachment.bin>

From chris at simplistix.co.uk  Tue Mar 26 15:02:21 2013
From: chris at simplistix.co.uk (Chris Withers)
Date: Tue, 26 Mar 2013 14:02:21 +0000
Subject: [Catalog-sig] error trying to upload by package
Message-ID: <5151AA6D.5080704@simplistix.co.uk>

Hi All,

I have a package called files: https://github.com/Simplistix/files

...but I get a 403 when I try to register it on PyPI.

Why is that?

cheers,

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
             - http://www.simplistix.co.uk

From donald at stufft.io  Tue Mar 26 15:04:50 2013
From: donald at stufft.io (Donald Stufft)
Date: Tue, 26 Mar 2013 10:04:50 -0400
Subject: [Catalog-sig] error trying to upload by package
In-Reply-To: <5151AA6D.5080704@simplistix.co.uk>
References: <5151AA6D.5080704@simplistix.co.uk>
Message-ID: <C7726850-FBEC-4576-874C-364AB4C399FE@stufft.io>


On Mar 26, 2013, at 10:02 AM, Chris Withers <chris at simplistix.co.uk> wrote:

> Hi All,
> 
> I have a package called files: https://github.com/Simplistix/files
> 
> ...but I get a 403 when I try to register it on PyPI.
> 
> Why is that?
> 
> cheers,
> 
> Chris
> 
> -- 
> Simplistix - Content Management, Batch Processing & Python Consulting
>            - http://www.simplistix.co.uk
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

Someone already has a package by that name.

https://pypi.python.org/pypi/files

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130326/4c9f9686/attachment.pgp>

From jcea at jcea.es  Tue Mar 26 21:07:43 2013
From: jcea at jcea.es (Jesus Cea)
Date: Tue, 26 Mar 2013 21:07:43 +0100
Subject: [Catalog-sig] Suscribing to PYPI projects
Message-ID: <5152000F.6050308@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I wonder if would be too difficult to be able to subscribe to projects
in PYPI, to be notified if a new version is available.

An option to PIP & family to verify local versions with PYPI versions,
and report old version would be useful too.

- -- 
Jes?s Cea Avi?n                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
Twitter: @jcea                        _/_/    _/_/          _/_/_/_/_/
jabber / xmpp:jcea at jabber.org  _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQCVAwUBUVIAD5lgi5GaxT1NAQJtfwQAhTeby09fEx/0smKy+FKKP+YacAHyfvY1
HvuxsipLanFaiCcRaxWzyzN9+2hUqD88BtUGzgNdqGS52ePxDg5dTC8u4IC0grMU
vk96tl0zMg3R4GraCzsShKGJm8arpdUfWJZXGy+FxMh7XYnrHWZkItUAHTWLuf7A
beTiCnZhuGY=
=s0VI
-----END PGP SIGNATURE-----

From richard at python.org  Wed Mar 27 05:26:49 2013
From: richard at python.org (Richard Jones)
Date: Wed, 27 Mar 2013 15:26:49 +1100
Subject: [Catalog-sig] Suscribing to PYPI projects
In-Reply-To: <5152000F.6050308@jcea.es>
References: <5152000F.6050308@jcea.es>
Message-ID: <CAHrZfZCJgvFs6cmhH2wL4B=93uirMj5uCD5CxxDUQgnt81G2kA@mail.gmail.com>

This does come up a fair bit but is not something that's planned for
the current incarnation of PyPI.


     Richard

On 27 March 2013 07:07, Jesus Cea <jcea at jcea.es> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I wonder if would be too difficult to be able to subscribe to projects
> in PYPI, to be notified if a new version is available.
>
> An option to PIP & family to verify local versions with PYPI versions,
> and report old version would be useful too.
>
> - --
> Jes?s Cea Avi?n                         _/_/      _/_/_/        _/_/_/
> jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
> Twitter: @jcea                        _/_/    _/_/          _/_/_/_/_/
> jabber / xmpp:jcea at jabber.org  _/_/  _/_/    _/_/          _/_/  _/_/
> "Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
> "My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
> "El amor es poner tu felicidad en la felicidad de otro" - Leibniz
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQCVAwUBUVIAD5lgi5GaxT1NAQJtfwQAhTeby09fEx/0smKy+FKKP+YacAHyfvY1
> HvuxsipLanFaiCcRaxWzyzN9+2hUqD88BtUGzgNdqGS52ePxDg5dTC8u4IC0grMU
> vk96tl0zMg3R4GraCzsShKGJm8arpdUfWJZXGy+FxMh7XYnrHWZkItUAHTWLuf7A
> beTiCnZhuGY=
> =s0VI
> -----END PGP SIGNATURE-----
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
>

From aclark at aclark.net  Wed Mar 27 16:27:50 2013
From: aclark at aclark.net (Alex Clark)
Date: Wed, 27 Mar 2013 11:27:50 -0400
Subject: [Catalog-sig] Suscribing to PYPI projects
References: <5152000F.6050308@jcea.es>
Message-ID: <kiv35i$kf3$1@ger.gmane.org>

On 2013-03-26 20:07:43 +0000, Jesus Cea said:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> I wonder if would be too difficult to be able to subscribe to projects
> in PYPI, to be notified if a new version is available.


Have you seen: https://bundlescout.com/


> 
> An option to PIP & family to verify local versions with PYPI versions,
> and report old version would be useful too.
> 
> - --Jes?s Cea Avi?n                         _/_/      _/_/_/        _/_/_/
> jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
> Twitter: @jcea                        _/_/    _/_/          _/_/_/_/_/
> jabber / xmpp:jcea at jabber.org  _/_/  _/_/    _/_/          _/_/  _/_/
> "Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
> "My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
> "El amor es poner tu felicidad en la felicidad de otro" - Leibniz
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> 
> iQCVAwUBUVIAD5lgi5GaxT1NAQJtfwQAhTeby09fEx/0smKy+FKKP+YacAHyfvY1
> HvuxsipLanFaiCcRaxWzyzN9+2hUqD88BtUGzgNdqGS52ePxDg5dTC8u4IC0grMU
> vk96tl0zMg3R4GraCzsShKGJm8arpdUfWJZXGy+FxMh7XYnrHWZkItUAHTWLuf7A
> beTiCnZhuGY=s0VI
> -----END PGP SIGNATURE-----


-- 
Alex Clark ? http://about.me/alex.clark



From lists at zopyx.com  Wed Mar 27 16:54:19 2013
From: lists at zopyx.com (Andreas Jung)
Date: Wed, 27 Mar 2013 16:54:19 +0100
Subject: [Catalog-sig] c.pypi.python.org - IP address change
Message-ID: <5153162B.5030103@zopyx.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi there,

I moved my c.pypi.python.org mirror to a new faster machine.

Please update the DNS entry to 176.9.146.29.

This mirror is  running on top of Christian Theune's bandersnatch
implementation.

Andrreas

- -- 
ZOPYX Limited         | Python | Zope | Plone | MongoDB
Hundskapfklinge 33    | Consulting & Development
D-72074 T?bingen      | Electronic Publishing Solutions
www.zopyx.com         | Scalable Web Solutions
- --------------------------------------------------
Produce & Publish - www.produce-and-publish.com


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQGUBAEBAgAGBQJRUxYrAAoJEADcfz7u4AZj+/kLwLo1gWKQfaVXMjJxN+6Ouens
0+ODnkdGhXFBAcwHM+VpBumFCodST8Cc3iEIT6EGK9HVZEMh7w9cBBO/jKrFX87K
7FYNdybEu81BLa1DxuZh3ux8xDC/bDj4lArYJLF3VcjSL2ZtQTaNyScb/u3n5VR2
pWFKppwF6VQ3P1n5RdmzAHIzF6XGixlR7kpKRJVS37ADfl8yR7ZB7frXzhux6qDn
f5c32QccT5RLKUk6R46GQU8+nHRVRVqum/hep5hX2wXVTeKfuEa8+MZOa/Ooot9r
P8Z1nBIbteivg0hpmX5b0G00h+DQkd29TP7wF/JZwwzu1bc5wXNCVpnNeXDV4Bi3
ON9uZnKFCSxLEKznPQaf3ZiPagxwX8fs/RrK/isO0MyW3HKqaDb77N0biAdWipt5
Mnv6XSyKK5CHte5JVtnpT4UqbLFKMEydQoK8JhYEBwgJABaNuHkYfKNVVDNN8G6K
X7mHX5f7ykirQXwUSfneLY2JYz7jn7o=
=+lq1
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lists.vcf
Type: text/x-vcard
Size: 353 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130327/79dd78d6/attachment.vcf>

From r1chardj0n3s at gmail.com  Thu Mar 28 01:29:54 2013
From: r1chardj0n3s at gmail.com (Richard Jones)
Date: Thu, 28 Mar 2013 11:29:54 +1100
Subject: [Catalog-sig] PEP 438 progress update
Message-ID: <CAHrZfZCeVHt-067Yj8QSTtp-aLCMgnuMQakzjR746YLj3meA=A@mail.gmail.com>

Hi all,

It was my intention to formally accept the PEP and deploy the
implementation to the production PyPI when I got back home this week,
but things have been quite hectic and I've not found the time to
perform the pre-deployment tasks needed (specifically steps 5, 6 and 7
of the first transition phase; determining the various email lists I
need to inform people of the changes.) At the moment I'm not sure when
I'll have time to do that; hopefully next week some time but it could
be as long as four weeks before I get sufficient tuits(round).


    Richard

From lists at zopyx.com  Thu Mar 28 11:33:43 2013
From: lists at zopyx.com (Andreas Jung)
Date: Thu, 28 Mar 2013 11:33:43 +0100
Subject: [Catalog-sig] c.pypi.python.org - IP address change
In-Reply-To: <5153162B.5030103@zopyx.com>
References: <5153162B.5030103@zopyx.com>
Message-ID: <51541C87.6090508@zopyx.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Will somone change the DNS entry?

Andreas

Andreas Jung wrote:
> Hi there,
> 
> I moved my c.pypi.python.org mirror to a new faster machine.
> 
> Please update the DNS entry to 176.9.146.29.
> 
> This mirror is  running on top of Christian Theune's bandersnatch 
> implementation.
> 
> Andrreas
> 
> 
> _______________________________________________ Catalog-SIG mailing
> list Catalog-SIG at python.org 
> http://mail.python.org/mailman/listinfo/catalog-sig

- -- 
ZOPYX Limited         | Python | Zope | Plone | MongoDB
Hundskapfklinge 33    | Consulting & Development
D-72074 T?bingen      | Electronic Publishing Solutions
www.zopyx.com         | Scalable Web Solutions
- --------------------------------------------------
Produce & Publish - www.produce-and-publish.com


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQGUBAEBAgAGBQJRVByHAAoJEADcfz7u4AZjFKQLwNIbXmIzphbmQvYGwDHouwVp
G2hblX/OHB7kTrrQHVwa+KnacIOL37dwEkjAqI7aK8l4UF3Prizn7P3XoS0KKvhS
43A4uGgJTvD4d3c6k+pkAQeHbDgqdojQ6jTZf4s2ogWp8lQXuZkETXBpqx8vPpJ3
Y9dUfjP/EhjhsBuZuJNApC/9xHYe+MfdgpYLHXrqk2QQQ2QxyuoMR+W9FR4GWh1U
KLAXVKp7lTXvZGrQ1cayZQo7IA5U5f8+N3HyISZ6bD+AvNKaKRaWgNSggYs4y5tQ
fwqlQp08BoDj6Xni2JzbCJ7ZkzsHbkG0IJ9ZZpDyBTeOWFQBXV2AFSZ8Zx1nPmGm
Z2Mbp4lLrUvp6WVCjSQ/rvOEe6yk2OxaWvlBiJPJRmfzlco0XNX93bRnxiKPkcpH
eGvgRXQ2nNJEYWUD6nBeBUA4bJen59/4b+Pm4AMoOo+fhHwd7kjIBK1/e8PUqSqT
r07IChjG+jwp8vjclD35GS9PMH0KQwY=
=GcLZ
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lists.vcf
Type: text/x-vcard
Size: 353 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130328/bbb109b0/attachment.vcf>

From donald at stufft.io  Thu Mar 28 19:22:59 2013
From: donald at stufft.io (Donald Stufft)
Date: Thu, 28 Mar 2013 14:22:59 -0400
Subject: [Catalog-sig] Merge catalog-sig and distutils-sig
Message-ID: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>

Is there much point in keeping catalog-sig and distutils-sig separate?

It seems to me that most of the same people are on both lists, and the topics almost always have consequences to both sides of the coin. So much so that it's often hard to pick *which* of the two (or both) lists you post too. Further confused by the fact that distutils is hopefully someday going to go away :)

Not sure if there's some official process for requesting it or not, but I think we should merge the two lists and just make packaging-sig to umbrella the entire packaging topics.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130328/c8a80c3a/attachment.pgp>

From jacob at jacobian.org  Thu Mar 28 19:26:05 2013
From: jacob at jacobian.org (Jacob Kaplan-Moss)
Date: Thu, 28 Mar 2013 13:26:05 -0500
Subject: [Catalog-sig] Merge catalog-sig and distutils-sig
In-Reply-To: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
Message-ID: <CAK8PqJE+-namBPAcMT+O=sFkfX0mmy7m3WS59akc5N5huqpeVQ@mail.gmail.com>

As a mostly-lurker on both who would love to cut down on the number of
lists I have to follow: a hearty +1!

Jacob

On Thu, Mar 28, 2013 at 1:22 PM, Donald Stufft <donald at stufft.io> wrote:
> Is there much point in keeping catalog-sig and distutils-sig separate?
>
> It seems to me that most of the same people are on both lists, and the topics almost always have consequences to both sides of the coin. So much so that it's often hard to pick *which* of the two (or both) lists you post too. Further confused by the fact that distutils is hopefully someday going to go away :)
>
> Not sure if there's some official process for requesting it or not, but I think we should merge the two lists and just make packaging-sig to umbrella the entire packaging topics.
>
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>
>
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
>

From jim at zope.com  Thu Mar 28 19:28:35 2013
From: jim at zope.com (Jim Fulton)
Date: Thu, 28 Mar 2013 14:28:35 -0400
Subject: [Catalog-sig] Merge catalog-sig and distutils-sig
In-Reply-To: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
Message-ID: <CAPDm-FiLiqqOD6WZSwpV-NFHAx325W9MLgWJKJJptcSQqTBKSg@mail.gmail.com>

On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft <donald at stufft.io> wrote:
> Is there much point in keeping catalog-sig and distutils-sig separate?

Not IMO.

> It seems to me that most of the same people are on both lists, and the topics almost always have consequences to both sides of the coin. So much so that it's often hard to pick *which* of the two (or both) lists you post too. Further confused by the fact that distutils is hopefully someday going to go away :)
>
> Not sure if there's some official process for requesting it or not, but I think we should merge the two lists and just make packaging-sig to umbrella the entire packaging topics.

+1

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton

From holger at merlinux.eu  Thu Mar 28 20:11:44 2013
From: holger at merlinux.eu (holger krekel)
Date: Thu, 28 Mar 2013 19:11:44 +0000
Subject: [Catalog-sig] Merge catalog-sig and distutils-sig
In-Reply-To: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
Message-ID: <20130328191144.GL9677@merlinux.eu>

On Thu, Mar 28, 2013 at 14:22 -0400, Donald Stufft wrote:
> Is there much point in keeping catalog-sig and distutils-sig separate?
> 
> It seems to me that most of the same people are on both lists, and the topics almost always have consequences to both sides of the coin. So much so that it's often hard to pick *which* of the two (or both) lists you post too. Further confused by the fact that distutils is hopefully someday going to go away :)

+1

> Not sure if there's some official process for requesting it or not, but I think we should merge the two lists and just make packaging-sig to umbrella the entire packaging topics.
> 
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 



> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


From fred at fdrake.net  Thu Mar 28 20:14:24 2013
From: fred at fdrake.net (Fred Drake)
Date: Thu, 28 Mar 2013 15:14:24 -0400
Subject: [Catalog-sig] Merge catalog-sig and distutils-sig
In-Reply-To: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
Message-ID: <CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>

On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft <donald at stufft.io> wrote:
> Is there much point in keeping catalog-sig and distutils-sig separate?

No.

The last time this was brought up, there were objections, but I don't
remember what they were.  I'll let people who think there's a point
worry about that.

> Not sure if there's some official process for requesting it or not, but
> I think we should merge the two lists and just make packaging-sig to
> umbrella the entire packaging topics.

There is the meta-sig, but the description is out-dated:

    http://mail.python.org/mailman/listinfo/meta-sig

and the last message in the archives is dated 2011, and sparked no
discussion:

    http://mail.python.org/pipermail/meta-sig/2011-June.txt

+1 on merging the lists.


  -Fred

-- 
Fred L. Drake, Jr.    <fred at fdrake.net>
"A storm broke loose in my mind."  --Albert Einstein

From qwcode at gmail.com  Thu Mar 28 20:25:59 2013
From: qwcode at gmail.com (Marcus Smith)
Date: Thu, 28 Mar 2013 12:25:59 -0700
Subject: [Catalog-sig] Merge catalog-sig and distutils-sig
In-Reply-To: <CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
Message-ID: <CAPYWazqGfWmWnbZ_0AbCDQp0c+-ANm6Yx9ugQOjXRYASLPoQcQ@mail.gmail.com>

+1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130328/a75a4928/attachment.html>

From pje at telecommunity.com  Thu Mar 28 20:39:38 2013
From: pje at telecommunity.com (PJ Eby)
Date: Thu, 28 Mar 2013 15:39:38 -0400
Subject: [Catalog-sig] Merge catalog-sig and distutils-sig
In-Reply-To: <CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
Message-ID: <CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>

On Thu, Mar 28, 2013 at 3:14 PM, Fred Drake <fred at fdrake.net> wrote:
> On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft <donald at stufft.io> wrote:
>> Is there much point in keeping catalog-sig and distutils-sig separate?
>
> No.
>
> The last time this was brought up, there were objections, but I don't
> remember what they were.  I'll let people who think there's a point
> worry about that.
>
>> Not sure if there's some official process for requesting it or not, but
>> I think we should merge the two lists and just make packaging-sig to
>> umbrella the entire packaging topics.
>
> There is the meta-sig, but the description is out-dated:
>
>     http://mail.python.org/mailman/listinfo/meta-sig
>
> and the last message in the archives is dated 2011, and sparked no
> discussion:
>
>     http://mail.python.org/pipermail/meta-sig/2011-June.txt
>
> +1 on merging the lists.

Can we do it by just dropping catalog-sig and keeping distutils-sig?
I'm afraid we might lose some important distutils-sig population if
the process involves renaming the list, resubscribing, etc.  I also
*really* don't want to invalidate archive links to the distutils-sig
archive.

All in all, +1 on not having two lists, but I'm really worried about
"breaking" distutils-sig.  We're still going to be talking about
"distribution utilities", after all.

From donald at stufft.io  Thu Mar 28 20:42:07 2013
From: donald at stufft.io (Donald Stufft)
Date: Thu, 28 Mar 2013 15:42:07 -0400
Subject: [Catalog-sig] Merge catalog-sig and distutils-sig
In-Reply-To: <CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
	<CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
Message-ID: <3BF298C9-293D-40FF-A86F-76206A88D162@stufft.io>


On Mar 28, 2013, at 3:39 PM, PJ Eby <pje at telecommunity.com> wrote:

> On Thu, Mar 28, 2013 at 3:14 PM, Fred Drake <fred at fdrake.net> wrote:
>> On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft <donald at stufft.io> wrote:
>>> Is there much point in keeping catalog-sig and distutils-sig separate?
>> 
>> No.
>> 
>> The last time this was brought up, there were objections, but I don't
>> remember what they were.  I'll let people who think there's a point
>> worry about that.
>> 
>>> Not sure if there's some official process for requesting it or not, but
>>> I think we should merge the two lists and just make packaging-sig to
>>> umbrella the entire packaging topics.
>> 
>> There is the meta-sig, but the description is out-dated:
>> 
>>    http://mail.python.org/mailman/listinfo/meta-sig
>> 
>> and the last message in the archives is dated 2011, and sparked no
>> discussion:
>> 
>>    http://mail.python.org/pipermail/meta-sig/2011-June.txt
>> 
>> +1 on merging the lists.
> 
> Can we do it by just dropping catalog-sig and keeping distutils-sig?
> I'm afraid we might lose some important distutils-sig population if
> the process involves renaming the list, resubscribing, etc.  I also
> *really* don't want to invalidate archive links to the distutils-sig
> archive.
> 
> All in all, +1 on not having two lists, but I'm really worried about
> "breaking" distutils-sig.  We're still going to be talking about
> "distribution utilities", after all.

Don't care how it's done. I don't know Mailman enough to know what is possible or how easy things are. I thought packaging-sig sounded nice but if you can't rename + redirect or merge or something in mailman I'm down for whatever.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130328/418a26b6/attachment.pgp>

From donald at stufft.io  Thu Mar 28 20:43:07 2013
From: donald at stufft.io (Donald Stufft)
Date: Thu, 28 Mar 2013 15:43:07 -0400
Subject: [Catalog-sig] Merge catalog-sig and distutils-sig
In-Reply-To: <CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
	<CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
Message-ID: <3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io>


On Mar 28, 2013, at 3:39 PM, PJ Eby <pje at telecommunity.com> wrote:

> On Thu, Mar 28, 2013 at 3:14 PM, Fred Drake <fred at fdrake.net> wrote:
>> On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft <donald at stufft.io> wrote:
>>> Is there much point in keeping catalog-sig and distutils-sig separate?
>> 
>> No.
>> 
>> The last time this was brought up, there were objections, but I don't
>> remember what they were.  I'll let people who think there's a point
>> worry about that.
>> 
>>> Not sure if there's some official process for requesting it or not, but
>>> I think we should merge the two lists and just make packaging-sig to
>>> umbrella the entire packaging topics.
>> 
>> There is the meta-sig, but the description is out-dated:
>> 
>>    http://mail.python.org/mailman/listinfo/meta-sig
>> 
>> and the last message in the archives is dated 2011, and sparked no
>> discussion:
>> 
>>    http://mail.python.org/pipermail/meta-sig/2011-June.txt
>> 
>> +1 on merging the lists.
> 
> Can we do it by just dropping catalog-sig and keeping distutils-sig?
> I'm afraid we might lose some important distutils-sig population if
> the process involves renaming the list, resubscribing, etc.  I also
> *really* don't want to invalidate archive links to the distutils-sig
> archive.
> 
> All in all, +1 on not having two lists, but I'm really worried about
> "breaking" distutils-sig.  We're still going to be talking about
> "distribution utilities", after all.

Worst case I'm sure subscribers can be transferred and the existing archive kept intact.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130328/58d5bc0f/attachment.pgp>

From holger at merlinux.eu  Thu Mar 28 21:04:19 2013
From: holger at merlinux.eu (holger krekel)
Date: Thu, 28 Mar 2013 20:04:19 +0000
Subject: [Catalog-sig] Merge catalog-sig and distutils-sig
In-Reply-To: <3BF298C9-293D-40FF-A86F-76206A88D162@stufft.io>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
	<CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
	<3BF298C9-293D-40FF-A86F-76206A88D162@stufft.io>
Message-ID: <20130328200419.GM9677@merlinux.eu>

On Thu, Mar 28, 2013 at 15:42 -0400, Donald Stufft wrote:
> On Mar 28, 2013, at 3:39 PM, PJ Eby <pje at telecommunity.com> wrote:
> 
> > On Thu, Mar 28, 2013 at 3:14 PM, Fred Drake <fred at fdrake.net> wrote:
> >> On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft <donald at stufft.io> wrote:
> >>> Is there much point in keeping catalog-sig and distutils-sig separate?
> >> 
> >> No.
> >> 
> >> The last time this was brought up, there were objections, but I don't
> >> remember what they were.  I'll let people who think there's a point
> >> worry about that.
> >> 
> >>> Not sure if there's some official process for requesting it or not, but
> >>> I think we should merge the two lists and just make packaging-sig to
> >>> umbrella the entire packaging topics.
> >> 
> >> There is the meta-sig, but the description is out-dated:
> >> 
> >>    http://mail.python.org/mailman/listinfo/meta-sig
> >> 
> >> and the last message in the archives is dated 2011, and sparked no
> >> discussion:
> >> 
> >>    http://mail.python.org/pipermail/meta-sig/2011-June.txt
> >> 
> >> +1 on merging the lists.
> > 
> > Can we do it by just dropping catalog-sig and keeping distutils-sig?
> > I'm afraid we might lose some important distutils-sig population if
> > the process involves renaming the list, resubscribing, etc.  I also
> > *really* don't want to invalidate archive links to the distutils-sig
> > archive.
> > 
> > All in all, +1 on not having two lists, but I'm really worried about
> > "breaking" distutils-sig.  We're still going to be talking about
> > "distribution utilities", after all.
> 
> Don't care how it's done. I don't know Mailman enough to know what is possible or how easy things are. I thought packaging-sig sounded nice but if you can't rename + redirect or merge or something in mailman I'm down for whatever.

I've moved lists even from external sites to python.org and renamed them
(latest was pytest-dev).  That part works nicely and people can continue
to use the old ML address.  Merging two lists however makes it harder
to get redirects for the old archives.  But why not just keep distutils-sig
and catalog-sig archives, but have all their mail arrive at
a new packaging-sig and begin a new archive for the latter?

holger


> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 



> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


From dholth at gmail.com  Thu Mar 28 21:08:44 2013
From: dholth at gmail.com (Daniel Holth)
Date: Thu, 28 Mar 2013 16:08:44 -0400
Subject: [Catalog-sig] Merge catalog-sig and distutils-sig
In-Reply-To: <20130328200419.GM9677@merlinux.eu>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
	<CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
	<3BF298C9-293D-40FF-A86F-76206A88D162@stufft.io>
	<20130328200419.GM9677@merlinux.eu>
Message-ID: <CAG8k2+6ktsN7qQYvAdPNhU+TX_-Q-mvqt=+ycScDrq=+++Rnvg@mail.gmail.com>

That should work. Sounds like a plan.

On Thu, Mar 28, 2013 at 4:04 PM, holger krekel <holger at merlinux.eu> wrote:
> On Thu, Mar 28, 2013 at 15:42 -0400, Donald Stufft wrote:
>> On Mar 28, 2013, at 3:39 PM, PJ Eby <pje at telecommunity.com> wrote:
>>
>> > On Thu, Mar 28, 2013 at 3:14 PM, Fred Drake <fred at fdrake.net> wrote:
>> >> On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft <donald at stufft.io> wrote:
>> >>> Is there much point in keeping catalog-sig and distutils-sig separate?
>> >>
>> >> No.
>> >>
>> >> The last time this was brought up, there were objections, but I don't
>> >> remember what they were.  I'll let people who think there's a point
>> >> worry about that.
>> >>
>> >>> Not sure if there's some official process for requesting it or not, but
>> >>> I think we should merge the two lists and just make packaging-sig to
>> >>> umbrella the entire packaging topics.
>> >>
>> >> There is the meta-sig, but the description is out-dated:
>> >>
>> >>    http://mail.python.org/mailman/listinfo/meta-sig
>> >>
>> >> and the last message in the archives is dated 2011, and sparked no
>> >> discussion:
>> >>
>> >>    http://mail.python.org/pipermail/meta-sig/2011-June.txt
>> >>
>> >> +1 on merging the lists.
>> >
>> > Can we do it by just dropping catalog-sig and keeping distutils-sig?
>> > I'm afraid we might lose some important distutils-sig population if
>> > the process involves renaming the list, resubscribing, etc.  I also
>> > *really* don't want to invalidate archive links to the distutils-sig
>> > archive.
>> >
>> > All in all, +1 on not having two lists, but I'm really worried about
>> > "breaking" distutils-sig.  We're still going to be talking about
>> > "distribution utilities", after all.
>>
>> Don't care how it's done. I don't know Mailman enough to know what is possible or how easy things are. I thought packaging-sig sounded nice but if you can't rename + redirect or merge or something in mailman I'm down for whatever.
>
> I've moved lists even from external sites to python.org and renamed them
> (latest was pytest-dev).  That part works nicely and people can continue
> to use the old ML address.  Merging two lists however makes it harder
> to get redirects for the old archives.  But why not just keep distutils-sig
> and catalog-sig archives, but have all their mail arrive at
> a new packaging-sig and begin a new archive for the latter?
>
> holger
>
>
>> -----------------
>> Donald Stufft
>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>>
>
>
>
>> _______________________________________________
>> Catalog-SIG mailing list
>> Catalog-SIG at python.org
>> http://mail.python.org/mailman/listinfo/catalog-sig
>
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

From nawkboy at gmail.com  Thu Mar 28 20:57:09 2013
From: nawkboy at gmail.com (James Carpenter)
Date: Thu, 28 Mar 2013 14:57:09 -0500
Subject: [Catalog-sig] How to determine if archive is an sdist or bdist
Message-ID: <CAAndj4sB=4KC1VC_n3upAEWkJ=GV+K=-ht539xJ7KBU5putXzQ@mail.gmail.com>

Is there an easy way to programmatically tell if an archive (tar.gz, zip,
etc.) in the dist directory is a binary or sdist? I would like to
post-process the contents of a dist directory and classify each build
artifact there (egg, sdist, bdist, etc.).

Currently the only approach I know of is to have my own command that is run
along with the relevant build command.  For example:

python setup.py sdist be_funky

or:
python setup.py sdist bdist bdist_egg be_funky

Using this approach the tuples in  self.distribution.dist_files provide the
command, python version and file created. Unfortunately this solution is
slightly more complicated in my use case than simply having an easy way to
classify each build artifact and extract it's pkg-info.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130328/bbdd049b/attachment-0001.html>

From pje at telecommunity.com  Thu Mar 28 21:32:26 2013
From: pje at telecommunity.com (PJ Eby)
Date: Thu, 28 Mar 2013 16:32:26 -0400
Subject: [Catalog-sig] Merge catalog-sig and distutils-sig
In-Reply-To: <3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
	<CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
	<3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io>
Message-ID: <CALeMXf55DiOKDsj45pj=tdnEVwobK7cfpOzwFN3bk0A-zhEpbg@mail.gmail.com>

On Thu, Mar 28, 2013 at 3:43 PM, Donald Stufft <donald at stufft.io> wrote:
> On Mar 28, 2013, at 3:39 PM, PJ Eby <pje at telecommunity.com> wrote:
>> Can we do it by just dropping catalog-sig and keeping distutils-sig?
>> I'm afraid we might lose some important distutils-sig population if
>> the process involves renaming the list, resubscribing, etc.  I also
>> *really* don't want to invalidate archive links to the distutils-sig
>> archive.
>>
>> All in all, +1 on not having two lists, but I'm really worried about
>> "breaking" distutils-sig.  We're still going to be talking about
>> "distribution utilities", after all.
>
> Worst case I'm sure subscribers can be transferred and the existing archive kept intact.

That's a great way to have a bunch of people complaining that they
never subscribed to packaging-sig, not to mention the part where it
breaks everyone's mail filters.

I really don't see any gains for renaming the list.  It's not like we
can go and scrub the entire internet of references to distutils-sig.

From donald at stufft.io  Thu Mar 28 21:32:16 2013
From: donald at stufft.io (Donald Stufft)
Date: Thu, 28 Mar 2013 16:32:16 -0400
Subject: [Catalog-sig] Merge catalog-sig and distutils-sig
In-Reply-To: <20130328200419.GM9677@merlinux.eu>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
	<CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
	<3BF298C9-293D-40FF-A86F-76206A88D162@stufft.io>
	<20130328200419.GM9677@merlinux.eu>
Message-ID: <4BDF3B12-B394-4823-9186-73D1E742E78F@stufft.io>


On Mar 28, 2013, at 4:04 PM, holger krekel <holger at merlinux.eu> wrote:

> On Thu, Mar 28, 2013 at 15:42 -0400, Donald Stufft wrote:
>> On Mar 28, 2013, at 3:39 PM, PJ Eby <pje at telecommunity.com> wrote:
>> 
>>> On Thu, Mar 28, 2013 at 3:14 PM, Fred Drake <fred at fdrake.net> wrote:
>>>> On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft <donald at stufft.io> wrote:
>>>>> Is there much point in keeping catalog-sig and distutils-sig separate?
>>>> 
>>>> No.
>>>> 
>>>> The last time this was brought up, there were objections, but I don't
>>>> remember what they were.  I'll let people who think there's a point
>>>> worry about that.
>>>> 
>>>>> Not sure if there's some official process for requesting it or not, but
>>>>> I think we should merge the two lists and just make packaging-sig to
>>>>> umbrella the entire packaging topics.
>>>> 
>>>> There is the meta-sig, but the description is out-dated:
>>>> 
>>>>   http://mail.python.org/mailman/listinfo/meta-sig
>>>> 
>>>> and the last message in the archives is dated 2011, and sparked no
>>>> discussion:
>>>> 
>>>>   http://mail.python.org/pipermail/meta-sig/2011-June.txt
>>>> 
>>>> +1 on merging the lists.
>>> 
>>> Can we do it by just dropping catalog-sig and keeping distutils-sig?
>>> I'm afraid we might lose some important distutils-sig population if
>>> the process involves renaming the list, resubscribing, etc.  I also
>>> *really* don't want to invalidate archive links to the distutils-sig
>>> archive.
>>> 
>>> All in all, +1 on not having two lists, but I'm really worried about
>>> "breaking" distutils-sig.  We're still going to be talking about
>>> "distribution utilities", after all.
>> 
>> Don't care how it's done. I don't know Mailman enough to know what is possible or how easy things are. I thought packaging-sig sounded nice but if you can't rename + redirect or merge or something in mailman I'm down for whatever.
> 
> I've moved lists even from external sites to python.org and renamed them
> (latest was pytest-dev).  That part works nicely and people can continue
> to use the old ML address.  Merging two lists however makes it harder
> to get redirects for the old archives.  But why not just keep distutils-sig
> and catalog-sig archives, but have all their mail arrive at
> a new packaging-sig and begin a new archive for the latter?
> 
> holger
> 
> 
>> -----------------
>> Donald Stufft
>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>> 
> 
> 
> 
>> _______________________________________________
>> Catalog-SIG mailing list
>> Catalog-SIG at python.org
>> http://mail.python.org/mailman/listinfo/catalog-sig
> 


sounds good to me.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130328/d23342ed/attachment.pgp>

From pje at telecommunity.com  Thu Mar 28 21:36:22 2013
From: pje at telecommunity.com (PJ Eby)
Date: Thu, 28 Mar 2013 16:36:22 -0400
Subject: [Catalog-sig] How to determine if archive is an sdist or bdist
In-Reply-To: <CAAndj4sB=4KC1VC_n3upAEWkJ=GV+K=-ht539xJ7KBU5putXzQ@mail.gmail.com>
References: <CAAndj4sB=4KC1VC_n3upAEWkJ=GV+K=-ht539xJ7KBU5putXzQ@mail.gmail.com>
Message-ID: <CALeMXf5kx++DG+LEzm3+syM_QoXS8aAiByJKycRyUbPoL8c6ag@mail.gmail.com>

On Thu, Mar 28, 2013 at 3:57 PM, James Carpenter <nawkboy at gmail.com> wrote:
> Is there an easy way to programmatically tell if an archive (tar.gz, zip,
> etc.) in the dist directory is a binary or sdist? I would like to
> post-process the contents of a dist directory and classify each build
> artifact there (egg, sdist, bdist, etc.).

An sdist always has a single subdirectory in the archive's root
directory, named for the package+version, and containing a PKG-INFO
and setup.py (plus a bunch of other stuff).

A bdist_dumb will not have such a subdirectory in the archive root;
instead it will have one or more directories like /usr, /opt, /Program
Files.

Other bdist formats?  Hard to say.

From jacob at jacobian.org  Thu Mar 28 22:15:56 2013
From: jacob at jacobian.org (Jacob Kaplan-Moss)
Date: Thu, 28 Mar 2013 16:15:56 -0500
Subject: [Catalog-sig] Merge catalog-sig and distutils-sig
In-Reply-To: <CALeMXf55DiOKDsj45pj=tdnEVwobK7cfpOzwFN3bk0A-zhEpbg@mail.gmail.com>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
	<CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
	<3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io>
	<CALeMXf55DiOKDsj45pj=tdnEVwobK7cfpOzwFN3bk0A-zhEpbg@mail.gmail.com>
Message-ID: <CAK8PqJELxd5NoFSFYfaVU4_jo3JUfe+MguBynoQQE8ignd30ng@mail.gmail.com>

C'mon, folks, we're arguing about a name. That's about as close to
literal bikeshedding as we could get.

How about we just let whoever has the keys make the change in whatever
way's easiest and most logical for them?

Jacob

From richard at python.org  Thu Mar 28 22:42:06 2013
From: richard at python.org (Richard Jones)
Date: Fri, 29 Mar 2013 08:42:06 +1100
Subject: [Catalog-sig] [Distutils]  Merge catalog-sig and distutils-sig
In-Reply-To: <CALeMXf55DiOKDsj45pj=tdnEVwobK7cfpOzwFN3bk0A-zhEpbg@mail.gmail.com>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
	<CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
	<3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io>
	<CALeMXf55DiOKDsj45pj=tdnEVwobK7cfpOzwFN3bk0A-zhEpbg@mail.gmail.com>
Message-ID: <CAHrZfZCj7hY68065YN0WzBvmDuwPOS0EwFyiQXjN7Cjxcu_Qcw@mail.gmail.com>

I think I'm the only one on the list who probably would have objected
but I'm on both now so whatever :-)


    Richard

On 29 March 2013 07:32, PJ Eby <pje at telecommunity.com> wrote:
> On Thu, Mar 28, 2013 at 3:43 PM, Donald Stufft <donald at stufft.io> wrote:
>> On Mar 28, 2013, at 3:39 PM, PJ Eby <pje at telecommunity.com> wrote:
>>> Can we do it by just dropping catalog-sig and keeping distutils-sig?
>>> I'm afraid we might lose some important distutils-sig population if
>>> the process involves renaming the list, resubscribing, etc.  I also
>>> *really* don't want to invalidate archive links to the distutils-sig
>>> archive.
>>>
>>> All in all, +1 on not having two lists, but I'm really worried about
>>> "breaking" distutils-sig.  We're still going to be talking about
>>> "distribution utilities", after all.
>>
>> Worst case I'm sure subscribers can be transferred and the existing archive kept intact.
>
> That's a great way to have a bunch of people complaining that they
> never subscribed to packaging-sig, not to mention the part where it
> breaks everyone's mail filters.
>
> I really don't see any gains for renaming the list.  It's not like we
> can go and scrub the entire internet of references to distutils-sig.
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> http://mail.python.org/mailman/listinfo/distutils-sig

From donald at stufft.io  Thu Mar 28 22:57:11 2013
From: donald at stufft.io (Donald Stufft)
Date: Thu, 28 Mar 2013 17:57:11 -0400
Subject: [Catalog-sig] [Distutils]  Merge catalog-sig and distutils-sig
In-Reply-To: <kj2dni$8l6$1@ger.gmane.org>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
	<CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
	<3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io>
	<CALeMXf55DiOKDsj45pj=tdnEVwobK7cfpOzwFN3bk0A-zhEpbg@mail.gmail.com>
	<kj2dni$8l6$1@ger.gmane.org>
Message-ID: <D8127498-8C8F-41E6-9D18-308C64ED78FC@stufft.io>


On Mar 28, 2013, at 5:42 PM, Tres Seaver <tseaver at palladion.com> wrote:

> Signed PGP part
> On 03/28/2013 04:32 PM, PJ Eby wrote:
> > On Thu, Mar 28, 2013 at 3:43 PM, Donald Stufft <donald at stufft.io>
> > wrote:
> >> On Mar 28, 2013, at 3:39 PM, PJ Eby <pje at telecommunity.com> wrote:
> >>> Can we do it by just dropping catalog-sig and keeping
> >>> distutils-sig? I'm afraid we might lose some important
> >>> distutils-sig population if the process involves renaming the
> >>> list, resubscribing, etc.  I also *really* don't want to
> >>> invalidate archive links to the distutils-sig archive.
> >>> 
> >>> All in all, +1 on not having two lists, but I'm really worried
> >>> about "breaking" distutils-sig.  We're still going to be talking
> >>> about "distribution utilities", after all.
> >> 
> >> Worst case I'm sure subscribers can be transferred and the existing
> >> archive kept intact.
> > 
> > That's a great way to have a bunch of people complaining that they 
> > never subscribed to packaging-sig, not to mention the part where it 
> > breaks everyone's mail filters.
> > 
> > I really don't see any gains for renaming the list.  It's not like we 
> > can go and scrub the entire internet of references to distutils-sig.
> 
> Not to mention breaking the gmane.org gateway, and those of us who sip
> the firehose there instead of trying to swallow it via e-mail.
> 
> 
> Tres.
> - -- 
> ===================================================================
> Tres Seaver          +1 540-429-0999          tseaver at palladion.com
> Palladion Software   "Excellence by Design"    http://palladion.com
> 
> 
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> http://mail.python.org/mailman/listinfo/distutils-sig

This problem is inherent no matter what name is picked. GMane will need updated and some messages will need sent to tell people about the new name. No matter what at least one name isn't going to be used anymore.

It's not that big of a deal.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130328/cdb873bb/attachment-0001.pgp>

From aclark at aclark.net  Fri Mar 29 00:01:45 2013
From: aclark at aclark.net (Alex Clark)
Date: Thu, 28 Mar 2013 19:01:45 -0400
Subject: [Catalog-sig] [Distutils]  Merge catalog-sig and distutils-sig
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
	<CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
	<3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io>
	<CALeMXf55DiOKDsj45pj=tdnEVwobK7cfpOzwFN3bk0A-zhEpbg@mail.gmail.com>
	<kj2dni$8l6$1@ger.gmane.org>
	<D8127498-8C8F-41E6-9D18-308C64ED78FC@stufft.io>
Message-ID: <kj2i4j$lik$1@ger.gmane.org>

On 2013-03-28 21:57:11 +0000, Donald Stufft said:

> 
> On Mar 28, 2013, at 5:42 PM, Tres Seaver <tseaver at palladion.com> wrote:
> 
>> Signed PGP part
>> On 03/28/2013 04:32 PM, PJ Eby wrote:
>>> On Thu, Mar 28, 2013 at 3:43 PM, Donald Stufft <donald at stufft.io>
>>> wrote:
>>>> On Mar 28, 2013, at 3:39 PM, PJ Eby <pje at telecommunity.com> wrote:
>>>>> Can we do it by just dropping catalog-sig and keeping
>>>>> distutils-sig? I'm afraid we might lose some important
>>>>> distutils-sig population if the process involves renaming the
>>>>> list, resubscribing, etc.  I also *really* don't want to
>>>>> invalidate archive links to the distutils-sig archive.
>>>>> 
>>>>> All in all, +1 on not having two lists, but I'm really worried
>>>>> about "breaking" distutils-sig.  We're still going to be talking
>>>>> about "distribution utilities", after all.
>>>> 
>>>> Worst case I'm sure subscribers can be transferred and the existing
>>>> archive kept intact.
>>> 
>>> That's a great way to have a bunch of people complaining that they
>>> never subscribed to packaging-sig, not to mention the part where it
>>> breaks everyone's mail filters.
>>> 
>>> I really don't see any gains for renaming the list.  It's not like we
>>> can go and scrub the entire internet of references to distutils-sig.
>> 
>> Not to mention breaking the gmane.org gateway, and those of us who sip
>> the firehose there instead of trying to swallow it via e-mail.
>> 
>> 
>> Tres.
>> - --
>> ==================================================================> 
>> Tres Seaver          +1 540-429-0999          tseaver at palladion.com
>> Palladion Software   "Excellence by Design"    http://palladion.com
>> 
>> 
>> _______________________________________________
>> Distutils-SIG maillist  -  Distutils-SIG at python.org
>> http://mail.python.org/mailman/listinfo/distutils-sig
> 
> This problem is inherent no matter what name is picked. GMane will need 
> updated and some messages will need sent to tell people about the new 
> name. No matter what at least one name isn't going to be used anymore.
> 
> It's not that big of a deal.


FWIW: I am a GMANE-sipper and I'm willing to rejoin a new packaging-sig 
list (as well as register the new list with GMANE if no one else does). 
Seems to me another viable option is to simply turn off catalog-sig and 
distutils-sig (while preserving the archives forever, of course) and 
just start chatting on packaging-sig. Send an email to both lists "Last 
post, please join packaging-sig" and you are done.


> 
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 
> 
> 
> <image>_______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


-- 
Alex Clark ? http://about.me/alex.clark



From pje at telecommunity.com  Fri Mar 29 00:28:14 2013
From: pje at telecommunity.com (PJ Eby)
Date: Thu, 28 Mar 2013 19:28:14 -0400
Subject: [Catalog-sig] Merge catalog-sig and distutils-sig
In-Reply-To: <CAK8PqJELxd5NoFSFYfaVU4_jo3JUfe+MguBynoQQE8ignd30ng@mail.gmail.com>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
	<CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
	<3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io>
	<CALeMXf55DiOKDsj45pj=tdnEVwobK7cfpOzwFN3bk0A-zhEpbg@mail.gmail.com>
	<CAK8PqJELxd5NoFSFYfaVU4_jo3JUfe+MguBynoQQE8ignd30ng@mail.gmail.com>
Message-ID: <CALeMXf7cXtFB22gSVQMP2oNKw+67ot1jJSJXLtfYiRusZ=3HSQ@mail.gmail.com>

On Thu, Mar 28, 2013 at 5:15 PM, Jacob Kaplan-Moss <jacob at jacobian.org> wrote:
> C'mon, folks, we're arguing about a name. That's about as close to
> literal bikeshedding as we could get.

I'm not arguing about the *name*.  I just don't see the point in
making everybody subscribe to a new list and change their mail filters
(and update every book and webpage out there that mentions the
distutils-sig), because a few people want to *change* the name -- a
change that AFAICT doesn't actually provide any tangible benefit to
anybody whatsoever.


> How about we just let whoever has the keys make the change in whatever way's easiest and most logical for them?

Because it's not up to just the person with the keys.  Neither SIG is
a mere mailing list, it's a Python special interest group, and SIGs
have their own formation and termination processes.

In particular, if you're going to start a new SIG, one of the
requirements to be met is "in particular, no other SIG nor the general
Python newsgroup is already more suitable" (per the Python SIG
Creation Guidelines).  It's hard to argue that distutils-sig isn't
already more suitable than whatever is being proposed to take its
place.

From donald at stufft.io  Fri Mar 29 00:45:55 2013
From: donald at stufft.io (Donald Stufft)
Date: Thu, 28 Mar 2013 19:45:55 -0400
Subject: [Catalog-sig] Merge catalog-sig and distutils-sig
In-Reply-To: <CALeMXf7cXtFB22gSVQMP2oNKw+67ot1jJSJXLtfYiRusZ=3HSQ@mail.gmail.com>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
	<CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
	<3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io>
	<CALeMXf55DiOKDsj45pj=tdnEVwobK7cfpOzwFN3bk0A-zhEpbg@mail.gmail.com>
	<CAK8PqJELxd5NoFSFYfaVU4_jo3JUfe+MguBynoQQE8ignd30ng@mail.gmail.com>
	<CALeMXf7cXtFB22gSVQMP2oNKw+67ot1jJSJXLtfYiRusZ=3HSQ@mail.gmail.com>
Message-ID: <A5049F52-6A73-47A3-A749-FF23174296DF@stufft.io>


On Mar 28, 2013, at 7:28 PM, PJ Eby <pje at telecommunity.com> wrote:

> On Thu, Mar 28, 2013 at 5:15 PM, Jacob Kaplan-Moss <jacob at jacobian.org> wrote:
>> C'mon, folks, we're arguing about a name. That's about as close to
>> literal bikeshedding as we could get.
> 
> I'm not arguing about the *name*.  I just don't see the point in
> making everybody subscribe to a new list and change their mail filters
> (and update every book and webpage out there that mentions the
> distutils-sig), because a few people want to *change* the name -- a
> change that AFAICT doesn't actually provide any tangible benefit to
> anybody whatsoever.
> 
> 
>> How about we just let whoever has the keys make the change in whatever way's easiest and most logical for them?
> 
> Because it's not up to just the person with the keys.  Neither SIG is
> a mere mailing list, it's a Python special interest group, and SIGs
> have their own formation and termination processes.
> 
> In particular, if you're going to start a new SIG, one of the
> requirements to be met is "in particular, no other SIG nor the general
> Python newsgroup is already more suitable" (per the Python SIG
> Creation Guidelines).  It's hard to argue that distutils-sig isn't
> already more suitable than whatever is being proposed to take its
> place.

A requirement for a SIG is also that it has a clear goal and a start and end date. distutils-sig's goal is the distutils module. And the "end date" requirements seems to be completely ignored anymore so arguing strict adherence to the rules seems to be a wash.

I suggested packaging-sig because discussion jumps back and forth between distutils-sig and catalog-sig and neither name nor stated goal really reflected what the sig was actually about which was packaging in python in general. I also suggested packaging because it matched the other current sigs which are generic topics and not about a single module. But whatever, I hate the pointless duplication and just want to kill the overlap.


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130328/a1e5a4eb/attachment.pgp>

From dennis.coldwell at gmail.com  Fri Mar 29 01:19:54 2013
From: dennis.coldwell at gmail.com (Dennis Coldwell)
Date: Thu, 28 Mar 2013 17:19:54 -0700
Subject: [Catalog-sig] [Distutils]  Merge catalog-sig and distutils-sig
In-Reply-To: <A5049F52-6A73-47A3-A749-FF23174296DF@stufft.io>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
	<CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
	<3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io>
	<CALeMXf55DiOKDsj45pj=tdnEVwobK7cfpOzwFN3bk0A-zhEpbg@mail.gmail.com>
	<CAK8PqJELxd5NoFSFYfaVU4_jo3JUfe+MguBynoQQE8ignd30ng@mail.gmail.com>
	<CALeMXf7cXtFB22gSVQMP2oNKw+67ot1jJSJXLtfYiRusZ=3HSQ@mail.gmail.com>
	<A5049F52-6A73-47A3-A749-FF23174296DF@stufft.io>
Message-ID: <CAGBjFcVeo8O5tEKqOnmK6++-CVtZdMdedEXHTBpYTNJRs7P_OQ@mail.gmail.com>

> But whatever, I hate the pointless duplication and just want to kill the
overlap.

Agree, +1 to merging into one list.


On Thu, Mar 28, 2013 at 4:45 PM, Donald Stufft <donald at stufft.io> wrote:

>
> On Mar 28, 2013, at 7:28 PM, PJ Eby <pje at telecommunity.com> wrote:
>
> > On Thu, Mar 28, 2013 at 5:15 PM, Jacob Kaplan-Moss <jacob at jacobian.org>
> wrote:
> >> C'mon, folks, we're arguing about a name. That's about as close to
> >> literal bikeshedding as we could get.
> >
> > I'm not arguing about the *name*.  I just don't see the point in
> > making everybody subscribe to a new list and change their mail filters
> > (and update every book and webpage out there that mentions the
> > distutils-sig), because a few people want to *change* the name -- a
> > change that AFAICT doesn't actually provide any tangible benefit to
> > anybody whatsoever.
> >
> >
> >> How about we just let whoever has the keys make the change in whatever
> way's easiest and most logical for them?
> >
> > Because it's not up to just the person with the keys.  Neither SIG is
> > a mere mailing list, it's a Python special interest group, and SIGs
> > have their own formation and termination processes.
> >
> > In particular, if you're going to start a new SIG, one of the
> > requirements to be met is "in particular, no other SIG nor the general
> > Python newsgroup is already more suitable" (per the Python SIG
> > Creation Guidelines).  It's hard to argue that distutils-sig isn't
> > already more suitable than whatever is being proposed to take its
> > place.
>
> A requirement for a SIG is also that it has a clear goal and a start and
> end date. distutils-sig's goal is the distutils module. And the "end date"
> requirements seems to be completely ignored anymore so arguing strict
> adherence to the rules seems to be a wash.
>
> I suggested packaging-sig because discussion jumps back and forth between
> distutils-sig and catalog-sig and neither name nor stated goal really
> reflected what the sig was actually about which was packaging in python in
> general. I also suggested packaging because it matched the other current
> sigs which are generic topics and not about a single module. But whatever,
> I hate the pointless duplication and just want to kill the overlap.
>
>
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372
> DCFA
>
>
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> http://mail.python.org/mailman/listinfo/distutils-sig
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130328/afab3e78/attachment-0001.html>

From barry at python.org  Fri Mar 29 03:59:11 2013
From: barry at python.org (Barry Warsaw)
Date: Thu, 28 Mar 2013 22:59:11 -0400
Subject: [Catalog-sig] [Distutils] Merge catalog-sig and distutils-sig
In-Reply-To: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
Message-ID: <20130328225911.513250fe@anarchist>

On Mar 28, 2013, at 02:22 PM, Donald Stufft wrote:

>Is there much point in keeping catalog-sig and distutils-sig separate?

Without yet reading the whole thread, I'll just mention that it's probably
easier to just retire one or the other mailing lists and divert all discussion
to the other one.  Of course, the archives for the retired list would be
retained for historical purposes.  In fact, sigs are *supposed* to be
periodically reviewed for renewal or retirement, though I think practically
speaking we haven't done that in a very long time.

If there's consensus on what you want to do, please contact postmaster@ and
let them know.  Let's say you just want to retire catalog-sig: we can set up
forwards to distutils-sig and let the former be an "acceptable alias" to the
latter so postings will be accepted when addressed to either.  Of course,
folks on the defunct list should manually subscribe to the good list
(i.e. opt-in).

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130328/becc7284/attachment.pgp>

From tseaver at palladion.com  Fri Mar 29 04:45:52 2013
From: tseaver at palladion.com (Tres Seaver)
Date: Thu, 28 Mar 2013 23:45:52 -0400
Subject: [Catalog-sig] [Distutils]  Merge catalog-sig and distutils-sig
In-Reply-To: <D8127498-8C8F-41E6-9D18-308C64ED78FC@stufft.io>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
	<CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
	<3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io>
	<CALeMXf55DiOKDsj45pj=tdnEVwobK7cfpOzwFN3bk0A-zhEpbg@mail.gmail.com>
	<kj2dni$8l6$1@ger.gmane.org>
	<D8127498-8C8F-41E6-9D18-308C64ED78FC@stufft.io>
Message-ID: <kj32o5$uq1$1@ger.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/28/2013 05:57 PM, Donald Stufft wrote:
> 
> On Mar 28, 2013, at 5:42 PM, Tres Seaver <tseaver at palladion.com>
> wrote:
> 
>> On 03/28/2013 04:32 PM, PJ Eby wrote:

>>> I really don't see any gains for renaming the list.  It's not like
>>> we can go and scrub the entire internet of references to
>>> distutils-sig.
>> 
>> Not to mention breaking the gmane.org gateway, and those of us who
>> sip the firehose there instead of trying to swallow it via e-mail.

> This problem is inherent no matter what name is picked. GMane will
> need updated and some messages will need sent to tell people about the
> new name. No matter what at least one name isn't going to be used
> anymore.
> 
> It's not that big of a deal.

If we leave the main list the 'distutils-sig', and just announce that
'catalog-sig' is retired, folks who want to follow the new list just
switch over.  All the archives (mailman / gmane / etc.) stay valid, but
the list goes into moderated mode.

Creating a third list and retiring both the existing ones is extra hassle
for no value, aside for a "cleanliness" issue on its name.


Tres.
- -- 
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iEYEARECAAYFAlFVDmoACgkQ+gerLs4ltQ4yDQCfZCEUpSTIhQKNNDilIYIRc6Jj
Fu0AoM6RKaflwbeek0VFGsX1USIzUhlC
=gWkJ
-----END PGP SIGNATURE-----


From richard at python.org  Fri Mar 29 10:47:48 2013
From: richard at python.org (Richard Jones)
Date: Fri, 29 Mar 2013 20:47:48 +1100
Subject: [Catalog-sig] [Distutils] Merge catalog-sig and distutils-sig
In-Reply-To: <kj32o5$uq1$1@ger.gmane.org>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
	<CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
	<3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io>
	<CALeMXf55DiOKDsj45pj=tdnEVwobK7cfpOzwFN3bk0A-zhEpbg@mail.gmail.com>
	<kj2dni$8l6$1@ger.gmane.org>
	<D8127498-8C8F-41E6-9D18-308C64ED78FC@stufft.io>
	<kj32o5$uq1$1@ger.gmane.org>
Message-ID: <CAHrZfZAM+zr7MJE4Thbd1uX8L708osrvXX1eCj+H3fAOt=r6Lw@mail.gmail.com>

On 29 March 2013 14:45, Tres Seaver <tseaver at palladion.com> wrote:
> If we leave the main list the 'distutils-sig', and just announce that
> 'catalog-sig' is retired, folks who want to follow the new list just
> switch over.  All the archives (mailman / gmane / etc.) stay valid, but
> the list goes into moderated mode.

Whoever has the power to do this, do it please.


    Richard

From pje at telecommunity.com  Fri Mar 29 19:54:22 2013
From: pje at telecommunity.com (PJ Eby)
Date: Fri, 29 Mar 2013 14:54:22 -0400
Subject: [Catalog-sig] How to determine if archive is an sdist or bdist
In-Reply-To: <CAAndj4vFFVkudSui9c_0+nD1vPHuveZicn4pf9hXD_une_UQCA@mail.gmail.com>
References: <CAAndj4sB=4KC1VC_n3upAEWkJ=GV+K=-ht539xJ7KBU5putXzQ@mail.gmail.com>
	<CALeMXf5kx++DG+LEzm3+syM_QoXS8aAiByJKycRyUbPoL8c6ag@mail.gmail.com>
	<CAAndj4vFFVkudSui9c_0+nD1vPHuveZicn4pf9hXD_une_UQCA@mail.gmail.com>
Message-ID: <CALeMXf7p=s8uQE4Dz20d3tVQmTyC0XfDdq6LiAvg=oApYuB3tw@mail.gmail.com>

On Fri, Mar 29, 2013 at 11:00 AM, James Carpenter <nawkboy at gmail.com> wrote:
> Looks like the idea of using a custom command is a better approach then.

I'm not sure why you think that.  The only kinds of archives whose
file types are ambiguous from the name, are sdist, bdist_dumb, and
random raw source dumps.  Everything else has a unique extension like
.egg, .exe, .msi, rpm, etc.  If you have a .zip, .tar.gz, .tgz, or
some other archive name, you can find out if it's an sdist by
inspecting its contents as I described.  And if it's not an sdist, you
can usually tell if it's a raw source dump by checking for a setup.py
in the archive root or a depth-1 subdirectory off the root.  (That's
what easy_install does, anyway, when it's given an archive it doesn't
know what to do with.)

>
> Is a custom command my only choice or can I register pre/post hooks to any
> given command?
>
>
> On Thu, Mar 28, 2013 at 3:36 PM, PJ Eby <pje at telecommunity.com> wrote:
>>
>> On Thu, Mar 28, 2013 at 3:57 PM, James Carpenter <nawkboy at gmail.com>
>> wrote:
>> > Is there an easy way to programmatically tell if an archive (tar.gz,
>> > zip,
>> > etc.) in the dist directory is a binary or sdist? I would like to
>> > post-process the contents of a dist directory and classify each build
>> > artifact there (egg, sdist, bdist, etc.).
>>
>> An sdist always has a single subdirectory in the archive's root
>> directory, named for the package+version, and containing a PKG-INFO
>> and setup.py (plus a bunch of other stuff).
>>
>> A bdist_dumb will not have such a subdirectory in the archive root;
>> instead it will have one or more directories like /usr, /opt, /Program
>> Files.
>>
>> Other bdist formats?  Hard to say.
>
>

From ncoghlan at gmail.com  Fri Mar 29 20:40:58 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 30 Mar 2013 05:40:58 +1000
Subject: [Catalog-sig] [Distutils] Merge catalog-sig and distutils-sig
In-Reply-To: <CAHrZfZAM+zr7MJE4Thbd1uX8L708osrvXX1eCj+H3fAOt=r6Lw@mail.gmail.com>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
	<CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
	<3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io>
	<CALeMXf55DiOKDsj45pj=tdnEVwobK7cfpOzwFN3bk0A-zhEpbg@mail.gmail.com>
	<kj2dni$8l6$1@ger.gmane.org>
	<D8127498-8C8F-41E6-9D18-308C64ED78FC@stufft.io>
	<kj32o5$uq1$1@ger.gmane.org>
	<CAHrZfZAM+zr7MJE4Thbd1uX8L708osrvXX1eCj+H3fAOt=r6Lw@mail.gmail.com>
Message-ID: <CADiSq7eXCAFMgumrfb-VFpdbS987w5=oETsbb51gkjNt5rw2sw@mail.gmail.com>

On Fri, Mar 29, 2013 at 7:47 PM, Richard Jones <richard at python.org> wrote:
> On 29 March 2013 14:45, Tres Seaver <tseaver at palladion.com> wrote:
>> If we leave the main list the 'distutils-sig', and just announce that
>> 'catalog-sig' is retired, folks who want to follow the new list just
>> switch over.  All the archives (mailman / gmane / etc.) stay valid, but
>> the list goes into moderated mode.
>
> Whoever has the power to do this, do it please.

+1

distutils-sig it is. We're expanding the charter to "the distutils
standard library module, the Python Package Index and associated
interoperabilty standards", but that's a lot easier than forcing
everyone to rewrite their mail filters.

Besides, it's gonna be a *long* time before the default build system
in the standard library is anything other than distutils. Coupling the
build system to the language release cycle has proven to be a *bad
idea*, because the addition of new platform support needs to happen in
a more timely fashion than language releases. The incorporation of pip
bootstrapping into 3.4 will also make it a lot easier to recommend
more readily upgraded alternatives.

Cheers,
Nick.


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From donald at stufft.io  Fri Mar 29 20:43:06 2013
From: donald at stufft.io (Donald Stufft)
Date: Fri, 29 Mar 2013 15:43:06 -0400
Subject: [Catalog-sig] [Distutils] Merge catalog-sig and distutils-sig
In-Reply-To: <CADiSq7eXCAFMgumrfb-VFpdbS987w5=oETsbb51gkjNt5rw2sw@mail.gmail.com>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
	<CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
	<3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io>
	<CALeMXf55DiOKDsj45pj=tdnEVwobK7cfpOzwFN3bk0A-zhEpbg@mail.gmail.com>
	<kj2dni$8l6$1@ger.gmane.org>
	<D8127498-8C8F-41E6-9D18-308C64ED78FC@stufft.io>
	<kj32o5$uq1$1@ger.gmane.org>
	<CAHrZfZAM+zr7MJE4Thbd1uX8L708osrvXX1eCj+H3fAOt=r6Lw@mail.gmail.com>
	<CADiSq7eXCAFMgumrfb-VFpdbS987w5=oETsbb51gkjNt5rw2sw@mail.gmail.com>
Message-ID: <7C8C91B8-46D7-4E74-9FEE-0E52F30F1132@stufft.io>


On Mar 29, 2013, at 3:40 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On Fri, Mar 29, 2013 at 7:47 PM, Richard Jones <richard at python.org> wrote:
>> On 29 March 2013 14:45, Tres Seaver <tseaver at palladion.com> wrote:
>>> If we leave the main list the 'distutils-sig', and just announce that
>>> 'catalog-sig' is retired, folks who want to follow the new list just
>>> switch over.  All the archives (mailman / gmane / etc.) stay valid, but
>>> the list goes into moderated mode.
>> 
>> Whoever has the power to do this, do it please.
> 
> +1
> 
> distutils-sig it is. We're expanding the charter to "the distutils
> standard library module, the Python Package Index and associated
> interoperabilty standards", but that's a lot easier than forcing
> everyone to rewrite their mail filters.
> 
> Besides, it's gonna be a *long* time before the default build system
> in the standard library is anything other than distutils. Coupling the
> build system to the language release cycle has proven to be a *bad
> idea*, because the addition of new platform support needs to happen in
> a more timely fashion than language releases. The incorporation of pip
> bootstrapping into 3.4 will also make it a lot easier to recommend
> more readily upgraded alternatives.
> 
> Cheers,
> Nick.
> 
> 
> -- 
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> http://mail.python.org/mailman/listinfo/distutils-sig

Sounds good to me, whoever please to doing the needful.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130329/cf007471/attachment.pgp>

From r1chardj0n3s at gmail.com  Sat Mar 30 23:16:38 2013
From: r1chardj0n3s at gmail.com (Richard Jones)
Date: Sun, 31 Mar 2013 09:16:38 +1100
Subject: [Catalog-sig] Shutting down catalog-sig
Message-ID: <CAHrZfZA7MdvZGDErcBasz0pWix1nEMwkg=Pa-q1o6hshOQZG9A@mail.gmail.com>

Hi all,

We're about to merge the catalog-sig and distutils-sig by just removing the
catalog-sig mailing list. If you wish to remain in the discussions
regarding Python package cataloging then please subscribe to the distutils
SIG.

The catalog SIG archives will remain, but the mailing list will be deleted
and the SIG will be retired.

There's no real timeframe but it will be happening imminently.


     Richard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130331/ce9bc51a/attachment.html>

From richard at python.org  Sat Mar 30 23:20:15 2013
From: richard at python.org (Richard Jones)
Date: Sun, 31 Mar 2013 09:20:15 +1100
Subject: [Catalog-sig] [Distutils]  Merge catalog-sig and distutils-sig
In-Reply-To: <7C8C91B8-46D7-4E74-9FEE-0E52F30F1132@stufft.io>
References: <C232A3E2-3E7D-4824-A60F-0819F1E136C1@stufft.io>
	<CAFT4OTFQr_dC2QZ6TA_bUfBQraanbYO5wKsNLL_9hk02p09Wrw@mail.gmail.com>
	<CALeMXf6ZTUbZLJ0WX-cXFWdzjexPM4WonsFywMOb-UwRh3ko0g@mail.gmail.com>
	<3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io>
	<CALeMXf55DiOKDsj45pj=tdnEVwobK7cfpOzwFN3bk0A-zhEpbg@mail.gmail.com>
	<kj2dni$8l6$1@ger.gmane.org>
	<D8127498-8C8F-41E6-9D18-308C64ED78FC@stufft.io>
	<kj32o5$uq1$1@ger.gmane.org>
	<CAHrZfZAM+zr7MJE4Thbd1uX8L708osrvXX1eCj+H3fAOt=r6Lw@mail.gmail.com>
	<CADiSq7eXCAFMgumrfb-VFpdbS987w5=oETsbb51gkjNt5rw2sw@mail.gmail.com>
	<7C8C91B8-46D7-4E74-9FEE-0E52F30F1132@stufft.io>
Message-ID: <CAHrZfZBUBEDc0B-QmM64bO=uckkYK7qruPUyzap=45CfAdYL0Q@mail.gmail.com>

I've set the wheels in motion. I just need a little help from the pydotorg
volunteers (and some hits from the mailman cluebat).


    Richard


On 30 March 2013 06:43, Donald Stufft <donald at stufft.io> wrote:

>
> On Mar 29, 2013, at 3:40 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
> > On Fri, Mar 29, 2013 at 7:47 PM, Richard Jones <richard at python.org>
> wrote:
> >> On 29 March 2013 14:45, Tres Seaver <tseaver at palladion.com> wrote:
> >>> If we leave the main list the 'distutils-sig', and just announce that
> >>> 'catalog-sig' is retired, folks who want to follow the new list just
> >>> switch over.  All the archives (mailman / gmane / etc.) stay valid, but
> >>> the list goes into moderated mode.
> >>
> >> Whoever has the power to do this, do it please.
> >
> > +1
> >
> > distutils-sig it is. We're expanding the charter to "the distutils
> > standard library module, the Python Package Index and associated
> > interoperabilty standards", but that's a lot easier than forcing
> > everyone to rewrite their mail filters.
> >
> > Besides, it's gonna be a *long* time before the default build system
> > in the standard library is anything other than distutils. Coupling the
> > build system to the language release cycle has proven to be a *bad
> > idea*, because the addition of new platform support needs to happen in
> > a more timely fashion than language releases. The incorporation of pip
> > bootstrapping into 3.4 will also make it a lot easier to recommend
> > more readily upgraded alternatives.
> >
> > Cheers,
> > Nick.
> >
> >
> > --
> > Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> > _______________________________________________
> > Distutils-SIG maillist  -  Distutils-SIG at python.org
> > http://mail.python.org/mailman/listinfo/distutils-sig
>
> Sounds good to me, whoever please to doing the needful.
>
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372
> DCFA
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130331/54fb2d26/attachment.html>