[Catalog-sig] an immutable mirror of PyPI

M.-A. Lemburg mal at egenix.com
Fri Jul 15 17:59:31 CEST 2011

Martijn Faassen wrote:
> Hi there,
> On 07/06/2011 05:54 PM, M.-A. Lemburg wrote:
>>> This is not a good reason, IMHO. You can go on with new versions and a
>>> new name, maybe you could want to deprecate the old package, but it's
>>> not a good reason to remove it.
>> I think undoing mistakes in package names is a very
>> good reason to remove them. As package author you don't want such
>> mistakes to stay on the net forever, if you can avoid it.
> I don't understand the "you don't want such mistakes to stay on the net
> forever" line of argumentation. As a package author, once you release
> something to the internet, you can assume it'll stay on the net. This is
> already true. So this is an unfixable problem that we're not making
> worse by having an immutable mirror.

It's not unfixable, but it's not easy fo fix either. In any case,
we shouldn't be making it harder for authors.

We shouldn't forget that we have Python users that are young,
not very good at naming things, not aware of consequences that their
naming scheme may have, careless, etc. etc.

We don't want to make things hard for them, otherwise they'd
simply turn away from us, instead of learning how to work with
PyPI properly.

We do have a responsibility here as community and simply saying:
"you uploaded it, so it's your fault, that it'll stay up there",
blaming the author forever, does not align well with our Python
community spirit.

>>>>   * legal action (copyright, trademark, DMCA, license issues, etc)
>>>>   * removal of malicious packages (e.g. script kiddy stuff in
>>>>    setup.py)
>>>>   * seriously broken builds (e.g. that cause users to lose data)
>>> This is would make a good reason for package removal, but not for
>>> version reassignment. I.E. if I delete version 0.4.3 because it
>>> deleted my /usr instead of /usr/share/mylib/content, then I would be
>>> right at removing it, but there's no point in allowing any other
>>> package to take back 0.4.3. It's simply gone.
>> I'm not sure I understand what you want to say. I wasn't talking
>> about version reassignment in the above cases.
> Well, it restricts the use cases. The use case "re-release Foo 3.0 but
> with different contents" doesn't seem to be necessary to tackle the
> above, just plain removal.
> So a friendly perpetual mirror would need to be able to handle delete
> requests but not re-upload requests.

It all depends ... again, it's best to keep the files around that
you've tested, rather than relying on some stored on some external
file server.

>>>>   * reassigning package names (not sure whether that's possible with
>>>>    PyPI, but it certainly happens in the wild every now and then)
>>> I'm not sure about what you mean here.
>> Author A releases a package X, then drops the idea and removes
>> the package, freeing up the name for others to use. Later on,
>> author B uses the name X for something different and creates
>> a new package X with a new set of releases.
> I wonder by the way whether PyPI supports the "dropping package name
> forever" use case now.

Sure: You file a ticket and Martin or Richard removes the package.

> I think a lot is to be gained if you assume package names to be unique
> forever.
> For instance: allowing this would be a major security risk: I release
> package X. Package X is used by people. Then I drop the name. Someone
> else completely unrelated comes along, creates package with the same
> name, and uploads evil code. Hilarity ensues.
> I know the answer already: "Just don't use packages by people who will
> drop the name at a nebulous point in the future then!" :)

As PyPI grows, we'll sooner or later see popular names being reserved
on PyPI or names being used for PyPI/distutils/setuptools

Removal of the latter already does occur now, e.g. with the linked
list packages described in a Python book for learning distutils

> Yes, allowing this follows from the "developers should have total
> freedom" goal that is apparently the main driver behind PyPI's use
> cases, but there are also "security please" and "repeatability" use cases.

There's no security on PyPI and repeatability is a myth as well :-)

Just because we don't have malicious packages on PyPI, doesn't
mean it's going to stay like this forever and for repeatability
you're better off relying on files you've already downloaded and
tested, since it is possible to reupload a package release file
with different content, or to change the meta data of a release
after its initial upload.

I'd suggest you build a read-only PyPI mirror tool that
people can download and then use on their private net as
they see fit. This solves you use case and that of the other
proponents of the read-only PyPI idea, while leaving the
mainland PyPI index unchanged.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jul 15 2011)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Catalog-SIG mailing list