[Catalog-sig] Proposal: close the PyPI file-replacement loophole

M.-A. Lemburg mal at egenix.com
Mon Jan 30 19:27:28 CET 2012

Tarek Ziadé wrote:
> On Mon, Jan 30, 2012 at 1:46 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> Donald Stufft wrote:
>>> On Monday, January 30, 2012 at 4:23 AM, M.-A. Lemburg wrote:
>>>> Richard Jones wrote:
>>>>> Hi catalog-sig,
>>>>> When we initially implemented file upload to PyPI it was our intention
>>>>> that the file be immutable once uploaded. The goal was to make things
>>>>> significantly simpler for end users - there would only ever be one
>>>>> file with a given name. If the content changed then so must the name
>>>>> (typically by creating a new release version.)
>>>>> After the upload facility was put in place we also added the ability
>>>>> to delete files uploaded to pypi. This created a loophole: if a
>>>>> package owner knew how to they could delete the file and re-upload,
>>>>> thus circumventing the replacement protection.
>>>>> I'm considering closing this loophole by retaining a record of the
>>>>> uploaded file (though not the contents) so that future uploads with
>>>>> the same name wouldn't be allowed. I understand that this is how the
>>>>> ruby gem archive handles deletion of files.
>>>>> Your thoughts?
>>>> I don't think that's a good idea, since it would require the
>>>> package author to issue a new release whenever something goes wrong
>>>> with an upload (e.g. missing files, corrupted archive, etc.).
>>>> Please leave the existing logic in place.
>>> And version numbers are a scarce resource?
>> No, but having to kick off the whole release process again
>> just because something went wrong when uploading release files
>> to PyPI causes plenty of trouble.
> It's the opposite that gets you into trouble: once you have uploaded
> something at PyPI, it potentially gets copied to mirrors.
> The mirror protocol, as far as I remember, does not deal with 'updates of
> existing files'
> IOW if the release is broken and you fix it in pypi, it might stay broken
> in mirrorsand the inconsistent state is much more trouble.

That shouldn't be a problem if the mirrors correctly implements
the protocol:

PEP 381:
Mirrors must reduce the amount of data transfered between the central server and the mirror. To
achieve that, they MUST use the changelog() PyPI XML-RPC call, and only refetch the packages that
have been changed since the last time. For each package P, they MUST copy documents /simple/P/ and
/serversig/P. If a package is deleted on the central server, they MUST delete the package and all
associated files. To detect modification of package files, they MAY cache the file's ETag, and MAY
request skipping it using the If-none-match header.

Note that the whole package (including all files) is refetched
whenever something changes. The only allowed optimization is
to look at the file's modification date, which will be different
for newly uploaded files of the same name.

See Martin's pep381client for details:


Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jan 30 2012)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Catalog-SIG mailing list