[Catalog-sig] Proposal: close the PyPI file-replacement loophole

M.-A. Lemburg mal at egenix.com
Mon Jan 30 11:27:11 CET 2012


Donald Stufft wrote:
> On Monday, January 30, 2012 at 4:46 AM, M.-A. Lemburg wrote:
>> Donald Stufft wrote:
>>>
>>>
>>> On Monday, January 30, 2012 at 4:23 AM, M.-A. Lemburg wrote:
>>>
>>>> Richard Jones wrote:
>>>>> Hi catalog-sig,
>>>>>
>>>>> When we initially implemented file upload to PyPI it was our intention
>>>>> that the file be immutable once uploaded. The goal was to make things
>>>>> significantly simpler for end users - there would only ever be one
>>>>> file with a given name. If the content changed then so must the name
>>>>> (typically by creating a new release version.)
>>>>>
>>>>> After the upload facility was put in place we also added the ability
>>>>> to delete files uploaded to pypi. This created a loophole: if a
>>>>> package owner knew how to they could delete the file and re-upload,
>>>>> thus circumventing the replacement protection.
>>>>>
>>>>> I'm considering closing this loophole by retaining a record of the
>>>>> uploaded file (though not the contents) so that future uploads with
>>>>> the same name wouldn't be allowed. I understand that this is how the
>>>>> ruby gem archive handles deletion of files.
>>>>>
>>>>> Your thoughts?
>>>>
>>>> I don't think that's a good idea, since it would require the
>>>> package author to issue a new release whenever something goes wrong
>>>> with an upload (e.g. missing files, corrupted archive, etc.).
>>>>
>>>> Please leave the existing logic in place.
>>> And version numbers are a scarce resource? 
>>>
>>
>>
>> No, but having to kick off the whole release process again
>> just because something went wrong when uploading release files
>> to PyPI causes plenty of trouble.
>>
> I would assert that almost every time something goes wrong with "uploading to PyPI" it's actually "I didn't package my software correctly". A better solution to people failing package correctly (missing MANIFEST, whatever) is to test your package prior to uploading to PyPI. Then you don't need mutable files, your release process becomes more robust, your releases become more robust, and the ecosystem in general becomes more robust.
>
> Further more I would still argue that the benefits to the community outweigh the ability for people to skimp on the release process. Either you are doing your releases adhoc, in that case you don't have much of a release process to begin with, so doing it over again to bump it up one more isn't a huge deal, or you have a large release process and testing the package before distributing it should be a part of it. 

Due to the way PyPI uploads through distutils work by default, it is not
always easy to apply those checks to the uploaded files (distutils
recreates the distribution files when running the upload command and
even though it is possible to have the command reuse an already
created distribution file, that process is tricky and not well known).

Besides, we're not talking about a common case here, just an emergency
exit that can be used if needed.

>>> (Even though I believe it would be acceptable to cover that particular use case by giving a grace period of when you can re upload). 
>>
>>
>> Can't we just leave dealing with that problem to the package authors ?
>> It's their responsibility, not PyPI's.
>>
>>
> 
> In my opinion No. PyPI is acting as the central repository, it is it's responsibility to take a reasonable effort to protect the people that depend on it. The current solution doesn't just make end developers at risk from a bad author breaking their well tested software. It also puts the security of their software under the the author's watch. Author of a particular package's credentials get leaked/stolen/whatever? suddenly my software is now possibly vulnerable to whatever person did it decides to upload. 
> 
> It puts the integrity of my (proverbial my) software in the hands of a disparate group of authors who may or may not have the same stringent testing that I do. Any python application that get's installed from PyPI is at risk of mysteriously breaking, even with a "known good" configuration. These bugs are often hard to track down, and very confusing and difficult to determine why they are occurring when they never did before.

PyPI uploads get stored with a hash sum, so any such changes can
easily be recognized on the client side, if there's a need.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 30 2012)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Catalog-SIG mailing list