Reviving PEP 470 - Removing External Hosting support on PyPI
While developing Warehouse, one of the things I wanted to get done was a final ruling on PEP 470. With that in mind I’d like to bring it back up for discussion and hopefully ultimately a ruling. Their are two major differences in this version of PEP 470, and I’d like to point them out explicitly. Removal of the “External Repository Discover” feature. I’ve been thinking about this for awhile, and I finally removed it. I’ve always been uncomfortable with this feature and I finally realized why it was. Essentially, the major use case for not hosting things on PyPI that I think PyPI can reasonably be expected to accommodate is people who cannot publish their software to the US for various reasons. At the time I came up with the solution I did, It was an attempt to placate the folks who were against PEP 470 while assuming very few people would ever actually use it, essentially a junk feature to push the PEP through. I think that the feature itself is a bad feature and I think it presents a poor experience for people who want to use it, so I’ve removed it from the PEP and instead focused the PEP on explicitly recommending that all installers should implement the ability to specify multiple repositories and deprecating and removing the ability for finding anything but files hosted by the repository itself on /simple/. I recognize this is a regression for anyone who *does* have concerns with uploading their projects to a server hosted in the US. If there is someone that has this concern, and is also willing to put in the effort and legwork required, I will happily collaborate with them to design a solution that both follows whatever legal requirements they might have, as well as provides a good experience for people using PyPI and pip. I have some rough ideas on what this could look like, but I think it’s really a separate discussion since I believe externally hosted files like we were is an overall bad experience for people and is largely a historic accident from how PyPI and Python packaging has evolved. I don’t want to derail this thread or PEP exploring these ideas (some of which I don’t even know if they would satisfy the requirements since it’s all dealing with legal jurisdictions other than my own), but i wanted to make explicit that someone who knows the legalities and is willing to put in the work can reach out to me. The other major difference is that I’ve shortened the time schedule from 6 months to 3 months. Given that authors are either going to upload their projects to PyPI or not and there is no longer a need to setup an external index I think a shorter time schedule is fine, especially since they will be given a script they can run that will spider their projects for any installable files and upload them to PyPI for them in a quick one shot deal that would require very little effort for them. Everything else in the PEP is basically the same except for rewordings. I do need a BDFL Delegate for this PEP, Richard does not have the time to do it and the other logical candidate for a PyPI centric PEP is myself, but I don’t feel it’s appropriate to BDFL Delegate my own PEP. You can see the PEP online at https://www.python.org/dev/peps/pep-0470/ (make sure it’s updated and you see the one that has Aug 26 2015 in it’s Post History). The PEP has also been inlined below. ----------------- Abstract ======== This PEP proposes the deprecation and removal of support for hosting files externally to PyPI as well as the deprecation and removal of the functionality added by PEP 438, particularly rel information to classify different types of links and the meta-tag to indicate API version. Rationale ========= Historically PyPI did not have any method of hosting files nor any method of automatically retrieving installables, it was instead focused on providing a central registry of names, to prevent naming collisions, and as a means of discovery for finding projects to use. In the course of time setuptools began to scrape these human facing pages, as well as pages linked from those pages, looking for things it could automatically download and install. Eventually this became the "Simple" API which used a similar URL structure however it eliminated any of the extraneous links and information to make the API more efficient. Additionally PyPI grew the ability for a project to upload release files directly to PyPI enabling PyPI to act as a repository in addition to an index. This gives PyPI two equally important roles that it plays in the Python ecosystem, that of index to enable easy discovery of Python projects and central repository to enable easy hosting, download, and installation of Python projects. Due to the history behind PyPI and the very organic growth it has experienced the lines between these two roles are blurry, and this blurring has caused confusion for the end users of both of these roles and this has in turn caused ire between people attempting to use PyPI in different capacities, most often when end users want to use PyPI as a repository but the author wants to use PyPI solely as an index. This confusion comes down to end users of projects not realizing if a project is hosted on PyPI or if it relies on an external service. This often manifests itself when the external service is down but PyPI is not. People will see that PyPI works, and other projects works, but this one specific one does not. They often times do not realize who they need to contact in order to get this fixed or what their remediation steps are. PEP 438 attempted to solve this issue by allowing projects to explicitly declare if they were using the repository features or not, and if they were not, it had the installers classify the links it found as either "internal", "verifiable external" or "unverifiable external". PEP 438 was accepted and implemented in pip 1.4 (released on Jul 23, 2013) with the final transition implemented in pip 1.5 (released on Jan 2, 2014). PEP 438 was successful in bringing about more people to utilize PyPI's repository features, an altogether good thing given the global CDN powering PyPI providing speed ups for a lot of people, however it did so by introducing a new point of confusion and pain for both the end users and the authors. By moving to using explicit multiple repositories we can make the lines between these two roles much more explicit and remove the "hidden" surprises caused by the current implementation of handling people who do not want to use PyPI as a repository. Key User Experience Expectations -------------------------------- #. Easily allow external hosting to "just work" when appropriately configured at the system, user or virtual environment level. #. Eliminate any and all references to the confusing "verifiable external" and "unverifiable external" distinction from the user experience (both when installing and when releasing packages). #. The repository aspects of PyPI should become *just* the default package hosting location (i.e. the only one that is treated as opt-out rather than opt-in by most client tools in their default configuration). Aside from that aspect, hosting on PyPI should not otherwise provide an enhanced user experience over hosting your own package repository. #. Do all of the above while providing default behaviour that is secure against most attackers below the nation state adversary level. Why Additional Repositories? ---------------------------- The two common installer tools, pip and easy_install/setuptools, both support the concept of additional locations to search for files to satisfy the installation requirements and have done so for many years. This means that there is no need to "phase" in a new flag or concept and the solution to installing a project from a repository other than PyPI will function regardless of how old (within reason) the end user's installer is. Not only has this concept existed in the Python tooling for some time, but it is a concept that exists across languages and even extending to the OS level with OS package tools almost universally using multiple repository support making it extremely likely that someone is already familiar with the concept. Additionally, the multiple repository approach is a concept that is useful outside of the narrow scope of allowing projects that wish to be included on the index portion of PyPI but do not wish to utilize the repository portion of PyPI. This includes places where a company may wish to host a repository that contains their internal packages or where a project may wish to have multiple "channels" of releases, such as alpha, beta, release candidate, and final release. This could also be used for projects wishing to host files which cannot be uploaded to PyPI, such as multi-gigabyte data files or, currently at least, Linux Wheels. Why Not PEP 438 or Similar? --------------------------- While the additional search location support has existed in pip and setuptools for quite some time support for PEP 438 has only existed in pip since the 1.4 version, and still has yet to be implemented in setuptools. The design of PEP 438 did mean that users still benefited for projects which did not require external files even with older installers, however for projects which *did* require external files, users are still silently being given either potentially unreliable or, even worse, unsafe files to download. This system is also unique to Python as it arises out of the history of PyPI, this means that it is almost certain that this concept will be foreign to most, if not all users, until they encounter it while attempting to use the Python toolchain. Additionally, the classification system proposed by PEP 438 has, in practice, turned out to be extremely confusing to end users, so much so that it is a position of this PEP that the situation as it stands is completely untenable. The common pattern for a user with this system is to attempt to install a project possibly get an error message (or maybe not if the project ever uploaded something to PyPI but later switched without removing old files), see that the error message suggests ``--allow-external``, they reissue the command adding that flag most likely getting another error message, see that this time the error message suggests also adding ``--allow-unverified``, and again issue the command a third time, this time finally getting the thing they wish to install. This UX failure exists for several reasons. #. If pip can locate files at all for a project on the Simple API it will simply use that instead of attempting to locate more. This is generally the right thing to do as attempting to locate more would erase a large part of the benefit of PEP 438. This means that if a project *ever* uploaded a file that matches what the user has requested for install that will be used regardless of how old it is. #. PEP 438 makes an implicit assumption that most projects would either upload themselves to PyPI or would update themselves to directly linking to release files. While a large number of projects did ultimately decide to upload to PyPI, some of them did so only because the UX around what PEP 438 was so bad that they felt forced to do so. More concerning however, is the fact that very few projects have opted to directly and safely link to files and instead they still simply link to pages which must be scraped in order to find the actual files, thus rendering the safe variant (``--allow-external``) largely useless. #. Even if an author wishes to directly link to their files, doing so safely is non-obvious. It requires the inclusion of a MD5 hash (for historical reasons) in the hash of the URL. If they do not include this then their files will be considered "unverified". #. PEP 438 takes a security centric view and disallows any form of a global opt in for unverified projects. While this is generally a good thing, it creates extremely verbose and repetitive command invocations such as:: $ pip install --allow-external myproject --allow-unverified myproject myproject $ pip install --allow-all-external --allow-unverified myproject myproject Multiple Repository/Index Support ================================= Installers SHOULD implement or continue to offer, the ability to point the installer at multiple URL locations. The exact mechanisms for a user to indicate they wish to use an additional location is left up to each individual implementation. Additionally the mechanism discovering an installation candidate when multiple repositories are being used is also up to each individual implementation, however once configured an implementation should not discourage, warn, or otherwise cast a negative light upon the use of a repository simply because it is not the default repository. Currently both pip and setuptools implement multiple repository support by using the best installation candidate it can find from either repository, essentially treating it as if it were one large repository. Installers SHOULD also implement some mechanism for removing or otherwise disabling use of the default repository. The exact specifics of how that is achieved is up to each individual implementation. Installers SHOULD also implement some mechanism for whitelisting and blacklisting which projects a user wishes to install from a particular repository. The exact specifics of how that is achieved is up to each individual implementation. Deprecation and Removal of Link Spidering ========================================= A new hosting mode will be added to PyPI. This hosting mode will be called ``pypi-only`` and will be in addition to the three that PEP 438 has already given us which are ``pypi-explicit``, ``pypi-scrape``, ``pypi-scrape-crawl``. This new hosting mode will modify a project's simple api page so that it only lists the files which are directly hosted on PyPI and will not link to anything else. Upon acceptance of this PEP and the addition of the ``pypi-only`` mode, all new projects will be defaulted to the PyPI only mode and they will be locked to this mode and unable to change this particular setting. An email will then be sent out to all of the projects which are hosted only on PyPI informing them that in one month their project will be automatically converted to the ``pypi-only`` mode. A month after these emails have been sent any of those projects which were emailed, which still are hosted only on PyPI will have their mode set permanently to ``pypi-only``. At the same time, an email will be sent to projects which rely on hosting external to PyPI. This email will warn these projects that externally hosted files have been deprecated on PyPI and that in 3 months from the time of that email that all external links will be removed from the installer APIs. This email **MUST** include instructions for converting their projects to be hosted on PyPI and **MUST** include links to a script or package that will enable them to enter their PyPI credentials and package name and have it automatically download and re-host all of their files on PyPI. This email **MUST** also include instructions for setting up their own index page. This email must also contain a link to the Terms of Service for PyPI as many users may have signed up a long time ago and may not recall what those terms are. Finally this email must also contain a list of the links registered with PyPI where we were able to detect an installable file was located. Two months after the initial email, another email must be sent to any projects still relying on external hosting. This email will include all of the same information that the first email contained, except that the removal date will be one month away instead of three. Finally a month later all projects will be switched to the ``pypi-only`` mode and PyPI will be modified to remove the externally linked files functionality. Summary of Changes ================== Repository side --------------- #. Deprecate and remove the hosting modes as defined by PEP 438. #. Restrict simple API to only list the files that are contained within the repository. Client side ----------- #. Implement multiple repository support. #. Implement some mechanism for removing/disabling the default repository. #. Deprecate / Remove PEP 438 Impact ====== To determine impact, we've looked at all projects using a method of searching PyPI which is similar to what pip and setuptools use and searched for all files available on PyPI, safely linked from PyPI, unsafely linked from PyPI, and finally unsafely available outside of PyPI. When the same file was found in multiple locations it was deduplicated and only counted it in one location based on the following preferences: PyPI > Safely Off PyPI > Unsafely Off PyPI. This gives us the broadest possible definition of impact, it means that any single file for this project may no longer be visible by default, however that file could be years old, or it could be a binary file while there is a sdist available on PyPI. This means that the *real* impact will likely be much smaller, but in an attempt not to miscount we take the broadest possible definition. At the time of this writing there are 65,232 projects hosted on PyPI and of those, 59 of them rely on external files that are safely hosted outside of PyPI and 931 of them rely on external files which are unsafely hosted outside of PyPI. This shows us that 1.5% of projects will be affected in some way by this change while 98.5% will continue to function as they always have. In addition, only 5% of the projects affected are using the features provided by PEP 438 to safely host outside of PyPI while 95% of them are exposing their users to Remote Code Execution via a Man In The Middle attack. Data Sovereignty ================ In the discussions around previous versions of this PEP, one of the key use cases for wanting to host files externally to PyPI was due to data sovereignty requirements for people living in jurisdictions outside of the USA, where PyPI is currently hosted. The author of this PEP is not blind to these concerns and realizes that this PEP represents a regression for the people that have these concerns, however the current situation is presenting an extremely poor user experience and the feature is only being used by a small percentage of projects. In addition, the data sovereignty problems requires familarity with the laws outside of the home jurisdiction of the author of this PEP, who is also the principal developer and operator of PyPI. For these reasons, a solution for the problem of data sovereignty has been deferred and is considered outside of the scope for this PEP. If someone for whom the issue of data sovereignty matters to them wishes to put forth the effort, then at that time a system can be designed, implemented, and ultimately deployed and operated that would satisfy both the needs of non US users that cannot upload their projects to a system on US soil and the quality of user experience that is attempted to be created on PyPI. Rejected Proposals ================== Allow easier discovery of externally hosted indexes --------------------------------------------------- A previous version of this PEP included a new feature added to both PyPI and installers that would allow project authors to enter into PyPI a list of URLs that would instruct installers to ignore any files uploaded to PyPI and instead return an error telling the end user about these extra URLs that they can add to their installer to make the installation work. This idea is rejected because it provides a similar painful end user experience where people will first attempt to install something, get an error, then have to re-run the installation with the correct options. Keep the current classification system but adjust the options ------------------------------------------------------------- This PEP rejects several related proposals which attempt to fix some of the usability problems with the current system but while still keeping the general gist of PEP 438. This includes: * Default to allowing safely externally hosted files, but disallow unsafely hosted. * Default to disallowing safely externally hosted files with only a global flag to enable them, but disallow unsafely hosted. * Continue on the suggested path of PEP 438 and remove the option to unsafely host externally but continue to allow the option to safely host externally. These proposals are rejected because: * The classification system introduced in PEP 438 in an entirely unique concept to PyPI which is not generically applicable even in the context of Python packaging. Adding additional concepts comes at a cost. * The classification system itself is non-obvious to explain and to pre-determine what classification of link a project will require entails inspecting the project's ``/simple/<project>/`` page, and possibly any URLs linked from that page. * The ability to host externally while still being linked for automatic discovery is mostly a historic relic which causes a fair amount of pain and complexity for little reward. * The installer's ability to optimize or clean up the user interface is limited due to the nature of the implicit link scraping which would need to be done. This extends to the ``--allow-*`` options as well as the inability to determine if a link is expected to fail or not. * The mechanism paints a very broad brush when enabling an option, while PEP 438 attempts to limit this with per package options. However a project that has existed for an extended period of time may often times have several different URLs listed in their simple index. It is not unusual for at least one of these to no longer be under control of the project. While an unregistered domain will sit there relatively harmless most of the time, pip will continue to attempt to install from it on every discovery phase. This means that an attacker simply needs to look at projects which rely on unsafe external URLs and register expired domains to attack users. Implement this PEP, but Do Not Remove the Existing Links -------------------------------------------------------- This is essentially the backwards compatible version of this PEP. It attempts to allow people using older clients, or clients which do not implement this PEP to continue on as if nothing had changed. This proposal is rejected because the vast bulk of those scenarios are unsafe uses of the deprecated features. It is the opinion of this PEP that silently allowing unsafe actions to take place on behalf of end users is simply not an acceptable solution. Copyright ========= This document has been placed in the public domain. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Donald Stufft <donald@stufft.io> writes:
Removal of the “External Repository Discover” feature. […] I think that the feature itself is a bad feature and I think it presents a poor experience for people who want to use it, so I’ve removed it from the PEP and instead focused the PEP on explicitly recommending that all installers should implement the ability to specify multiple repositories and deprecating and removing the ability for finding anything but files hosted by the repository itself on /simple/.
+1, thank you for this improvement. -- \ “Human reason is snatching everything to itself, leaving | `\ nothing for faith.” —Bernard of Clairvaux, 1090–1153 CE | _o__) | Ben Finney
On Wed, 26 Aug 2015 21:24:05 -0400 Donald Stufft <donald@stufft.io> wrote:
At the time of this writing there are 65,232 projects hosted on PyPI and of those, 59 of them rely on external files that are safely hosted outside of PyPI and 931 of them rely on external files which are unsafely hosted outside of PyPI. This shows us that 1.5% of projects will be affected in some way by this change while 98.5% will continue to function as they always have. In addition, only 5% of the projects affected are using the features provided by PEP 438 to safely host outside of PyPI while 95% of them are exposing their users to Remote Code Execution via a Man In The Middle attack.
Out of curiosity, have you tried to determine if those Unsafely Off PyPI projects were either still active or "popular" ? The PEP looks fine anyway, good job :) Regards Antoine.
On August 27, 2015 at 4:26:28 AM, Antoine Pitrou (solipsis@pitrou.net) wrote:
Out of curiosity, have you tried to determine if those Unsafely Off PyPI projects were either still active or "popular" ?
10 Months ago I attempted to figure out how popular or active those projects were, I didn’t redo those numbers but I can if people think it should be in the PEP. I felt the previous versions of the PEP were a bit too much of “well if you look at the data this way you get X but if you look at it this way you get Y” and I tried to narrow it down to just a single measure of impact that took the broadest interpretation of what impact would mean. Anyways, 10 months ago I parsed the log files for a single day and I looked how often the /simple/foo/ pages got hit for every project on PyPI. This isn’t a great metric since people running an install on multiple machines or using something like tox to run it multiple times on the same machine will be counted multiple times, however I got data that looked like this: Top Externally Hosted Projects by Requests ------------------------------------------- This is determined by looking at the number of requests the ``/simple/<project>/`` page had gotten in a single day. The total number of requests during that day was 10,623,831. ============================== ======== Project Requests ============================== ======== PIL 63869 Pygame 2681 mysql-connector-python 1562 pyodbc 724 elementtree 635 salesforce-python-toolkit 316 wxPython 295 PyXML 251 RBTools 235 python-graph-core 123 cElementTree 121 ============================== ======== Top Externally Hosted Projects by Unique IPs -------------------------------------------- This is determined by looking at the IP addresses of requests the ``/simple/<project>/`` page had gotten in a single day. The total number of unique IP addresses during that day was 124,604. ============================== ========== Project Unique IPs ============================== ========== PIL 4553 mysql-connector-python 462 Pygame 202 pyodbc 181 elementtree 166 wxPython 126 RBTools 114 PyXML 87 salesforce-python-toolkit 76 pyDes 76 ============================== ========== I don’t know what those number look like today, but I suspect that they’d be even more weighted towards PIL being the primary project that actual users are pulling down that isn’t hosted on PyPI. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 27 August 2015 at 02:24, Donald Stufft <donald@stufft.io> wrote:
While developing Warehouse, one of the things I wanted to get done was a final ruling on PEP 470. With that in mind I’d like to bring it back up for discussion and hopefully ultimately a ruling.
Their are two major differences in this version of PEP 470, and I’d like to point them out explicitly.
Removal of the “External Repository Discover” feature. I’ve been thinking about this for awhile, and I finally removed it. I’ve always been uncomfortable with this feature and I finally realized why it was. Essentially, the major use case for not hosting things on PyPI that I think PyPI can reasonably be expected to accommodate is people who cannot publish their software to the US for various reasons. At the time I came up with the solution I did, It was an attempt to placate the folks who were against PEP 470 while assuming very few people would ever actually use it, essentially a junk feature to push the PEP through. I think that the feature itself is a bad feature and I think it presents a poor experience for people who want to use it, so I’ve removed it from the PEP and instead focused the PEP on explicitly recommending that all installers should implement the ability to specify multiple repositories and deprecating and removing the ability for finding anything but files hosted by the repository itself on /simple/.
+1 on the proposal. Agreed that while the removal of the external hosting/discovery feature is a regression, it's one we need to make in order to provide a clean baseline and a good user experience. Encouraging people to set up an external index if they have off-PyPI hosting requirements seems entirely reasonable. But I would say (having tried to do this for testing and personal use in the past) it's not easy to find good documentation on how to set up an external index (when you're starting from a simple web host). Having a "how to set up a PyPI-style simple index" document, linked from the PEP (and ultimately from the docs) would be a useful resource for people in general, and a good starting point for any discussion with people who have requirements for not hosting on PyPI. Probably putting such a document in https://packaging.python.org/en/latest/distributing.html would make sense. Paul
On August 27, 2015 at 4:28:34 AM, Paul Moore (p.f.moore@gmail.com) wrote:
Encouraging people to set up an external index if they have off-PyPI hosting requirements seems entirely reasonable. But I would say (having tried to do this for testing and personal use in the past) it's not easy to find good documentation on how to set up an external index (when you're starting from a simple web host). Having a "how to set up a PyPI-style simple index" document, linked from the PEP (and ultimately from the docs) would be a useful resource for people in general, and a good starting point for any discussion with people who have requirements for not hosting on PyPI. Probably putting such a document in https://packaging.python.org/en/latest/distributing.html would make sense.
We can add documentation, it’s basically “stick some files in a directory, run python -m http.server”, adjusted for someone’s need for performance and “production” readiness. We make it pretty easy to make one with —find-links, any old web server with an auto index and a directory full of files will do it. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 27 August 2015 at 12:53, Donald Stufft <donald@stufft.io> wrote:
We can add documentation, it’s basically “stick some files in a directory, run python -m http.server”, adjusted for someone’s need for performance and “production” readiness. We make it pretty easy to make one with —find-links, any old web server with an auto index and a directory full of files will do it.
The devil's in the details, though. * Do I need to use canonical names for packages in my index? Assuming so, what *are* the rules for a "canonical name"? * I need a main page with package name -> package page links on it. The link text is what needs to be the package name, yes? * All that matters on the package page is the targets of all the links - these should point to downloadable files, yes? It shouldn't be hard to write these up (and as I said, the PEP proposes to do so in the email sent to package owners, all I'm suggesting is store that information somewhere permanent as well). And one other point - the way the PEP talks, I believe we're suggesting people set something up that works for --extra-index-url, *not* --find-links. Pip has two different discovery mechanisms here, and I think we need to be careful over that. The PEP talks about an external *index*, which I interpret as --extra-index-url. Maybe there's another point here - the PEP should say "installers may have additional discovery methods, but they *MUST* clearly state which one corresponds to the index specification method described in this PEP". Paul
On 30 August 2015 at 05:41, Paul Moore <p.f.moore@gmail.com> wrote:
On 27 August 2015 at 12:53, Donald Stufft <donald@stufft.io> wrote:
We can add documentation, it’s basically “stick some files in a directory, run python -m http.server”, adjusted for someone’s need for performance and “production” readiness. We make it pretty easy to make one with —find-links, any old web server with an auto index and a directory full of files will do it.
The devil's in the details, though.
* Do I need to use canonical names for packages in my index? Assuming so, what *are* the rules for a "canonical name"?
This is a good point - even if folks are hosting externally, we still want them to claim at least a top-level name on the global index. If they'd like to just claim a single name, and not worry about PyPI beyond that, then a zc style usage of namespace packages likely makes sense, and only claim additional names on PyPI if they want to promote a package out of their custom namespace and into the global one. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 30 August 2015 at 05:48, Nick Coghlan <ncoghlan@gmail.com> wrote:
The devil's in the details, though.
* Do I need to use canonical names for packages in my index? Assuming so, what *are* the rules for a "canonical name"?
This is a good point - even if folks are hosting externally, we still want them to claim at least a top-level name on the global index.
Although what you say is *also* a good point, my original question was whether I need to make my links use all lowercase and whichever of dash or underscore it is that pip treats as the canonical form (I can never remember). The point I was making is to make sure that people with an existing non-PyPI file structure can set up an index easily. They'll quite likely need to build some sort of static webpage for that (consider a project hosted on something like sourceforge, setting up an index to replace their current external links on a cheap provider that only offers static webpage hosting). If it's an older project, it's quite possible they will *not* use the canonical form of the project name in their filenames, so they'll need to fix the name up or they'll get obscure "pip can't find your project" errors. Regarding your point, though, looking at the wider picture there are *three* classes of project to consider: 1. Projects registered and hosted on PyPI 2. Projects registered on PyPI but hosted elsewhere 3. Projects neither hosted nor registered on PyPI (There's also projects in category 1, but with some (presumably historical) files hosted elsewhere, but I'll ignore those for now, as for most purposes they can be treated as category 1) Category 3 could quite easily be massive (private indexes used by companies, for example) but is irrelevant for the purposes of this PEP. Category 1 is straightforward - the PEP is a 100% clear win there, as the overhead of unneeded link scraping is removed. The problem is existing category 2 projects, and new projects that want to join that category. We need to ensure that hosting off PyPI remains a viable option for such projects, which is why it's important to document how to create an index. But as you point out, we *also* need to make sure people don't think "what's the point in registering on PyPI if I'm setting up my own index anyway?" (and hence choose category 3 rather than category 2). Maybe there should be a further FAQ in the PEP - "If I'm setting up my own index for my files, why should I bother registering my project on PyPI at all?" I suspect this is the real question at the root of a lot of the objections to the PEP. For people hosting off PyPI, the current arrangement (ignoring the UX issues) means that "pip install foo" works for them. We're now proposing to remove that benefit, and while it's not the *only* benefit of being registered on PyPI, maybe a reminder of what the other benefits are would put this into perspective. Paul
On August 30, 2015 at 7:48:46 AM, Paul Moore (p.f.moore@gmail.com) wrote:
Maybe there should be a further FAQ in the PEP - "If I'm setting up my own index for my files, why should I bother registering my project on PyPI at all?" I suspect this is the real question at the root of a lot of the objections to the PEP. For people hosting off PyPI, the current arrangement (ignoring the UX issues) means that "pip install foo" works for them. We're now proposing to remove that benefit, and while it's not the *only* benefit of being registered on PyPI, maybe a reminder of what the other benefits are would put this into perspective.
Added to PEP 470 - https://hg.python.org/peps/rev/2794fe98567d ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 27.08.2015 03:24, Donald Stufft wrote:
While developing Warehouse, one of the things I wanted to get done was a final ruling on PEP 470. With that in mind I’d like to bring it back up for discussion and hopefully ultimately a ruling.
Their are two major differences in this version of PEP 470, and I’d like to point them out explicitly.
Removal of the “External Repository Discover” feature. I’ve been thinking about this for awhile, and I finally removed it. I’ve always been uncomfortable with this feature and I finally realized why it was. Essentially, the major use case for not hosting things on PyPI that I think PyPI can reasonably be expected to accommodate is people who cannot publish their software to the US for various reasons. At the time I came up with the solution I did, It was an attempt to placate the folks who were against PEP 470 while assuming very few people would ever actually use it, essentially a junk feature to push the PEP through. I think that the feature itself is a bad feature and I think it presents a poor experience for people who want to use it, so I’ve removed it from the PEP and instead focused the PEP on explicitly recommending that all installers should implement the ability to specify multiple repositories and deprecating and removing the ability for finding anything but file s hoste d by the repository itself on /simple/.
This feature was part of a compromise to reach consensus on the removal of external hosting. While I don't think the details of the repository discovery need to be part of PEP 470, I do believe that the PEP should continue to support the idea of having a way for package managers to easily find external indexes for a particular package and not outright reject it. Instead, the PEP should highlight this compromise and defer it to a separate PEP. More comments: * The user experience argument you give in the PEP 470 for rejecting the idea is not really sound: the purpose of the discovery feature is to provide a *better user experience* than an error message saying that a package cannot be found and requiring the user to do research on the web to find the right URLs. Package managers can use the information about the other indexes they receive from PyPI to either present them to the user to use/install them or to even directly go there to find the packages. * The section on data sovereignty should really be removed or reworded. PEPs should be neutral and not imply political views, in particular not make it look like the needs of US users of PyPI are more important then those of non-US users. Using "poor user experience" as argument here is really not appropriate. PyPI is a central part of the Python community infrastructure and should be regarded as a resource for the world-wide community. There is no reason to assume that we cannot have several PyPI installations around the world to address any such issues. * It is rather unusual to have a PEP switch from a compromise solution to a rejection of the compromise this late in the process. I will soon be starting a PSF working group to address some of the reasons why people cannot upload packages to PyPI. The intent is to work on the PyPI terms to make them more package author friendly. Anyone interested to join ?
I recognize this is a regression for anyone who *does* have concerns with uploading their projects to a server hosted in the US. If there is someone that has this concern, and is also willing to put in the effort and legwork required, I will happily collaborate with them to design a solution that both follows whatever legal requirements they might have, as well as provides a good experience for people using PyPI and pip. I have some rough ideas on what this could look like, but I think it’s really a separate discussion since I believe externally hosted files like we were is an overall bad experience for people and is largely a historic accident from how PyPI and Python packaging has evolved. I don’t want to derail this thread or PEP exploring these ideas (some of which I don’t even know if they would satisfy the requirements since it’s all dealing with legal jurisdictions other than my own), but i wanted to make explicit that someone who knows the legalities and is willing to put in the work can reach out to me.
We can start a separate thread about discovery, using a separate PEP to formalize it. This could be as simply as having a flag saying "use the download URL as index and offer this to the user trying to find the package distribution files".
The other major difference is that I’ve shortened the time schedule from 6 months to 3 months. Given that authors are either going to upload their projects to PyPI or not and there is no longer a need to setup an external index I think a shorter time schedule is fine, especially since they will be given a script they can run that will spider their projects for any installable files and upload them to PyPI for them in a quick one shot deal that would require very little effort for them.
It would be good to have both PEP 470 and the discovery PEP available at the same time.
Everything else in the PEP is basically the same except for rewordings.
I do need a BDFL Delegate for this PEP, Richard does not have the time to do it and the other logical candidate for a PyPI centric PEP is myself, but I don’t feel it’s appropriate to BDFL Delegate my own PEP.
You can see the PEP online at https://www.python.org/dev/peps/pep-0470/ (make sure it’s updated and you see the one that has Aug 26 2015 in it’s Post History).
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 27 2015)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2015-08-19: Released mxODBC 3.3.5 ... http://egenix.com/go82 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On August 27, 2015 at 6:57:17 AM, M.-A. Lemburg (mal@egenix.com) wrote:
This feature was part of a compromise to reach consensus on the removal of external hosting. While I don't think the details of the repository discovery need to be part of PEP 470, I do believe that the PEP should continue to support the idea of having a way for package managers to easily find external indexes for a particular package and not outright reject it.
Instead, the PEP should highlight this compromise and defer it to a separate PEP.
I’ve never thought that particular API was actually a good idea, I think it’s a poor end user experience because it invokes the same sort of “well if you knew what I needed to type to make it work, why didn’t you just do it for me” reaction as PEP 438 does. The user experience will be something like: $ pip install something-not-hosted-on-pypi ... ERROR: Can not find something-not-hosted-on-pypi, it is not hosted on PyPI, it's author has indicated that it found at: * https://pypi.example.com/all/ : All Platforms * https://pypi.example.com/ubuntu-trust/ : Ubuntu Trusty To enable, please invoke pip by added --extra-index-url <An URL from Above> $ pip install --extra-index-url https://pypi.example.com/all/ something-not-hosted-on-pypi This leaves the user feeling annoyed that we didn’t just search those locations by default. I truly think it is a bad experience and I only ever added it because I wanted the discussion to be over with and I was trying to placate people by giving them a bad feature. Instead, I think that we can design a solution that works by default and will work without the end user needing to do anything at all. However, I’m not an expert in the US laws (the country I live in and have lived in all my life) and I’m especially not an expert in the laws of countries other than the US. This means I don’t fully understand the issue that needs to be solved. In addition to that, I only have so many hours in the day and I need to find a way to prioritize what I’m working on, the data sovereignty issue may only affect people who do not live in the US, however it does not affect everyone who is outside of the US. Most projects from authors outside of the US are perfectly fine with hosting their projects within the US and it is a minority of projects who cannot or will not for one reason or another. I am happy to work with someone impacted by the removal of offsite to design and implement a solution to these issues that provides an experience to those people that matches the experience for people willing or able to host in the US. If the PSF wants to hire someone to do this instead of relying on someone affected to volunteer, I’m also happy to work with them. However, I do not think it’s fair to everyone else, inside and outside of the US, to continue to confuse and infuriate them while we wait for someone to step forward. I’m one person and I’m the only person who gets paid dedicated time to work on Python’s packaging, but I’m spread thin and I have a backlog a mile long, if I don’t prioritize the things that affect most people over the things that affect a small number of people, and leave it up to the people who need an edge case feature things that are already blocked on me are going to languish even further. Finally, I wasn’t sure if this should be a new PEP or if it should continue as PEP 470, I talked to Nick and he suggested it would be fine to just continue on using PEP 470 for this.
More comments:
* The user experience argument you give in the PEP 470 for rejecting the idea is not really sound: the purpose of the discovery feature is to provide a *better user experience* than an error message saying that a package cannot be found and requiring the user to do research on the web to find the right URLs. Package managers can use the information about the other indexes they receive from PyPI to either present them to the user to use/install them or to even directly go there to find the packages.
It’s a slightly better user experience than a flat out error yes, but it’s a much worse user experience than what people using PyPI deserve. If we’re going to solve a problem, I’d much rather do it correctly in a way that doesn’t frustrate end users and gives them a great experience rather than something that is going to annoy them and be a “death of a thousand cuts” to people hosting off of PyPI. I think that the compromise feature I added in PEP 470 will be the same sort of compromise feature we had in PEP 438, something that on the tin looks like a compromise, enough to get the folks who need/want it to think the PEP is supporting their use case, but in reality is just another cut in a “death of a thousand cuts” to hosting outside of the US (or off of PyPI). I don’t want to continue to implement half solutions that I know are going to be painful to end users with the idea in mind that I know the pain is going to drive people away from those solutions and towards the one good option we have currently. I’d rather be honest to everyone involved about what is truly supported and focus on making that a great experience for everyone.
* The section on data sovereignty should really be removed or reworded. PEPs should be neutral and not imply political views, in particular not make it look like the needs of US users of PyPI are more important then those of non-US users. Using "poor user experience" as argument here is really not appropriate.
I’m perhaps not great at wording things. I don’t think it’s US users vs Non-US users, since plenty of people outside of the US are perfectly happy and able to upload their projects to PyPI. I think it’s more of “the needs of the many outweigh the needs of the few”, but with an explicit callout that if one of those few want to come forward and work with me, we can get something in place that really solves that problem in a user friendly way. Perhaps you could suggest a rewording that you think says the above? I don’t see a political view being implied nor do I see the needs of US users being prioritized over the needs of non-US users.
PyPI is a central part of the Python community infrastructure and should be regarded as a resource for the world-wide community. There is no reason to assume that we cannot have several PyPI installations around the world to address any such issues.
I don’t assume that we can’t do something like that, one of my ideas for solving this issue looks something like that in fact. However without someone who cares willing to step forward and bring to the table an expertise in what will satisfy those legalities or not and with a willingness to pitch it and contribute to help make such a solution a reality I don’t feel comfortable spending any time a solution that may not even actually solve the problem at hand. I don’t think the solution that was in the PEP is that solution though. I think it was a poison pill that I fully expected to have a terrible experience which would just force people to host on PyPI or have their project suffer. Repositories hosted by random people end up making people’s installs extremely unreliable. We’ve made great strides in making it so that ``pip install <foo>`` rarely fails because something is down, and I think blessing a feature or continuing to support one that doesn’t aid in making that more the case is harmful to the community as a whole. Honestly, if someone comes forward now-ish or the PSF tells me now-ish they will hire someone to figure out the intricacies of this matter, I’ll even put this PEP back on hold while we design this feature. I have no particularly animosity towards hosting outside of the US, I just don’t have the expertise or time to design and implement it on my own.
* It is rather unusual to have a PEP switch from a compromise solution to a rejection of the compromise this late in the process.
I will soon be starting a PSF working group to address some of the reasons why people cannot upload packages to PyPI. The intent is to work on the PyPI terms to make them more package author friendly. Anyone interested to join ?
I don’t have any particular insight into the ToS nor do I really care what it says as long as it grants PyPI everything it needs to function. I should probably be a part of the WG though since it involves PyPI and I’m really the only person working on PyPI. I wouldn’t want to adopt a ToS for PyPI without Van’s approval though. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 27 August 2015 at 12:51, Donald Stufft <donald@stufft.io> wrote:
Perhaps you could suggest a rewording that you think says the above? I don’t see a political view being implied nor do I see the needs of US users being prioritized over the needs of non-US users.
Just for perspective, I didn't read that section as prioritising US users or authors in any way. It does reflect the reality that PyPI is hosted in the US, but I think it's both fair and necessary to point that out and explain the implications. It may be that the comment "If someone for whom the issue of data sovereignty matters to them wishes to put forth the effort..." reflects a little too much your frustration with not being able to get anywhere with a solution to this issue. Maybe rephrase that as something along the lines of "The data sovereignty issue will need to be addressed by someone with an understanding of the restrictions and constraints involved. As the author of this PEP does not have that expertise, it should be addressed in a separate PEP"? Paul
On August 27, 2015 at 8:19:25 AM, Paul Moore (p.f.moore@gmail.com) wrote:
Just for perspective, I didn't read that section as prioritising US users or authors in any way. It does reflect the reality that PyPI is hosted in the US, but I think it's both fair and necessary to point that out and explain the implications.
Right, and the reality is also that the only people who are currently working on it consistently (ever since Richard has stepped back to pursue things more interesting to him) are also based in the US. Basically just me with occasional contributions from other people, but primarily me. I don’t have metrics on legacy PyPI because it’s in Mercurial, but you can explore Stacklytics [1][2] for Warehouse.
It may be that the comment "If someone for whom the issue of data sovereignty matters to them wishes to put forth the effort..." reflects a little too much your frustration with not being able to get anywhere with a solution to this issue. Maybe rephrase that as something along the lines of "The data sovereignty issue will need to be addressed by someone with an understanding of the restrictions and constraints involved. As the author of this PEP does not have that expertise, it should be addressed in a separate PEP”?
Thanks, I’ve switched to your suggested wording [3]. [1] http://stackalytics.com/?release=all&project_type=pypa-group&metric=loc&module=warehouse [2] http://stackalytics.com/?release=all&project_type=pypa-group&metric=commits&module=warehouse [3] https://hg.python.org/peps/rev/6c8a8f29a798 ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/27/2015 07:51 AM, Donald Stufft wrote:
This leaves the user feeling annoyed that we didn’t just search those locations by default. I truly think it is a bad experience and I only ever added it because I wanted the discussion to be over with and I was trying to placate people by giving them a bad feature
I don't understand the sensibility here: an error message which tells me "not hosted on PyPI, try 'pip...' instead" seems like a *good* UX to me: Having a tool which respectw its default policy ("trust only PyPI") while giving me the information I need to off-road when needed is a good balance. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJV3x+AAAoJEPKpaDSJE9HYE+IP/0LiQ+TwRnDvAcDsRG9wXPWd l0D2jnw6uGkJez5CJD+4JA9x3SRabixY+K1VRffgnG8NuR0omfioJbAFXyQVV9g1 ymUEZzJG5O8kd7v+iIvrguu/dGu1hAH9nwsLQwAkPdrUSfB6YFIfBmJfpb3vqJAx Frf6A2zAedwLGTB23XFajJBYpDXZGjwj1+N6zedDHvC0xfN+fRW3jyYwaTJYklDU y7HZRuSIOuy6mRgjwi73iNsSexY+jcIyKWdh4msD6+BLge8X1/8BuDtA5Q5agMrH ZxrlHaVv5AxtDks6lYzIhSocK2D28NiU2IBbha8OC9NUnmJdNHVHmvmR4UNTjYTY cP1+bi3cMQ28LVcuGffXoV4HlydpOf1QKIOaK6fwecr6ooIv2Y0/BJrSElgCrBZ6 tx3nP/uHd6R5GoxLl88ZyT7oFsoPQUVSsWZTc+1hnxI7PskvTAZXogUJShO0um+9 hKXjmc911fGKMszNk8xnagUMWJVimuBwLZwsVAjg9s0D5Xfq/LRS3qzs88Dwmb5C FPpaJINqgSfWRkiivgIsO421PUuX+vyjLr13vcfCaN5TCwHQujgOe+6PFqOjjVNC UONyy98A3RzdUuBuRruwzFUkaBRjqra7TMQrFzhHFIwOeiY/T2dMTRnMfAlC+TVO hv5zjuS8U9/WqpiWH2zq =a10G -----END PGP SIGNATURE-----
On August 27, 2015 at 10:33:15 AM, Tres Seaver (tseaver@palladion.com) wrote:
On 08/27/2015 07:51 AM, Donald Stufft wrote:
This leaves the user feeling annoyed that we didn’t just search those locations by default. I truly think it is a bad experience and I only ever added it because I wanted the discussion to be over with and I was trying to placate people by giving them a bad feature
I don't understand the sensibility here: an error message which tells me "not hosted on PyPI, try 'pip...' instead" seems like a *good* UX to me: Having a tool which respectw its default policy ("trust only PyPI") while giving me the information I need to off-road when needed is a good balance.
Given my experience dealing with pip’s users and the fallout of PEP 438, the very next question we’d get after implementing that UI will either be “If pip knows what I need to do, why can’t it just do it for me instead of making me type it out again” OR “Give me a flag so I can just automatically accept every externally hosted index”. Both of these asks are completely logical from an end user who doesn’t understand why the situation is the way it is, but are also essentially “let me be insecure all the time implicitly” flags. On the other hand, if we just remove it then we can explain that we used to support an insecure method of finding links, but that we no longer support it. The difference here is that there is no bread crumb of “here’s some information that pip obviously knows, because it’s telling it to you” to lead people to ask for something to opt into a “global insecure” flag. We have a clear answer that doesn’t leave room for argument: “We no longer get that information from PyPI so we cannot use it”. I think it’s a bad API because I think it’s going to cause frustration with people, particularly that pip is making them do extra leg work to approve or type in an repository URL. In addition, the discovery mechanism will only be in new versions of pip, however only about a third of our users upgrade quickly after a release (see: https://caremad.io/2015/04/a-year-of-pypi-downloads/) so the error case is going to be happening with the vast bulk of users anyways (The “Unknown” in that graph is older than 1.4). I also think it’s a bad experience because you’re mandating that they are lowering the “uptime” of any particular installation set that includes an external repository unless every repository has a 100% uptime. This is because you’re adding new single points of failures into a system. PyPI has had a 99.94% uptime over the last year which corresponds with 5 1/2 hours of downtime. Let’s assume that’s a rough average for what we can expect, If someone adds a single additional repository then the uptime of the system as a whole becomes 99.88% (or X hours of downtime), a third repository brings it to 99.82% (X hours), a fourth brings it to 99.76% (X hours). I think this is a conservative estimate of what the affects of the downtime would be. On the other hand, here’s what I consider a good experience which is possible if my assumptions about what is acceptable for data sovereignty are correct: Project “Foo” doesn’t want to host their projects in the US for $REASONS, they go to https://pypi.python.org/ and register their project, when registering they select to have their uploads hosted in the EU. Anytime they upload their files to https://pypi.python.org instead of storing them in a bucket in us-west-2, PyPI checks for their preferences, sees they have selected the EU and instead stores their files in eu-west-1 (Ireland). User “Jill” wants to install Project “Foo” and she is using pip 1.5.6 from her Debian operating system. When she types in ``pip install Foo`` pip goes to https://pypi.python.org/simple/foo/ gets a list of files which have been hosted in the EU. Without any updates or changes required on her end, pip downloads these files and installs them. Here’s the thing though, which I’ve been saying: I don’t know the laws and I don’t think it’s reasonable to expect me to learn the laws for all these other countries. There are open questions on how to actually implement this. For example, what exactly are we trying to achieve? If we’re trying to protect against the US government compelling the hosting company to do something, then you’re pretty much boned because if the files were hosted in the EU you still have the fact that it’d be a service controlled by a US Non Profit, ran by volunteers that live in the US, developed by someone who lives in the US who is employed by someone who lives in the US. If we’re trying to comply with some sort of data locality laws like https://en.wikipedia.org/wiki/Data_Protection_Directive does OSS even count as “personal data”? If it does, then does uploading it to https://pypi.python.org/ which is located in the US but storing and hosting it from the EU satisfy the requirements? What about putting it behind Fastly (another US company), when a US user requests those files can it route them and cache them in a US Datacenter? Is it OK to have it linked from https://pypi.python.org/ (Again, hosted in the US) or do we need a whole separate repository to handle these files? I think we can make this a great experience, but it is it’s own discussion and it needs to include stakeholders who actually know what the requirements are. I need someone who can put forth some effort into making it a reality instead of expecting me to do it all. If nobody wants to put in any effort to make it happen, maybe it’s not actually that important to them? ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 27.08.2015 13:51, Donald Stufft wrote:
On August 27, 2015 at 6:57:17 AM, M.-A. Lemburg (mal@egenix.com) wrote:
This feature was part of a compromise to reach consensus on the removal of external hosting. While I don't think the details of the repository discovery need to be part of PEP 470, I do believe that the PEP should continue to support the idea of having a way for package managers to easily find external indexes for a particular package and not outright reject it.
Instead, the PEP should highlight this compromise and defer it to a separate PEP.
I’ve never thought that particular API was actually a good idea, I think it’s a poor end user experience because it invokes the same sort of “well if you knew what I needed to type to make it work, why didn’t you just do it for me” reaction as PEP 438 does. The user experience will be something like:
$ pip install something-not-hosted-on-pypi ... ERROR: Can not find something-not-hosted-on-pypi, it is not hosted on PyPI, it's author has indicated that it found at:
* https://pypi.example.com/all/ : All Platforms * https://pypi.example.com/ubuntu-trust/ : Ubuntu Trusty
To enable, please invoke pip by added --extra-index-url <An URL from Above>
Uhm, no :-) This would be a more user friendly way of doing it: $ pip install something-not-hosted-on-pypi ... I'm sorry, but we cannot find something-not-hosted-on-pypi on the available configured trusted indexes: * https://pypi.python.org/ However, the author has indicated that it can be found at: * https://pypi.example.com/ Should we add this PyPI index to the list of trusted indexes ? (y/n)
y
Thank you. We added https://pypi.example.com/ to the list of indexes. You are currently using these indexes as trusted indexes: * https://pypi.python.org/ * https://pypi.example.com/ We will now retry the installation. ... something-not-hosted-on-pypi installed successfully. $
$ pip install --extra-index-url https://pypi.example.com/all/ something-not-hosted-on-pypi
This leaves the user feeling annoyed that we didn’t just search those locations by default. I truly think it is a bad experience and I only ever added it because I wanted the discussion to be over with and I was trying to placate people by giving them a bad feature.
There's nothing bad in the above UI. A user will go through the discovery process once and the next time around, everything will just work.
Instead, I think that we can design a solution that works by default and will work without the end user needing to do anything at all. However, I’m not an expert in the US laws (the country I live in and have lived in all my life) and I’m especially not an expert in the laws of countries other than the US. This means I don’t fully understand the issue that needs to be solved. In addition to that, I only have so many hours in the day and I need to find a way to prioritize what I’m working on, the data sovereignty issue may only affect people who do not live in the US, however it does not affect everyone who is outside of the US. Most projects from authors outside of the US are perfectly fine with hosting their projects within the US and it is a minority of projects who cannot or will not for one reason or another.
All Linux distros I know and use have repositories distributed all over the planet, and many also provide official and less official ones, for the users to choose from, so there is more than enough evidence that a federated system for software distribution works better than a centralized one. I wonder why we can't we agree on this ?
I am happy to work with someone impacted by the removal of offsite to design and implement a solution to these issues that provides an experience to those people that matches the experience for people willing or able to host in the US. If the PSF wants to hire someone to do this instead of relying on someone affected to volunteer, I’m also happy to work with them. However, I do not think it’s fair to everyone else, inside and outside of the US, to continue to confuse and infuriate them while we wait for someone to step forward. I’m one person and I’m the only person who gets paid dedicated time to work on Python’s packaging, but I’m spread thin and I have a backlog a mile long, if I don’t prioritize the things that affect most people over the things that affect a small number of people, and leave it up to the people who need an edge case feature things that are already blocked on me are going to languish even further.
I'm happy to help write a PEP for the discovery feature and I'd also love to help with the implementation. My problem is that no one is paying me to work on this and so my time investment into this has to stay way behind of what I'd like to invest.
Finally, I wasn’t sure if this should be a new PEP or if it should continue as PEP 470, I talked to Nick and he suggested it would be fine to just continue on using PEP 470 for this.
If you don't feel comfortable with the discovery feature, I think it's better to split it off into a separate PEP.
More comments:
* The user experience argument you give in the PEP 470 for rejecting the idea is not really sound: the purpose of the discovery feature is to provide a *better user experience* than an error message saying that a package cannot be found and requiring the user to do research on the web to find the right URLs. Package managers can use the information about the other indexes they receive from PyPI to either present them to the user to use/install them or to even directly go there to find the packages.
It’s a slightly better user experience than a flat out error yes, but it’s a much worse user experience than what people using PyPI deserve. If we’re going to solve a problem, I’d much rather do it correctly in a way that doesn’t frustrate end users and gives them a great experience rather than something that is going to annoy them and be a “death of a thousand cuts” to people hosting off of PyPI. I think that the compromise feature I added in PEP 470 will be the same sort of compromise feature we had in PEP 438, something that on the tin looks like a compromise, enough to get the folks who need/want it to think the PEP is supporting their use case, but in reality is just another cut in a “death of a thousand cuts” to hosting outside of the US (or off of PyPI).
I think you are forgetting that the worst user experience is one where users are left without installable Python packages :-) No matter how much you try to get people to host everything on pypi.python.org, there are always going to be some which don't want to do this and rather stick with their own PyPI index server for whatever reason.
I don’t want to continue to implement half solutions that I know are going to be painful to end users with the idea in mind that I know the pain is going to drive people away from those solutions and towards the one good option we have currently. I’d rather be honest to everyone involved about what is truly supported and focus on making that a great experience for everyone.
* The section on data sovereignty should really be removed or reworded. PEPs should be neutral and not imply political views, in particular not make it look like the needs of US users of PyPI are more important then those of non-US users. Using "poor user experience" as argument here is really not appropriate.
I’m perhaps not great at wording things. I don’t think it’s US users vs Non-US users, since plenty of people outside of the US are perfectly happy and able to upload their projects to PyPI. I think it’s more of “the needs of the many outweigh the needs of the few”, but with an explicit callout that if one of those few want to come forward and work with me, we can get something in place that really solves that problem in a user friendly way.
Perhaps you could suggest a rewording that you think says the above? I don’t see a political view being implied nor do I see the needs of US users being prioritized over the needs of non-US users.
I'd just remove the whole section. Splitting the user base into US and non-US users, even if just to explain that you cannot cover all non-US views or requirements is not something we should put into an official Python document.
PyPI is a central part of the Python community infrastructure and should be regarded as a resource for the world-wide community. There is no reason to assume that we cannot have several PyPI installations around the world to address any such issues.
I don’t assume that we can’t do something like that, one of my ideas for solving this issue looks something like that in fact. However without someone who cares willing to step forward and bring to the table an expertise in what will satisfy those legalities or not and with a willingness to pitch it and contribute to help make such a solution a reality I don’t feel comfortable spending any time a solution that may not even actually solve the problem at hand.
I don’t think the solution that was in the PEP is that solution though. I think it was a poison pill that I fully expected to have a terrible experience which would just force people to host on PyPI or have their project suffer. Repositories hosted by random people end up making people’s installs extremely unreliable. We’ve made great strides in making it so that ``pip install <foo>`` rarely fails because something is down, and I think blessing a feature or continuing to support one that doesn’t aid in making that more the case is harmful to the community as a whole.
Honestly, if someone comes forward now-ish or the PSF tells me now-ish they will hire someone to figure out the intricacies of this matter, I’ll even put this PEP back on hold while we design this feature. I have no particularly animosity towards hosting outside of the US, I just don’t have the expertise or time to design and implement it on my own.
Fully understood, alas, I don't have more cycles to spare to help with designing a full blown distributed PyPI system on my free time. As for the PSF: the situation is somewhat similar. The money is available for such projects, but we don't have enough people to provide management and oversight :-( While we're slowly working on changing this, it won't happen over night.
* It is rather unusual to have a PEP switch from a compromise solution to a rejection of the compromise this late in the process.
I will soon be starting a PSF working group to address some of the reasons why people cannot upload packages to PyPI. The intent is to work on the PyPI terms to make them more package author friendly. Anyone interested to join ?
I don’t have any particular insight into the ToS nor do I really care what it says as long as it grants PyPI everything it needs to function. I should probably be a part of the WG though since it involves PyPI and I’m really the only person working on PyPI. I wouldn’t want to adopt a ToS for PyPI without Van’s approval though.
Since PyPI is legally run by the PSF, the PSF board will have to approve the new terms. Having you on board for the WG, would certainly be very useful, since there may well be technical details that come into play. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 27 2015)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2015-08-27: Released eGenix mx Base 3.2.9 ... http://egenix.com/go83 2015-08-19: Released mxODBC 3.3.5 ... http://egenix.com/go82 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On 28 Aug 2015 9:00 am, "M.-A. Lemburg" <mal@egenix.com> wrote:
All Linux distros I know and use have repositories distributed all over the planet, and many also provide official and less official ones, for the users to choose from, so there is more than enough evidence that a federated system for software distribution works better than a centralized one.
None of them provide cross repository discovery except Conary ttbomk. And its is inherited so a different ux. So that's a difference. Rob
On 28 Aug 2015 07:31, "Robert Collins" <robertc@robertcollins.net> wrote:
On 28 Aug 2015 9:00 am, "M.-A. Lemburg" <mal@egenix.com> wrote:
All Linux distros I know and use have repositories distributed all over the planet, and many also provide official and less official ones, for the users to choose from, so there is more than enough
evidence
that a federated system for software distribution works better than a centralized one.
None of them provide cross repository discovery except Conary ttbomk. And its is inherited so a different ux.
So that's a difference.
Right, the distro model is essentially the one Donald is proposing - centrally controlled default repos, ability to enable additional repos on client systems. Geographically distributed mirrors are different, as those are just redistributing signed content from the main repos. Hosting in multiple regions and/or avoiding selected regions would definitely be a nice service to offer, and it would be good to have a straightforward way to deploy and run an external repo (e.g. a devpi Docker image), but the proposed core model is itself a tried and tested one. Reducing back to that, and restarting the exploration of multi-index support from there with a clear statement of objectives would be a good way to go. If we need to manually whitelist some external repos for transition management purposes, then that's likely a better option than settling for a nominally general purpose feature we'd prefer people didn't actually use. Regards, Nick.
Rob
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 29.08.2015 15:57, Nick Coghlan wrote:
On 28 Aug 2015 07:31, "Robert Collins" <robertc@robertcollins.net> wrote:
On 28 Aug 2015 9:00 am, "M.-A. Lemburg" <mal@egenix.com> wrote:
All Linux distros I know and use have repositories distributed all over the planet, and many also provide official and less official ones, for the users to choose from, so there is more than enough
evidence
that a federated system for software distribution works better than a centralized one.
None of them provide cross repository discovery except Conary ttbomk. And its is inherited so a different ux.
So that's a difference.
Right, the distro model is essentially the one Donald is proposing - centrally controlled default repos, ability to enable additional repos on client systems. Geographically distributed mirrors are different, as those are just redistributing signed content from the main repos.
Hosting in multiple regions and/or avoiding selected regions would definitely be a nice service to offer, and it would be good to have a straightforward way to deploy and run an external repo (e.g. a devpi Docker image), but the proposed core model is itself a tried and tested one. Reducing back to that, and restarting the exploration of multi-index support from there with a clear statement of objectives would be a good way to go.
If we need to manually whitelist some external repos for transition management purposes, then that's likely a better option than settling for a nominally general purpose feature we'd prefer people didn't actually use.
There are quite a few systems out there that let you search for repos with the packages you need, but they are usually web based and not integrated into the package managers, e.g. rpmfind, various PPA search tools (e.g. Launchpad or open build service), etc. There's also another difference: Linux repos are usually managed by a single entity owning the packages, very much unlike PyPI which is merely a hosting platform and index to point to packages owned by the authors. So it's natural for PyPI to let package manager users know about where to find packages which are not hosted on PyPI and the user experience (which people always bring up as the number one argument for all sorts of things on this list ;-)), is much better when providing this information to the user directly, rather than saying "I couldn't find any distribution files for you - go look on PyPI for instructions where to find them...". -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 31 2015)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2015-08-27: Released eGenix mx Base 3.2.9 ... http://egenix.com/go83 2015-08-19: Released mxODBC 3.3.5 ... http://egenix.com/go82 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On 31 Aug 2015, at 10:44, M.-A. Lemburg <mal@egenix.com> wrote:
There's also another difference: Linux repos are usually managed by a single entity owning the packages, very much unlike PyPI which is merely a hosting platform and index to point to packages owned by the authors.
That is probably true for public repositories. However, there are also a huge number of organisations who have internal repositories for deb/rpm packages, and many of those contain third party packages. I have a couple, and most of them contain a combination of our own packages as well as collection of backports and custom packages for software that hasn’t been packaged by anyone else. Wichert.
On 31.08.2015 11:05, Wichert Akkerman wrote:
On 31 Aug 2015, at 10:44, M.-A. Lemburg <mal@egenix.com> wrote:
There's also another difference: Linux repos are usually managed by a single entity owning the packages, very much unlike PyPI which is merely a hosting platform and index to point to packages owned by the authors.
That is probably true for public repositories. However, there are also a huge number of organisations who have internal repositories for deb/rpm packages, and many of those contain third party packages. I have a couple, and most of them contain a combination of our own packages as well as collection of backports and custom packages for software that hasn’t been packaged by anyone else.
True, but for those, I think explicitly adding the index URL to the package installer search path is the better approach. Or perhaps I misunderstood and you meant something like: "If the package is not in my internal repo, I don't want pip to look it up on PyPI or anywhere else." That's a valid use case, but it seems orthogonal to the question of making public repositories for specific packages more easily configurable for package manager users. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 31 2015)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2015-08-27: Released eGenix mx Base 3.2.9 ... http://egenix.com/go83 2015-08-19: Released mxODBC 3.3.5 ... http://egenix.com/go82 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On 31 Aug 2015, at 11:35, M.-A. Lemburg <mal@egenix.com> wrote:
On 31.08.2015 11:05, Wichert Akkerman wrote:
On 31 Aug 2015, at 10:44, M.-A. Lemburg <mal@egenix.com> wrote:
There's also another difference: Linux repos are usually managed by a single entity owning the packages, very much unlike PyPI which is merely a hosting platform and index to point to packages owned by the authors.
That is probably true for public repositories. However, there are also a huge number of organisations who have internal repositories for deb/rpm packages, and many of those contain third party packages. I have a couple, and most of them contain a combination of our own packages as well as collection of backports and custom packages for software that hasn’t been packaged by anyone else.
True, but for those, I think explicitly adding the index URL to the package installer search path is the better approach.
Sure.
Or perhaps I misunderstood and you meant something like:
"If the package is not in my internal repo, I don't want pip to look it up on PyPI or anywhere else.”
I just wanted to add a bit of context, since your statement seemed to reflect a slightly different reality than mine. You do bring up a good point though. I really like the apt-preferences approach. That allows you to define some rules to set which repository should be used. You can do things like always prefer a specific repository, or do that only for specific packages, with a default rule to use whichever repository has the latest version. Very, very useful. Wichert.
On 31 August 2015 at 10:43, Wichert Akkerman <wichert@wiggy.net> wrote:
I just wanted to add a bit of context, since your statement seemed to reflect a slightly different reality than mine. You do bring up a good point though. I really like the apt-preferences approach. That allows you to define some rules to set which repository should be used. You can do things like always prefer a specific repository, or do that only for specific packages, with a default rule to use whichever repository has the latest version. Very, very useful.
There's been a few posts now about how the new system is or is not like Linux package management systems. As a Windows user, my view of Linux package management is pretty limited. To me it seems like the basic approach is, if a package is in the official repo, you can just do apt-get install (or yum install) and it works. If the package is elsewhere, you need to find out where (usually manually, as far as I can see) and then do a bit of config, and then the package is available just like standard ones. That's pretty much the same as the proposed solution for PyPI/pip. If there's any additional functionality that Linux systems provide, could someone summarise it from an end user POV for me? (And maybe also point out why I'd never noticed it as a naive Linux user!) To me, that's a key to whether this PEP is missing something important relative to those systems. Paul
On 31 Aug 2015, at 12:36, Paul Moore <p.f.moore@gmail.com> wrote:
On 31 August 2015 at 10:43, Wichert Akkerman <wichert@wiggy.net> wrote:
I just wanted to add a bit of context, since your statement seemed to reflect a slightly different reality than mine. You do bring up a good point though. I really like the apt-preferences approach. That allows you to define some rules to set which repository should be used. You can do things like always prefer a specific repository, or do that only for specific packages, with a default rule to use whichever repository has the latest version. Very, very useful.
There's been a few posts now about how the new system is or is not like Linux package management systems. As a Windows user, my view of Linux package management is pretty limited. To me it seems like the basic approach is, if a package is in the official repo, you can just do apt-get install (or yum install) and it works. If the package is elsewhere, you need to find out where (usually manually, as far as I can see) and then do a bit of config, and then the package is available just like standard ones. That's pretty much the same as the proposed solution for PyPI/pip.
If there's any additional functionality that Linux systems provide, could someone summarise it from an end user POV for me? (And maybe also point out why I'd never noticed it as a naive Linux user!) To me, that's a key to whether this PEP is missing something important relative to those systems.
Sure. My knowledge of rpm is 20 years out of date, so I am going to focus the deb/dpkg/apt world only. The whole packaging system is build around archives. The packaging tools themselves do not have anything hardcoded there, they pick up the archive from a configuration file (/etc/apt/sources.list). That file lists all archives. For example: # Hetzner APT-Mirror deb http://mirror.hetzner.de/ubuntu/packages trusty main restricted universe multiverse deb-src http://mirror.hetzner.de/ubuntu/packages trusty main restricted universe multiverse deb http://de.archive.ubuntu.com/ubuntu/ trusty multiverse deb http://security.ubuntu.com/ubuntu trusty-security main restricted deb-src http://security.ubuntu.com/ubuntu trusty-security main restricted There are two things of note here: you can have multiple archive types (“deb” and “deb-src” in this case), and different URL types. Besides the standard http there is also a cdrom scheme which can mount cdroms, there is a tor scheme now, etc. These are pluggable and handled by a binaries in /usr/lib/apt/methods/ . When you do an install of a new computer the installer will add some default repositories there, generally a local mirror and the security archive. https://wiki.debian.org/SourcesList <https://wiki.debian.org/SourcesList> has some more detailed information (which is probably not relevant here). There are some convenient tools available to register extra archives. For example add-apt-repository, which allows you to do this: # add-apt-repository ppa:saltstack/salt # add-apt-repository “deb http://nl.archive.ubuntu.com/ubuntu <http://nl.archive.ubuntu.com/ubuntu> trusty universe" This will try to download and configure the correct GPG key to check archive signatures as well. Each archive has an index which lists all its packages with metadata. You download these to your system (using “apt-get update” or a GUI), and the local copy is used by all other tools. That results in fast searches, and solvers having fast access to the complete database of available packages. When installing a package your normally get the latest version, independent of which archive has that version. You can ask the system which versions are available: # apt-cache policy salt-minion salt-minion: Installed: 2015.5.3+ds-1trusty1 Candidate: 2015.5.3+ds-1trusty1 Version table: *** 2015.5.3+ds-1trusty1 0 500 http://ppa.launchpad.net/saltstack/salt/ubuntu/ trusty/main amd64 Packages 100 /var/lib/dpkg/status 0.17.5+ds-1 0 500 http://mirror.hetzner.de/ubuntu/packages/ trusty/universe amd64 Packages 500 http://de.archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages In this case there are two versions available: 2015.5.3+ds-1trusty1 is currently installed and came from a ppa, and 0.17.5+ds-1 which is available from two mirrors. This can result in somewhat interesting behaviour when multiple archives have the same packages and are updated independently. To handle that you can define preferences to tell the system how you want to handle that. For example if you want salt to always be installed from the ppa you can define this preference: Package: salt* Pin: origin “LP-PPA-saltstack-salt” Pin-Priority: 900 This makes sure packages from the salt ppa have a higher priority, so they are always preferred. You can also use this to make an archive of backports available on your system, but only track a few specific packages from there. There are more options available there; see https://wiki.debian.org/AptPreferences <https://wiki.debian.org/AptPreferences> and https://www.debian.org/doc/manuals/debian-reference/ch02.en.html#_tweaking_c... <https://www.debian.org/doc/manuals/debian-reference/ch02.en.html#_tweaking_c...> for more information. Wichert.
Thank you for this description of APT! * debtorrent adds a bittorrent APT transport (to download packages from *more than one index*) I, myself, have always wished that APT had a configuration file schema (JSON, YAML) and was written in Python. Dnf 1.1.0 just fixed package download caching... More packaging notes: https://westurner.org/tools/#packages I see that you would remove the rel= links, because external traversal and allow_hosts. -- A somewhat relevant suggestion would be to add schema.org RDFa to the pages (so search engines can offload the search part) (and include the install_requires edges in the ./JSON-LD, as well). On Aug 31, 2015 6:07 AM, "Wichert Akkerman" <wichert@wiggy.net> wrote:
On 31 Aug 2015, at 12:36, Paul Moore <p.f.moore@gmail.com> wrote:
On 31 August 2015 at 10:43, Wichert Akkerman <wichert@wiggy.net> wrote:
I just wanted to add a bit of context, since your statement seemed to reflect a slightly different reality than mine. You do bring up a good point though. I really like the apt-preferences approach. That allows you to define some rules to set which repository should be used. You can do things like always prefer a specific repository, or do that only for specific packages, with a default rule to use whichever repository has the latest version. Very, very useful.
There's been a few posts now about how the new system is or is not like Linux package management systems. As a Windows user, my view of Linux package management is pretty limited. To me it seems like the basic approach is, if a package is in the official repo, you can just do apt-get install (or yum install) and it works. If the package is elsewhere, you need to find out where (usually manually, as far as I can see) and then do a bit of config, and then the package is available just like standard ones. That's pretty much the same as the proposed solution for PyPI/pip.
If there's any additional functionality that Linux systems provide, could someone summarise it from an end user POV for me? (And maybe also point out why I'd never noticed it as a naive Linux user!) To me, that's a key to whether this PEP is missing something important relative to those systems.
Sure. My knowledge of rpm is 20 years out of date, so I am going to focus the deb/dpkg/apt world only. The whole packaging system is build around archives. The packaging tools themselves do not have anything hardcoded there, they pick up the archive from a configuration file (/etc/apt/sources.list). That file lists all archives. For example:
# Hetzner APT-Mirror deb http://mirror.hetzner.de/ubuntu/packages trusty main restricted universe multiverse deb-src http://mirror.hetzner.de/ubuntu/packages trusty main restricted universe multiverse
deb http://de.archive.ubuntu.com/ubuntu/ trusty multiverse
deb http://security.ubuntu.com/ubuntu trusty-security main restricted deb-src http://security.ubuntu.com/ubuntu trusty-security main restricted
There are two things of note here: you can have multiple archive types (“deb” and “deb-src” in this case), and different URL types. Besides the standard http there is also a cdrom scheme which can mount cdroms, there is a tor scheme now, etc. These are pluggable and handled by a binaries in /usr/lib/apt/methods/ . When you do an install of a new computer the installer will add some default repositories there, generally a local mirror and the security archive. https://wiki.debian.org/SourcesList has some more detailed information (which is probably not relevant here).
There are some convenient tools available to register extra archives. For example add-apt-repository, which allows you to do this:
# add-apt-repository ppa:saltstack/salt # add-apt-repository “deb http://nl.archive.ubuntu.com/ubuntu trusty universe"
This will try to download and configure the correct GPG key to check archive signatures as well.
Each archive has an index which lists all its packages with metadata. You download these to your system (using “apt-get update” or a GUI), and the local copy is used by all other tools. That results in fast searches, and solvers having fast access to the complete database of available packages.
When installing a package your normally get the latest version, independent of which archive has that version. You can ask the system which versions are available:
# apt-cache policy salt-minion salt-minion: Installed: 2015.5.3+ds-1trusty1 Candidate: 2015.5.3+ds-1trusty1 Version table: *** 2015.5.3+ds-1trusty1 0 500 http://ppa.launchpad.net/saltstack/salt/ubuntu/ trusty/main amd64 Packages 100 /var/lib/dpkg/status 0.17.5+ds-1 0 500 http://mirror.hetzner.de/ubuntu/packages/ trusty/universe amd64 Packages 500 http://de.archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages
In this case there are two versions available: 2015.5.3+ds-1trusty1 is currently installed and came from a ppa, and 0.17.5+ds-1 which is available from two mirrors.
This can result in somewhat interesting behaviour when multiple archives have the same packages and are updated independently. To handle that you can define preferences to tell the system how you want to handle that. For example if you want salt to always be installed from the ppa you can define this preference:
Package: salt* Pin: origin “LP-PPA-saltstack-salt” Pin-Priority: 900
This makes sure packages from the salt ppa have a higher priority, so they are always preferred. You can also use this to make an archive of backports available on your system, but only track a few specific packages from there. There are more options available there; see https://wiki.debian.org/AptPreferences and https://www.debian.org/doc/manuals/debian-reference/ch02.en.html#_tweaking_c... for more information.
Wichert.
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 31 August 2015 at 20:59, Wichert Akkerman <wichert@wiggy.net> wrote:
Sure. My knowledge of rpm is 20 years out of date, so I am going to focus the deb/dpkg/apt world only.
For the purposes of this discussion, we can consider the two ecosystems essentially equivalent. There are differences around what's builtin and what's a plugin, but the general principles are the same: * default repos are defined by each distro, not by the tool developers * source repos and binary repos are separate from each other * users can add additional repos of both kinds via config files * config files can also set relative priorities between repos That last one is the main defence against malicious repos - if you set your distro repos to a high priority, then third party repos can't override distro packages, but if you set a particular third party repo to a high priority then it can override *anything*, not just the packages you're expecting it to replace. The key *differences* that are relevant to the current discussion are: * PyPA are the maintainers of both the default repo *and* the installation tools * we don't currently have any kind of repo priority mechanism This is why I think it's important to be clear that we *want* to improve the off-PyPI hosting experience, as that's something we consider reasonable for people to want to do, but it's going to take time and development effort. The question is thus whether it makes sense to delay significant improvements for the common case (i.e. hosting on PyPI), while we work on handling the minority case, and I don't believe it does. It *may* be worth hacking in special case handling for the packages already hosted externally, but we can do that as exactly that (i.e. a special case providing a legacy bridging mechanism until a better solution is available), rather than as an indefinitely supported feature. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 31 August 2015 at 14:32, Nick Coghlan <ncoghlan@gmail.com> wrote:
This is why I think it's important to be clear that we *want* to improve the off-PyPI hosting experience, as that's something we consider reasonable for people to want to do, but it's going to take time and development effort.
+1 It's important also that we get input from people who need off-PyPI hosting, as they are the people with the knowledge of what's required.
The question is thus whether it makes sense to delay significant improvements for the common case (i.e. hosting on PyPI), while we work on handling the minority case, and I don't believe it does.
Agreed again. IMO, our initial responsibility is to get a solid, stable baseline functionality. The principle behind the PEP is that getting packages from anywhere other than PyPI must be an opt-in process by the user. That's a basic consequence of the idea that users should know who provides the services they use, combined with the fact that PyPI is the official Python repository. We've provided an option based on those principles for people who host off-PyPI, and there's no reason we couldn't improve that solution, but the principle should remain - users choose which package providers to use and trust.
It *may* be worth hacking in special case handling for the packages already hosted externally, but we can do that as exactly that (i.e. a special case providing a legacy bridging mechanism until a better solution is available), rather than as an indefinitely supported feature.
Hmm, I'm not aware of any concrete suggestions along those lines. According to the PEP, 3 months after it is implemented all projects on PyPI will be in pypi-only mode, and all of the other legacy modes will be unavailable, and unused. No external links will be visible, no link-scraping will occur, and the relevant code could be removed from installer tools such as pip. Are you suggesting that this shouldn't occur? Paul
On 1 September 2015 at 00:17, Paul Moore <p.f.moore@gmail.com> wrote:
On 31 August 2015 at 14:32, Nick Coghlan <ncoghlan@gmail.com> wrote:
It *may* be worth hacking in special case handling for the packages already hosted externally, but we can do that as exactly that (i.e. a special case providing a legacy bridging mechanism until a better solution is available), rather than as an indefinitely supported feature.
Hmm, I'm not aware of any concrete suggestions along those lines. According to the PEP, 3 months after it is implemented all projects on PyPI will be in pypi-only mode, and all of the other legacy modes will be unavailable, and unused. No external links will be visible, no link-scraping will occur, and the relevant code could be removed from installer tools such as pip.
Are you suggesting that this shouldn't occur?
I'm saying we can look at the numbers at the end of the grace period and decide what to do then :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On August 31, 2015 at 10:26:04 AM, Paul Moore (p.f.moore@gmail.com) wrote:
On 31 August 2015 at 15:21, Nick Coghlan wrote:
Are you suggesting that this shouldn't occur?
I'm saying we can look at the numbers at the end of the grace period and decide what to do then :)
Sounds reasonable to me :-)
FWIW, 10 months ago PIL was an order of magnitude the biggest user (based on hits to /simple/<X>/ of this feature both by sheer number of requests AND by unique IP addresses. I removed these numbers from the updated PEP because I felt the PEP was getting heavy on the “weasel” factor by trying to be “well if you look at it X way, the impact is A, if you look at it Y way the impact is B”. I’ve included the numbers from 10 months ago below, but if we think this numbers are useful I can redo them now again. Top Externally Hosted Projects by Requests ------------------------------------------ This is determined by looking at the number of requests the ``/simple/<project>/`` page had gotten in a single day. The total number of requests during that day was 10,623,831. ============================== ======== Project Requests ============================== ======== PIL 63869 Pygame 2681 mysql-connector-python 1562 pyodbc 724 elementtree 635 salesforce-python-toolkit 316 wxPython 295 PyXML 251 RBTools 235 python-graph-core 123 cElementTree 121 ============================== ======== Top Externally Hosted Projects by Unique IPs -------------------------------------------- This is determined by looking at the IP addresses of requests the ``/simple/<project>/`` page had gotten in a single day. The total number of unique IP addresses during that day was 124,604. ============================== ========== Project Unique IPs ============================== ========== PIL 4553 mysql-connector-python 462 Pygame 202 pyodbc 181 elementtree 166 wxPython 126 RBTools 114 PyXML 87 salesforce-python-toolkit 76 pyDes 76 ============================== ========== ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On August 31, 2015 at 10:31:35 AM, Donald Stufft (donald@stufft.io) wrote:
FWIW, 10 months ago PIL was an order of magnitude the biggest user (based on hits to /simple// of this feature both by sheer number of requests AND by unique IP addresses.
Oh, and also you can see that, at least 10 months ago, the actual use of this feature drops sharply the further away from PIL you get. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 31 Aug 2015, at 16:31, Donald Stufft <donald@stufft.io> wrote:
Top Externally Hosted Projects by Requests ------------------------------------------
This is determined by looking at the number of requests the ``/simple/<project>/`` page had gotten in a single day. The total number of requests during that day was 10,623,831.
============================== ======== Project Requests ============================== ======== PIL 63869 Pygame 2681 mysql-connector-python 1562 pyodbc 724 elementtree 635 salesforce-python-toolkit 316 wxPython 295 PyXML 251 RBTools 235 python-graph-core 123 cElementTree 121 ============================== ========
Looking very briefly at that list all of those are obsolete/replaced and have not seen a release in years, or they are now hosted in PyPI. Wichert.
On Mon, Aug 31, 2015 at 9:31 AM, Donald Stufft <donald@stufft.io> wrote:
On August 31, 2015 at 10:26:04 AM, Paul Moore (p.f.moore@gmail.com) wrote:
On 31 August 2015 at 15:21, Nick Coghlan wrote:
Are you suggesting that this shouldn't occur?
I'm saying we can look at the numbers at the end of the grace period and decide what to do then :)
Sounds reasonable to me :-)
FWIW, 10 months ago PIL was an order of magnitude the biggest user (based on hits to /simple/<X>/ of this feature both by sheer number of requests AND by unique IP addresses. I removed these numbers from the updated PEP because I felt the PEP was getting heavy on the “weasel” factor by trying to be “well if you look at it X way, the impact is A, if you look at it Y way the impact is B”. I’ve included the numbers from 10 months ago below, but if we think this numbers are useful I can redo them now again.
Status Quo: * pypi, warehouse httpd logs? Opportunities: * JOIN "/simple/<pkg>/....ext" w/ package metadata (for faceted queries) * offload to bigquery * overload warehouse * PIL -> pillow ?
Top Externally Hosted Projects by Requests ------------------------------------------
This is determined by looking at the number of requests the ``/simple/<project>/`` page had gotten in a single day. The total number of requests during that day was 10,623,831.
============================== ======== Project Requests ============================== ======== PIL 63869 Pygame 2681 mysql-connector-python 1562 pyodbc 724 elementtree 635 salesforce-python-toolkit 316 wxPython 295 PyXML 251 RBTools 235 python-graph-core 123 cElementTree 121 ============================== ========
Top Externally Hosted Projects by Unique IPs --------------------------------------------
This is determined by looking at the IP addresses of requests the ``/simple/<project>/`` page had gotten in a single day. The total number of unique IP addresses during that day was 124,604.
============================== ========== Project Unique IPs ============================== ========== PIL 4553 mysql-connector-python 462 Pygame 202 pyodbc 181 elementtree 166 wxPython 126 RBTools 114 PyXML 87 salesforce-python-toolkit 76 pyDes 76 ============================== ==========
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Aug 31, 2015 7:31 AM, "Donald Stufft" <donald@stufft.io> wrote:
On August 31, 2015 at 10:26:04 AM, Paul Moore (p.f.moore@gmail.com) wrote:
On 31 August 2015 at 15:21, Nick Coghlan wrote:
Are you suggesting that this shouldn't occur?
I'm saying we can look at the numbers at the end of the grace period and decide what to do then :)
Sounds reasonable to me :-)
FWIW, 10 months ago PIL was an order of magnitude the biggest user (based
on hits to /simple/<X>/ of this feature both by sheer number of requests AND by unique IP addresses. I removed these numbers from the updated PEP because I felt the PEP was getting heavy on the “weasel” factor by trying to be “well if you look at it X way, the impact is A, if you look at it Y way the impact is B”. I’ve included the numbers from 10 months ago below, but if we think this numbers are useful I can redo them now again.
Top Externally Hosted Projects by Requests ------------------------------------------
This is determined by looking at the number of requests the ``/simple/<project>/`` page had gotten in a single day. The total number
of
requests during that day was 10,623,831.
============================== ======== Project Requests ============================== ======== PIL 63869
Maybe what we need is a special case rule that's similar to the ones discussed already, except where if people try to install PIL then instead of printing an external repository url, it prints the URL of an explanation of why they want Pillow instead...
Pygame 2681 mysql-connector-python 1562 pyodbc 724 elementtree 635 salesforce-python-toolkit 316 wxPython 295 PyXML 251 RBTools 235 python-graph-core 123 cElementTree 121 ============================== ========
Top Externally Hosted Projects by Unique IPs --------------------------------------------
This is determined by looking at the IP addresses of requests the ``/simple/<project>/`` page had gotten in a single day. The total number of unique IP addresses during that day was 124,604.
============================== ========== Project Unique IPs ============================== ========== PIL 4553 mysql-connector-python 462 Pygame 202 pyodbc 181 elementtree 166 wxPython 126 RBTools 114 PyXML 87 salesforce-python-toolkit 76 pyDes 76 ============================== ==========
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On August 31, 2015 at 10:31:35 AM, Donald Stufft (donald@stufft.io) wrote:
I can redo them now again.
So, I went ahead and ran all of the numbers using the data from 2015-08-18 (chosen because when I looked at the last couple weeks of log files, it was the largest log file). I think the difference in just ~10 months supports the idea that use of this feature is declining and the model in the PEP will be a cleaner and easier to understand model. On this day, there were 20,398,771 total requests to /simple/<X>/ which resulted in either a 200 or a 304 response code, out of these ~20 million requests 80,622 went to projects which do not have any files hosted on PyPI but do have files hosted off of PyPI. This represents ~0.4% of the total traffic for that particular day. The top packages look a bit different than it did 10 months ago, surprisingly to me PIL has severely dropped off from where it had ~63k 10 months ago and it now has 5.5k, however pygame has risen from 2.6k 10 months ago to ~32k. The total number of requests has doubled between now and 10 months ago and it appears that numbers of the top packages have more or less done the same, with the exception of the very top package which has been cut in half. Similar to 10 months ago we see the numbers rapidly drop by orders of magnitude. Overall, the top 10 in this list togther represented 70,691 requests 10 months ago, and now they represent 41,703. That's roughly 60% of what they were 10 months ago while the total number of requests increased by 100%, so it's really more like 30% of what they were previously when adjusted for the traffic increase. ============================== ======== Project Requests ============================== ======== Pygame 32238 PIL 5548 mysql-connector-python 5152 RBTools 3723 python-apt 3028 meliae 1679 elementtree 1576 which 457 salesforce-python-toolkit 454 pywbem 400 wxPython 359 pyDes 301 PyXML 300 robotframework-seleniumlibrary 282 basemap 255 Is any of this information useful for the PEP? I removed it because I though it was too much, but I'm happy to add it back in if it'd be useful. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 1 September 2015 at 05:53, Donald Stufft <donald@stufft.io> wrote:
The top packages look a bit different than it did 10 months ago, surprisingly to me PIL has severely dropped off from where it had ~63k 10 months ago and it now has 5.5k, however pygame has risen from 2.6k 10 months ago to ~32k.
I can provide plausible explanations for both of those: * Pillow adoption progressively displacing PIL usage * Pygame Zero lowering barriers to entry for PyGame usage in education Cheers, Nick. P.S. If folks haven't seen PyGame Zero, it's a pretty neat concept: https://pygame-zero.readthedocs.org/en/latest/ -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On September 1, 2015 at 12:15:56 AM, Nick Coghlan (ncoghlan@gmail.com) wrote:
On 1 September 2015 at 05:53, Donald Stufft wrote:
The top packages look a bit different than it did 10 months ago, surprisingly to me PIL has severely dropped off from where it had ~63k 10 months ago and it now has 5.5k, however pygame has risen from 2.6k 10 months ago to ~32k.
I can provide plausible explanations for both of those:
* Pillow adoption progressively displacing PIL usage * Pygame Zero lowering barriers to entry for PyGame usage in education
Seems reasonable, Pygame doesn’t seem to have had a release since 2009. This is a common theme amongst most of the projects still using external hosting, they were added at a time that PyPI either didn’t have file uploading or it’s file uploading was unreliable and it made more sense to host externally. I think that also explains why almost none of them switched away from the unverifiable method to the verifiable method and why the amount of traffic per project drops sharply after the first handful, because it’s largely older projects that may or may not even actually work anymore. In the case of Pygame, I see that the pygame zero instructions say to install from https://bitbucket.org/pygame/pygame. I don’t know of that’s an official pygame repository or if someone forked it or what, but if that’s owned by the same people, maybe we can get them to make a release to PyPI. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Mon, Aug 31, 2015 at 11:12 PM, Donald Stufft <donald@stufft.io> wrote:
On September 1, 2015 at 12:15:56 AM, Nick Coghlan (ncoghlan@gmail.com) wrote:
On 1 September 2015 at 05:53, Donald Stufft wrote:
The top packages look a bit different than it did 10 months ago, surprisingly to me PIL has severely dropped off from where it had ~63k 10 months ago and it now has 5.5k, however pygame has risen from 2.6k 10 months ago to ~32k.
I can provide plausible explanations for both of those:
* Pillow adoption progressively displacing PIL usage * Pygame Zero lowering barriers to entry for PyGame usage in education
Seems reasonable, Pygame doesn’t seem to have had a release since 2009. This is a common theme amongst most of the projects still using external hosting, they were added at a time that PyPI either didn’t have file uploading or it’s file uploading was unreliable and it made more sense to host externally. I think that also explains why almost none of them switched away from the unverifiable method to the verifiable method and why the amount of traffic per project drops sharply after the first handful, because it’s largely older projects that may or may not even actually work anymore.
In the case of Pygame, I see that the pygame zero instructions say to install from https://bitbucket.org/pygame/pygame. I don’t know of that’s an official pygame repository or if someone forked it or what, but if that’s owned by the same people, maybe we can get them to make a release to PyPI.
Looking at the download url that PyPI points to: http://www.pygame.org/download.shtml then I get the impression that this page was designed to be read by humans only, and if pip/easy_install ever do anything useful with it then it's by pure luck only. The whole page is about how you should go about choosing and obtaining the correct platform-specific binary package. (Also, what does it meant that https://pypi.python.org/pypi/Pygame is 404?) -n -- Nathaniel J. Smith -- http://vorpus.org
On September 1, 2015 at 2:27:03 AM, Nathaniel Smith (njs@pobox.com) wrote:
(Also, what does it meant that https://pypi.python.org/pypi/Pygame is 404?)
It means they’ve hidden all of their releases on PyPI. This is a UI only thing, it still shows up on /simple/pygame/, but it’s plausible they thought it would hide their releases from pip too, they wouldn’t be the first that thought that. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 1 September 2015 at 07:26, Nathaniel Smith <njs@pobox.com> wrote:
Looking at the download url that PyPI points to:
http://www.pygame.org/download.shtml
then I get the impression that this page was designed to be read by humans only, and if pip/easy_install ever do anything useful with it then it's by pure luck only. The whole page is about how you should go about choosing and obtaining the correct platform-specific binary package.
There are a number of pygame issues open about packaging: https://bitbucket.org/pygame/pygame/issues/59/pygame-has-no-pypi-page-and-ca... is a general "get the pypi page working" one, https://bitbucket.org/pygame/pygame/issues/222/wheel-format-for-pip-and-pypi for providing wheels, https://bitbucket.org/pygame/pygame/issues/223/new-version-scheme-to-match-p... for PEP 440 versioning, https://bitbucket.org/pygame/pygame/issues/224/fix-usage-of-raw_input-in-set... fix interactive prompting from setup.py, and I've just added one asking them to host on PyPI: https://bitbucket.org/pygame/pygame/issues/276/host-release-files-on-pypi They seem to be discussing the various packaging issues, so with luck they will put something in place for the next release, whenever that occurs. Paul
On 9/1/15 12:15 AM, Nick Coghlan wrote:
On 1 September 2015 at 05:53, Donald Stufft <donald@stufft.io> wrote:
The top packages look a bit different than it did 10 months ago, surprisingly to me PIL has severely dropped off from where it had ~63k 10 months ago and it now has 5.5k, however pygame has risen from 2.6k 10 months ago to ~32k.
I can provide plausible explanations for both of those:
* Pillow adoption progressively displacing PIL usage
Wow that's amazing, thanks for those numbers Donald \o/. To correlate (apropos of nothing) it looks like Pillow's PyPI numbers are slightly up over the past year (Pillow is released quarterly): $ vanity -q pillow==2.7.0 Pillow 2.7.0 has been downloaded 934,119 times! $ vanity -q pillow=={2.8.0,2.8.1,2.8.2} Pillow 2.8.0 has been downloaded 41,433 times! Pillow 2.8.1 has been downloaded 788,788 times! Pillow 2.8.2 has been downloaded 500,565 times! Pillow, Pillow and Pillow have been downloaded 1,330,786 times! $ vanity -q pillow==2.9.0 Pillow 2.9.0 has been downloaded 1,083,447 times!
* Pygame Zero lowering barriers to entry for PyGame usage in education
Cheers, Nick.
P.S. If folks haven't seen PyGame Zero, it's a pretty neat concept: https://pygame-zero.readthedocs.org/en/latest/
"~0.4% of the total traffic for that particular day." Thanks. So, to clarify, It will not be possible to host releases from external links to GitHub [1], But it will be possible to host /simple/ packages index with e.g. gh-pages and compoze [2]? [1] https://developer.github.com/v3/repos/releases/#upload-a-release-asset [2] http://docs.repoze.org/compoze/narr.html#consolidating-package-indexes On Aug 31, 2015 2:53 PM, "Donald Stufft" <donald@stufft.io> wrote:
On August 31, 2015 at 10:31:35 AM, Donald Stufft (donald@stufft.io) wrote:
I can redo them now again.
So, I went ahead and ran all of the numbers using the data from 2015-08-18 (chosen because when I looked at the last couple weeks of log files, it was the largest log file). I think the difference in just ~10 months supports the idea that use of this feature is declining and the model in the PEP will be a cleaner and easier to understand model.
On this day, there were 20,398,771 total requests to /simple/<X>/ which resulted in either a 200 or a 304 response code, out of these ~20 million requests 80,622 went to projects which do not have any files hosted on PyPI but do have files hosted off of PyPI. This represents ~0.4% of the total traffic for that particular day.
The top packages look a bit different than it did 10 months ago, surprisingly to me PIL has severely dropped off from where it had ~63k 10 months ago and it now has 5.5k, however pygame has risen from 2.6k 10 months ago to ~32k. The total number of requests has doubled between now and 10 months ago and it appears that numbers of the top packages have more or less done the same, with the exception of the very top package which has been cut in half. Similar to 10 months ago we see the numbers rapidly drop by orders of magnitude.
Overall, the top 10 in this list togther represented 70,691 requests 10 months ago, and now they represent 41,703. That's roughly 60% of what they were 10 months ago while the total number of requests increased by 100%, so it's really more like 30% of what they were previously when adjusted for the traffic increase.
============================== ======== Project Requests ============================== ======== Pygame 32238 PIL 5548 mysql-connector-python 5152 RBTools 3723 python-apt 3028 meliae 1679 elementtree 1576 which 457 salesforce-python-toolkit 454 pywbem 400 wxPython 359 pyDes 301 PyXML 300 robotframework-seleniumlibrary 282 basemap 255
Is any of this information useful for the PEP? I removed it because I though it was too much, but I'm happy to add it back in if it'd be useful.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
*index inheritance:* Each index can inherit packages from another index, including the pypi cacheroot/pypi. This allows to have development indexes
devpi may be the packaging infrastructure component to integrate with / document? search("docker devpi") http://doc.devpi.net/latest/ that also contain all releases from a production index. All privately uploaded packages will by default inhibit lookups from pypi, allowing to stay safe from an attacker who could otherwise upload malicious release files to the public PyPI index. On Aug 31, 2015 8:33 AM, "Nick Coghlan" <ncoghlan@gmail.com> wrote:
On 31 August 2015 at 20:59, Wichert Akkerman <wichert@wiggy.net> wrote:
Sure. My knowledge of rpm is 20 years out of date, so I am going to focus the deb/dpkg/apt world only.
Status Quo: * Default: [pypi, ] Proposed in the PEP (IIUC): * Default: [] Suggested here:
For the purposes of this discussion, we can consider the two ecosystems essentially equivalent. There are differences around what's builtin and what's a plugin, but the general principles are the same:
* default repos are defined by each distro, not by the tool developers * source repos and binary repos are separate from each other * users can add additional repos of both kinds via config files * config files can also set relative priorities between repos
That last one is the main defence against malicious repos - if you set your distro repos to a high priority, then third party repos can't override distro packages, but if you set a particular third party repo to a high priority then it can override *anything*, not just the packages you're expecting it to replace.
How do I diff these (ordered) graph solutions (w/ versions, extras_requires, and potentially _ABI_feat_x_)? There are defined graph algorithms for JSON-LD (RDFa), which would make it much easier to correlate a version+[sdist,bdist,wheel-<...>] of a package with a URI with a package catalog with a URI served by a repo with a URI
The key *differences* that are relevant to the current discussion are:
* PyPA are the maintainers of both the default repo *and* the installation tools * we don't currently have any kind of repo priority mechanism
Is it traversed in a list, or does config parser OrderedDict?
This is why I think it's important to be clear that we *want* to improve the off-PyPI hosting experience, as that's something we consider reasonable for people to want to do, but it's going to take time and development effort. The question is thus whether it makes sense to delay significant improvements for the common case (i.e. hosting on PyPI), while we work on handling the minority case, and I don't believe it does.
It *may* be worth hacking in special case handling for the packages already hosted externally, but we can do that as exactly that (i.e. a special case providing a legacy bridging mechanism until a better solution is available), rather than as an indefinitely supported feature.
Is there like a bigquery githubarchive of these, for large queries?
Regards, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 31 August 2015 at 15:20, Wes Turner <wes.turner@gmail.com> wrote:
Status Quo: * Default: [pypi, ]
Proposed in the PEP (IIUC): * Default: []
No. The PEP doesn't propose any change here - pip looks only at PyPI by default at the moment (but users can set config or options to include other indexes) and that will remain the case. I didn't understand the rest of your email, I'm afraid, so I can't comment. Sorry. Paul
On 31 August 2015 at 11:59, Wichert Akkerman <wichert@wiggy.net> wrote:
Sure. My knowledge of rpm is 20 years out of date, so I am going to focus the deb/dpkg/apt world only. The whole packaging system is build around archives. The packaging tools themselves do not have anything hardcoded there, they pick up the archive from a configuration file (/etc/apt/sources.list). That file lists all archives. [... Useful explanation omitted...]
Thanks, that was very helpful. From that I understand that the key differences are: 1. deb doesn't have a hard-coded "official" repository, everything is in the config files. 2. There are tools to manage the config, rather than editing the config files by hand. 3. There is pluggable support for archive and URL types. In the context of the PEP I don't think these are significant differences, so it seems to me that the PyPI solution proposed in the PEP matches pretty closely to the deb approach. Which is what I thought, but it's nice to have it confirmed. Thanks, Paul
On August 31, 2015 at 6:36:42 AM, Paul Moore (p.f.moore@gmail.com) wrote:
If there's any additional functionality that Linux systems provide, could someone summarise it from an end user POV for me? (And maybe also point out why I'd never noticed it as a naive Linux user!) To me, that's a key to whether this PEP is missing something important relative to those systems.
Wichert provided a much more in-depth summary, but I’d just say that the primary differences from an end user POV is that the defaults aren’t baked into the tool itself, but rather laid down in a config file by the system installer, and that they have more features for controlling how multiple repositories are combined into a list of packages that the installer can come from (e.g. only get package X from Y repository or such) but that their default behavior is roughly equivalent to what pip and setuptools does. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On August 27, 2015 at 5:00:27 PM, M.-A. Lemburg (mal@egenix.com) wrote:
Uhm, no :-) This would be a more user friendly way of doing it:
Well, except we’d have to throw away all of the work we’ve done for discovering things up until that point so we’d need to essentially restart the entire process anyway, I don’t think that there has ever been any effort to make setuptools or pip re-rentrant in that regards. The mechanism in the older PEP 470 also supported more than one index to support people who wanted to host binary builds of their project in cases where the compatibility information in Wheel aren’t enough to adequately differentiate when wheels are compatible or not. If we’re going to support a discoverable index like that then we should support the other major reason people might not host on PyPI too, but that means we can’t automagically add things to the list of repositories because we don’t know which list of repositories is accurate. The other problem is that end users should really be adding the configuration to their requirements.txt or other files in the situations they are using those so that they work in situations where they don’t have an interactive console to prompt them (for example, deploying to Heroku). If we’re automagically adding it to the list on a prompt then we make it less obvious they need to do anything else and we just push the pain off until they attempt to deploy their project. Finally, additional repositories are similar to additional CAs in the browser ecosystem, you want to carefully consider which ones you trust because they get the ability to essentially execute whatever arbitrary code they want on your system. There *should* be some level of deliberation and thought behind a user adding a new repository. Allowing a new one with a simple prompt is as dangerous as a browser running into a HTTPS certificate it doesn’t trust and going “Well I see you don’t trust this CA, do you want to add it to your list and reload?”, a UI that most (or all) browsers are moving away from and hiding as much as possible.
All Linux distros I know and use have repositories distributed all over the planet, and many also provide official and less official ones, for the users to choose from, so there is more than enough evidence that a federated system for software distribution works better than a centralized one.
I wonder why we can't we agree on this ?
Sure, and people are more than welcome to not host on PyPI and all of the tools support a federated system. However those tools also don’t have any sort of meta links between repositories that will automatically suggest that you add additional repositories to your list. What you have configured is what you have configured. The latest update to PEP 470 represents moving to the exact same system that all the Linux distros you know and use have.
I'm happy to help write a PEP for the discovery feature and I'd also love to help with the implementation. My problem is that no one is paying me to work on this and so my time investment into this has to stay way behind of what I'd like to invest.
Sure, I mean I don’t expect other people to have near the amount of time I do, since my entire job is working on Python packaging. Part of why I’m bringing this up now instead of closer to when Warehouse is ready to launch is to give plenty of time for discussion, implementation, and migration.
No matter how much you try to get people to host everything on pypi.python.org, there are always going to be some which don't want to do this and rather stick with their own PyPI index server for whatever reason.
I don’t care if people host things off of PyPI, I just don’t think we need or should complicate the API to try and provide a seamless experience for people hosting things outside of PyPI. If you’re going off on your own you should expect there to be some level of “not on by default”-ness. Honestly, If someone wanted to set up an additional repositor(y|ies) I wouldn’t even be personally opposed to adding it to the default list of repositories in pip assuming some basic guidelines/rules were followed. I don’t speak for all of the other maintainers so they might be opposed to it but I’d think something like: * Is being operated by a known and trusted entity (e.g. Joe Nobody doesn’t get to do this). * Agrees to consider PyPI the central authority for who owns a particular name (e.g. just because you host a repository doesn’t mean you get to make Django releases). * Some plan for how they plan to operate it in regards to how they’ll keep the uptime high.
I'd just remove the whole section. Splitting the user base into US and non-US users, even if just to explain that you cannot cover all non-US views or requirements is not something we should put into an official Python document.
Okay.
Since PyPI is legally run by the PSF, the PSF board will have to approve the new terms.
Having you on board for the WG, would certainly be very useful, since there may well be technical details that come into play.
Ok, sure sign me up. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 27 August 2015 at 02:24, Donald Stufft <donald@stufft.io> wrote:
I do need a BDFL Delegate for this PEP, Richard does not have the time to do it and the other logical candidate for a PyPI centric PEP is myself, but I don’t feel it’s appropriate to BDFL Delegate my own PEP.
I've agreed to be BDFL-Delegate for this PEP. Overall, I believe the PEP is looking good. I've reviewed the latest version of the PEP, and as much as I can of the discussion - but the history of the whole issue is pretty messy and I may well have missed something. If so, please speak up! I do have some specific points I'd like to see addressed by the PEP: 1. While the migration process talks about how we will assist existing users of the off-PyPI hosting option to migrate, the PEP doesn't include any provision for *new* projects who prefer not to host on PyPI. In the spirit of ensuring that off-PyPI hosting is seen as a fully supported option, I'd like to see the PEP include a statement that the process for setting up an external index will be added to the packaging user guide (that documentation is already being included in the emails being sent out, let's just also make sure it's published somewhere permanent as well). 2. The PEP says very little on how users should find out when an external index is required, and how to configure it. I'd suggest a couple of explicit points get included here - in "Multiple Repository/Index Support" a note should be added saying that projects that need an external index SHOULD document the index location prominently in their PyPI index page (i.e. near the top of their long_description), and installers SHOULD provide an example of configuring an external index in their documentation. 3. In the section "Allow easier discovery of externally hosted indexes", maybe replace the paragraph "This idea is rejected because it provides a similar painful end user experience where people will first attempt to install something, get an error, then have to re-run the installation with the correct options." with "This feature has been removed from the scope of the PEP because it proved too difficult to develop a solution that avoided UX issues similar to those that caused so many problems with the PEP 438 solution. If needed, a future PEP could revisit this idea." 4. The "Opposition" section still feels unsatisfying to me. It seems to me that the point of this section is to try to address specific points that came up in the discussion, so why not make that explicit and turn it into a "Frequently Asked Questions" section? Something along the lines of the following: * I can't host my project on PyPI because of <X>, what should I do? (Data sovereignty would be one question in this category). The answer would be to host externally and instruct users to add your index to their config. * But this provides a worse user experience for my users than the current situation - how do I explain that to my users? There are two aspects here. On the one hand, you have to explain to your users why you don't host on PyPI. That has to be a project-specific issue, and the PEP can't offer any help with that. On the other hand, the existing transparent use of external links has been removed for security, reliability and user friendliness reasons that have been covered elsewhere in the PEP. * Switching my current hosting to an index-style structure breaks my workflow/doesn't fit my hosting provider's rules/... I believe the answer here was to host an index on pythonhosted.org pointing to the existing files. But it's a fair question, and one the PEP should probably cover in a bit more detail. * Why don't you provide <X>? Generally, the answer here is that the PEP authors don't have sufficient experience with the subject matter behind X. This PEP is intended to be a straightforward, easily understood baseline, similar to existing models such as Linux distribution repositories. Additional PEPs covering extra functionality to address further specialised requirements are welcomed, but would require someone with a good understanding of the underlying issue to develop. If anyone has any further points to raise, now is the time to do so! I don't see any need for another extended debate, hopefully most of the issues have already been discussed, and I'm going to assume unless told otherwise that people are happy they are covered properly in the PEP. In particular, if anyone wants to vote an explicit -1 on the current proposal, then please do so now. Paul
On August 29, 2015 at 2:46:40 PM, Paul Moore (p.f.moore@gmail.com) wrote:
I do have some specific points I'd like to see addressed by the PEP:
Ok, I’ve gone ahead and addressed (I think) everything you’ve pointed out, you can see the diff at https://hg.python.org/peps/rev/8ddbde2dfd45 or see the entire PEP at https://www.python.org/dev/peps/pep-0470/ once it updates. If there are any additional changes that need to be made, let me know! ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
participants (13)
-
Alex Clark
-
Antoine Pitrou
-
Ben Finney
-
Donald Stufft
-
M.-A. Lemburg
-
Nathaniel Smith
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Robert Collins
-
Tres Seaver
-
Wes Turner
-
Wichert Akkerman