On Feb 15, 2020, at 09:00, Senthil Kumaran <senthil@uthcode.com> wrote:
As we have to a decision here, my vote is to revert the patch in 3.8.2 and 3.7.7 I have gone back-and-forth with this thinking, and it seems revert might address some definite complaints we have got. The problem is contained to single version, and users can upgrade to the next one.
On Fri, Feb 14, 2020 at 8:14 AM Ćukasz Langa <lukasz@langa.pl> wrote:
Ned, what are you doing with this for 3.7.7? Reverting?
Ugh!
As others have noted, urlparse is a big can of worms. I am certainly not a subject expert but, from some investigation and thinking about it, it seems to me that we kinda brought this on ourselves by allowing the scheme part (e.g. "https:" or "ftp:" or "any-old-scheme:" etc) of the urlstring parameter to be optional:
urllib.parse.urlparse(urlstring, scheme='', allow_fragments=True)
therby introducing the ambiguity of whether a string like "localhost:80" denotes a relative url with a user-defined scheme of "localhost" and a path of "80" (as it now does with the changes for bpo-27657 introduced in 3.8.1 and 3.7.6):
urlparse("localhost:80") ParseResult(scheme='localhost', netloc='', path='80', params='', query='', fragment='')
or denotes a relative url with no scheme and a path of "localhost:80" (as happened in previous releases):
urlparse("localhost:80") ParseResult(scheme='', netloc='', path='localhost:80', params='', query='', fragment='')
With an explicit scheme, in either case you get what you would expect - an absolute url:
urlparse("http://localhost:80") ParseResult(scheme='http', netloc='localhost:80', path='', params='', query='', fragment='')
AFAICT the intent of the original RFCs was to require an explicit scheme in a urlstring, thus avoiding any ambiguity. But the now universal practice of web browsers supplying a default http: or https: scheme for (partial) urls typed into a location bar has understandably changed user expectations to often be that schemes are optional when the scheme is clear in context.
So it seems to me that there is no one obviously correct behavior here. Judging from the comments and the reports of broken packages, many users are clearly used to using urlparse with schemeless urlstrings even if they aren't truly conformant URLs and even with the at first glance unintuitive way they were parsed by urlparse; for example, there is this snippet in the third-party requests package:
# urlparse is a finicky beast, and sometimes decides that there isn't a
# netloc present. Assume that it's being over-cautious, and switch netloc
# and path if urlparse decided there was no netloc.
if not netloc:
netloc, path = path, netloc
OTOH, there are also undoubtedly users who want a urlparser that more strictly parses schemeless URLs, which is now the behavior as of 3.8.1 and 3.7.6, again, even if the new behavior is also unintuitive.
I don't see how we can satisfy both use cases without changing the API somehow. And there may be other use cases.
The good news is that, AFAICT from a quick survey, the change didn't affect urllib.urlopen or thrid-party urllib3 or requests. But from the "me-toos" on the bpo issue and the PR, it's clear that we broke stuff downstream and it seems that most of those users are waiting for a resolution from us and likely would prefer to stick to the previous behavior.
So my take is that we should revert the 3.7 changes (bpo-27657 / PR 16837 / 82b5f6b16e051f8a2ac6e87ba86b082fa1c4a77f ). Senthil, please go ahead and do so for the 3.7 branch. Thanks!
While it's not my call, I think we should also revert for 3.8.2.
For 3.9.0, I recommend we reconsider this change (temporarily reverting it) and consider whether an API change to accommodate the various use cases would be better; perhaps something like adding a new parameter to urlparse to indicate whether urlstrings should be parsed like webbrowser "urls" (and defining exactly what that means) and also review the many remaining open urlparse bpo issues to look for commonalities. (Perhaps that could be a post-3.9 GsoC project?)
Thoughts?
In any case, ugh!
-- Ned Deily nad@python.org -- []