[issue43882] [security] urllib.parse should sanitize urls containing ASCII newline and tabs.

Tue May 4 22:19:35 EDT 2021

Gregory P. Smith <greg at krypto.org> added the comment:

Both Django and Botocore issues appear to be in the category of: "depending on invalid data being passed through our urlsplit API so that they could look for it later"  Not much sympathy.  We never guaranteed we'd pass invalid data through.  They're depending on an implementation detail (Hyrum's law).  Invalid data causes other people who don't check for it problems.  There is no valid solution on our end within the stdlib that won't frustrate somebody.

We chose to move towards safer (undoubtedly not perfect) by default.

Instead of the patches as you see them, we could've raised an exception.  I'm sure that would also also have tripped up existing code depending on the undesirable behavior.

If one wants to reject invalid data as an application/library/framework, they need a validator.  The Python stdlib does not provide a URL validation API.  I'm not convinced we would even want to (though that could be something issue43883 winds up providing) given how perilous that is to get right: Who's version of right? which set of standards? when and why? Conclusion: The web... such a mess.

----------
versions: +Python 3.11

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue43882>
_______________________________________