[Python-ideas] Re: Universal parsing library in the stdlib to alleviate security issues

22 Jul 2019

      On Jul 21, 2019, at 14:13, Barry  wrote:
...
...
...
On 21 Jul 2019, at 19:03, Steven D'Aprano  wrote:
On Sun, Jul 21, 2019 at 08:48:49AM +0100, Barry Scott wrote:
I took at very quick look at bpo30500 and was struck by the comment 
that the code was working on a URL that had not been validated.
Validation of the URL would reject the URL before the parsing happens 
in this case. Was that the case?
Sorry, can you elaborate on that? How do you validate a URL without 
attempting to parse it? You're surely not talking about looking it up in 
a whitelist are you?
I was thinking about ensuring the the characters in the url are from the subset that is allowed. \n is not allowed for example. Yes agree you have a try to parse it.
For a spec that has different sets of restricted characters for different parts, that kind of prevalidation doesn’t seem like it would get you very far. At least a priori, if there are attacks that involve using illegal characters in the netloc or the path or the scheme or whatever, they could just as easily be characters that are legal elsewhere in the URL as characters that happen to not be  legal anywhere. 

(If you’re just talking about mitigating one particular attack after it’s been discovered, that’s a different story. If checking for \n patches things without waiting for the library fix, obviously it’s worth doing.)

[Python-ideas] Re: Universal parsing library in the stdlib to alleviate security issues

Andrew Barnert