
On Jul 21, 2019, at 14:13, Barry <barry@barrys-emacs.org> wrote:
On 21 Jul 2019, at 19:03, Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, Jul 21, 2019 at 08:48:49AM +0100, Barry Scott wrote:
I took at very quick look at bpo30500 and was struck by the comment that the code was working on a URL that had not been validated.
Validation of the URL would reject the URL before the parsing happens in this case. Was that the case?
Sorry, can you elaborate on that? How do you validate a URL without attempting to parse it? You're surely not talking about looking it up in a whitelist are you?
I was thinking about ensuring the the characters in the url are from the subset that is allowed. \n is not allowed for example. Yes agree you have a try to parse it.
For a spec that has different sets of restricted characters for different parts, that kind of prevalidation doesn’t seem like it would get you very far. At least a priori, if there are attacks that involve using illegal characters in the netloc or the path or the scheme or whatever, they could just as easily be characters that are legal elsewhere in the URL as characters that happen to not be legal anywhere. (If you’re just talking about mitigating one particular attack after it’s been discovered, that’s a different story. If checking for \n patches things without waiting for the library fix, obviously it’s worth doing.)