On 18 Jul 2019, at 05:23, Nam Nguyen <bitsink@gmail.com> wrote:

On Wed, Jul 17, 2019 at 12:38 AM Barry Scott <barry@barrys-emacs.org> wrote:
But if your use cases call for performance, it is perfectly fine to understand the tradeoffs, and opt in to the more appropriate solutions. And, of course, maybe there is a solution that could satisfy *both*.

Generally speaking, though, do you see 1 millisecond spent on parsing a URL deal breaker? I sense that some web frameworks might not like that very much, but I don't have any concrete use case to quote.

Yes 1ms would be a serious issue.

I guess what I'm concerned about is the use of a universal parser for a benefit I'm not clear exists having a terrible affect of the speed of code that is secure and correct.

I hope you are doubting that *my* library has not proven itself yet, rather than the benefit if a proper parsing package. If it was about the second, perhaps these links could convince you that URL parsing was simply not "secure and correct":

https://www.cvedetails.com/cve/CVE-2019-10160/ urlsplit and urlparse regress from the fix below.
https://www.cvedetails.com/cve/CVE-2019-9636/ urlsplit and urlparse vs unicode normalization.
https://bugs.python.org/issue30500 urlparse's handling of # in password

That is a correlation, as Chris said, show causation.

I'm working on the impact of CVE-2019-9636 as part of my day-job.

I'd be interesting in your analysis of how you parsing proposal would have avoided this problem before it was described.
I add the "before it was described" because I think you are claiming that the universal parser will prevent this class of
security issue by the nature of its design.


And many more related to HTTP header, cookie, email... These things are tricky.

Parsing of HTTP headers is not that hard. Have a look at the code in twisted library that handles all this pretty well.
Where HTTP gets complex is the semantic meaning of the headers. Where performance matters is when handling
very large HTTP messages. We see 1GiB POST's to backup servers that contain 1 million small files mime
encoded in them, and python can handle that well enough. But not with a x70 slow down.


It is fair to use the numbers I currently have (~900 usec vs 13 usec per parse) to ballpark the impact but I'm pretty confident that performance will sort itself out acceptably, eventually (and even with an entirely different library).

If 1ms is a deal breaker, what is a more palatable latency?

You need to be very close to the 13us mark. For correctness and security I will take the hit of the patch to CVE-2019-9636, which will be a few microseconds at most.

How much latency would you trade for security?

Please show your approach can deliver that extra security.

Barry



Thanks.
Nam