<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 10.04.2019 7:30, Karthikeyan wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAAjsFLR9OjFtutWsCQW5oOF-EUBfJHbm-M2ar-geFE_no_Lyeg@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div dir="auto">Thanks Gregory. I think it's a good tradeoff to

          ensure this validation only for URLs of http scheme. 

          <div dir="auto"><br>

          </div>

          <div dir="auto">I also agree handling newline is little

            problematic over the years and the discussion over the level

            at which validation should occur also prolongs some of the

            patches. <a href="https://bugs.python.org/issue35906"

              target="_blank" moz-do-not-send="true">https://bugs.python.org/issue35906</a>

            is another similar case where splitlines is used but it's

            better to raise an error and the proposed fix could be used

            there too. Victor seemed to wrote a similar PR like linked

            one for other urllib functions only to fix similar attack in

            ftplib to reject newlines that was eventually fixed only in

            ftplib</div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">* <a

              href="https://bugs.python.org/issue30713" target="_blank"

              moz-do-not-send="true">https://bugs.python.org/issue30713</a> </div>

          <div dir="auto">* <a href="https://bugs.python.org/issue29606"

              target="_blank" moz-do-not-send="true">https://bugs.python.org/issue29606</a></div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">Search also brings multiple issues with one

            duplicate over another that makes these attacks scattered

            over the tracker and some edge case missing. Slightly off

            topic, the last time I reported a cookie related issue where

            the policy can be overriden by third party library I was

            asked to fix it in stdlib itself since adding fixes to

            libraries causes maintenance burden to downstream libraries

            to keep up upstream. With urllib being a heavily used module

            across ecosystem it's good to have a fix landing in stdlib

            that secures downstream libraries encouraging users to

            upgrade Python too.<br>

            <br>

            <div class="gmail_quote" dir="auto">

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"> </blockquote>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <p>Validation should occur whenever user data crosses a trust

      boundary -- i.e. when the library starts to assume that an

      extracted chunk now contains something valid.</p>

    <p><a class="moz-txt-link-freetext"

        href="https://tools.ietf.org/html/rfc3986">https://tools.ietf.org/html/rfc3986</a>

      defines valid syntax (incl. valid characters) for every part of a

      URL -- _of any scheme_ (FYI, \r\n are invalid everywhere and the

      test code for     `data:' that Karthikeyan referred to is raw data

      to compare to rather than a part of a URL). It also obsoletes all

      the RFCs that the current code is written against.</p>

    <p>AFAICS, urllib.split*

      fns (obsoleted as public in 3.8) are used by both urllib and

      urllib2 to parse URLs. They can be made to each validate the chunk

      that they split off. urlparse can validate the entire URL

      altogether.<br>

    </p>

    <p>Also, all modules ought to use the same code (urlparse looks like

      the best candidate) to parse URLs -- this will minimize the attack

      surface.<br>

    </p>

    <p>I think I can look into this later this week.</p>

    <p>Fixing this is going to break code that relies on the current

      code accepting invalid URLs. But the docs have never said that

      e.g. in urlopen, anything apart from a (valid) URL is accepted (in

      particular, this implies that the user is responsible for escaping

      stuff properly before passing it). So I would say that we are

      within our right here and whoever is relying on those quirks is

      and has always been on unsupported territory. <br>

      Determining which of those quirks are exploitable and which are

      not to fix just the former is an incomparably larger, more

      error-prone and avoidable work. If anything, the history of the

      issue referenced to by previous posters clearly shows that this is

      too much to ask from the Python team.<br>

    </p>

    <p>I also see other undocumented behavior like accepting

      '<URL:<url>>' (also obsoleted as public in 3.8) which

      I would like to but it's of no harm.<br>

      <br>

    </p>

    <p>-- </p>

    <pre class="moz-signature" cols="140">Regards,

Ivan</pre>

  </body>

</html>