On Fri, Sep 17, 2010 at 9:43 AM, And Clover <span dir="ltr">&lt;<a href="mailto:and-py@doxdesk.com">and-py@doxdesk.com</a>&gt;</span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<div class="im">On 09/17/2010 02:03 PM, Armin Ronacher wrote:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

In case we change the spec as Ian mentioned above, I am all for<br>

a &quot;wsgi.guessed_encoding&quot; = True flag or something like that.<br>

</blockquote>

<br></div>

Yes, I&#39;d like to see that. I believe going with *only* a raw-or-reconstructed path_info, rather than having both path_info and PATH_INFO, is probably best, for the middleware-dupication reasons PJE mentioned.<br>

<br>

A more in-depth possibility might be:<br>

<br>

wsgi.path_accuracy =<br>

<br>

    0: script_name/path_info have been crudely reconstructed from<br>

    SCRIPT_NAME/PATH_INFO from an unknown source. Beware!<br>

    If there is to be backwards compatibility with WSGI1, this<br>

    would be seen as the &#39;default value&#39; given a missing path_accuracy.<br>

<br>

    1: script_name/path_info have been reconstructed, but it is known<br>

    that path_info is accurate, other than %2F and non-ASCII issues.<br>

    That is, it&#39;s known that the path doesn&#39;t come from IIS&#39;s broken<br>

    PATH_INFO, or the IIS error has been detected and compensated for.<br>

<br>

    2: script_name/path_info have been reconstructed using known-good<br>

    encodings for the env. The only way in which they may differ from<br>

    the original request path is that a slash might originally have<br>

    been a %2F. (This is good enough for the vast majority of<br>

    applications.)<br>

<br>

    3: script_name/path_info come directly from the request path<br>

    without any intervening mangling.</blockquote><div><br>path_accuracy is certainly a better name than encoding; nothing here actually relates to encoding (except insofar as attempts to encode or reencode values corrupts the path).  Personally I wouldn&#39;t want to split it up this much, I&#39;d rather a simple flag to indicate something was guessed, vs. an accurate request.  The only real value I see in it is to help people debug problems.  Maybe.  I&#39;m not sure it&#39;s that realistic to imagine this will be noticed by people deploying software and encountering problems.  A helpful application could use it to warn the deployer of potential problems.<br>


 <br>It seems that it would be possible to create a WSGI application and client library that together can detect and help resolve these issues.  E.g., the application always returns the values of script_name, path_info, and query_string, and the client fires off a bunch of different requests to see how it gets interpreted.  It could suggest corrections until everything passes.<br>


<br>I would really like to see concerns over bad gateways not be used to keep valuable information out of the spec.  We want people to use well-configured gateways that accurately represent requests.  There are limits, e.g., in environments where information is lost.  The only really problematic example is losing the distinction between %2f and /, and I think it&#39;s reasonable to suggest that applications should avoid making that distinction in the path if they want to be easily deployed in different environments.<br clear="all">


<br></div></div><br>-- <br>Ian Bicking  |  <a href="http://blog.ianbicking.org">http://blog.ianbicking.org</a><br>