On Fri, Sep 17, 2010 at 9:43 AM, And Clover <span dir="ltr"><<a href="mailto:and-py@doxdesk.com">and-py@doxdesk.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">On 09/17/2010 02:03 PM, Armin Ronacher wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
In case we change the spec as Ian mentioned above, I am all for<br>
a "wsgi.guessed_encoding" = True flag or something like that.<br>
</blockquote>
<br></div>
Yes, I'd like to see that. I believe going with *only* a raw-or-reconstructed path_info, rather than having both path_info and PATH_INFO, is probably best, for the middleware-dupication reasons PJE mentioned.<br>
<br>
A more in-depth possibility might be:<br>
<br>
wsgi.path_accuracy =<br>
<br>
0: script_name/path_info have been crudely reconstructed from<br>
SCRIPT_NAME/PATH_INFO from an unknown source. Beware!<br>
If there is to be backwards compatibility with WSGI1, this<br>
would be seen as the 'default value' given a missing path_accuracy.<br>
<br>
1: script_name/path_info have been reconstructed, but it is known<br>
that path_info is accurate, other than %2F and non-ASCII issues.<br>
That is, it's known that the path doesn't come from IIS's broken<br>
PATH_INFO, or the IIS error has been detected and compensated for.<br>
<br>
2: script_name/path_info have been reconstructed using known-good<br>
encodings for the env. The only way in which they may differ from<br>
the original request path is that a slash might originally have<br>
been a %2F. (This is good enough for the vast majority of<br>
applications.)<br>
<br>
3: script_name/path_info come directly from the request path<br>
without any intervening mangling.</blockquote><div><br>path_accuracy is certainly a better name than encoding; nothing here actually relates to encoding (except insofar as attempts to encode or reencode values corrupts the path). Personally I wouldn't want to split it up this much, I'd rather a simple flag to indicate something was guessed, vs. an accurate request. The only real value I see in it is to help people debug problems. Maybe. I'm not sure it's that realistic to imagine this will be noticed by people deploying software and encountering problems. A helpful application could use it to warn the deployer of potential problems.<br>
<br>It seems that it would be possible to create a WSGI application and client library that together can detect and help resolve these issues. E.g., the application always returns the values of script_name, path_info, and query_string, and the client fires off a bunch of different requests to see how it gets interpreted. It could suggest corrections until everything passes.<br>
<br>I would really like to see concerns over bad gateways not be used to keep valuable information out of the spec. We want people to use well-configured gateways that accurately represent requests. There are limits, e.g., in environments where information is lost. The only really problematic example is losing the distinction between %2f and /, and I think it's reasonable to suggest that applications should avoid making that distinction in the path if they want to be easily deployed in different environments.<br clear="all">
<br></div></div><br>-- <br>Ian Bicking | <a href="http://blog.ianbicking.org">http://blog.ianbicking.org</a><br>