On Tue, May 5, 2009 at 10:14 PM, Graham Dumpleton <span dir="ltr"><<a href="mailto:graham.dumpleton@gmail.com">graham.dumpleton@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
2009/5/6 Ian Bicking <<a href="mailto:ianb@colorstudy.com">ianb@colorstudy.com</a>>:<br>
<div class="im">> Philip Jenvey brought this to my attention:<br>
><br>
> <a href="http://www.python.org/dev/peps/pep-0383/" target="_blank">http://www.python.org/dev/peps/pep-0383/</a><br>
><br>
> It's a UTF8 encoding and decoding scheme that encodes illegal bytes in such<br>
> a way that you can decode to get the original bytes object, and thus<br>
> transcode to another encoding. It's intended for cases exactly like WSGI.<br>
<br>
</div>Care to explain then how that would in practice be used while I try<br>
and reread it a few times to try and understand it myself? :-)<br>
<font color="#888888"></font></blockquote><div><br>I don't particularly know, except I think you'd do things like:<br><br>environ['PATH_INFO'] = urllib.unquote(http_byte_path).decode('utf8', 'python-escape')<br>
<br>Then if the encoding was wrong, you could transcode like:<br><br>environ['PATH_INFO'] = environ['PATH_INFO'].encode('utf8', 'python-escape').decode('latin1', 'python-escape')<br>
<br>Note that you need to know the encoding that was used (utf8 in this case) and that python-escape was used. It has been suggested that the server should put the encoding it used into the environment. When transcoding this should also be updated.<br>
<br></div></div>It's not clear what python-escape is going to do, I don't think that's been determined. Probably it'll put \x00 or something in the unicode string to mark raw bytes.<br><br>-- <br>Ian Bicking | <a href="http://blog.ianbicking.org">http://blog.ianbicking.org</a><br>