[Python-Dev] PEP 3333: wsgi_string() function

Nick Coghlan ncoghlan at gmail.com
Mon Jan 10 18:55:09 CET 2011

On Tue, Jan 11, 2011 at 3:24 AM, Ian Bicking <ianb at colorstudy.com> wrote:
> The kind of object PJE was referring to is more like Ruby's strings, which
> do not embed the encoding inside the bytes themselves but have the encoding
> as a kind of annotation on the bytes, and do lazy transcoding when combining
> strings of different encodings.  The goal with respect to WSGI is that you
> could annotate bytes with an encoding but also change or fix that encoding
> if other out-of-band information implied that you got the encoding wrong
> (e.g., some data is submitted with the encoding of the page the browser was
> on, and so nothing inside the request itself will indicate the encoding of
> the data).  Latin1 is kind of the poor man's version of this -- it's a good
> guess at an encoding, that at worst requires transcoding that can be done in
> a predictable way.  (Personally I think Latin1 gets us 99% of the way there,
> and so bytes-of-a-known-encoding are not really that important to the WSGI
> case.)

Having done the upgrade to urllib to support direct manipulation of
byte sequences, I don't think such a type would help as much people
hoped anyway. Converting to Unicode, manipulating as text and
converting back really *is* the right way to do text manipulation
(however, providing bytes-in-bytes-out APIs that do the conversions
for you can also be quite convenient).


Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

More information about the Python-Dev mailing list