[Python-Dev] email package status in 3.X

Sun Jun 20 22:10:00 CEST 2010

On Sun, Jun 20, 2010 at 2:40 PM, P.J. Eby <pje at telecommunity.com> wrote:
> At 10:57 AM 6/20/2010 -0700, Guido van Rossum wrote:
>>
>> The problem comes exactly where you find it: when *porting* existing
>> code that uses aforementioned ways to alleviate the pain, you find
>> that the hacks no longer work and a properly layered design is needed
>> that clearly distinguishes between which variables contain bytes and
>> which text.
>
> Actually, I would say that it's more that (in the network protocol case) we
> *have* bytes, some of which we would like to *treat* as text, yet do not
> wish to constantly convert back and forth to full-blown unicode --
> especially since the protocols themselves designate ASCII or latin-1 at the
> transport layer (sometimes with odder encodings above, but these already
> have to be explicitly dealt with by existing code).
>
> While reading over this thread, I'm wondering whether at least my
> (WSGI-related) problems in this area would be solved by the availability of
> a type (say "bstr") that was simply a wrapper providing string-like behavior
> over an underlying bytes, byte array, or memoryview, that would produce
> objects of compatible type when combined with strings (by encoding them to
> match).
>
> Then, I could wrap bytes with it to pass them to string operations, and then
> feed them back into everything else.  The bstr type ideally would be
> directly compatible with bytes I/O, or at least have a .bytes attribute that
> would be.
>
> It seems like that would reduce WSGI porting issues quite a bit, since it
> would mostly consist of throwing extra bstr() calls in where things are
> breaking, and maybe grabbing the .bytes attribute for I/O.
>
> This approach would still be explicit as to what types you're working with,
> but would not require O(n) *conversions* at every interaction boundary.  It
> would be limited, of course, to single-byte encodings with all characters
> (0-255) valid.
>
> OTOH, maybe there should just be a bytestrings module with bytestrings.ascii
> and bytestrings.latin1, and between the two that should cover the network
> protocol needs quite well.
>
> Actually, if the Python 3 str() constructor could do O(1) conversion for the
> latin-1 case (i.e., just wrapped the underlying bytes), I would just put,
> "bstr = lambda x: str(x,'latin-1')" at the top of my programs and have
> roughly the same effect.
>
> This idea is still a bit half-baked, but a more baked version might be just
> the ticket for porting stuff that used str to work with bytes in 2.x, if
> only because writing, e.g.:
>
>     newurl = bstr(urljoin(bstr(base), 'subdir'))
>
> seems so much saner than writing *this* everywhere:
>
>     newurl = str(urljoin(str(base, 'latin-1'), 'subdir'), 'latin-1')
>
> It is perhaps a bit late to propose this idea, since ideally we would also
> want to use it in 2.x to aid porting.  But I'm curious if any other people
> here experiencing byte/unicode woes in relation to network protocols would
> find this a solution to their chief frustration.  (i.e., that the stdlib
> often insists now on strings, where effectively bytes were usable before,
> and thus one must do conversions both coming and going.)
>

I hate to reply with a simple +1 - but I've heard this pain and
proposal from a frightening number of people, something which allowed
you to use bytes with some of the sting methods would go a really long
way to solving a lot of peoples python 3 pain. I don't relish the idea
that once people start moving over, there might be a billion
implementations of "things like this".

jesse