tl;dr: At the end I'm volunteering to look at real code that is having porting problems. On Sat, 11 Jan 2014 17:33:17 +0100, "M.-A. Lemburg" <mal@egenix.com> wrote:
asciistr is interesting in that it coerces to bytes instead of to Unicode (as is the case in Python 2).
At the moment it doesn't cover the more common case bytes + str, just str + bytes, but let's assume it would, then you'd write
... headers += asciistr('Length: %i bytes\n' % 123) headers += b'\n\n' body = b'...' socket.send(headers + body) ...
With PEP 460, you could write the above as:
... headers += b'Length: %i bytes\n' % 123 headers += b'\n\n' body = b'...' socket.send(headers + body) ...
IMO, that's more readable.
Both variants essentially do the same thing: they implicitly coerce ASCII text strings to bytes, so conceptually, there's little difference.
And if we are explicit: headers = u'Length: %i bytes\n' % 123 headers += u'\n\n' body = b'...' socket.send(headers.encode('ascii') + body) (I included the 'u' prefix only because we are talking about shared-codebase python2/python3 code.) That looks pretty readable to me, and it is explicit about what parts are text and what parts are binary. But of course we'd never do exactly that in any but the simplest of protocols and scripts. Instead we'd write a library that had one or more object that modeled our wire/file protocol. The text parts the API would accept input as text strings. The binary parts it would accept input as bytes. Then, when reading or writing the data stream, we perform the appropriate conversions on the appropriate parts. Our library does a more complex analog of 'socket.send(headers.encode('ascii') + body)', one that understands the various parts and glues them together, encoding the text parts to the appropriate encoding (often-but-not-always ascii) as it does so. And yes, I have written code that does this in Python3. What I haven't done is written that code to run in both Python3 and Python2. I *think* the only missing thing I would need to back-port it is the surrogateescape error handler, but I haven't tried it. And I could probably conditionalize the code to use latin1 on python2 instead and get away with it. And please note that email is probably the messiest of messy binary wire protocols. Not only do you have bytes and text mixed in the same data stream, with internal markers (in the text parts) that specify how to interpret the binary, including what encodings each part of that binary data is in for cases where that matters, you *also* have to deal with the possibility of there being *invalid* binary data mixed in with the ostensibly text parts, that you nevertheless are expected to both preserve and parse around. When I started adding back binary support to the email package, I was really annoyed by the lack of certain string features in the bytes type. But in the end, it turned out to be really simple to instead think of the text-with-invalid-bytes parts as *text*-with-invalid-bytes (surrogateescaped bytes). Now, if I was designing from the ground up I'd store the stuff that was really binary as bytes in the model object instead of storing it as surrogateescaed text, but that problem is a consequence of how we got from there to here (python2-email to python3-email-that-didn't-handle-8bit-data to python3-email-that-works) rather than a problem with the python3 core data model. So it seems like I'm with Nick and Antoine and company here. The byte-interpolation proposed by Antoine seems reasonable, but I don't see the *need* for the other stuff. I think that programs will be cleaner if the text parts of the protocol are handled *as text*. On the other hand, Ethan's point that bytes *does* have text methods is true. However, other than the perfectly-sensible-for-bytes split, strip, and ends/startswith, I don't think I actually use any of them. But! Our goal should be to help people convert to Python3. So how can we find out what the specific problems are that real-world programs are facing, look at the *actual code*, and help that project figure out the best way to make that code work in both python2 and python3? That seems like the best way to find out what needs to be added to python3 or pypi: help port the actual code of the developers who are running into problems. Yes, I'm volunteering to help with this, though of course I can't promise exactly how much time I'll have available. --David