For not caring much, your own stubbornness is quite notable throughout this discussion. Stones and glass houses. :)

That said:

Twisted and Mercurial aren't the only ones who are hurt by this, at all. I'm aware of at least two other projects who are actively hindered in their support or migration to Python 3 by the bytes type not having some basic functionality that "strings" had in 2.0.

The purity crowd in here has brought up that it was an important and serious decision to split Text from Bytes in Py3, and I actually agree with that. However, it is missing some very real and very concrete use-cases -- there are multiple situations where there are byte streams which have a known text-subset which they really, really do need to operate on.

There's been a number of examples given: PDF, HTTP, network streams that switch inline from text-ish to binary and back-again.. But, we can focus that down to a very narrow and not at all uncommon situation in the latter.

Look at the HTTP Content-Length header. HTTP headers are fuzzy. My understanding is, per the RFCs, their body can be arbitrary octets to the exclusion of line feeds and DELs-- my understanding may be a bit off here, and please feel free to correct me -- but the relevant specifications are a bit fuzzy to begin with.

To my understanding of the spec, the header field name is essentially an ASCII text field (sans separator), and the body is... anything, or nearly anything. This is HTTP, which is surely one of the most used protocols in the world.

The need to be able to assemble and disassemble such streams of that is a real, valid use-case.

But looking at it, now look to the Content-Length header I mentioned. It seems those who are declaring a purity priority in bytes/string separation think it reasonable to do things like:

headers.append((b"Content-Length": ("%d" % (len(content))).encode("ascii")))

Or something. In the middle of processing a stream, you need to convert this number into a string then encode it into bytes to just represent the number as the extremely common, widely-accessible 7-bit ascii subset of its numerical value. This isn't some rare, grandiose or fiendish undertaking, or trying to merge Strings and Bytes back together: this is the simple practical recognition that representing a number as its ascii-numerical value is actually not at all uncommon.

This position seems utterly astonishing in its ridiculousness to me. The recognition that the number "123" may be represented as b"123" surprises me as a controversial thing, considering how often I see it in real life.

There is a LOT of code out there which needs a little bit of a middle ground between bytes and strings; it doesn't mean you are giving way and allowing strings and bytes to merge and giving up on the Edict of Separation. But there are real world use-cases where you simply need to be able to do many basic "String" like operations on byte-streams.

The removal of the ability to use interpolation to construct such byte strings was a major regression in python 3 and is a big hurdle for more then a few projects to upgrade.

I mean, its not like the "bytes" type lacks knowledge of the subset of bytes that happen to be 7-bit ascii-compatible and can't perform text-ish operations on them--

Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 18 2013, 21:18:40) [MSC v.1600 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> b"stephen hansen".title()
b'Stephen Hansen'

How is this not a practical recognition that yes, while bytes are byte streams and not text, a huge subset of bytes are text-y, and as long as we maintain the barrier between higher characters and implicit conversion therein, we're fine?

I don't see the difference here. There is a very real, practical need to interpolate bytes. This very real, practical need includes the very real recognition that converting 12345 to b'12345' is not something weird, unusual, and subject to the thorny issues of Encodings. It is not violating the doctrine of separation of powers between Text and Bytes.

Personally, I won't be converting my day job's codebase to Python 3 anytime soon (where 'soon' is defined as 'within five years, assuming a best-case scenario that a number of third-party issues are resolved. But! I'm aware and involved with other projects, and this has bit two of them specifically. I'm sure there are others who are not aware of this list or don't feel comfortable talking on it (as it is, I encouraged one of the project's coder to speak up, but they thought the question was a lost one due to previous responses on the original issue ticket and gave up.).

On Fri, Jan 10, 2014 at 6:04 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

On Fri, 10 Jan 2014 20:53:09 -0500
"Eric V. Smith" <eric@trueblade.com> wrote:
>
> So, I'm -1 on the PEP. It doesn't address the cases laid out in issue
> 3892. See for example http://bugs.python.org/issue3982#msg180432 .

Then we might as well not do anything, since any attempt to advance
things is met by stubborn opposition in the name of "not far enough".

(I don't care much personally, I think the issue is quite overblown
anyway)

Regards

Antoine.

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/me%2Bpython%40ixokai.io