On 11.01.2014 14:54, Georg Brandl wrote:
Am 11.01.2014 14:49, schrieb Georg Brandl:
Am 11.01.2014 10:44, schrieb Stephen Hansen:
I mean, its not like the "bytes" type lacks knowledge of the subset of bytes that happen to be 7-bit ascii-compatible and can't perform text-ish operations on them--
Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 18 2013, 21:18:40) [MSC v.1600 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
b"stephen hansen".title() b'Stephen Hansen'
How is this not a practical recognition that yes, while bytes are byte streams and not text, a huge subset of bytes are text-y, and as long as we maintain the barrier between higher characters and implicit conversion therein, we're fine?
I don't see the difference here. There is a very real, practical need to interpolate bytes. This very real, practical need includes the very real recognition that converting 12345 to b'12345' is not something weird, unusual, and subject to the thorny issues of Encodings. It is not violating the doctrine of separation of powers between Text and Bytes.
This. Exactly. Thanks for putting it so nicely, Stephen.
To elaborate: if the bytes type didn't have all this ASCII-aware functionality already, I think we would have (and be using) a dedicated "asciistr" type right now. But it has the functionality, and it's way too late to remove it.
I think we need to step back a little from the purist view of things and give more emphasis on the "practicality beats purity" Zen. I complete agree with Stephen, that bytes are in fact often an encoding of text. If that text is ASCII compatible, I don't see any reason why we should not continue to expose the C lib standard string APIs available for text manipulations on bytes. We don't have to be pedantic about the bytes/text separation. It doesn't help in real life. If you give programmers the choice they will - most of the time - do the right thing. If you don't give them the tools, they'll work around the missing features in a gazillion different ways of which many will probably miss a few edge cases. bytes already have most of the 8-bit string methods from Python 2, so it doesn't hurt adding some more of the missing features from Python 2 on top to make life easier for people dealing with multiple/unknown encoding data. BTW: I don't know why so many people keep asking for use cases. Isn't it obvious that text data without known (but ASCII compatible) encoding or multiple different encodings in a single data chunk is part of life ? Most HTTP packets fall into this category, many email messages as well. And let's not forget that we don't live in a perfect world. Broken encodings are everywhere around you - just have a look at your spam folder for a decent chunk of example data :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 11 2014)
Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/