[Python-Dev] Python 3.x and bytes
Guido van Rossum
guido at python.org
Thu May 19 19:43:02 CEST 2011
On Thu, May 19, 2011 at 1:43 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> OK, summarising the thread so far from my point of view.
> 1. There are some aspects of the behavior of bytes() objects that
> tempt people to think of them as string-like objects (primarily the
> b'' literals and their use in repr(), along with the fact that they
> fill roles that were filled by str in it's "arbitrary binary data"
> incarnation in Python 2.x). The mental model this creates in the
> reader is incorrect, as bytes() are far closer to array.array('c') in
> their underlying behaviour (and deliberately so - cf. PEP 358, 3112,
I think most of this "wrong mental model" is actually due to people
not having completely internalized the Python 3 way.
> One proposal for addressing this is to add a x'deadbeef' literal and
> using that in repr() rather than the bytestring. Another would be to
> escape all characters, even printable ASCII, in the bytes()
> representation. Both of these are undesirable, as they miss the
> original purpose of this behaviour: making it easier to work with the
> many ASCII based wire protocols that are in widespread use.
Indeed, -1 on both.
> To be honest, I don't think there is a lot we can do here except to
> further emphasise in the documentation and elsewhere that *bytes is
> not a string type* (regardless of any API similarities retained to
> ease transition from the 2.x series). For example, if we have any
> lingering references to "byte strings" they should be replaced with
> "byte sequences" or "bytes objects" (depending on context, as the
> former phrasing also encompasses bytearray objects).
> 2. As a concrete usability issue, it is awkward to programmatically
> check the value of a specific byte when working with an ASCII based
> data[i] == b'a' # Intuitive, but always False due to type mismatch
> data[i:i+1] == b'a' # Works, but clumsy
> data[i] == b'a' # Ditto (but at least susceptible to compiler
> const-expression optimisation)
> data[i] == ord('a') # Clumsy and slow
> data[i] == 97 # Hard to read
> Proposals to address this include:
> - introduce a "character" literal to allow c'a' as an alternative to ord('a')
-1; the result is not a *character* but an integer. I'm personally
favoring using b'a' and possibly hiding this in a constant
> Potentially workable, but leaves the intuitive answer above
> silently producing an unexpected answer
I'm not convinced that that problem is any worse than other
comparison-related problems. E.g. b'a' == 'a' also always returns
False (most likely it'll be disguised by at least one operand being a
variable of course.)
> - allow 1-element byte sequences to compare equal to the corresponding
> integer values.
> - would require reworking of bytes.__hash__ to use the hash of the
> contained element when the data length is exactly 1
> - transitivity of equality would recommend also supporting
> equivalences such as b'a' == 97.0
> - backwards compatibility concerns arise due to introduction of
> new key collisions in dictionaries and sets and other value based
> - yet more string-like behaviour in a type that is *not* a string
> (further reinforcing the mistaken impression from point 1)
> - One thing that *isn't* a concern from my point of view is the
> fact that we have ample precedent in decimal.Decimal for supporting
> implicit coercion in comparison operations while disallowing them in
> arithmetic operations (Decimal("1") == 1.0 is allowed, but
> Decimal("1") + 1.0 will raise TypeError).
> For point 2, I'm personally +0 on the idea of having 1-element bytes
> and bytearray objects delegate hashing and comparison operations to
> the corresponding integer object. We have the power to make the
> obvious code correct code, so let's do that. However, the implications
> of the additional key collisions in value based containers may need to
> be explored further.
My gut feeling about this is that this will probably introduce some
confusing or unintended side effect elsewhere, and I am -1 on this
--Guido van Rossum (python.org/~guido)
More information about the Python-Dev