[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
Barry Warsaw
barry at python.org
Tue Feb 14 05:59:03 CET 2006
On Feb 13, 2006, at 7:29 PM, Guido van Rossum wrote:
> There's one property that bytes, str and unicode all share: type(x[0])
> == type(x), at least as long as len(x) >= 1. This is perhaps the
> ultimate test for string-ness.
But not perfect, since of course other containers can contain objects
of their own type too. But it leads to an interesting issue...
> Or should b[0] be an int, if b is a bytes object? That would change
> things dramatically.
This makes me think I want an unsigned byte type, which b[0] would
return. In another thread I think someone mentioned something about
fixed width integral types, such that you could have an object that
was guaranteed to be 8-bits wide, 16-bits wide, etc. Maybe you also
want signed and unsigned versions of each. This may seem like YAGNI
to many people, but as I've been working on a tightly embedded/
extended application for the last few years, I've definitely had
occasions where I wish I could more closely and more directly model
my C values as Python objects (without using the standard workarounds
or writing my own C extension types).
But anyway, without hyper-generalizing, it's still worth asking
whether a bytes type is just a container of byte objects, where the
contained objects would be distinct, fixed 8-bit unsigned integral
types.
> There's also the consideration for APIs that, informally, accept
> either a string or a sequence of objects. Many of these exist, and
> they are probably all being converted to support unicode as well as
> str (if it makes sense at all). Should a bytes object be considered as
> a sequence of things, or as a single thing, from the POV of these
> types of APIs? Should we try to standardize how code tests for the
> difference? (Currently all sorts of shortcuts are being taken, from
> isinstance(x, (list, tuple)) to isinstance(x, basestring).)
I think bytes objects are very much like string objects today --
they're the photons of Python since they can act like either
sequences or scalars, depending on the context. For example, we have
code that needs to deal with situations where an API can return
either a scalar or a sequence of those scalars. So we have a utility
function like this:
def thingiter(obj):
try:
it = iter(obj)
except TypeError:
yield obj
else:
for item in it:
yield item
Maybe there's a better way to do this, but the most obvious problem
is that (for our use cases), this fails for strings because in this
context we want strings to act like scalars. So we add a little test
just before the "try:" like "if isinstance(obj, basestring): yield
obj". But that's yucky.
I don't know what the solution is -- if there /is/ a solution short
of special case tests like above, but I think the key observation is
that sometimes you want your string to act like a sequence and
sometimes you want it to act like a scalar. I suspect bytes objects
will be the same way.
-Barry
More information about the Python-Dev
mailing list