[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

Wed Feb 15 00:13:41 CET 2006

On 2/13/06, Barry Warsaw <barry at python.org> wrote:
> This makes me think I want an unsigned byte type, which b[0] would
> return.  In another thread I think someone mentioned something about
> fixed width integral types, such that you could have an object that
> was guaranteed to be 8-bits wide, 16-bits wide, etc.   Maybe you also
> want signed and unsigned versions of each.  This may seem like YAGNI
> to many people, but as I've been working on a tightly embedded/
> extended application for the last few years, I've definitely had
> occasions where I wish I could more closely and more directly model
> my C values as Python objects (without using the standard workarounds
> or writing my own C extension types).

So I'm taking that the specific properties you want to model are the
overflow behavior, right? N-bit unsigned is defined as arithmethic mod
2**N; N-bit signed is a bit more tricky to define but similar. These
never overflow but instead just throw away bits in an exactly
specified manner (2's complement arithmetic).

While I personally am comfortable with writing (x+y) & 0xFFFF (for
16-bit unsigned), I can see that someone who spends a lot of time
doing arithmetic in this field might want specialized types.

But I'm not sure that that's what the Numeric folks want -- I believe
they're more interested in saving space, not in the mod 2**N
properties. So (here I'm to some extent guessing) they have different
array types whose elements are ints or floats of various widths; I'm
guessing they also have scalars of those widths for consistency or to
guide the creation of new arrays from scalars. I wouldn't be surprised
if, rather than requiring N-bit 2's complement, they would prefer more
flexible control over overflow -- e.g. ignore, warn, error, turn into
NaN, etc.

> But anyway, without hyper-generalizing, it's still worth asking
> whether a bytes type is just a container of byte objects, where the
> contained objects would be distinct, fixed 8-bit unsigned integral
> types.

There's certainly a point to treating bytes as ints; I don't know if
it's more compelling than to treating them as unit bytes. But if we
decide that the bytes types contains ints, b[0] should return a plain
int (whose value necessarily is in range(0, 256)), not some new
unsigned-8-bit type. And creating a bytes object from a list of ints
should accept any input values as long as their __index__ value is in
that same range.

I.e. bytes([1, 2L]) should be the same as bytes([1L, 2]); and
bytes([-1]) should raise a ValueError.

> > There's also the consideration for APIs that, informally, accept
> > either a string or a sequence of objects. Many of these exist, and
> > they are probably all being converted to support unicode as well as
> > str (if it makes sense at all). Should a bytes object be considered as
> > a sequence of things, or as a single thing, from the POV of these
> > types of APIs? Should we try to standardize how code tests for the
> > difference? (Currently all sorts of shortcuts are being taken, from
> > isinstance(x, (list, tuple)) to isinstance(x, basestring).)
>
> I think bytes objects are very much like string objects today --
> they're the photons of Python since they can act like either
> sequences or scalars, depending on the context.  For example, we have
> code that needs to deal with situations where an API can return
> either a scalar or a sequence of those scalars.  So we have a utility
> function like this:
>
> def thingiter(obj):
>      try:
>          it = iter(obj)
>      except TypeError:
>          yield obj
>      else:
>          for item in it:
>              yield item
>
> Maybe there's a better way to do this, but the most obvious problem
> is that (for our use cases), this fails for strings because in this
> context we want strings to act like scalars.  So we add a little test
> just before the "try:" like "if isinstance(obj, basestring): yield
> obj".  But that's yucky.
>
> I don't know what the solution is -- if there /is/ a solution short
> of special case tests like above, but I think the key observation is
> that sometimes you want your string to act like a sequence and
> sometimes you want it to act like a scalar.  I suspect bytes objects
> will be the same way.

I agree it's icky, and I'd rather not design APIs like that -- but I
can't help it that others continue to want to use that idiom. I also
agree that most likely we'll want to treat bytes the same as strings
here. But no basestring (bytes are mutable and don't behave like
sequences of characters).

--
--Guido van Rossum (home page: http://www.python.org/~guido/)