[Python-Dev] Unifying Long Integers and Integers: baseint

Thu Aug 12 16:30:06 CEST 2004

> Guido van Rossum wrote:
> > Anyway, if we really do have enough use cases for byte array
> > literals, we might add them.  I still think that would be
> > confusing though, because byte arrays are most useful if they are
> > mutable: and then we'd have mutable literals -- blechhhh!

Martin:
> I see. How would you like byte array displays then?

Not as characters (not by default anyway), because more often than not
they will contain binary or encoded gibberish!

> This is the approach taken in the other languages: Everytime the
> array display is executed, a new array is created. There is then no
> problem with that being mutable.

The downside of that is that then for performance reasons you might
end up having to move bytes literals out of expressions if they are in
fact used read-only (which the compiler can't know but the user can).

> Of course, if the syntax is too similar to string literals, people
> might be tricked into believing they are actually literals. Perhaps
> 
>    bytes('G','E','T')
> 
> would be sufficient, or even
> 
>    bytes("GET")
> 
> which would implicitly convert each character to Latin-1.

The first form would also need an encoding, since Python doesn't have
character literals!  I don't think we should use Latin-1, for the same
reasons that the default encoding is ASCII.  Better would be to have a
second argument that's the encoding, so you can write

    bytes(u"<chinese text>", "utf-8")

Hm, u"<chinese text>".encode("utf-8") should probably return a bytes
array, and that might be sufficient.  Perhaps bytes should by default
be considered as arrays of tiny unsigned ints, so we could use

    bytes(map(ord, "GET"))

and it would display itself as

    bytes([71, 69, 84])

Very long ones should probably use ellipses rather than print a
million numbers.

--Guido van Rossum (home page: http://www.python.org/~guido/)