Does Python need a '>>>' operator?

Mon Jun 10 07:11:31 EDT 2002

On 2002-06-10, Ken Seehof wrote:

> Beni Cherniavsky <cben at tx.technion.ac.il> wrote:
> > I just got another idea: use 0x1234 for 0-filled numbers and 1xABCD for
> > 1-filled ones.  That way you impose no restrictions on what follows the
> > prefix and keep backward compatibility.  0xFFFFFFFF stays a 2^n-1
> > _positive_ number, as it should be.  The look of 1x is weird at first but
> > it is very logical...
>
> Hey, that's pretty clever.  I like it.  One objection I can forsee is
> that it does add another little thing to learn.  And yes, one must
> consider such a feature as making python more complex if it requires
> a paragraph to be added to the main documentation, even if the feature
> will only be used by "advanced" programmers.  The standard argument
> is that all python programmers have to be able to read your code.
>
> However, I don't see this as making life much more complicated for
> beginners, since we already have 0x1234, which is only familiar to
> experienced programmers.  People who are not familiar with 0x1234 can
> learn about 1xABCD at the same time.  People who are familiar with
> 0x1234 (e.g. from programming in C/C++), can probably handle looking up
> the proposed syntax in the documentation, if they can't guess what it
> means by experimenting in the interactive interpreter.
>
After some thought I discovered that this is actually very consistent:

~1x0123456789ABCDEF == 0xFEDCBA9876543210

One only need to learn a little to about 2's compleemnt to understand:

-1x0123456789ABCDEF == 0xFEDCBA9876543211 # 1 added

if he wishes to...

> In particular, I like:
>
>   1xF == -1
>
> Seems more pythonic than 0xFFFFFFFF, since the latter implies knowledge
> that we are using 32 bits.  I don't know of any current way to express
> 1 filled binary numbers cleanly.
>
Currently you use the ~ or the - unary operators.  So another idea, in the
spirit of repr() is to print -1 as ~0x0.  When you want bit-manipulation,
~ is more natural than -.  However I personally like the 1x prefix more.

> Unfortunately, I don't see an easy way to clean up this blemish:
>
> >>> 0xffff
> 65535

This is correct behaviour.

> >>> 0xffffffff
> -1

While currently it is like this, it cannot be considered standard.  It
depends on the machine's word size so any code relying on it being equal
to -1 is already broken!

> >>> 0xfffffffff
> 68719476735L
>
Correct too.

> I suppose in a perfect world, 0xffffffff would be 4294967295L, but
> we'd have serious compatibility issues with that (note that one
> would use 1xf instead of 0xffffffff to represent -1 in the perfect
> world).
>
> BTW, note that 0xffffffff == 4294967295L would be consistent with
> C/C++ with an unsigned int (32 or more bits).  The idea is that
> all numbers 0x... are non-negative, while number 1x... are negative.
>
Sure.  That's the idea.  To make the 32bit boundary invisible one must
ensure that a given written number represents the same mathematical
integer, on any machine.

> I'm quite certain that any proposed solution (such as issuing a
> warning for 0xXXXXXXXX) will receive flames, so I think I will
> stop here, hoping it's not too late :-)
>
I would propose the 1x for a read syntax.  ~ and - are already accepted
automagically (since python's only read syntax is the eval syntax).  The
only trouble now are non-long constants with msb set.  They should be
warned anyway to help people detect word-size sensitivity bugs...  Maybe a
__future__ should be provided is serious enough compatibility problems
arise.  This point itself is probably not worth a __future__ as the new
semantics of 0x... always positive is compatible with any bug free-code ;)

I do not propose to replace hex() now, one can provide a newhex() [better
name desperately needed, maybe hex1?].  In the long run, it might be
reasonable to support the different combinations in format strings:

We already have (or should have; in C and python1.5 - sadly there is no
up-to-date installation around - %x treats the data as always unsigned;
python1.5 refuses to %x longs but accepts them in hex() - I presume that's
been already fixed):

"%x"  % +2  =>   2
"%x"  % -2  =>  -2
"%+x" % +2  =>  +2
"%+x" % -2  =>  -2
"%04x" % +2 =>   0002
"%04x" % -2 =>  -0002
"%0#4x" %+2 =>   0x02
"%0#4x" %-2 =>  -0x02  # Is this so indeed?

Or maybe it's still always unsigned in '%x' to mirror C?  I son't think
that can be kept because the signed->unsigned coercion exposes the word
length; do (x & 0xffff) when you want to coerce, so that your results will
be predictable...

Proposed:

"%~x" % +2  =>   2
"%~x" % -2  =>  ~1
"%#~x" % +2 =>   0x2
"%#~x" % -2 =>   1xE
# Like the proposal by I-don't-remember-who (sorry) but with a determined
# msb!  No guessing!
"%0~4x" % +2 =>  0x02
"%0~4x" % -2 => ~0x01
"%0#~4x" %+2 =>  0x02
"%0#~4x" %-2 =>  1xFE

Now this is probably overkill.  But one must introduce a new flag symbol,
# is already used for adding the 0x, it can't be 1 (would mean a width)
and it would be logical to do 1xE only when # is also specified, which
brought me to all these combinations.

But the two ideas of ~0x1 and 1xE both serve exactly the same purpose so
only one sould suffice IMHO.  ~0x1 is easy to constuct from the existing
facilities, 1xE needs playing around with the output's first char...

> - Ken Seehof
>

-- 
Beni Cherniavsky <cben at tx.technion.ac.il>