[Python-bugs-list] [ python-Bugs-795791 ] bool() violates a numeric invariant

Thu Aug 28 14:46:28 EDT 2003

Bugs item #795791, was opened at 2003-08-26 23:56
Message generated for change (Comment added) made by tim_one
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=795791&group_id=5470

Category: Python Interpreter Core
Group: Python 2.3
Status: Open
Resolution: None
Priority: 1
Submitted By: David Albert Torpey (dtorp)
Assigned to: Nobody/Anonymous (nobody)
Summary: bool() violates a numeric invariant

Initial Comment:
The Liskov Substitution Principle states:   If for each 
object o1 of type S there is an object o2 of type T such 
that for all programs P defined in terms of T, the 
behavior of P is unchanged when o1 is substituted for o2
then S is a subtype of T.

The current implementation of bool() violates this rule 
by breaking an invariant for its parent, int() and its 
related numeric types, float(), complex(), and long():

>>> for typ in (int, long, complex, float, bool):
...     print typ(0), typ(str(typ(0)))

0 0
0 0
0j 0j
0.0 0.0
False True

In theory, polymorphism requires that a subclass object 
be substitutable into any code that would accept an 
instance of a parent class.  Here, the False instance is 
clearly not substitutable for zero in serializer code.

This is not just a theoretical problem, it arises in 
practice and caused problems for a comp.lang.python 
participant who was writing an XML serializer that 
depended on numeric types being able to reconstruct 
themselves from their string representations.

Another poster pointed out that the programmer could 
have used eval(repr(typ(0))), but that is not 
satisfactory because it is unsafe to run eval() on 
serialized data from an external source.

One possible fix is to make a special case so that bool
("False") returns False.  While this is a bit strange in 
that all other non-empty strings evaluate to True, it has 
the advantage of producing the expected behavior and 
only being damaging to the highly unlikely and 
probably erroneous case where someone relied on bool
("False") evaluating to True.

A more conservative fix is to issue a warning whenever 
encountering bool("False").  The will at least keep a 
probable error from passing silently into the good night.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2003-08-28 16:46

Message:
Logged In: YES 
user_id=31435

Armin, in 2.3 a simple, safe and fast way to "un repr" a string 
(or unicode) S is via

S.decode('string-escape')

IIRC, MvL added the string-escape codec.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2003-08-28 09:53

Message:
Logged In: YES 
user_id=38388

Please take these discussions to comp.lang.python; the SF
bug tracker is not the right place for this.

Thanks.

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2003-08-28 09:44

Message:
Logged In: YES 
user_id=4771

Wouldn't it make sense to have a safe version of 'eval()' ?
One that can be used to read back any repr() of a built-in type.

The functionality would be similar to that of the parser
reading a literal from Python source, so maybe it is already
available. (However I think that the parser doesn't do
anything special with False and True.)

In any case I have sometimes wished I would have a simple
way to unquote a string quoted by repr() without all the
overhead and dangers of eval().

----------------------------------------------------------------------

Comment By: Steven Taschuk (staschuk)
Date: 2003-08-27 18:08

Message:
Logged In: YES 
user_id=666873

I concur with Christos that this is desirable behaviour, not a bug.

It seems very important to me that
    if x:
be equivalent to
    if bool(x):
for all objects x.  Thus if we adopt the proposal that 
bool('False') == False, it seems to me that we'd also have to 
have
    if 'False':
*not* run the if-block.  This would break existing code which 
expects strings to be false only when empty.  That's lots of 
code.

Roundtripping through str() strikes me as a very minor 
consideration.  str() is expected to lose information in general 
and is not in any way intended to be used for serialization.  If 
one insists on using it for serializing ints, then
    if type(x) is int:
        write(str(x))
    else:
        write(serialize_sensibly(x))
will avoid the problem with bools.  The violation of Liskov 
(which prevents us from using isinstance(x, int) here) is 
unfortunate but not imho nearly a big enough deal to merit the 
change.

----------------------------------------------------------------------

Comment By: Christos Georgiou (tzot)
Date: 2003-08-27 10:56

Message:
Logged In: YES 
user_id=539787

Addendum, correcting myself: yes, if reading the LSP in math 
lingo context, I can see that one can read it backwards (if S 
is subtype of T, then S() can be used in place of T() 
everywhere, a la <=>); I was interpreting it non-
mathematically, so David is correct about context.

Does it matter, though, if I create a subclass whose 
instances are not substitutable for its parent class 
instances?  Python (and most other "OO" computer languages 
I know of) allows me to do that; should I stop calling 
this 'subclassing'?

----------------------------------------------------------------------

Comment By: Christos Georgiou (tzot)
Date: 2003-08-27 10:41

Message:
Logged In: YES 
user_id=539787

The comments might not be helpful to your request, but they 
are on topic (apart from the last paragraph, which was 
intended to be humourous).
I still can't swap the if-then parts of the LSP, even in 
context; that might be a problem of my way of thinking, 
though.

You seem to confuse the purpose of str() (saying "There is a 
reason that all of these..."); its intended use is not 
serialising.  repr() and eval() combined are intended for this 
purpose (even if repr() doesn't/can't always do that 
practically), and in this context you already know that bool 
works fine.  You can't expect str() to behave correctly for 
bools when it doesn't for floats:

>>> x = 5**.5
>>> x == float(str(x))
False
>>> x == eval(repr(x))
True

Is float broken too?

A hammer is heavy and can be used as a paperweight, but 
people 99% of the times use hammers for hammering and 
paperweights to keep papers down.  __repr__ is intended for 
unambiguous representation (consequently, can be used for 
serialisation too), and __str__ is intended to coerce to str 
(just like __unicode__, and __float__, and __int__ etc do).
The burden of security (ie not passing arbitrary arguments to 
eval()) is what the programmer must carry in this case.  repr
() and eval() is the way to go, IMHO.

I will comply and won't answer your rhetorical question.

I would suggest you bring up this subject in the newsgroup 
(or mailing list), which you might have done already without 
my noticing it.  You can calculate percentages counting the 
responses, and if only 1% is against it, I am sure the BDFL will 
take this into account; the issue brought up in bug 795791 is 
not a bug, but a request for partial redesign.

----------------------------------------------------------------------

Comment By: David Albert Torpey (dtorp)
Date: 2003-08-27 10:05

Message:
Logged In: YES 
user_id=681258

While quite funny, the comments are unhelpful.  There is a 
pickle protocol but not for bool. The original post also pointed 
out that bool('False')==False would break code; however, 
bool() is brand new in Py2.3 and the *only* code affected 
would be in the extremely improbable case of someone 
expecting that bool('False')==True.  It is a nit of strict 
reading that the LSP was originally stated as an inverted 
if/then because it was being presented in the context of 
mathematical proofs -- the correct reading in the current 
context is that child instances *must* be fully substitutable 
for parent instances.

Leaving bool() as it stands breaks an important invariant for 
int, long, float, and complex.  There is a reason that all of 
these have a provision for being able to coerce their str() 
forms back into the original -- that is a key feature and its 
application is not limited to pickling.  The PEP clearly states 
that bools should behave like ints -- for instance, it is 
intentional that True + 3 == 4.

If integer addition worked in every case except fo 5+6, would 
you fix the C code for integer additional or would you have 
every user of addition insert a "simple if" to create a custom 
work-around?  Don't answer, it is a rhetorical question.

I realize that there is no clean solution as there is a tension 
between the expected invariant and the usual meaning of all 
non-empty strings as True.  The correct balance comes from 
looking at the use cases and asking what is the 
programmer's expected outcome when they type:  bool
("False").  IMHO, the answer falls 99% towards False and 1% 
towards "True".

----------------------------------------------------------------------

Comment By: Christos Georgiou (tzot)
Date: 2003-08-27 07:29

Message:
Logged In: YES 
user_id=539787

There is the pickle protocol with its __getstate__, 
__setstate__ methods, which seems more appropriate than str
() and eval(); however, these methods are not defined in the 
base types for XML serialisation.

Changing bool('False') to evaluate as False instead of True for 
all Python programs would introduce breakage; why do that 
instead of a simple special-cased 'if' in a program that does 
XML serialisation?

BTW I don't see any violation of the Liskov Substitution 
Principle; as stated, it works one way (ie in this case it fails, 
so based on the principle you *cannot* prove that bool is a 
subtype of int, but we know that bool *is* a subtype of int --
it's in the source).  The LSP is a method to understand if S is 
a subtype of T, not a necessary step for the definition of S.

And a little tongue in cheek: the Christos Similarity Principle.  
If for each person P there is another person B who has an 
astonishing similarity in appearance, has almost exactly the 
same age and was born in the same location as person P, 
then person B is a sibling of person P.  Does that mean that 
my brother is not my sibling just because he doesn't look like 
me, is 14 years younger and wasn't born in the same hospital 
as me? :)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=795791&group_id=5470