Comparisons and sorting of a numeric class....
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Tue Jan 6 17:25:38 CET 2015
Andrew Robinson wrote:
> Hi,
> I'm building a custom numeric class that works with values that have
> uncertainty and am wanting to make it as compatible with floating point
> objects as possible -- so as to be usable in legacy code with minimal
> rewites; but but I am having trouble understanding how to handle
> magnitude comparison return values for __lt__ __gt__ etc.
>
> The problem with magnitude comparisons in my class is that the
> comparison may fail to be True for various reasons, and in various ways,
> but it's important for programming to retain the general reason a
> comparison wasn't strictly 'True'.
It sounds like you want some sort of multi-valued or fuzzy logic here.
Skipping ahead, you wrote:
> What I would like is for operators like '>' to return a boolean True or
> False equivalent, which also encodes the reason for failure in one of
> five ways:
>
> True: Sufficient information and confidence exists that the
> comparison is thoroughly True.
> PartTrue: The comparison is uncertain for any single sample, but True
> is at least minimally more probable.
> Unbiased: The comparison is uncertain for any single sample.
> PartFalse: The comparison is uncertain for any single sample, but
> False is at least minimally more probable.
> False: Sufficient information and confidence exists that the
> comparison is thoroughly False.
>
> By default, only True would evaluate in conditional statement as a
> logical True. All other values would be equivalent to False,
[...]
Before you spend too much time on this, I recommend that you research some
of the existing theory on multi-value logic. There are a number of standard
types that you may be able to use for your needs, or possibly even find an
existing library for. Or there is fuzzy logic, where truth values vary
continuously between 0.0 and 1.0, with 0.0 being "absolutely false" and 1.0
being "absolutely true".
Start here: http://en.wikipedia.org/wiki/Many-valued_logic
Assuming nothing standard suits, you can't actually subclass bool, as you
have discovered. But that doesn't mean that you can't still use bools! You
just need some other values to use as well.
For simplicity, let's just create three singleton values. We're going to
have to bootstrap the instances into existence, so this will be a bit
messy:
# untested
class MultiLogic(object):
PartTrue = Unbiased = PartFalse = None
def __new__(cls):
raise TypeError('cannot create more instances')
def __repr__(self):
if self is type(self).PartTrue:
return "PartTrue"
if self is type(self).PartFalse:
return "PartFalse"
if self is type(self).Unbiased:
return "Unbiased"
# Python 2
def __nonzero__(self):
return False
# Python 3
def __bool__(self):
return False
# Bootstrap the instances into existence.
PartTrue = MultiLogic.PartTrue = object.__new__(MultiLogic)
PartFalse = MultiLogic.PartFalse = object.__new__(MultiLogic)
Unbiased = MultiLogic.Unbiased = object.__new__(MultiLogic)
There's other ways to handle this, of course, but what we have now is
three "singleton" (tripleton?) values, PartTrue, PartFalse and Unbiased.
You can access them either by the global name, or via the MultiLogic.*
class attribute.
Now, let's use these in your numeric class:
# I don't think we should inherit from float.
class MyNumber(object):
# See the source code for decimal.Decimal or fractions.Fraction
# for ways to program a numeric type.
def __eq__(self, other):
if not isinstance(other, MyNumber):
# Signal that we don't know how to do this comparison.
return NotImplemented
if some_condition:
# Certainly equal.
return True
elif another_condition:
# Probably equal.
return PartTrue
elif a_third_condition:
# Probably not equal.
return PartFalse
elif yet_another_condition:
# Definitely not equal.
return False
else:
# Unknown.
return Unbiased
# Now do the same for __ne__ __lt__ __gt__ __le__ __ge__
# (not equal, less than, greater than, less than or equal,
# greater than or equal)
You probably will want a helper method to avoid too much duplicate code in
your comparison operators.
By the way, you seem to be working on a form of fuzzy number or interval
arithmetic:
http://en.wikipedia.org/wiki/Fuzzy_number
http://en.wikipedia.org/wiki/Interval_arithmetic
> For example, to 1 significant figure, I can write two values
> a=mynum(0.1) and b=mynum(0.01) ; as written both of them have an
> uncertainty of 1 least significant figure, so it is possible both values
> are really a 0. But, there is a bias/mean/average which suggests that
> more often than not -- 0.1 will be bigger than 0.01. So, what is the
> proper response of a>b ?, the answer is that a>b depends on the context
> of how the values were obtained and what is being done with them,
> although strictly speaking 'a' is not greater than 'b' in this case,
> although 'a' has a much better chance of being greater than 'b' on
> average which may be important for the user to know.
>
> Where I'm getting stuck is how to encode the return values of the
> comparison operators to be compatible with python floating point object
> return values (and python's sort algorithms for lists of floats)
With this design, you probably will be able to sort lists of MyNumbers. But
sorting probably won't be stable! I expect that it will depend on the order
that you give the numbers. That is:
sorted([a, b, c, d] != sorted([b, c, a, d]
in general. The reason is that your comparison operators may not be
transitive. For real numbers, we have the rule:
if a > b and b > c
then a > c
and similar for the other comparisons, but once you add uncertainties those
may no longer apply. If they don't, then sorting will be unpredictable. If
they remain transitive, then sorting will behave sensibly.
Note: the programming language APL uses intransitive comparisons, including
fuzzy equality. In APL, if you had a == b, and b == c, that did not
necessarily mean that a == c.
[...]
> but
> still give extra functionality that is needed for uncertainty ... and as
> I'm writing in python 2.xx but with an eye toward python 3 in the
> future, I've become concerned that __cmp__ has been discontinued so that
> I need to write this using only __gt__() and friends. I don't know how
> to do it.
I don't think you can solve this problem with __cmp__ since it can only
represent three values, "less", "equal" or "greater".
[...]
> For sorting, it would be ideal if the sorting algorithms min(), max(),
> etc. could automatically recognize that:
> False < PartFalse < Unknown < PartTrue < True.
I don't think that is possible with the standard Python sort, but you could
write your own sort routines. It may be possible to come up with a clever
key function or cmp routine for sort that will do that.
> But, even if that can't be done -- if there were a way, or method I
> could add to my classes, which would intercept sort functions and
> replace an absolute certain compare function with a merely Unbiased
> detecting one; sort functions would all operate properly.
>
> However, I'm not sure how to create these return values nor do I know
> how to get the sort() functions to use them.
>
> I've tried subclassing float() to see if I could actually make a
> subclass that inherited all that float has, and be able to add extra
> methods for my own use -- but Python doesn't seem to allow subclassing
> float. I either am not doing it right, or it can't be done.
Unless you are using a truly ancient version of Python, you can subclass
float. This has been possible since version 2.2.
If you show us what you tried, we can tell you what you've done wrong.
> So, I'm not sure I can subclass boolean either because that too is a
> built in class ...
You cannot subclass bool.
> but I'm not sure how else to make an object that
> acts as boolean False, but can be differentiated from false by the 'is'
> operator.
That is easy. Here is a trivial example:
class X:
# Like False, but different.
def __nonzero__(self): # for Python 2
return False
def __bool__(self): # for Python 3
return False
x = X()
x now acts like boolean False, but can be differentiated by the 'is'
operator.
> It's frustrating -- what good is subclassing, if one cant
> subclass all the base classes the language has?
Just because you can't subclass bool doesn't mean that subclassing (say)
list, dict, float, str, Exception, set and others isn't useful.
> What other approaches can I take?
>
> 'False' is a singleton, so I was pretty sure this wouldn't work -- but I
> tried it...
> PartTrue = False
> if (1>2) is PartTrue: print "This is an obvious failure... False is not
> the object PartTrue."
>
> And I am seriously stumped....
Hopefully I've given you some good pointers. Feel free to ask many more
questions, although it is late here now (technically, it is very early) so
I'll be going to bed soon and may not get to answer until some time
tomorrow. But others may chime in with their own suggestions.
--
Steven
More information about the Python-list
mailing list