[Numpy-discussion] PEP 31XX: A Type Hierarchy for Numbers (and other algebraic entities)

Wed Apr 25 04:36:06 EDT 2007

Here's a draft of the numbers ABCs PEP. The most up to date version
will live in the darcs repository at
http://jeffrey.yasskin.info/darcs/PEPs/pep-3141.txt (unless the number
changes) for now. Naming a PEP about numbers 3.141 seems cute, but of
course, I don't get to pick the number. :) This is my first PEP, so
apologies for any obvious mistakes.

I'd particularly like the numpy people's input on whether I've gotten
floating-point support right.

Thanks,
Jeffrey Yasskin

-------------------------------
PEP: 3141
Title: A Type Hierarchy for Numbers (and other algebraic entities)
Version: $Revision: 54928 $
Last-Modified: $Date: 2007-04-23 16:37:29 -0700 (Mon, 23 Apr 2007) $
Author: Jeffrey Yasskin <jyasskin at gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 23-Apr-2007
Post-History: Not yet posted

Abstract
========

This proposal defines a hierarchy of Abstract Base Classes (ABCs)
[#pep3119] to represent numbers and other algebraic entities similar
to numbers.  It proposes:

* A hierarchy of algebraic concepts, including monoids, groups, rings,
  and fields with successively more operators and constraints on their
  operators. This will be added as a new library module named
  "algebra".

* A hierarchy of specifically numeric types, which can be converted to
  and from the native Python types. This will be added as a new
  library module named "numbers".

Rationale
=========

Functions that take numbers as arguments should be able to determine
the properties of those numbers, and if and when overloading based on
types is added to the language, should be overloadable based on the
types of the arguments. This PEP defines some abstract base classes
that are useful in numerical calculations. A function can check that
variable is an instance of one of these classes and then rely on the
properties specified for them. Of course, the language cannot check
these properties, so where I say something is "guaranteed", I really
just mean that it's one of those properties a user should be able to
rely on.

This PEP tries to find a balance between providing fine-grained
distinctions and specifying types that few people will ever use.

Specification
=============

Although this PEP uses terminology from PEP3119, the hierarchy is
meaningful for any systematic method of defining sets of
classes. **Todo:** link to the Interfaces PEP when it's ready. I'm
also using the extra notation from [#pep3107] (annotations) to specify
some types.

Object oriented systems have a general problem in constraining
functions that take two arguments. To take addition as an example,
``int(3) + int(4)`` is defined, and ``vector(1,2,3) + vector(3,4,5)``
is defined, but ``int(3) + vector(3,4,5)`` doesn't make much sense. So
``a + b`` is not guaranteed to be defined for any two instances of
``AdditiveGroup``, but it is guaranteed to be defined when ``type(a)
== type(b)``. On the other hand, ``+`` does make sense for any sorts
of numbers, so the ``Complex`` ABC refines the properties for plus so
that ``a + b`` is defined whenever ``isinstance(a,Complex) and
isinstance(b,Complex)``, even if ``type(a) != type(b)``.

Monoids (http://en.wikipedia.org/wiki/Monoid) consist of a set with an
associative operation, and an identity element under that
operation. **Open issue**: Is a @classmethod the best way to define
constants that depend only on the type?::

    class MonoidUnderPlus(Abstract):
	"""+ is associative but not necessarily commutative and has an
	identity given by plus_identity().

	Subclasses follow the laws:

	  a + (b + c) === (a + b) + c
	  a.plus_identity() + a === a === a + a.plus_identity()

	Sequences are monoids under plus (in Python) but are not
	AdditiveGroups.
	"""
	@abstractmethod
	def __add__(self, other):
	    raise NotImplementedError

	@classmethod
	@abstractmethod
	def plus_identity(cls):
	    raise NotImplementedError

I skip ordinary non-commutative groups here because I don't have any
common examples of groups that use ``+`` as their operator but aren't
commutative. If we find some, the class can be added later.::

    class AdditiveGroup(MonoidUnderPlus):
	"""Defines a commutative group whose operator is +, and whose inverses
	are produced by -x.
	See http://en.wikipedia.org/wiki/Abelian_group.

	Where a, b, and c are instances of the same subclass of
	AdditiveGroup, the operations should follow these laws, where
	'zero' is a.__class__.zero().

	      a + b === b + a
	(a + b) + c === a + (b + c)
	   zero + a === a
	   a + (-a) === zero
	      a - b === a + -b

	Some abstract subclasses, such as Complex, may extend the
	definition of + to heterogenous subclasses, but AdditiveGroup only
	guarantees it's defined on arguments of exactly the same types.

	Vectors are AdditiveGroups but are not Rings.
	"""
	@abstractmethod
	def __add__(self, other):
	    """Associative commutative operation, whose inverse is negation."""
	    raise NotImplementedError

**Open issue:** Do we want to give people a choice of which of the
following to define, or should we pick one arbitrarily?::

	def __neg__(self):
	    """Must define this or __sub__()."""
	    return self.zero() - self

	def __sub__(self, other):
	    """Must define this or __neg__()."""
	    return self + -other

	@classmethod
	@abstractmethod
	def zero(cls):
	    """A better name for +'s identity as we move into more mathematical
	    domains."""
	    raise NotImplementedError

	@classmethod
	def plus_identity(cls):
	    return cls.zero()

Including Semiring (http://en.wikipedia.org/wiki/Semiring) would help
a little with defining a type for the natural numbers. That can be
split out once someone needs it (see ``IntegralDomain`` for how).::

    class Ring(AdditiveGroup):
	"""A mathematical ring over the operations + and *.
	See http://en.wikipedia.org/wiki/Ring_%28mathematics%29.

	In addition to the requirements of the AdditiveGroup superclass, a
	Ring has an associative but not necessarily commutative
	multiplication operation with identity (one) that distributes over
	addition. A Ring can be constructed from any integer 'i' by adding
	'one' to itself 'i' times. When R is a subclass of Ring, the
	additive identity is R(0), and the multiplicative identity is
	R(1).

	Matrices are Rings but not Commutative Rings or Division
	Rings. The quaternions are a Division Ring but not a
	Field. The integers are a Commutative Ring but not a Field.
	"""
	@abstractmethod
	def __init__(self, i:int):
	    """An instance of a Ring may be constructed from an integer.

	    This may be a lossy conversion, as in the case of the integers
	    modulo N."""
	    pass

	@abstractmethod
	def __mul__(self, other):
	    """Satisfies:
	    a * (b * c) === (a * b) * c
		one * a === a
		a * one === a
	    a * (b + c) === a * b + a * c

	    where one == a.__class__(1)
	    """
	    raise NotImplementedError

	@classmethod
	def zero(cls):
	    return cls(0)

	@classmethod
	def one(cls):
	    return cls(1)

I'm skipping both CommutativeRing and DivisionRing here.

    class Field(Ring):
	"""The class Field adds to Ring the requirement that * be a
	commutative group operation except that zero does not have an
	inverse.
	See http://en.wikipedia.org/wiki/Field_%28mathematics%29.

	Practically, that means we can define division on a Field. The
	additional laws are:

	a * b === b * a
	a / a === a.__class_(1)  # when a != a.__class__(0)

	Division lets us construct a Field from any Python float,
	although the conversion is likely to be lossy. Some Fields
	include the real numbers, rationals, and integers mod a
	prime. Python's ``float`` resembles a Field closely.
	"""
	def __init__(self, f:float):
	    """A Field should be constructible from any rational number, which
	    includes Python floats."""
	    pass

	@abstractmethod
	def __div__(self, divisor):
	    raise NotImplementedError

Division is somewhat complicated in Python. You have both __floordiv__
and __div__, and ints produce floats when they're divided. For the
purposes of this hierarchy, ``__floordiv__(a, b)`` is defined by
``floor(__div__(a, b))``, and, since int is not a subclass of Field,
it's allowed to do whatever it wants with __div__.

There are four more reasonable classes that I'm skipping here in the
interest of keeping the initial library simple. They are:

``Algebraic``
    Rational powers of its elements are defined (and maybe a few other
    operations)
    (http://en.wikipedia.org/wiki/Algebraic_number). Complex numbers
    are the most well-known algebraic set. Real numbers are _not_
    algebraic, but Python does define these operations on floats,
    which makes defining this class somewhat difficult.

``Trancendental``
    The elementary functions
    (http://en.wikipedia.org/wiki/Elementary_function) are
    defined. These are basically arbitrary powers, trig functions, and
    logs, the contents of ``cmath``.

The following two classes can be reasonably combined with ``Integral``
for now.
``IntegralDomain``
    Defines __divmod__.
    (http://darcs.haskell.org/numericprelude/docs/html/Algebra-IntegralDomain.html#t%3AC)

``PrincipalIdealDomain``
    Defines gcd and lcm.
    (http://darcs.haskell.org/numericprelude/docs/html/Algebra-PrincipalIdealDomain.html#t%3AC)

If someone needs to split them later, they can use code like::
    import numbers
    class IntegralDomain(Ring): ...
    numbers.Integral.__bases__ = (IntegralDomain,) + numbers.Integral.__bases__

Finally, we get to numbers. This is where we switch from the "algebra"
module to the "numbers" module.::

    class Complex(Ring, Hashable):
        """The ``Complex`` ABC indicates that the value lies somewhere
        on the complex plane, not that it in fact has a complex
        component: ``int`` is a subclass of ``Complex``. Because these
        actually represent complex numbers, they can be converted to
        the ``complex`` type.

        ``Complex`` finally gets around to requiring its subtypes to
        be immutable so they can be hashed in a standard way.

        ``Complex`` also requires its operations to accept
        heterogenous arguments. Subclasses should override the
        operators to be more accurate when they can, but should fall
        back on the default definitions to handle arguments of
        different (Complex) types.

        **Open issue:** __abs__ doesn't fit here because it doesn't
        exist for the Gaussian integers
        (http://en.wikipedia.org/wiki/Gaussian_integer). In fact, it
        only exists for algebraic complex numbers and real numbers. We
        could define it in both places, or leave it out of the
        ``Complex`` classes entirely and let it be a custom extention
        of the ``complex`` type.

        The Gaussian integers are ``Complex`` but not a ``Field``.
	"""
	@abstractmethod
	def __complex__(self):
	    """Any Complex can be converted to a native complex object."""
	    raise NotImplementedError

	def __hash__(self):
	    return hash(complex(self))

	@abstractmethod
	def real(self) => Real:
	    raise NotImplementedError

	@abstractmethod
	def imag(self) => Real:
	    raise NotImplementedError

	@abstractmethod
	def __add__(self, other):
	    """The other Ring operations should be implemented similarly."""
	    if isinstance(other, Complex):
		return complex(self) + complex(other)
	    else:
		return NotImplemented

``FractionalComplex(Complex, Field)`` might fit here, except that it
wouldn't give us any new operations.

    class Real(Complex, TotallyOrdered):
	"""Numbers along the real line. Some subclasses of this class
	may contain NaNs that are not ordered with the rest of the
	instances of that type. Oh well. **Open issue:** what problems
	will that cause? Is it worth it in order to get a
	straightforward type hierarchy?
	"""
	@abstractmethod
	def __float__(self):
	    raise NotImplementedError
	def __complex__(self):
	    return complex(float(self))
	def real(self) => self.__class__:
	    return self
	def imag(self) => self.__class__:
	    return self.__class__(0)
        def __abs__(self) => self.__class__:
            if self < 0: return -self
            else: return self

    class FractionalReal(Real, Field):
	"""Rationals and floats. This class provides concrete
	definitions of the other four methods from properfraction and
	allows you to convert fractional reals to integers in a
	disciplined way.
	"""
	@abstractmethod
	def properfraction(self) => (int, self.__class__):
	    """Returns a pair (n,f) such that self == n+f, and:
	      * n is an integral number with the same sign as self; and
	      * f is a fraction with the same type and sign as self, and with
		absolute value less than 1.
	      """
	    raise NotImplementedError
	def floor(self) => int:
            n, r = self.properfraction()
            if r < 0 then n - 1 else n
	def ceiling(self) => int: ...
	def __trunc__(self) => int: ...
	def round(self) => int: ...

**Open issue:** What's the best name for this class? RealIntegral? Integer?::

    class Integral(Real):
	"""Integers!"""
	@abstractmethod
	def __int__(self):
	    raise NotImplementedError
	def __float__(self):
	    return float(int(self))

	@abstractmethod
	def __or__(self, other):
	    raise NotImplementedError
	@abstractmethod
	def __xor__(self, other):
	    raise NotImplementedError
	@abstractmethod
	def __and__(self, other):
	    raise NotImplementedError
	@abstractmethod
	def __lshift__(self, other):
	    raise NotImplementedError
	@abstractmethod
	def __rshift__(self, other):
	    raise NotImplementedError
	@abstractmethod
	def __invert__(self):
	    raise NotImplementedError

Floating point values may not exactly obey several of the properties
you would expect from their superclasses. For example, it is possible
for ``(large_val + -large_val) + 3 == 3``, but ``large_val +
(-large_val + 3) == 0``. On the values most functions deal with this
isn't a problem, but it is something to be aware of. Types like this
inherit from ``FloatingReal`` so that functions that care can know to
use a numerically stable algorithm on them. **Open issue:** Is this
the proper way to handle floating types?::

    class FloatingReal:
	"""A "floating" number is one that is represented as
	``mantissa * radix**exponent`` where mantissa, radix, and
	exponent are all integers. Subclasses of FloatingReal don't
	follow all the rules you'd expect numbers to follow. If you
	really care about the answer, you have to use numerically
	stable algorithms, whatever those are.

        **Open issue:** What other operations would be useful here?

	These include floats and Decimals.
	"""
	@classmethod
	@abstractmethod
	def radix(cls) => int:
	    raise NotImplementedError

	@classmethod
	@abstractmethod
	def digits(cls) => int:
	    """The number of significant digits of base cls.radix()."""
	    raise NotImplementedError

	@classmethod
	@abstractmethod
	def exponentRange(cls) => (int, int):
	    """A pair of the (lowest,highest) values possible in the exponent."""
	    raise NotImplementedError

	@abstractmethod
	def decode(self) => (int, int):
	    """Returns a pair (mantissa, exponent) such that
	    mantissa*self.radix()**exponent == self."""
	    raise NotImplementedError

Inspiration
===========
http://hackage.haskell.org/trac/haskell-prime/wiki/StandardClasses
http://repetae.net/john/recent/out/classalias.html

References
==========

.. [#pep3119] Introducing Abstract Base Classes
   (http://www.python.org/dev/peps/pep-3119/)

.. [#pep3107] Function Annotations
   (http://www.python.org/dev/peps/pep-3107/)

.. [3] Possible Python 3K Class Tree?, wiki page created by Bill Janssen
   (http://wiki.python.org/moin/AbstractBaseClasses)

.. [#numericprelude] NumericPrelude: An experimental alternative
hierarchy of numeric type classes
   (http://darcs.haskell.org/numericprelude/docs/html/index.html)

Acknowledgements
----------------

Thanks to Neil Norwitz for helping me through the PEP process.

The Haskell Numeric Prelude [#numericprelude] nicely condensed a lot
of experience with the Haskell numeric hierarchy into a form that was
relatively easily adaptable to Python.

Copyright
=========

This document has been placed in the public domain.

..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: