prePEP "Decimal data type" v0.2

Mon Dec 29 12:10:31 EST 2003

Here is the second version of the prePEP for the Decimal data type.

Sorry for the delay, but I restructured it to new (thanks list), and
changing to a new house and deploying GSM in Argentina are not very
time-freeing tasks, :p

Anyway, I'll appreciate any suggestion. Thank you!

.	Facundo

------------------------------------------------------------

PEP: 9999
Title: Decimal data type
Version: $Revision: 0.2 $
Last-Modified: $Date: 2003/12/29 13:35:00 $
Author: Facundo Batista <facundo at taniquetil.com.ar>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 17-Oct-2003
Python-Version: 2.3.4

Abstract
========

The idea is to have a Decimal data type, for every use where decimals are
needed but binary floating point is too inexact.

The Decimal data type will support the Python standard functions and
operations, and must comply the decimal arithmetic ANSI standard
X3.274-1996.

Decimal will be floating point (as opposite of fixed point) and will have
bounded precision (the precision is the upper limit on the quantity of
significant digits in a result).

This work is based on code and test functions written by Eric Price, Aahz
and
Tim Peters.  Actually I'll work on the Decimal.py code in the sandbox (at
python/nondist/sandbox/decimal in SourceForge).  Lot of the explanations of
this PEP are taken from the Cowlishaw's work and comp.lang.python.

Motivation
==========

Here I'll expose the reasons of why I think a Decimal data type is needed
and
why others numeric data types are not enough.

I wanted a Money data type, and after proposing a prePEP in
comp.lang.python,
the comunity agreed to have a numeric data type with the needed arithmetic
behaviour, and then build Money over it: all the considerations about
quantity of digits after the decimal point, rounding, etc., will be handled
through Money.  It is not the purpose of this PEP to have a data type that
can be used as Money without further effort.

One of the biggest advantages of implementing a standard is that someone
already thought all the creepy cases for you.  And to a standard GvR
redirected me: the Decimal specification is the Mike Cowlishaw's work at
http://www2.hursley.ibm.com/decimal/.  This document defines a general
purpose decimal arithmetic.  A correct implementation of this specification
will conform to the decimal arithmetic defined in ANSI/IEEE standard
854-1987, 5 except for some minor restrictions, and will also provide
unrounded decimal arithmetic and integer arithmetic as proper subsets.

The problem with binary float
-----------------------------

In decimal math, there are many numbers that can't be represented with a
fixed number of decimal digits, e.g. 1/3 = 0.3333333333.......

In base 2 (the way that standard floating point is calculated), 1/2 = 0.1,
1/4 = 0.01, 1/8 = 0.001, etc. 0.2 equals 2/10 equals 1/5, resulting in the
binary fractional number 0.001100110011001...  As you can see, the problem
is
that some decimal numbers can't be represented exactly in binary, resulting
in small roundoff errors.

So we need a decimal data type that represents exactly decimal numbers. 
Instead of a binary data type, we need a decimal one.

Why floating point?
-------------------

So we go to decimal, but why *floating point*?

Floating point numbers use a fixed quantity of digits (precision) to
represent a number, working with an exponent when the number gets too big or
too small.  For example, with a precision of 5::

       1234 ==>   1234e0
      12345 ==>  12345e0
     123456 ==>  12345e1

In opposite of this, we have the example of a ``long`` number, which
precision is infinite, meaning that you can have the number as big as you
want, and you'll never lose anything. 

So, why can't we have an infinite precision decimal?  It's no so easy,
because of the not exact divisions.  Ej: 1/3 = 0.3333333333333... ad
infinitum.  In this case you should store a infinite amount of 3s, which
takes too much memory, ;).

John Roth proposed to eliminate the division operator and force the user to
use an explicit method, just to avoid this kind of trouble.  This generated
adverse reactions in comp.lang.python, as everybody wants to have the
``"/"``
support in a numeric data type.

With this exposed maybe you're thinking "Hey! Can we just store the 1 and
the
3 as numerator and denominator?", which take us to the next point.

Why not rationale
-----------------

Rationale numbers are stored using two integers numbers, the numerator and
the denominator.  This implies that the arithmetic operations can't be
executed directly (e.g. to add two rationale numbers you first need to
calculate the common denominator).

Quoting Alex Martelli: "The performance implications of the fact that
summing
two rationals (which take O(M) and O(N) space respectively) gives a rational
which takes O(M+N) memory space is just too troublesome.  There are
excellent
Rational implementations in both pure Python and as extensions (e.g., gmpy),
but they'll always be a "niche market" IMHO.  Probably worth PEPping, not
worth doing without Decimal -- which is the right way to represent sums of
money, a truly major use case in the real world."

Anyway, if you're interested in this data type, you maybe will want to take
a
look at PEP 239: Adding a Rational Type to Python.

So, what we got?
----------------

The result is a Decimal data type, with bounded precision and floating
point.

Will it be useful?  I won't say it better than Alex Martelli: "Python (out
of
the box) doesn't let you have binary floating point numbers *with whatever
precision you specify*: you're limited to what your hardware supplies.
Decimal, be it used as a fixed or floating point number, should suffer from
no such limitation: whatever bounded precision you may specify on number
creation (your memory permitting) should work just as well.  Most of the
expense of programming simplicity can be hidden from application programs
and placed in a suitable decimal arithmetic type.  As per
http://www2.hursley.ibm.com/decimal/ , *a single data type can be used for
integer, fixed-point, and floating-point decimal arithmetic* -- and for
money
arithmetic which doesn't drive the application programmer crazy."

There're several uses for such a data type.  As I said before, I will use it
as base for Money.  In this case the bounded precision is not an issue;
quoting Tim Peters: "A precision of 20 would be way more than enough to
account for total world economic output, down to the penny, since the
beginning of time."

General Decimal Arithmetic Specification
========================================

Here I'll include information and descriptions that are part of the
specification (the structure of the number, the context, etc.).  All the
requeriments included in this section are not for discussion, as they are in
the standard, and the PEP is for just implementing the standard.

Anyway, if you think that something here should change, just propose it
(maybe I misplaced the item), but you've been warned, ;)

This is a very trimmed version of the original document: for a more specific
verba, check it at http://www2.hursley.ibm.com/decimal/.

The Arithmetic Model
--------------------

The specification is based on a model of decimal arithmetic which is a
formalization of the decimal system of numeration (Algorism) as further
defined and constrained by the relevant standards (IEEE 854, ANSI X3-274,
and the proposed revision of IEEE 754).

There are three components to the model:

    - Numbers: which represent the values that can be manipulated by, or be
      the results of, the core operations defined in the specification.

    - Operations: the core operations (such as addition, multiplication,
      etc.) which can be carried out on numbers.

    - Context: which represents the user-selectable parameters and rules
      that govern the results of arithmetic operations (for example, the
      precision to be used).

Numbers
-------

Numbers may be finite numbers (numbers whose value can be represented
exactly) or they may be special values (infinities and other values which
are not finite numbers).

Finite numbers are defined by three integer parameters:

    - Sign: a value which must be either 0 or 1, where 1 indicates that the
      number is negative or is the negative zero and 0 indicates that the
      number is zero or positive.

    - Coefficient: an integer which is zero or positive.

    - Exponent: a signed integer which indicates the power of ten by which
      the coefficient is multiplied.

The numerical value of a finite number is given by::

    (-1)**sign * coefficient * 10**exponent

Numbers must also be able to represent one of three named special values:

    - Infinity: a value representing a number whose magnitude is infinitely
      large.

    - Quiet NaN: a value representing undefined results (*Not a Number*)
      which does not cause an Invalid operation condition.

    - Signaling NaN: a value representing undefined results (*Not a Number*)
      which will cause an Invalid operation condition if used in any
      operation defined in the specification.

All special values may have a sign, as for finite numbers.  The sign of an
infinity is significant and the sign of a NaN has no meaning.

Context
-------

The context represents the user-selectable parameters and rules which govern
the results of arithmetic operations (for example, the precision to be
used).

The context gets that name because surrounds the Decimal numbers.  It's up
to
the implementation to work with one or several contexts, but definitely the
idea is not to get a context per Decimal number.

These definitions doesn't affect the internal storage of the Decimal
numbers,
just the way that the arithmetic operations are performed.

The context is defined by the following parameters:

    - Precision: An integer which must be greater than 0.  This sets the
      maximum number of significant digits that can result from an
arithmetic
      operation.

    - Rounding: A named value which indicates the algorithm to be used when
      rounding is necessary.  Rounding is applied when a result coefficient
      has more significant digits than the value of precision; in this case
      the result coefficient is shortened to precision digits and may then
be
      incremented by one (which may require a further shortening), depending
      on the rounding algorithm selected and the remaining digits of the
      original coefficient.  The exponent is adjusted to compensate for any
      shortening.

    - Flags and trap-enablers: The exceptional conditions are grouped into
      signals, which can be controlled individually.  The context contains a
      flag and a trap-enabler (both are either 0 or 1) for each signal.  For
      each of the signals, the corresponding flag is set to 1 when the
signal
      occurs.  It is only reset to 0 by explicit user action.  For each of
      the signals, the corresponding trap-enabler indicates which action is
      to be taken when the signal occurs.  If 0, a defined result is
      supplied, and execution continues.  If 1, the execution of the
      operation is ended or paused and control passes to a "trap handler",
      which will have access to the defined result.

The signals are: clamped, division-by-zero, inexact, invalid-operation,
overflow, rounded, subnormal and underflow.

Default Contexts
----------------

The specification defines two default contexts, which define suitable
settings for basic arithmetic and for the extended arithmetic defined by
IEEE 854.  It is recommended that the default contexts be easily selectable
by the user.

In the basic default context, the parameters are set as follows:

    - flags: all set to 0
    - trap-enablers: inexact, rounded, and subnormal are set to 0; all
others
      are set to 1
    - precision: is set to 9
    - rounding: is set to round-half-up

In the extended default context, the parameters are set as follows:

    - flags: all set to 0
    - trap-enablers: all set to 0
    - precision: is set to the designated single precision
    - rounding: is set to round-half-even

Exceptional Conditions
----------------------

This section lists, in the abstract, the exceptional conditions that may
arise during the operations defined in the specification.

For each condition, the corresponding signal in the context is given,
along with the defined result:

    - Clamped: This occurs and signals ``clamped`` if the exponent of a
      result has been altered in order to fit the constraints of a specific
      concrete representation.

    - Conversion syntax: This occurs and signals ``invalid-operation`` if an
      string is being converted to a number and it does not conform to the
      numeric string syntax.  The result is [0,qNaN].

    - Division by zero: This occurs and signals ``division-by-zero`` if
      division of a non-zero finite number by zero was attempted.  The
result
      of the operation is [sign,inf], where sign is the *exclusive or* of
      the signs of the operands for divide.

    - Division impossible: This occurs and signals ``invalid-operation`` if
      the integer result of a divide-integer or remainder operation had too
      many digits.  The result is [0,qNaN].

    - Division undefined: This occurs and signals ``invalid-operation`` if
      division by zero was attempted, and the dividend is also zero.  The
      result is [0,qNaN].

    - Inexact: This occurs and signals ``inexact`` whenever the result of an
      operation is not exact (that is, it needed to be rounded and any
      discarded digits were non-zero), or if an overflow or underflow
      condition occurs.  The result in all cases is unchanged.

    - Insufficient storage: For many implementations, storage is needed for
      calculations and intermediate results, and on occasion an arithmetic
      operation may fail due to lack of storage.  The result is [0,qNaN].

    - Invalid context: This occurs and signals ``invalid-operation`` if an
      invalid context was detected during an operation.  The result is
      [0,qNaN].

    - Invalid operation: This occurs and signals ``invalid-operation`` in a
      variety of cases (an operand to an operation is a signaling NaN, an
      attempt is made to add [0,inf] to [1,inf] or to multiply 0 by [0,inf]
      or [1,inf], etc.).  The result of the operation after any of the
      invalid operations is [0,qNaN] except when the cause is a signaling
      NaN, in which case the result is [s,qNaN] or [s,qNaN,d] where the sign
      and diagnostic are copied from the signaling NaN.

    - Overflow: This occurs and signals ``overflow`` if the adjusted
exponent
      of a result, after rounding, would be greater than the largest value
      that can be handled.  The result depends on the rounding mode: largest
      finite number, [0,inf] or [1,inf]).  In all cases, Inexact and Rounded
      will also be raised.

    - Rounded: This occurs and signals ``rounded`` whenever the result of an
      operation is rounded (that is, some zero or non-zero digits were
      discarded from the coefficient), or if an overflow or underflow
      condition occurs.  The result in all cases is unchanged.

    - Subnormal: This occurs and signals ``subnormal`` whenever the result
of
      a conversion or operation is subnormal (that is, its adjusted exponent
      is less than the smallest value that can be handled, before any
      rounding).  The result in all cases is unchanged. 

    - Underflow: This occurs and signals ``underflow`` if a result is both
      inexact and sub-normal. 

Different kinds Of Rounding
---------------------------

``round-down``: The discarded digits are ignored; the result is unchanged
(round toward 0, truncate)::

    1.123 --> 1.12
    1.128 --> 1.12
    1.125 --> 1.12
    1.135 --> 1.13

``round-half-up``: If the discarded digits represent greater than or equal
to
half (0.5) then the result should be incremented by 1; otherwise the
discarded digits are ignored::

    1.123 --> 1.12
    1.128 --> 1.13
    1.125 --> 1.13
    1.135 --> 1.14

``round-half-even``: If the discarded digits represent greater than half
(0.5)
then the result coefficient is incremented by 1; if they represent less than
half, then the result is not adjusted; otherwise the result is unaltered if
its rightmost digit is even, or incremented by 1 if its rightmost digit is
odd (to make an even digit)::

    1.123 --> 1.12
    1.128 --> 1.13
    1.125 --> 1.12
    1.135 --> 1.14

``round-ceiling``: If all of the discarded digits are zero or if the sign is
negative the result is unchanged; otherwise, the result is incremented by
1::

    1.123 --> 1.13
    1.128 --> 1.13
    -1.123 --> -1.12
    -1.128 --> -1.12

``round-floor``: If all of the discarded digits are zero or if the sign is
positive the result is unchanged; otherwise, the absolute value of the
result
is incremented by 1::

    1.123 --> 1.12
    1.128 --> 1.12
    -1.123 --> -1.13
    -1.128 --> -1.13

``round-half-down``: If the discarded digits represent greater than half
(0.5) then the result is incremented by 1; otherwise the discarded digits
are
ignored::

    1.123 --> 1.12
    1.128 --> 1.13
    1.125 --> 1.12
    1.135 --> 1.13

``round-up``: If all of the discarded digits are zero the result is
unchanged, otherwise the result is incremented by 1 (round away from 0)::

    1.123 --> 1.13
    1.128 --> 1.13
    1.125 --> 1.13
    1.135 --> 1.14

Rationale
=========

I must separate the requeriments in two sections.  The first is to comply
with the ANSI standard.  All the needings for this are specified in the
Mike Cowlishaw's work.  He also provided a **comprehensive** suite of test
cases.

The second section of requeriments (standard Python functions support,
usability, etc) is detailed from here, where I'll include all the decisions
made and why, and all the subjects still being discussed.

Explicit construction
---------------------

The explicit construction does not get affected by the context (there is no
rounding, no limits by the precision, etc.), because the context affects
just
operations' results.

**From int or long**: There's no loss and no need to specify any other
information::

    Decimal(35)
    Decimal(-124)

**From string**: Strings with floats in normal and engineering notation will
be supported.  In this transformation there is no loss of information, as
the
string is directly converted to Decimal (there is not an intermediate
conversion through float)::

    Decimal("-12")
    Decimal("23.2e-7")

**From float**: The initial discussion on this item was what should happen
when passing floating point to the constructor:

    1. ``Decimal(1.1) == Decimal('1.1')``

    2. ``Decimal(1.1) ==
Decimal('110000000000000008881784197001252...e-51')``

    3. an exception is raised

Several peopel allegued that (1) is the better option here, because it's
what
you expect when writing ``Decimal(1.1)``.  And quoting John Roth, it's easy
to implement: "It's not at all difficult to find where the actual number
ends
and where the fuzz begins.  You can do it visually, and the algorithms to do
it are quite well known".

But If I *really* want my number to be
``Decimal('110000000000000008881784197001252...e-51')``, why can not write
``Decimal(1.1)``?  Why should I expect Decimal to be "rounding" it?
Remember
that ``1.1`` *is* binary floating point, so I can predict the result.  It's
not intuitive to a begginer, but that's the way it is.

Anyway, Paul Moore shown that (1) can't be, because::

    (1) says  D(1.1) == D('1.1')
    but       1.1 == 1.1000000000000001
    so        D(1.1) == D(1.1000000000000001)
    together: D(1.1000000000000001) == D('1.1')  

which is wrong, because if I write ``Decimal('1.1')`` it is exact, not
``D(1.1000000000000001)``.  He also proposed to have an explicit conversion
to float.  bokr says you need to put the precision in the constructor and
mwilson has the idea to::

    d = Decimal (1.1, 1)  # take float value to 1 decimal place
    d = Decimal (1.1)  # gets `places` from pre-set context

But Alex Martelli says that "Constructing with some specified precision
would
be fine.  Thus, I think *construction from float with some default
precision*
runs a substantial risk of tricking naive users."

So, I think that the best solution is to have a parameter that says in which
position after the decimal point you apply a round-half-up rounding.  If you
do not specify this parameter, you get an exact conversion. In this way::

    Decimal(1.1, 2) == Decimal('1.1')
    Decimal(1.1, 16) == Decimal('1.1000000000000001')
    Decimal(1.1) == Decimal('110000000000000008881784197001252...e-51')

**From tuples**: Aahz suggested to construc from tuples: it's easier to
implement ``eval()``'s round trip and "someone who has numeric values
representing a Decimal does not need to convert them to a string."

The structure will be a tuple of three elements: sign, number and exponent.
The sign is 1 or 0, the number is a tuple of decimal digits and the exponent
is a signed int or long::

    Decimal((1, (3, 2, 2, 5), -2))     # for -32.25

**From Decimal**: No mistery here, just a copy.

**Syntax to all the cases**::

    Decimal(value, [decimal_digits])

where ``value`` can be any of the data types just mentioned and
``decimal_digits`` is allowed only when value is float.

Implicit construction
---------------------

As the implicit construction is the consequence of an operation, it will be
affected by the context as is detailed in each point.

John Roth suggested that "The other type should be handled in the same way
the decimal() constructor would handle it".  But Alex Martelli thinks that
"this total breach with Python tradition would be a terrible mistake.
23+"43" is NOT handled in the same way as 23+int("45"), and a VERY good
thing
that is too.  It's a completely different thing for a user to EXPLICITLY
indicate they want construction (conversion) and to just happen to sum two
objects one of which by mistake could be a string."

So, here I define the behaviour again for each data type.

**From int or long**: Aahz suggested the need of an explicit conversion from
int, but also thinks it's ok if the precision in the current Context is not
exceeded; in that case you raise ValueError.  Votes in comp.lang.python
agreed with this.

**From string**: Everybody agrees to raise an exception here.

**From float**: Aahz is strongly opposed to interact with float: "The
problem
is that Decimal is capable of greater precision, accuracy, and range than
float", suggesting an explicit conversion.

But in Python it's ok to do ``35 + 1.1``, so why can't I do
``Decimal(35) + 1.1``?  We agree that when a naive user writes ``1.1``
doesn't know that he's being inexact, but that happens in the both examples
I
just mentioned.

So, what should we do? I propose to allow the interaction with float, making
an exact conversion and raising ValueError if exceeds the precision in the
current context (this is maybe too tricky, because for example with a
precision of 9, ``Decimal(35) + 1.2`` is ok but ``Decimal(35) + 1.1`` raises
an error).

**From Decimal**: There isn't any issue here.

Use of Context
--------------

In the last prePEP I said that "The Context must be omnipresent, meaning
that
changes to it affects all the current and future Decimal instances".  I was
wrong.  In response, John Roth said that "The context should be selectable
for the particular usage.  That is, it should be possible to have several
different contexts in play at one time in an application."

In comp.lang.python, Aahz explained that the idea is to have a "context per
thread".  So, all the instances of a thread belongs to a context, and you
can
change a context in thread A (and the behaviour of the instances of that
thread) without changing nothing in thread B.

Also, and again correcting me, he said that the "Context applies only to
operations, not to Decimal instances; changing the Context does not affect
existing instances if there are no operations on them".

Arguing about special cases when there's need to perform operations with
other rules that those of the current context, Tim Peters said that the
context will have the operations as methods.  This way, the user "can create
whatever private context object(s) it needs, and spell arithmetic as
explicit
method calls on its private context object(s), so that the default thread
context object is neither consulted nor modified".

Python Usability
----------------

- Decimal should support the basic aritmetic (``+, -, *, /, //, **, %,
  divmod``) and comparison (``==, !=, <, >, <=, >=, cmp``) operators in the
  following cases (check `Implicit Construction`_ to see what types could
  OtherType be, and what happens in each case):

    - Decimal op Decimal
    - Decimal op otherType
    - otherType op Decimal
    - Decimal op= Decimal
    - Decimal op= otherType

- Decimal should support unary operators (``-, +, abs``).

- Decimal should support the built-in methods:

        - min, max
        - float, int, long
        - str, repr
        - hash
        - copy, deepcopy
        - bool (0 is false, otherwise true)

- Calling repr() should do round trip, meaning that::

       m = Decimal(...)
       m == eval(repr(m))

- Decimal should be immutable.

Reference Implementation
========================

To be included later:

    - code
    - test code
    - documentation

Copyright
=========

This document has been placed in the public domain.

..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   End:

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
ADVERTENCIA  

La información contenida en este mensaje y cualquier archivo anexo al mismo,
son para uso exclusivo del destinatario y pueden contener información
confidencial o propietaria, cuya divulgación es sancionada por la ley. 

Si Ud. No es uno de los destinatarios consignados o la persona responsable
de hacer llegar este mensaje a los destinatarios consignados, no está
autorizado a divulgar, copiar, distribuir o retener información (o parte de
ella) contenida en este mensaje. Por favor notifíquenos respondiendo al
remitente, borre el mensaje original y borre las copias (impresas o grabadas
en cualquier medio magnético) que pueda haber realizado del mismo. 

Todas las opiniones contenidas en este mail son propias del autor del
mensaje y no necesariamente coinciden con las de Telefónica Comunicaciones
Personales S.A. o alguna empresa asociada. 

Los mensajes electrónicos pueden ser alterados, motivo por el cual
Telefónica Comunicaciones Personales S.A. no aceptará ninguna obligación
cualquiera sea el resultante de este mensaje. 

Muchas Gracias.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20031229/2d37860e/attachment.html>