[Python-Dev] Rewrite of cmath module?

Sun Mar 18 05:03:13 CET 2007

The current cmath module has some significant numerical problems,
particularly in the inverse trig and inverse hyperbolic trig
functions.  I've listed a few of these below, but for now I'll just
note the following upsetting result:

>>> asin(-1e8)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
OverflowError: math range error

(The true value is somewhere around -pi/2 +/- 19.1138j, which is
hardly deserving of an overflow error.)

I'm wondering whether anyone would be interested in a rewrite of the
cmath module.  I have a drop-in replacement written in pure Python,
based largely on the formulas given by Kahan in his `Much Ado about
Nothing's Sign Bit' article, which I believe eliminates the vast
majority of the current numerical problems.  Besides the numerical
fixes, the only significant difference from the current behaviour is
that the branch cuts for asinh have been moved. (The docs suggest that
the current branch cuts for asinh should be regarded as buggy).  I'm
offering to translate this into C and add appropriate documentation
and test cases, but I have a couple of questions before I start.

(1) Is there actually any interest in fixing this module?  I guess
that the cmath module can't be used that much, otherwise these
problems would have surfaced before.

(2) (Disregard if the answer to question 1 is `No'.)  What should be
done about branch cuts?  The value of a function on a branch cut is
going to depend on signs of zeros, so it'll be pretty much impossible
to make any guarantees about the behaviour.  Even so, there are two
approaches that seem feasible:

(a) One can ensure that on a platform that happens to have IEEE-754
compliant hardware and a signed-zero-friendly math library, the
complex functions `do the right thing': i.e., are continuous on both
sides of each of the branch cuts, using the sign of zero on a branch
cut to determine which side of the branch cut the user is interested
in.  Doing this portably in C89 involves some trickery, but it seems
possible.  For other platforms behaviour on the branch cuts would be
undefined.  Alternatively:

(b) Force collapsing of all signed zeros to `positive' zero, so that
the functions are continuous only on one side of the branch cuts, and
the continuity is exactly as specified in the current documentation
(though probably including the suggested changes for atan and atanh.)
This is mostly what happens with the current implementation, though
it's not clear to me whether that's by design or by accident.

(c) Some other approach that I haven't considered.

Which of these approaches is preferable?  The Python module that I
have currently does (a).

(3) Is it worth trying to be consistent in exceptional cases?  The
current math module documentation notes that math.log(0) might produce
-inf on one platform and raise ValueError on another, so being consistent
about whether a non-representable result gives a NaN, an infinity or a
Python exception seems as though it would be tricky.  I should also note that
I've made no effort to do the right thing when the argument to any of the
functions contains a NaN or an infinity in either the real or imaginary part.

Here are a few examples of the current cmath problems, on my machine
(OS X 10.4.8/PowerPC).

(*) As already mentioned in the docs, the current branch cuts for
asinh are somewhat unconventional, and should probably be fixed.  I
believe that this is a fairly trivial fix.

(*) asinh gives inaccurate answers in the portion of the complex plane
between the current branch cuts, when the real part is large and
negative.

>>> asinh(-1e7+0.9j)
(-16.811242831518271+8.9158318101199366e-08j)
>>> asinh(-1e8+0.9j)
(-19.113827924512311+0j)
(The imaginary parts here should be approximately 8.999999999999e-8 and
8.999999999999e-9 respectively.)

(*) asinh gives inaccurate results for small x:

>>> asinh(1e-12)
(1.000088900582091e-12+0j)
>>> asinh(1e-20)
(4.4408920985006257e-16+0j)

(asinh(x) ~ x - 1/6 x^3 + O(x^5) for small x, so the correct values
are 1e-12 and 1e-20 to within machine precision.)

(*) acos behaves badly for large reals:

>>> acos(1e7)
16.805431370234086j # true value ~ 16.811242831518262597j
>>> acos(1e8)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
OverflowError: math range error

(*) tan and tanh overflow unnecessarily, even though tan(z) and
tanh(z) should be representable for all representable complex arguments.

>>> tan(1000j)
(nannanj)
(Actual value: 1j, to within machine precision.)

(*) log and sqrt overflow unnecessarily for arguments close to the
limits of double precision:

>>> z = 1.4e308+1.4e308j
>>> sqrt(z)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
OverflowError: math range error
>>> log(z)
(inf+0.78539816339744828j)

(*) sqrt gives inaccurate answers for very small arguments (where the
real and imaginary parts are denormalized numbers):

>>> z = .5e-323 + .5e-323j
>>> sqrt(z)
(2.2227587494850775e-162+0j)

(Correct value for sqrt(z) is close to 2.442109e-162 + 1.011554e-162j.)

(*) exp, cosh and sinh overflow unnecessarily for some values. For
example:

>>> exp(710+pi/4j)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
OverflowError: math range error

The correct value is around 1.579673e308 + 1.579673e308j, which should
be representable in IEEE double precision.

This last problem is almost not worth fixing: these functions are
going to overflow eventually anyway, and whether that happens when the
real part of the argument is 709.0 or 710.0 isn't going to make a huge
difference to anyone's life.  In contrast, functions like acos, log
and sqrt should have representable output for *any* finite
representable complex number (with the obvious exception of 0 for
log), so it's a bit embarrassing if they fail for some inputs.

One more thing: since this is my first post to python-dev I should
probably introduce myself.  I'm a mathematician, working mostly in
number theory.  I learnt programming and numerical analysis the hard
way, coding service-life prediction algorithms for rocket motors in
Fortran 77 during several long summers.  I've been using Python for
more than ten years, mainly for teaching but also occasionally for
research purposes.

Mark Dickinson