[ python-Bugs-1746088 ] long.__str__ is quadratic time
SourceForge.net
noreply at sourceforge.net
Tue Jul 3 20:23:52 CEST 2007
Bugs item #1746088, was opened at 2007-07-01 11:42
Message generated for change (Comment added) made by dbenbenn
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746088&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: David Benbennick (dbenbenn)
Assigned to: Nobody/Anonymous (nobody)
Summary: long.__str__ is quadratic time
Initial Comment:
In the 2.5.1 source code, Objects/longobject.c:long_format() is used to convert a long int to a string. When base is not a power of 2, it is quadratic time in the length of the output string. Of course, this is fine on small numbers, but is catastrophic on huge numbers.
Suppose base is 10. The problem is that the algorithm basically does the following: take the number mod 10 to get a digit, stick that digit on the output, divide the number by 10, and repeat.
To print an n digit number, there is an O(n log n) algorithm, using divide-and-conquer. You break the number into 2 pieces, each n/2 digits long, and iterate on both pieces.
Converting string to long has the same quadratic time problem, in PyLong_FromString(). The solution is the same, in reverse: break the string in half, convert each piece to a long, and combine the two longs into one.
Alternatively, Python could just use GMP (GNU MP Bignum Library, http://gmplib.org/) to provide long integers. That would make other operations, such as * and /, more efficient, too. But it would require a much bigger change.
----------------------------------------------------------------------
>Comment By: David Benbennick (dbenbenn)
Date: 2007-07-03 14:23
Message:
Logged In: YES
user_id=95581
Originator: YES
> rely on having an FFT-based fast multiplication algorithm, together
with
> some form of divide-and-conquer division---is this right?
Yes, that's true: fast str() relies on fast division. I had assumed
Python already had fast division; if it doesn't, I'd consider that a bug,
too.
> What's your use-case for printing huge integers fast?
Note that it's not a question of printing them *fast*. With a quadratic
time algorithm, it's infeasible to print huge numbers *at all*. My
personal use case is doing computations in Thompson's group F; an element
of F is a list of humongous fractions. But I expect it's a problem that
often comes up in mathematical programming. I'll admit it isn't a very
important bug, since anyone who is harmed by it will either use a different
language, or use gmpy, or print in hex. But it's still a bug.
> Regarding GMP, I believe there are licensing issues: it's not legal to
> include GMP in core Python and release Python under its current non-GPL
> license, or something like that---I don't know anything about the
details.
I don't see what the problem would be. Python's LICENSE file says that
Python's license is GPL compatible. And in any case, GMP is LGPL, not GPL,
so any program can link to it.
----------------------------------------------------------------------
Comment By: Mark Dickinson (marketdickinson)
Date: 2007-07-03 10:22
Message:
Logged In: YES
user_id=703403
Originator: NO
I'd call this a feature request rather than a bug.
If I understand correctly, an O(n^(1+epsilon)) printing algorithm would
rely on having an FFT-based fast multiplication algorithm, together with
some form of divide-and-conquer division---is this right? These algorithms
are nontrivial to implement efficiently, and even then the crossover point
(where the FFT-based method becomes faster than the quadratic method) is
likely to be in the thousands of digits. So I can't imagine there's much
demand for this---even a 4096-bit RSA key is only 1233 (or 1234) digits
long. If you just want subquadratic printing (O(n^1.585) or so) then you'd
still need a subquadratic division (Python already has Karatsuba
multiplication for large integers); here I guess the crossover would be
smaller. A subquadratic division for Python might well be of interest to
the developers, if someone could be persuaded to write and test one, and
demonstrate a significant positive impact on performance.
What's your use-case for printing huge integers fast? It doesn't seem
like a very common need.
Regarding GMP, I believe there are licensing issues: it's not legal to
include GMP in core Python and release Python under its current non-GPL
license, or something like that---I don't know anything about the details.
But have you encountered Martelli's gmpy?
http://code.google.com/p/gmpy/
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746088&group_id=5470
More information about the Python-bugs-list
mailing list