[Python-Dev] int(string) (was: DRAFT: python-dev Summary for 2005-09-01 through 2005-09-16)
tim.peters at gmail.com
Sat Oct 22 04:52:36 CEST 2005
> Speeding up list append calls
> A `comp.lang.python message from Tim Peters`_ prompted Neal Norwitz
> to investigate how the code that Tim posted could be sped up. He
> hacked the code to replace var.append() with the LIST_APPEND opcode,
Someone want a finite project that would _really_ help their Uncle
Timmy in his slow-motion crusade to get Python on the list of "solved
it!" languages for each problem on that magnificent site?
It turns out that many of the problems there have input encoded as
vast quantities of integers (stdin is a mass of decimal integers on
one or more lines). Most infamous for Python is this tutorial (you
don't get points for solving it) problem, which is _trying_ to test
whether your language of choice can read from stdin "fast enough":
The input begins with two positive integers n k (n, k<=10**7). The
next n lines of input contain one positive integer t_i, not greater
than 10**9, each.
Write a single integer to output, denoting how many integers t_i are
divisable by k.
There's an 8-second time limit, and I believe stdin is about 8MB
(you're never allowed to see the actual input they use). They have a
slower machine than you use ;-), so it's harder than it sounds. To
date, 975 people have submitted a program that passed, but only a few
managed to do it using Python. I did, and it required every trick in
the book, including using psyco.
Turns out it's _not_ input speed that's the problem here, and not even
mainly the speed of integer mod: the bulk of the time is spent in
int(string) (and, yes, that's also far more important to the problem
Neal was looking at than list.append time). If you can even track all
the levels of C function calls that ends up invoking <wink>, you find
yourself in PyOS_strtoul(), which is a nifty all-purpose routine that
accepts inputs in bases 2 thru 36, can auto-detect base, and does
platform-independent overflow checking at the cost of a division per
digit. All those are features, but it makes for sloooow conversion.
I assume it's the overflow-checking that's the major time sink, and
it's not correct anyway: it does the check slightly differently for
base 10 than for any other base, explained only in the checkin comment
for rev 2.13, 8 years ago:
For base 10, cast unsigned long to long before testing overflow.
This prevents 4294967296 from being an acceptable way to spell zero!
So what are the odds that base 10 was the _only_ base that had a "bad
input" case for the overflow-check method used? If you thought
"slim", you were right ;-) Here are other bad cases, under all Python
versions to date (on a 32-bit box; if sizeof(long) == 8, there are
different bad cases):
int('102002022201221111211', 3) = 0
int('32244002423141', 5) = 0
int('1550104015504', 6) = 0
int('211301422354', 7) = 0
int('12068657454', 9) = 0
int('1904440554', 11) = 0
int('9ba461594', 12) = 0
int('535a79889', 13) = 0
int('2ca5b7464', 14) = 0
int('1a20dcd81', 15) = 0
int('a7ffda91', 17) = 0
int('704he7g4', 18) = 0
int('4f5aff66', 19) = 0
int('3723ai4g', 20) = 0
int('281d55i4', 21) = 0
int('1fj8b184', 22) = 0
int('1606k7ic', 23) = 0
int('mb994ag', 24) = 0
int('hek2mgl', 25) = 0
int('dnchbnm', 26) = 0
int('b28jpdm', 27) = 0
int('8pfgih4', 28) = 0
int('76beigg', 29) = 0
int('5qmcpqg', 30) = 0
int('4q0jto4', 31) = 0
int('3aokq94', 33) = 0
int('2qhxjli', 34) = 0
int('2br45qb', 35) = 0
int('1z141z4', 36) = 0
IOW, the only bases that _aren't_ "bad" are powers of 2, and 10
because it's special-cased (BTW, I'm not sure that base 10 doesn't
have a different bad case now, but don't care enough to prove it one
way or the other).
Now fixing that is easy: the problem comes from being too clever,
doing both a multiply and an addition before checking for overflow.
Check each operation on its own and it would be bulletproof, without
special-casing. But that might be even slower (it would remove the
branch special-casing 10, but add a cheap integer addition overflow
check with its own branch).
The challenge (should you decide to accept it <wink>) is to replace
the overflow-checking with something both correct _and_ much faster
than doing n integer divisions for an n-character input. For example,
36**6 < 2**32-1, so whenever the input has no more than 6 digits
overflow is impossible regardless of base and regardless of platform.
That's simple and exploitable. For extra credit, make int(string) go
faster than preparing your taxes ;-)
BTW, Python as-is can be used to solve many (I'd bet most) of these
problems in the time limit imposed, although it may take some effort,
and it may not be possible without using psyco. A Python triumph I'm
particularly fond of:
The legend at the bottom:
Warning: large Input/Output data, be careful with certain languages
seems to be a euphemism for "don't even think about using Python" <0.9 wink>.
But there's a big difference in this one: it's a _hard_ problem,
requiring graph analysis, delicate computation, greater than
double-precision precision (in the end), and can hugely benefit from
preprocessing a batch of queries to plan out and minimize the number
of operations needed. Five people have solved it to date (click on
"Best Solutions"), and you'll see that my Python entry is the
second-fastest so far, beating 3 C++ entries by 3 excellent C++
programmers. I don't know what they did, but I suspect I was far more
willing to code up an effective but tedious "plan out and minimize"
phase _because_ I was using Python. I sure didn't beat them on
reading the mass quantities of integers from stdin <wink>.
More information about the Python-Dev