Google Code Jam language usage

Bearophile bearophileHUGS at lycos.com
Tue Sep 15 09:04:00 CEST 2009


(If you see any error in what I have written here, please tell me.)

So far I've never done a Google Code Jam. There are 12 problems there
(G.C.Jam 2009), and the best solutions are 9 in C++ and 3 in C (I
think they are the best solutions, but I am not sure).
The code of all such best solutions is tidy, well formatted, and
written in a quite compact style, with little extra space, few lines
of code, and short variable names (often 1 or 2 chars long).

The C++ code by TripleM shows he/she is also a topcoder, such code is
written in an aggressive style: they show a standard header where all
data structures from C++ STL are imported, plus some handy and common
templates/functions (with vector, matrix, etc, operations). I think it
may be positive to add similar handy templates/functions/data
structures to Boost or the STL. For example I've lost count of how
many times I have seen reimplemented a vector class/struct with dot
products, etc.

Generally in such contests programs are not that similar to real-world
ones, for example inputs ranges/sizes are perfectly known. But
probably this makes such contests "better".

The plots show that Python programmers seem to produce a higher
percentage of working programs compared to C ones, curiously Pascal
programmers have an higher percentage still. As usual Haskell
programmers look smart.

In other similar contests Python often doesn't allow to produce the
best solutions because it's too much slow at runtime, even using
Psyco. For example in the Code Jam the large problems may require lot
of processing. But experience shows that even for such algorithmic
problems the choice of language isn't the most important thing. The
most important is the choice of a good algorithm, the choice of good
data structures, the use of as little dynamic memory allocations as
possible, and then only now the language becomes important for the
running time.

Problems in such contests are often designed with well defined input
ranges/sizes, and dynamic allocations are one of the slower things in
such programs. So when possible it's better to use static memory, for
example a statically allocated matrix in C.

Python encourages the programmer to use a programming style where
dynamic allocations of memory are very common, but when performance
matters it's better to reduce them to the minimun in the inner loops.

For example this is (I think) the best Python program for the "Welcome
to Code Jam" qualification problem:

# By ZIBADA
import sys
import psyco
psyco.full()
def isset(a, b): return ((a >> b) & 1) != 0
def dbg(a): sys.stderr.write(str(a))
def readint(): return int(raw_input())
def readfloat(): return float(raw_input())
def readarray(foo): return [foo(x) for x in raw_input().split()]

def doit(i, j):
	if (i, j) in mem: return mem[(i, j)]
	if (j >= len(b)): return 1
	if (i >= len(a)): return 0
	res = doit(i + 1, j)
	if (a[i] == b[j]):
		res += doit(i + 1, j + 1)
	res %= 10000
	mem[(i, j)] = res
	return res

def run_test(test):
	#dbg("Test %d\n" % (test + 1))
	global a, b, mem
	a = raw_input()
	b = "welcome to code jam"
	mem = {}

	print "Case #%d: %04d" % (test + 1, doit(0, 0))

for test in range(readint()):
	run_test(test)


It uses dynamic allocation and recursive calls. It processes the large
input in 0.56 s on my old PC.
The following is instead the starting part of (I think) the best
overall solution, that is in C (the algorithm used is a basic and well
known one, no recursions used. Run time is very fast, I can't time
it):

// By LayCurse
char a[512];
char b[32] = "welcome to code jam";
int d[512][32];
int main() {
    ...
    for (t = 1;t <= tc; t++) {
        fgets(a, 505, fp);
        memset(d, 0, sizeof(d));
...


It uses a static matrix, and the number of columns is 32 instead of 19
(= len("welcome to code jam")) so the compiler can use just a shift to
find the right row start. One memset clears the whole matrix for each
input string.

This is the same C code by LayCurse translated (with few small
improvements) to Python+Psyco, it runs in about 0.14 seconds here on
the same dataset. Using the right algorithm, right data structures,
and minimizing memory allocations, with Psyco you can often go fast
enough:


from array import array
def main():
    b = "welcome to code jam"
    len_b = len(b)
    d = [array('l', [0]) * len_b for _ in xrange(512)]
    for t in xrange(int(raw_input())):
        a = raw_input()
        for row in d: # set d to 0
            for j in xrange(len_b):
                row[j] = 0
        if a[0] == b[0]:
            d[0][0] = 1
        for i in xrange(1, len(a)):
            d[i][0] += d[i - 1][0]
            if b[0] == a[i]:
                d[i][0] += 1
            for j in xrange(1, len_b):
                d[i][j] += d[i - 1][j]
                if b[j] == a[i]:
                    d[i][j] += d[i - 1][j - 1]
                d[i][j] %= 10000
        print "Case #%d: %04d" % (t+1, d[len(a) - 1][len_b - 1])
import psyco; psyco.full()
main()

(There can be ways to speed up this Python code, I have not tried to
use a 1D matrix with shifts to find the right starting of the rows as
in C, and often in such dynamic programming algorithms you can just
keep 2 rows to avoid storing the whole dynamic matrix, this saves
memory and speed up code because there are less cache misses. Using
Psyco with array.array CPU cache misses can be felt, in normal Python
code they are almost invisible, lost in the noise).

In Python I'd like to have a built-in list&array method to assign all
items to a fixed value:

a1 = [1, 2, 3]
a2 = [array.array('l', [0]) * 5

a1.set(0)
a2.set(0)

That equals to the following, but it's faster:
for i in xrange(len(a1)):
    a1[i] = 0
for i in xrange(len(a2)):
    a2[i] = 0


I'd also like to have 64 bit signed/unsigned integers in array.array.


Such online contests usually ignore compilation times, but in
languages that have ways to perform computations at compile time (C++
with its templates, and even more in D that has better templates and
CTFE http://en.wikipedia.org/wiki/Compile_time_function_execution )
such times can become important, because some precomputations can be
moved to compile time. If compilation time counts too, then in such
contests Python will improve its stats (a slow compilation time can be
bad for the testing of the code, so it's a disadvantage for the
programmer too).

Bye,
bearophile



More information about the Python-list mailing list