CONTEST: PyWin crashes with pure Python code!

Christian Tismer tismer at appliedbiometrics.com
Fri Sep 3 10:07:38 EDT 1999


Hi folks,

I'm starting a contest to solve

   *** A SERIOUS PROBLEM ***

There seems to be an old bug in one of the win32 modules which
PythonWin uses. Please solve it and get a free web site
(or visit me and get invited if you prefer :).

I want to encourage everybody to solve this, since by
principle this bug possibly invalidates any output of a
PythonWin program. There is some wrong pointer somewhere
which breaks heap structures at some time.

Where is the error?
-------------------
I don't even know if a win32 module is guilty or if it
is a hidden native Python bug which just doesn't show
up without PythonWin. Anyway, the fact that PythonWin
exposes the bug leads to the assumption that it is caused
by a module in the win32 extensions which PythonWin makes
use of. If I should be wrong with this, please take my
apologies in advance. Also my best to Mark Hammond who
knows about this bug's existance but was like me unable
to find it yet.

How to produce the error?
-------------------------

If you have a large text file (my XML file is 28 MB) and run
it with the attached script, then it will crash in PyWin, but
run just under the Python shell.

Some weeks ago, I had something similar with large Excel files,
and I thought there is a bug in the COM code.

Now that I see the same effect without any reference to COM,
I believe it is somewhere else. *Some*thing is stepping into
the small block heap structures, and when my program uses
really much memory, this shows up. It seems to be related
to the allocation of many small structures, like tuples,
dict keys and so on.

This error occours both with standard PythonWin and with
the PyWin 2.b2, which both depend on win32all-125, and there
must be a nasty old bug, believe me.

If you are unable to produce the error by yourself, drop
me a note, and I will give you the URL of a big file.


CONTEST
-------

Whoever completely solves this problem first will get a free
toplevel domain and plenty of webspace on starship.skyport.net
from me. Be warned: This is hard to find and will cost
you many hours of debugging. The patch should both pass my
attached script, and document what the error was and how
it was found.

ciao - chris

p.s.: Originally I've sent this message to win32all, but I think
this has to be solved ASAP, so the mainlist gives me a larger
audience, and I cannot solve it alone.

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101   :    *Starship* http://starship.python.net
10553 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home
-------------- next part --------------
"""
PythonWin crashes in the small block heap.
Usage:
Use a very big file, read it with readlines,
and then insert every token into the code_finder.
It will crash. My file had 28 MB.
"""

class code_finder:
    def __init__(self):
        self.codes = {}
    def add_token(self, codestr, weight=1):
        oldweight = self.codes.get(codestr, 0)
        self.codes[codestr] = oldweight + weight
    def token_list(self):
        res = []
        for key, weight in self.codes.items():
            res.append( (weight, key) )
        res.sort()
        return res
    def calculate(self):
        tree = huff_merge(self.token_list())
        return huff_table(huff_traverse(tree))

def test(bigfile = "d:/tmp/praepalloid.xml"):
    print "reading data"
    lines = open(bigfile).readlines()
    print "inserting tokens"
    import string, sys
    cf = code_finder()
    n = 0
    for line in lines:
        for tok in string.split(line):
            cf.add_token(tok)
        n = n+1
        if n % 100 == 0:
            print n,
            sys.stdout.flush()
            if n % 2500 == 0: print

if __name__ == "__main__":
    print "if on PyWin, I will crash"
    print "but with pure Python, I will run"
    test()

    


More information about the Python-list mailing list