Python hits the spot

Sat Jun 22 13:53:26 EDT 2002

On Sat, 22 Jun 2002 15:59:10 +0200, Siegfried Gonzi <siegfried.gonzi at kfunigraz.ac.at> wrote:

>A few weeks ago I complaint here that Python on Windows XP sucks. I had
>a problem which would took theoretically about 1.5 hours; but in reality
>it takes 6 hours on Windows XP and Python 2.2. After that 6 hours I must
>reboot my system in order to work on.
If you have the funds, and Tim is available and willing, perhaps you could
hire him to solve the problem. If the problem is a bug in Python, it would
be great to get it fixed ;-)

If the problem is in the way your code uses Python, e.g., consuming memory
resources and not releasing them for reuse, it would be good to identify
the pattern as a Python programming pitfall. If it is a problem with your
use of SWIG, or F2PY, likewise it would be good to know what the real problem is.
Also you could get your program fixed, and the FUD from this unresolved
problem could be dispelled.

If somehow your program has exposed a problem shared by Windows and Linux,
that would be interesting too ;-)

>
>In the following I got the advice to install a real user system on my
>laptop. Okay, I bought Linux SuSE 8.0 and installed it on my laptop (256
>MB RAM, 1000 MHz Celeron). I also installed Python 2.2 (the ActivePython
>version, otherwise I couldn't install idle due to the annoying
>"free_software_garbage_2_weeks_installation_nightmare").
>
>SWIG helped me to make my external Python C function (with gcc); and
>F2PY helped me to create my external Python Fortran 90 function (with
>the Intel Fortran 95 compiler).
Did you run individual tests on these? Are you mixing support libraries
and compiled code from different environments??
>
>The directories and all the other programs and data are the same as on
>Windows.
>
>
>1. After starting the process from within idle the complete system
>freezes after the first wavelength (after about 20 minutes). I couldn't
>kill a process or open a different session (via F2) in order to kill
>python. The last resort: turn off the power supply.
What did you determine about why it "froze"??
>
>2. I thought it is better to start the process from the command line and
>in a X terminal: python forcing.py
>The same behavior as aforementioned: last resort: power supply. That is
>really good and fine because a Linux system really is greedy for "shut
>down immediately".
If there were a problem with your code, why wouldn't it move to the
new context? ISTM you have decided the problem is in the context, and
are desperately trying to find a context that will make your code work,
but what if the problem is in the code?

>
>3. I thought it has to do with Gnome and I logged in to a KDE session.
>Same behavior as in point 1. and 2.
Sounds like looking for another part of the context to be at fault.
(Do you actually need a GUI, BTW?)
>
>
>My conclusion: I will rewrite my simulation in Fortran 90 and will
>abadon Python. Python is maybe good for some small scripts but not
>appropiate for serious programming.
Rewriting in Fortran may solve your problem, because it will change
your code, not just the context.
>
>Thank you Billy Boy for a forgiving user system Windows XP which in turn
>relieves Python's "wreak havoc" behavior.
?? ISTM you are pointing fingers, but you have not yet really diagnosed
the problem (I may have missed some posts, but I would think you'd mention
in this post if you knew what the real problem cause was).

>
>I cannot believe that Linux is responsible for Python's ill behavior
>under Linux. Before you start ranting about note the following: In a
No, I doubt whether the context is responsible, especially since you
have tried two. But I suspect it's not Python per se either.

>first step it has nothing to do with my programming style (admittedly
>not very good and famous) it has to do that the same calculation cannot
>be performed on Linux and Python 2.2!
The "same calculation" is evidently not "the same." ;-)

Have you never chased a bug that only shows up in secondary effects?
In C code, the most common way for that to happen is storing something
through a bad pointer, but instead of getting a memory fault, it corrupts
some data structure or data value. The value could be used 20 minutes later
and cause GPF then, or e.g., it could be data that's part of a calculation,
but maybe it's only overwriting the least significant half of a floating
point number, so you get the effect systematic bias or random error injected
into your calculation, but nothing else. If you don't look for explanations
of last-decimal errors, this kind of thing can persist a long time. OTOH,
you could be corrupting numbers that are used in an iterative calculation
that mysteriously starts to take longer to converge, and does so in a systematically
increasing way, because the corrupting data varies systematically with time.
Or you could be corrupting data structures and sabotaging memory management
without causing immediate GPF, etc. etc.

The classic thing then is to compile a debug version, and have it run just fine,
because memory layout is different, and the bad pointer usage just throws a value
into an unused space (e.g., space at the end of an input line buffer that is never
used with max length lines). (Combine this with multi threading and multiprocessing,
and you may have a _really_ challenging debugging problem ;-)

If you move from windows to Linux, the chances are something is laid out differently ;-)
That's just an example. There are lots of ways to generate secondary-symptom bugs.

Another possibility that could lead to bogging down is a sensitive iterative algorithm.
If your problem is running in a very sensitive region, just the differences in a few
least significant bits returned by different library routines for the same function
can do it. Or differences due to different order of evaluation between different compilers.

Your result could be bouncing around chaotically and only satisfying convergence
criteria with some systematically decreasing probability, instead of deterministically
converging. Hopefully you would know if you are operating on the hairy edge, but such things
can be hidden in the way an algorithm is implemented. If C is involved, float instead of double
somewhere, in conjunction with some unhappy choice of coordinate system origins, could
easily cause problems.  BTW, if you have iterative algorithms, have you printed out iteration
counts, for starters? Anything surprising there?

Do you have some _exact_ test problems that you've used for unit tests? _ANY_ deviation from
_exactly_ correct results needs an explanation.

What happens if you systematically stub out major routines? Can the program run
to completion without slowing down? Can you identify the routine that is causing
the problem (or inducing Python to cause the problem)? How about just timing various
parts, to see which are constant and which is bogging down?

You apparently have a lot of different stuff glued together. Have you tested the
separate pieces separately? Do you have some test problems that you can use to validate
numerical results *exactly* (trailing decimal differences need an explanation. They
can be benign/acceptable or not).

I hate to see Python (or anything) bashed without proof. So far I haven't seen the proof,
nor an explanation.

If your code is open source, maybe someone will be interested in discovering why
it bogs Python down (or Python bogs it down ;-)

Regards,
Bengt Richter