Python too slow for real world

Christian Tismer tismer at appliedbiometrics.com
Sat Apr 24 09:58:57 EDT 1999


"Magnus L. Hetland" wrote:
> 
> Christian Tismer <tismer at appliedbiometrics.com> writes:
> 
> > Just did a little more cleanup to the code.
> > This it is:
> 
> Hm. This code is nice enough (although not very intuitive...) But
> isn't it a bit troublesome that this sort of thing (which in many ways
> is a natural application for Python) is so much simpler to implement
> (in an efficient enough way) in Perl?

Well, Python has its trouble with its generalism, all the
object protocols, the stack machine, the name lookups, which
all apply even for simplest problems like Arne's.
This leads to non-intutive optimization tricks which I showed.
Although my buffering techique applies to other languages as
well. The brain damaging concept is running over big, partial
chunks of memory, trying to process them effectively without
much object creation, and making sure that the parts glue
together correctly, the last record isn't missing and so on.
The real work is hidden somewhere between like a side effect.

> Can something be done about it? Perhaps a buffering parameter to
> fileinput? In that case, a lot of the code could be put in that
> module, as part of the standard distribution... Even so -- you would
> somehow have to be able to treat the buffers as blocks... Hm.

I think someting can be done.
First, I think I can set up a framework for this class of
problems, which takes a line oriented algorithm and spits
out such a convoluted thing which does the same.

Another thing which appears worthwhile is generalizing the
realine function. I used that in my own buffered files,
but this would be twice as fast if readline/s could do
this alone.

What I need is a variable line delimiter which can be set
as a property for a file object. In this case, I would
use ">" as delimiter. For a fast XML scanner (which just
works right partitioning of XML pieces, nothing else),
I would use "<" as delimiter, read such chunks and break
them on ">", with a little repair code for comments,
">" appearing in attributes etc.

Conclusion:
My readline would be parameterized by a delimiter string.
I would *not* leave it attached to a line (like the CR's),
instead I would return the delimiter as EOF indicator.

> (And... How about builtin regexes in P2?)

No. Noo! Please never! :-)
I really hate them from design, and they shouldn't imfluence
Python in any way. What I likemuch better is Marc Lemburg's
tagging engine, which could have been used for this problem.
One should think of a nicer interface, which allows it to
build readable, efficient tagging engines from Python code,
since at the moment, this is a little at the assembly level :-)
All in all, I'd like to express little engines in Python,
but not these ugly undebuggable unreadable flie dirt strings
which they call "regexen".
But that's my private opinion which should not be an attack
to anybody. I just prefer little machines whcih can interact
with Python directly.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101   :    *Starship* http://starship.python.net
10553 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home




More information about the Python-list mailing list