[Tutor] script too slow
Jeff Shannon
jeff@ccvcorp.com
Mon Feb 24 14:34:01 2003
Paul Tremblay wrote:
>My script is running so slow that it might prove usesless, and I
>wondered if there was a way to speed it up.
>[....]
>
> # now use the dictionaries to process the tokens
> def process_cw(self, token, str_token, space):
> """Change the value of the control word by determing what dictionary
> it belongs to"""
>
> if token_changed == '*':
> pass
> elif self.needed_bool.has_key(token_changed):
> token_changed = self.needed_bool[token_changed]
> elif self.styles_1.has_key(token_changed):
> token_changed = self.styles_1[token_changed]
> elif self.styles_2.has_key(token_changed):
> token_changed = self.styles_2[token_changed]
> num = self.divide_num(num,2)
>
> # ect. There are around a dozen such statements
>
>
>It is this last function, the "def process_cw", that eats up all the
>clock time. If I skip over this function, then I chop around 30 seconds
>off the script.
>
There are *much* better ways to use dictionaries. First, instead of
using 'if self.[some dict].has_key(token):' you should probably be using
'if token in self.[some dict]:' -- this is easier to read, and should be
slightly faster.
However, I suspect that there's a better algorithm for what you're
trying to do -- though it's not terribly clear what it is that you *are*
trying to do. The bit of dict that you showed being initialized in your
sample code looks like it should be used as straight substitution. In
that case you should just say 'token = self.xml_sub.get(token, token)'.
If token is in self.xml_sub, this will return the value associated with
it. If it's *not* there, then it'll return token itself as a default.
If you can isolate needed actions into a set of functions, then you
could put those functions into the dictionary along with your text values.
self.sub_dict = { '&' : ('&', func1), [...] }
################
token, action = self.sub_dict.get(token, (token, None))
if action:
action()
Now, every value in the dictionary is a (string, function) pair. If
token is found in the dictionary, then the string is substituted for
token and the function (if it exists) is called. If a particular token
doesn't require an action, then None can be used in place of the
function and you have a straight substitution case like the previous
example I gave. If token is *not* in the dictionary, then it's left
alone, and action becomes None so no function is called either.
Depending on just what you're doing with your dictionaries, you may be
able to consolidate them into fewer (or possibly even only one)
dictionary and simply rely on the associated function to differentiate
items. If the only difference between, say, your style_1 and style_2
dictionaries is the action that you want to take if a token is found,
then you can simply have a single style dictionary and use different
functions. (This can apply even if the action for one of the
dictionaries is 'do nothing special' -- just use None as the function.)
The fewer dictionaries you need to search through, the faster your
program will be. Of course, you may not be able to do this if you need
the segregation of dictionaries for other purposes, but even there, you
could set up all of your dictionaries to work this way and then simply
perform the same on each in turn:
for current_dict in [self.xml_sub, self.styles_1, self.styles_2, ...]:
token, action = current_dict.get(token, (token, None))
if action:
action()
This would replace your forest of elifs, and while it might not save
much time, it'll be a lot easier to read. (You could possibly extend
this to shortcircuit out of the loop if a match is found...)
>I am wondering if I should make the dictionaries a part of the class
>property rather than a property of the actual instance?
>
Yes, if the dictionaries are static (that is, you don't change them
during the program), then it would be sensible to make them a class
attribute rather than an instance attribute. This won't save you much
time or memory, unless you're creating a lot of instances of the class,
but it's a more appropriate place for it -- data that's the same for all
instances of a class should be part of the class, not the instance.
>The dictionary part of the scrpt seems so slow, that I am guessing I am
>doing something wroing, that Python has to read in the dictionary each
>time it starts the function.
>
Python is rebuilding the dictionaries each time that you call
initiate_dictionaries(), which may take some time, but if you're only
using a single instance of ProcessTokens then it's a one-time cost. I
suspect that the problem is simply that you're not making full use of
the strengths of dictionaries, though. If you write your Python in Perl
style (or C style, or Lisp style), then you're going to end up with slow
and ugly code because you're not going to be using the real strengths of
Python.
I'm also not entirely convinced that regular expressions are the best
choice for lexing (breaking the file up into tokens). They're almost a
default solution in Perl, but that doesn't mean they're the most
efficient method in Python. I'm not completely certain what the
requirements for that process are, though, so I can't speak any more
specifically other than to suggest considering other options.
Hope that this helps give you a few ideas...
Jeff Shannon
Technician/Programmer
Credit International