[Tutor] script too slow

Mon Feb 24 14:34:01 2003

Paul Tremblay wrote:

>My script is running so slow that it might prove usesless, and I
>wondered if there was a way to speed it up.
>[....]
>
>    # now use the dictionaries to process the tokens
>    def process_cw(self, token, str_token, space):
>        """Change the value of the control word by determing what dictionary
>        it belongs to"""
>
>        if token_changed == '*':
>            pass
>        elif self.needed_bool.has_key(token_changed):
>            token_changed = self.needed_bool[token_changed]
>        elif self.styles_1.has_key(token_changed):
>            token_changed = self.styles_1[token_changed]
>        elif self.styles_2.has_key(token_changed):
>            token_changed = self.styles_2[token_changed]
>            num = self.divide_num(num,2)
>
>	# ect. There are around a dozen such statements
>
>
>It is this last function, the "def process_cw", that eats up all the
>clock time. If I skip over this function, then I chop around 30 seconds
>off the script.
>

There are *much* better ways to use dictionaries.  First, instead of 
using 'if self.[some dict].has_key(token):' you should probably be using 
'if token in self.[some dict]:' -- this is easier to read, and should be 
slightly faster.  

However, I suspect that there's a better algorithm for what you're 
trying to do -- though it's not terribly clear what it is that you *are* 
trying to do.  The bit of dict that you showed being initialized in your 
sample code looks like it should be used as straight substitution.  In 
that case you should just say 'token = self.xml_sub.get(token, token)'. 
 If token is in self.xml_sub, this will return the value associated with 
it.  If it's *not* there, then it'll return token itself as a default.  

If you can isolate needed actions into a set of functions, then you 
could put those functions into the dictionary along with your text values.

    self.sub_dict = {  '&' :    ('&amp;', func1), [...] }

    ################

    token, action = self.sub_dict.get(token, (token, None))
    if action:
        action()

Now, every value in the dictionary is a (string, function) pair.  If 
token is found in the dictionary, then the string is substituted for 
token and the function (if it exists) is called.  If a particular token 
doesn't require an action, then None can be used in place of the 
function and you have a straight substitution case like the previous 
example I gave.  If token is *not* in the dictionary, then it's left 
alone, and action becomes None so no function is called either.

Depending on just what you're doing with your dictionaries, you may be 
able to consolidate them into fewer (or possibly even only one) 
dictionary and simply rely on the associated function to differentiate 
items.  If the only difference between, say, your style_1 and style_2 
dictionaries is the action that you want to take if a token is found, 
then you can simply have a single style dictionary and use different 
functions.  (This can apply even if the action for one of the 
dictionaries is 'do nothing special' -- just use None as the function.) 
 The fewer dictionaries you need to search through, the faster your 
program will be.  Of course, you may not be able to do this if you need 
the segregation of dictionaries for other purposes, but even there, you 
could set up all of your dictionaries to work this way and then simply 
perform the same on each in turn:

for current_dict in [self.xml_sub, self.styles_1, self.styles_2, ...]:
    token, action = current_dict.get(token, (token, None))
    if action:
        action()

This would replace your forest of elifs, and while it might not save 
much time, it'll be a lot easier to read.  (You could possibly extend 
this to shortcircuit out of the loop if a match is found...)

>I am wondering if I should make the dictionaries a part of the class
>property rather than a property of the actual instance? 
>

Yes, if the dictionaries are static (that is, you don't change them 
during the program), then it would be sensible to make them a class 
attribute rather than an instance attribute.  This won't save you much 
time or memory, unless you're creating a lot of instances of the class, 
but it's a more appropriate place for it -- data that's the same for all 
instances of a class should be part of the class, not the instance.

>The dictionary part of the scrpt seems so slow, that I am guessing I am
>doing something wroing, that Python has to read in the dictionary each
>time it starts the function.
>

Python is rebuilding the dictionaries each time that you call 
initiate_dictionaries(), which may take some time, but if you're only 
using a single instance of ProcessTokens then it's a one-time cost.  I 
suspect that the problem is simply that you're not making full use of 
the strengths of dictionaries, though.  If you write your Python in Perl 
style (or C style, or Lisp style), then you're going to end up with slow 
and ugly code because you're not going to be using the real strengths of 
Python.

I'm also not entirely convinced that regular expressions are the best 
choice for lexing (breaking the file up into tokens).  They're almost a 
default solution in Perl, but that doesn't mean they're the most 
efficient method in Python.  I'm not completely certain what the 
requirements for that process are, though, so I can't speak any more 
specifically other than to suggest considering other options.

Hope that this helps give you a few ideas...

Jeff Shannon
Technician/Programmer
Credit International