[Tutor] script too slow

Sun Feb 23 18:38:06 2003

On Sun, Feb 23, 2003 at 10:52:53AM -0800, Sean 'Shaleh' Perry wrote:
> On Sunday 23 February 2003 09:57, Paul Tremblay wrote:
> >
> > I have completed two parts of the script. The first part uses regular
> > expressions to break each line into tokens.
> >
> > perl => 20 seconds
> > python => 45 seconds
> >
> > Not surprisingly, python ran slower than perl, which is designed around
> > regular expressions. However, the next part proved very disappointing to
> > me. This part reads each token, and determines if it is in a dictionary,
> > and takes action if it is.
> >
> 
> if you precompile the regex the two often come much closer.  Especially if you 
> use the same ones over and over.
> 
> is_entity = re.compile(r'&\w+;')
> 
> if is_entity.search(input):
>   handle_entity(input)

Right. When I create my instance, I compile all my expressions:

	def __init__(self,file):
		self.compile_reg_exp()

	def compile_reg_exp(self):
		self.__split_exp = re.compile(r"(\\\\|\\{|\\}|...ect))

	def get_tokens(self, my_string):
		tokens = re.split(self.__split_exp, my_string)
		return tokens

The difference in speed between python and perl with regular expressions
doesn't suprise or disappoint me. After all, perl is faster than even
C++ when it comes to regular expressions. However, this difference in
reading from a dictionary, versus reading from an hash in perl, is
really baffling. 

Okay, I just checked this website out, which compares how long different
computer languages take for differnet tasks:

http://www.bagley.org/~doug/shootout/bench/hash/

According to this benchmark, Python is actually a bit faster than perl
with associative arrays (dictionary in python, hash in perl). So there
is something strange going on in my code.

>Perhaps you could reorder the if statements so that the most commonly hit
>cases are the first ones checked.  If you average 6 compares and can reduce
>that to 2 you should see a decent improvement.
>
>Seeing the perl code may also help us see why the run time is so different.

I'll try your first suggestion. Also, I wonder if I change this line
from :

elif self.needed_bool.has_key(token_changed):
            token_changed = self.needed_bool[token_changed]

to:

elif self.needed_bool.get(token_changed):
            token_changed = self.needed_bool[token_changed]

The second line only performs one lookup, whereas the first has to
perform two.

In addition, I wonder if I should use lists in some way. For example, I
could create one giant list.

allowed = ['\par', '\pard', etc]

if token in allowed:
	# check the dictionaries

That way, the dictionaries are checked only for around half of the
tokens. However, that means if the token is in allowed, there has to be
two lookups--one for the list, and one for the dictionary. I know that
in perl, lists (called arrays in perl) are much faster than hashses.

Here is the perl code:

if ($token_changed eq "*"){
		#do nothing
	}
	elsif ($token_changed eq ":"){
		#do nothing
	}
	elsif(exists $styles_bool{$token_changed}){
		$token_changed = $styles_bool{$token_changed};
		if ($num eq "0"){
			$num = "false";
			$second_field = "sf"; #style is false
		}
	}
	elsif(exists $needed_bool{$token_changed}){
		$token_changed = $needed_bool{$token_changed};
		#could be inline, but might not be
	}

Thanks for the help

Paul

-- 

************************
*Paul Tremblay         *
*phthenry@earthlink.net*
************************