why huge speed difference btwn 1.52 and 2.1?

Tue Jun 5 04:39:45 EDT 2001

rsenior at hotmail.com (Robin Senior) wrote in
news:3b1bcf7a.12528354 at news.ccs.queensu.ca: 

> On 3 Jun 2001 08:07:56 -0700, aahz at panix.com (Aahz Maruch) wrote:
> 
>>In article <9f8fgs$ooo$1 at knot.queensu.ca>,
>>robin senior <rsenior at hotmail.com> wrote:
>>>
>>>I have a pretty simple script for processing a flat file db, running
>>>on Python 2.1; I tried running it under 1.52 for kicks, and to my
>>>surprise it ran almost 10 times as fast! Could someone let me know why
>>>2.1 would be so much slower? 
Probably the regular expressions.
Why are you compiling your regular expressions every time round the loop?
Why are you using two regular expressions when one would do?
Why are you using regular expressions at all.

As far as I can tell your code reads a line in and then looks to see 
whether the line contains a word that ends in a state name or a state 
abbreviation. So if the line is "Today waz blowy, tomorrow may be better" 
is in the input it will be copied to the output files for Arizona and 
Wyoming. Is this correct?

I would be tempted to rewrite the code, either to not use regular 
expressions at all, or to use a single regular expression for everything. 
If you build one big regular expression that matches all states and state 
abbreviations, then you can extract the match out of the line and use what 
matched as a dictionary key to find the right filename (provided you first 
build a dictionary with both state names and abbreviations as keys mapping 
to the filenames).

Oh, and you upper cased the input, so you don't need a case insensitive 
search.

-- 
Duncan Booth                                             duncan at rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?