timeit module for comparing the performance of two scripts
John Machin
sjmachin at lexicon.net
Tue Jul 11 18:14:50 EDT 2006
On 12/07/2006 1:33 AM, Phoe6 wrote:
> Hi,
Hi,
I'm a little astonished that anyone would worry too much (if at all!)
about how long it took to read a config file. Generally, one would
concentrate on correctness, and legibility of source code. There's not
much point IMHO in timing your pyConfig.py in its current form. Please
consider the following interspersed comments.
Also, the functionality of the two modules that you are comparing is
somewhat different ;-)
Cheers,
John
> Following are my files.
> ----
>
> pyConfig.py
> ----
> import re
>
> def pyConfig():
> fhandle = open( 'config1.txt', 'r' )
> arr = fhandle.readlines()
> fhandle.close()
>
> hash = {}
>
> for item in arr:
There is no need of readlines(). Use:
for item in fhandle:
> txt = item.strip()
str.strip() removes leading and trailing whitespace. Hence if the line
is visually empty, txt will be "" i.e. a zero-length string.
> if re.search( '^\s*$', txt ):
For a start, your regex is constrained by the ^ and $ to matching a
whole string, so you should use re.match, not re.search. Read the
section on this topic in the re manual. Note that re.search is *not*
smart enough to give up if the test at the beginning fails. Second
problem: you don't need re at all! Your regex says "whole string is
whitespace", but the strip() will have reduced that to an empty string.
All you need do is:
if not txt: # empty line
> continue
> if re.search( '^#.*$', txt ):
Similar to the above. All you need is:
if txt.startswith('#'): # line is comment only
or (faster but less explicit):
if txt[0] == '#': # line is comment only
> continue
> if not re.search( '^\s*(\w|\W)+\s*=\s*(\w|\W)+\s*$', txt ):
(1) search -> match, lose the ^
(2) lose the first and last \s* -- strip() means they are redundant.
(3) What are you trying to achieve with (\w|\W) ??? Where I come from,
"select things that are X or not X" means "select everything". So the
regex matches 'anything optional_whitespace = optional_whitespace
anything'. However 'anything' includes whitespace. You probably intended
something like 'word optional_whitespace = optional_whitespace
at_least_1_non-whitespace':
if not re.match('\w+\s*=\s*\S+.*$'):
but once you've found a non-whitespace after the =, it's pointless
continuing, so:
if not re.match('\w+\s*=\s*\S'):
> continue
Are you sure such lines can be silently ignored? Might they be errors?
> hash[txt.split( '=' )[0].strip().lower()] = txt.split( '='
> )[1].strip()
Best to avoid splitting twice. Also it's a little convoluted. Also
beware of multiple '=' in the line.
left, right = txt.split('=', 1)
key = left.strip().lower()
if key in hash:
# does this matter?
value = right.strip()
hash[key] = value
>
> print hash['dbrepository']
> ----
Oh, yeah, that hash thing, regexes everywhere ... it's left me wondering
could this possibly be a translation of a script from another language :-)
More information about the Python-list
mailing list