Evaluate my first python script, please

Duncan Booth duncan.booth at invalid.invalid
Fri Mar 5 10:00:07 EST 2010


Jean-Michel Pichavant <jeanmichel at sequans.com> wrote:

> And tell me how not using regexp will ensure the /etc/hosts processing
> is correct ? The non regexp solutions provided in this thread did not 
> handled what you rightfully pointed out about host list and commented
> lines. 

It won't make is automatically correct, but I'd guess that written without 
being so dependent on regexes might have made someone point out those 
deficiencies sooner. The point being that casual readers of the code won't 
take the time to decode the regex, they'll glance over it and assume it 
does something or other sensible.

If I was writing that code, I'd read each line, strip off comments and 
leading whitespace (so you can use re.match instead of re.search), split on 
whitespace and take all but the first field. I might check that the field 
I'm ignoring it something like a numeric ip address, but if I did want to 
do then I'd include range checking for valid octets so still no regex.

The whole of that I'd wrap in a generator so what you get back is a 
sequence of host names.

However that's just me. I'm not averse to regular expressions, I've written 
some real mammoths from time to time, but I do avoid them when there are 
simpler clearer alternatives.

> And FYI, the OP pattern does match '192.168.200.1 (foo123)'
> ...
> Ok that's totally unfair :D You're right I made a mistake.  Still the 
> comment is absolutely required (provided it's correct).
> 
Yes, the comment would have been good had it been correct. I'd also go for 
a named group as that provides additional context within the regex.

Also if there are several similar regular expressions in the code, or if 
they get too complex I'd build them up in parts. e.g.

OCTET = r'(?:\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])'
ADDRESS = (OCTET + r'\.') * 3 + OCTET
HOSTNAME = r'[-a-zA-Z0-9]+(?:\.[-a-zA-Z0-9]+)*'
  # could use \S+ but my Linux manual says
  # alphanumeric, dash and dots only
... and so on ...

which provides another way of documenting the intentions of the regex.

BTW, I'm not advocating that here, the above patterns would be overkill, 
but in more complex situations thats what I'd do.

-- 
Duncan Booth http://kupuguy.blogspot.com



More information about the Python-list mailing list