Evaluate my first python script, please
duncan.booth at invalid.invalid
Fri Mar 5 16:00:07 CET 2010
Jean-Michel Pichavant <jeanmichel at sequans.com> wrote:
> And tell me how not using regexp will ensure the /etc/hosts processing
> is correct ? The non regexp solutions provided in this thread did not
> handled what you rightfully pointed out about host list and commented
It won't make is automatically correct, but I'd guess that written without
being so dependent on regexes might have made someone point out those
deficiencies sooner. The point being that casual readers of the code won't
take the time to decode the regex, they'll glance over it and assume it
does something or other sensible.
If I was writing that code, I'd read each line, strip off comments and
leading whitespace (so you can use re.match instead of re.search), split on
whitespace and take all but the first field. I might check that the field
I'm ignoring it something like a numeric ip address, but if I did want to
do then I'd include range checking for valid octets so still no regex.
The whole of that I'd wrap in a generator so what you get back is a
sequence of host names.
However that's just me. I'm not averse to regular expressions, I've written
some real mammoths from time to time, but I do avoid them when there are
simpler clearer alternatives.
> And FYI, the OP pattern does match '192.168.200.1 (foo123)'
> Ok that's totally unfair :D You're right I made a mistake. Still the
> comment is absolutely required (provided it's correct).
Yes, the comment would have been good had it been correct. I'd also go for
a named group as that provides additional context within the regex.
Also if there are several similar regular expressions in the code, or if
they get too complex I'd build them up in parts. e.g.
OCTET = r'(?:\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])'
ADDRESS = (OCTET + r'\.') * 3 + OCTET
HOSTNAME = r'[-a-zA-Z0-9]+(?:\.[-a-zA-Z0-9]+)*'
# could use \S+ but my Linux manual says
# alphanumeric, dash and dots only
... and so on ...
which provides another way of documenting the intentions of the regex.
BTW, I'm not advocating that here, the above patterns would be overkill,
but in more complex situations thats what I'd do.
Duncan Booth http://kupuguy.blogspot.com
More information about the Python-list