Evaluate my first python script, please

Pete Emerson pemerson at gmail.com
Fri Mar 5 10:55:00 EST 2010


On Mar 5, 7:00 am, Duncan Booth <duncan.bo... at invalid.invalid> wrote:
> Jean-Michel Pichavant <jeanmic... at sequans.com> wrote:
> > And tell me how not using regexp will ensure the /etc/hosts processing
> > is correct ? The non regexp solutions provided in this thread did not
> > handled what you rightfully pointed out about host list and commented
> > lines.
>
> It won't make is automatically correct, but I'd guess that written without
> being so dependent on regexes might have made someone point out those
> deficiencies sooner. The point being that casual readers of the code won't
> take the time to decode the regex, they'll glance over it and assume it
> does something or other sensible.
>
> If I was writing that code, I'd read each line, strip off comments and
> leading whitespace (so you can use re.match instead of re.search), split on
> whitespace and take all but the first field. I might check that the field
> I'm ignoring it something like a numeric ip address, but if I did want to
> do then I'd include range checking for valid octets so still no regex.
>
> The whole of that I'd wrap in a generator so what you get back is a
> sequence of host names.
>
> However that's just me. I'm not averse to regular expressions, I've written
> some real mammoths from time to time, but I do avoid them when there are
> simpler clearer alternatives.
>
> > And FYI, the OP pattern does match '192.168.200.1 (foo123)'
> > ...
> > Ok that's totally unfair :D You're right I made a mistake.  Still the
> > comment is absolutely required (provided it's correct).
>
> Yes, the comment would have been good had it been correct. I'd also go for
> a named group as that provides additional context within the regex.
>
> Also if there are several similar regular expressions in the code, or if
> they get too complex I'd build them up in parts. e.g.
>
> OCTET = r'(?:\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])'
> ADDRESS = (OCTET + r'\.') * 3 + OCTET
> HOSTNAME = r'[-a-zA-Z0-9]+(?:\.[-a-zA-Z0-9]+)*'
>   # could use \S+ but my Linux manual says
>   # alphanumeric, dash and dots only
> ... and so on ...
>
> which provides another way of documenting the intentions of the regex.
>
> BTW, I'm not advocating that here, the above patterns would be overkill,
> but in more complex situations thats what I'd do.
>
> --
> Duncan Boothhttp://kupuguy.blogspot.com

All good comments here. The takeaway for my lazy style of regexes
(which makes it harder for non-regex fiends to read, regardless of the
language) is that there are ways to make regexes much more readable to
the untrained eye. Duncan, I like your method of defining sections of
the regex outside the regex itself, even if it's a one time use.



More information about the Python-list mailing list