Evaluate my first python script, please

sjdevnull at yahoo.com sjdevnull at yahoo.com
Thu Mar 4 17:55:07 EST 2010


On Mar 4, 1:39 pm, Pete Emerson <pemer... at gmail.com> wrote:
> I've written my first python program, and would love suggestions for
> improvement.
>
> I'm a perl programmer and used a perl version of this program to guide
> me. So in that sense, the python is "perlesque"
>
> This script parses /etc/hosts for hostnames, and based on terms given
> on the command line (argv), either prints the list of hostnames that
> match all the criteria, or uses ssh to connect to the host if the
> number of matches is unique.
>
> I am looking for advice along the lines of "an easier way to do this"
> or "a more python way" (I'm sure that's asking for trouble!) or
> "people commonly do this instead" or "here's a slick trick" or "oh,
> interesting, here's my version to do the same thing".
>
> I am aware that there are performance improvements and error checking
> that could be made, such as making sure the file exists and is
> readable and precompiling the regular expressions and not calculating
> how many sys.argv arguments there are more than once. I'm not hyper
> concerned with performance or idiot proofing for this particular
> script.
>
> Thanks in advance.
>
> ########################################################
> #!/usr/bin/python
>
> import sys, fileinput, re, os

'Some people, when confronted with a problem, think "I know, I’ll use
regular expressions." Now they have two problems.' — Jamie Zawinski

Seriously, regexes can be very useful but there's no need for them
here.  Simpler is usually better, and easier to understand.

> filename = '/etc/hosts'
>
> hosts = []
>
> for line in open(filename, 'r'):
>         match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line)
>         if match is None or re.search('^(?:float|localhost)\.', line):
> continue
>         hostname = match.group(1)

I find this much clearer without regexes:

try:
    ip, hostname = line.strip().split(None, 1)
except IndexError:
    continue
# I think this is equivalent to your re, but I'm not sure it's what
# you actually meant...
#if line.startswith("float.") or line.startswith("localhost."):
#    continue
# I'm going with:
if hostname.startswith("float.") or hostname.startswith("localhost"):
    continue


>         count = 0
>         for arg in sys.argv[1:]:
>                 for section in hostname.split('.'):
>                         if section == arg:
>                                 count = count + 1
>                                 break
>         if count == len(sys.argv) - 1:
>                 hosts.append(hostname)

A perfect application of sets.

#initialize at program outset
args = set(sys.argv[1:])
...
hostparts = set(hostname.split("."))
if hostparts & args:
    hosts.append(hostname)


Full program:

import sys
import os
filename = '/etc/hosts'

hosts = []
args = set(sys.argv[1:])
for line in open(filename, 'r'):
    # Parse line into ip address and hostname, skipping bogus lines
    try:
        ipaddr, hostname = line.strip().split(None, 1)
    except ValueError:
        continue
    if hostname.startswith("float.") or
hostname.startswith("localhost."):
            continue

    # Add to hosts if it matches at least one argument
    hostparts = set(hostname.split("."))
    if hostparts & args:
        hosts.append(hostname)

# If there's only one match, ssh to it--otherwise print out the
matches
if len(hosts) == 1:
    os.system("ssh -A %s"%hosts[0])
else:
    for host in hosts:
        print host



More information about the Python-list mailing list