Evaluate my first python script, please

MRAB python at mrabarnett.plus.com
Thu Mar 4 14:30:55 EST 2010


Pete Emerson wrote:
> I've written my first python program, and would love suggestions for
> improvement.
> 
> I'm a perl programmer and used a perl version of this program to guide
> me. So in that sense, the python is "perlesque"
> 
> This script parses /etc/hosts for hostnames, and based on terms given
> on the command line (argv), either prints the list of hostnames that
> match all the criteria, or uses ssh to connect to the host if the
> number of matches is unique.
> 
> I am looking for advice along the lines of "an easier way to do this"
> or "a more python way" (I'm sure that's asking for trouble!) or
> "people commonly do this instead" or "here's a slick trick" or "oh,
> interesting, here's my version to do the same thing".
> 
> I am aware that there are performance improvements and error checking
> that could be made, such as making sure the file exists and is
> readable and precompiling the regular expressions and not calculating
> how many sys.argv arguments there are more than once. I'm not hyper
> concerned with performance or idiot proofing for this particular
> script.
> 
> Thanks in advance.
> 
> ########################################################
> #!/usr/bin/python
> 
> import sys, fileinput, re, os
> 
> filename = '/etc/hosts'
> 
> hosts = []
> 
> for line in open(filename, 'r'):
> 	match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line)

Use 'raw' strings for regular expressions.

'Normal' Python string literals use backslashes in escape sequences (eg,
'\n'), but so do regular expressions, and regular expressions are passed
to the 're' module in strings, so that can lead to a profusion of
backslashes!

You could either escape the backslashes:

  	match = re.search('\\d+\\.\\d+\\.\\d+\\.\\d+\s+(\\S+)', line)

or use a raw string:

  	match = re.search(r'\d+\.\d+\.\d+\.\d+\s+(\S+)', line)

A raw string literal is just like a normal string literal, except that
backslashes aren't special.

> 	if match is None or re.search('^(?:float|localhost)\.', line):
> continue

It would be more 'Pythonic' to say "not match".

	if not match or re.search(r'^(?:float|localhost)\.', line):

> 	hostname = match.group(1)
> 	count = 0
> 	for arg in sys.argv[1:]:
> 		for section in hostname.split('.'):
> 			if section == arg:
> 				count = count + 1

Shorter is:

				count += 1

> 				break
> 	if count == len(sys.argv) - 1:
> 		hosts.append(hostname)
> 
> if len(hosts) == 1:
> 	os.system("ssh -A " + hosts[0])
> else:
> 	print '\n'.join(hosts)

You're splitting the hostname repeatedly. You could split it just once,
outside the outer loop, and maybe make it into a set. You could then
have:

         host_parts = set(hostname.split('.'))
	count = 0
	for arg in sys.argv[1:]:
		if arg in host_parts:
			count += 1

Incidentally, the re module contains a cache, so the regular expressions
won't be recompiled every time (unless you have a lot of them and the re
module flushes the cache).



More information about the Python-list mailing list