Python vs. C/C++/Java: quantitative data ?

GerritM gmuller at worldonline.nl
Wed Mar 20 19:19:51 CET 2002


<..snip...>
> > For comparison of utilization level still other types of programs are
> > needed. In this type of comparison I expect even more discussion on the
> > "rules"; Are you allowed to use the plain language only, or also the
> > batteries which are included or everything on CPAN like archives; Are
> > packages which are used included in the line count; etcetera.
>
> I agree. Even things like a strong, helpful community for support are
> important for being productive. Python certainly has that. But I'm not
> sure how to measure that fairly :-)
>
> cheers,
> doug
> --
> http://www.bagley.org/~doug/contact.shtml

In a recent thread (see below) the wc utility was shown both in c and
python. These utilities are relatively well defined and one step larger than
the current shootout programs. They might be useful for benchmarking
purposes.

regards Gerrit

www.extra.research.philips.com/natlab/sysarch

---begin included message---
Re: Book Draft Available: Text Processing in Python
From: jimd at vega.starshine.org (Jim Dennis)
In article <mailman.1015993462.32701.python-list at python.org>,
 David Mertz, Ph.D. wrote:

> Pythonistas:

> As some folks know, I am working on a book called _Text Processing in
> Python_.  With the gracious permission of my publisher, Addison Wesley,
> I am hereby making in-progress drafts of the book available to the
> Python community.  It's about half done right now, but that will
> increase over time.

> Take a look at the book URL: http://gnosis.cx/TPiP/

> I welcome any comments or feedback the Python community has about my
> book.

> Yours, David...

 I was glancing through it and stopped when I read your word
 counter (with no support for the command line options).  I just
 had to do one to emulate the GNU wc utility as closely as I can
 in one quick session.

 Below is a somewhat more faithful rendering of the GNU wc command.
 Although it's about 120 lines long, almost forty of those are
 blank lines, docstrings, or comments.  In most cases it gives output
 that is identical to GNU wc (including the character spacing).
 The only discrepancies I've seen are in the -L (--max-line-length)
 calculation (particularly on binary files).

 It's pedagogical value is more in the use of the getopts module
 and possibly in file iteration (for line in file: ...).  The text
 processing being done here is trivial.  There's also a little bit
 of exception handling, and a minimal amount of error avoidance ---
 since Python will allow me to open a directory but will complain if
 I try to read lines therefrom).

 It is mildly interesting that this Python implementation of wc
 is only about a third the length of the GNU version from the
 text utils package (wc.c is 371 lines).  Actually counting words
 and characters the Python version is only about half the length.
 (Glancing at the sources I see that I missed support for the
 POSIXLY_CORRECT environment variable -- which modifies, or uglifies
 if you prefer, the output format; I could add that in a few lines).

 David, you're welcome to use this script as an example.  Perhaps
 you could list this as an example of how 14 lines of simple, focused
 code grows to 140 lines by the time we add option handling, help
 and error messages, exception handling and error avoidance, and all
 that other stuff.  (If you really want to scare people you could
 include the wc.c from the GNU textutils package by way of comparison).

#!/usr/bin/env python2.2
import sys, os
""" wc: Emulate GNU wc (word count) """

help = '''Usage: wc [OPTION]... [FILE]...
Print line, word, and byte counts for each FILE, and a total line if
more than one FILE is specified.  With no FILE, or when FILE is -,
read standard input.
  -c, --bytes, --chars   print the byte counts
  -l, --lines            print the newline counts
  -L, --max-line-length  print the length of the longest line
  -w, --words            print the word counts
      --help             display this help and exit
      --version          output version information and exit
Report bugs to <jimd+python at starshine.org>'''

version = """Python word count: wc(1) emulation by James T. Dennis
 version 0.1"""

def options():
"""Process command line options"""
import getopt
short = "clLw"
long  = ('bytes', 'chars', 'lines', 'max-line-length',
'words', 'help', 'version')
try:
opts, args = getopt.getopt(sys.argv[1:], short, long)
except getopt.GetoptError,err:
msg = "wc: invalid option \nTry `wc --help' for more information."
print >> sys.stderr, sys.argv[0], err
print >> sys.stderr, msg
sys.exit(1)
return opts, args

def count(f=None):
"""Count and return words, chars, lines, and maxlength"""
# We count them all, since that's and much easier than alot of
# conditional logic to decide what to count.
# We return it all, and main() can decide what to return.
lines = words = chars = maxline = 0
if f == None: file = sys.stdin
else:
if os.path.isdir(f):
print >> sys.stderr, "wc: %s: Is a directory" % f
return lines, words, chars, maxline
try:
file = open(f,'r')
except IOError:
print >> sys.stderr, "Error opening:", f
return lines, words, chars, maxline
# If we get this far, we can count stuff
for line in file:
length = len(line)
lines += 1
chars += length
words += len(line.split())
if length - 1 > maxline: maxline = length - 1
# GNU wc doesn't count line terminator in maxlength?
# +++ binary files anve much different line length semantics!
return lines, words, chars, maxline

def printcount(flags, totals, filename=None):
"""Print counts for each file and for the grand totals
   takes two 4-tuples, the flags for which items to print, and
   the total lines, words, characters, and max-line-length
   and an optional filename"""
if filename == None: filename = ""
dolines, dochars, dowords, domaxln = flags
l, w, c, m = totals
print "", # GNU wc prints one leading space?
if dolines: print "%6d" % l,
if dowords: print "%7d" % w,
if dochars: print "%7d" % c,
if domaxln: print "%7d" % m,
print filename

if __name__ == "__main__":
opts, args = options()
dolines = dochars = dowords = domaxln = 0
for opt,arg in opts:
if opt == '--help':
print help
sys.exit()
elif opt == '--version':
print version
sys.exit()
elif opt in ('-l', '--lines'): dolines = 1
elif opt in ('-c', '--chars', '--bytes'): dochars = 1
elif opt in ('-w', '--words'): dowords = 1
elif opt in ('-L', '--max-line-length'): domaxln = 1

if dolines + dochars + dowords + domaxln == 0:
# None specified so default is to do lines, chars, and words
dolines = dochars = dowords = 1
# Else we do only the ones that are specified
# GNU wc always prints the stats in the same order, regardless
# of the order of the options/switches.
printflags = (dolines, dochars, dowords, domaxln)

if not args:
# No files named: so just do stdin
# No grand totals. and no filename
l, w, c, m = count()
printcount (printflags, (l,w,c,m))
else: # Else we do each file and keep track of grand totals
all_lines = all_words = all_chars = longest_line = 0
files_processed = 0
for i in args:
if i == '-': l, w, c, m = count()
else: l, w, c, m = count(i)
all_lines += l
all_words += w
all_chars += c
if m > longest_line: longest_line = m
printcount (printflags, (l,w,c,m), i)
files_processed += 1
if files_processed > 1: # Print totals
totals = (all_lines, all_words, all_chars, longest_line)
printcount (printflags, totals, "total")
---end included message---





More information about the Python-list mailing list