Newbie with sort text file question

Andrew Dalke adalke at
Sun Jul 13 23:12:12 CEST 2003

Bob Gailer:
> [Pipeline]

Huh.  Hadn't heard of that one before.  Thanks for the pointer.
(And overall, nice post!)

> The Python version:

Some stylistic comments

> input = file('c:\input.txt')

Since 'input' is a builtin, I use 'infile'.  That's only a preference of
For the OP, you'll need 'c:\\input.txt' because the '\' has special meaning
inside of a string so must be escaped.

> fruits = {} # a dictionary to hold each fruit and its count
> lines = input.readlines()
> for line in lines:

Since you are using Python 2.2 (later you use "if fruit in fruits",
and "__in__" support for dicts wasn't added until Python 2.2, I
think, and the 'file' usage is also new), this is best written as

  for line in input:

>    fruit = line.split('_', 1)[0]

>    if fruit in fruits:
>      fruits[fruit] += 1 # increment count
>    else:
>      fruits[fruit] = 1 # add to dictionary with count of 1

Here's a handy idiom for what you want

      fruits[fruit] = fruits.get(fruit, 0) + 1

> output1 = file('c:\output1.txt', 'w')
> for key, value in fruits.items():
>    output1.write("%s occurs %s\n" % (key, value))
> output1.close()
> output2 = file('c:\output2.txt', 'w')
> output2.write("Total occurrences is %s\n" % len(lines))
> output2.close()

That's missing some sorts, so I don't think it meets the OP's

How about this?

infile = open("input.txt")
lines = []
counts = {}
for line in infile:
  fruit = line.split("_", 1)[0]
  counts[fruit] = counts.get(fruit) + 1

# Sort by name.  Since "_" sorts after any letter, this means
# that "plum_" will be placed *after* "plumbago_", which
# is probably not what you want.  Left as an exercise :)
outfile = open("output1.txt")
for line in lines:

# Print counts from highest count to lowest
count_data = [(n, fruit) for (fruit, n) in counts.items()]
outfile = open("output2.txt")
total = 0
for n, fruit in count_data:
  outfile.write("%s occurs %s\n" % (fruit, n))
  total += n
outfile.write("\nTotal occurances: %s\n" % total)

                    dalke at

More information about the Python-list mailing list