Newbie with sort text file question

Bengt Richter bokr at oz.net
Sun Jul 13 00:47:03 CEST 2003


On 12 Jul 2003 12:46:51 -0700, stuart_clemons at us.ibm.com (stuartc) wrote:

>Hi:
>
>I'm not a total newbie, but I'm pretty green.  I need to sort a text
>file and then get a total for the number of occurances for a part of
>the string. Hopefully, this will explain it better:
>
>Here's the text file: 
>
>banana_c \\yellow
>apple_a \\green
>orange_b \\yellow
>banana_d \\green
>orange_a \\orange
>apple_w \\yellow
>banana_e \\green
>orange_x \\yellow
>orange_y \\orange
>
>I would like two output files:
>
>1) Sorted like this, by the fruit name (the name before the dash)
>
>apple_a \\green
>apple_w \\yellow
>banana_c \\yellow
>banana_d \\green
>banana_e \\green
>orange_a \\orange
>orange_b \\yellow
>orange_x \\yellow
>orange_y \\orange
>
>2) Then summarized like this, ordered with the highest occurances
>first:
>
>orange occurs 4
>banana occurs 3
>apple occurs 2
>
>Total occurances is 9
>
>Thanks for any help !

===< stuartc.py >========================================================
import StringIO
textf = StringIO.StringIO(r"""
banana_c \\yellow
apple_a \\green
orange_b \\yellow
banana_d \\green
orange_a \\orange
apple_w \\yellow
banana_e \\green
orange_x \\yellow
orange_y \\orange
""")

# I would like two output files:
# (actually two files ?? Ok)

# 1) Sorted like this, by the fruit name (the name before the dash)

fruitlist = [line.split('_',1) for line in textf if line.strip()]
fruitlist.sort()

# apple_a \\green
# apple_w \\yellow
# banana_c \\yellow
# banana_d \\green
# banana_e \\green
# orange_a \\orange
# orange_b \\yellow
# orange_x \\yellow
# orange_y \\orange

outfile_1 = StringIO.StringIO()
outfile_1.write(''.join(['_'.join(pair) for pair in fruitlist]))

# 2) Then summarized like this, ordered with the highest occurances
# first:

# orange occurs 4
# banana occurs 3
# apple occurs 2

outfile_2 = StringIO.StringIO()
fruitfreq = {}
for fruit, dummyvar in fruitlist: fruitfreq[fruit] = fruitfreq.get(fruit, 0)+1
fruitfreqlist = [(occ,name) for name,occ in fruitfreq.items()]
fruitfreqlist.sort()
fruitfreqlist.reverse()
outfile_2.write('\n'.join(['%s occurs %s'%(name,occ) for occ,name in fruitfreqlist]+['']))

# Total occurances is 9
print >> outfile_2,"Total occurances [sic] is [sic] %s" % reduce(int.__add__, fruitfreq.values())

## show results
print '\nFile 1:\n------------\n%s------------' % outfile_1.getvalue()
print '\nFile 2:\n------------\n%s------------' % outfile_2.getvalue()
=========================================================================
executed:

[15:52] C:\pywk\clp>stuartc.py

File 1:
------------
apple_a \\green
apple_w \\yellow
banana_c \\yellow
banana_d \\green
banana_e \\green
orange_a \\orange
orange_b \\yellow
orange_x \\yellow
orange_y \\orange
------------

File 2:
------------
orange occurs 4
banana occurs 3
apple occurs 2
Total occurances [sic] is [sic] 9
------------

Is that what you wanted?

Regards,
Bengt Richter




More information about the Python-list mailing list