Newbie with sort text file question

stuartc stuart_clemons at us.ibm.com
Mon Jul 14 00:30:58 CEST 2003


Hi Bengt:

Thank you. Your code worked perfectly based on the text file I
provided.

Unfortunately for me, my real text file has one slight variation that
I did not account for.  That is, the fruit name does not always have
an "_" after its name.  For example, apple below does not an an "_"
attached to it.

banana_c \\yellow
apple   \\green
orange_b \\yellow


This variation in my text file caused a problem with the program.
Here's the error.

Traceback (most recent call last):
  File "G:/Python22/Sort_Fruit.py", line 47, in ?
    for fruit, dummyvar in fruitlist: fruitfreq[fruit] =
fruitfreq.get(fruit, 0)+1
ValueError: unpack list of wrong size

I tried to debug and fix this variation, but I wasn't able to.  I did
notice that your split, splits each line in the file into two fields,
as long as there's an "_" with a fruit name.  If the fruit name does
not have an "_", then the split does not occur. I think this is
related to the problem, but I couldn't figure out how to fix it.

Any help will be greatly appreciated. Thanks.

- Stuart



bokr at oz.net (Bengt Richter) wrote in message news:<beq357$thj$0 at 216.39.172.122>...
> On 12 Jul 2003 12:46:51 -0700, stuart_clemons at us.ibm.com (stuartc) wrote:
> 
> >Hi:
> >
> >I'm not a total newbie, but I'm pretty green.  I need to sort a text
> >file and then get a total for the number of occurances for a part of
> >the string. Hopefully, this will explain it better:
> >
> >Here's the text file: 
> >
> >banana_c \\yellow
> >apple_a \\green
> >orange_b \\yellow
> >banana_d \\green
> >orange_a \\orange
> >apple_w \\yellow
> >banana_e \\green
> >orange_x \\yellow
> >orange_y \\orange
> >
> >I would like two output files:
> >
> >1) Sorted like this, by the fruit name (the name before the dash)
> >
> >apple_a \\green
> >apple_w \\yellow
> >banana_c \\yellow
> >banana_d \\green
> >banana_e \\green
> >orange_a \\orange
> >orange_b \\yellow
> >orange_x \\yellow
> >orange_y \\orange
> >
> >2) Then summarized like this, ordered with the highest occurances
> >first:
> >
> >orange occurs 4
> >banana occurs 3
> >apple occurs 2
> >
> >Total occurances is 9
> >
> >Thanks for any help !
> 
> ===< stuartc.py >========================================================
> import StringIO
> textf = StringIO.StringIO(r"""
> banana_c \\yellow
> apple_a \\green
> orange_b \\yellow
> banana_d \\green
> orange_a \\orange
> apple_w \\yellow
> banana_e \\green
> orange_x \\yellow
> orange_y \\orange
> """)
> 
> # I would like two output files:
> # (actually two files ?? Ok)
> 
> # 1) Sorted like this, by the fruit name (the name before the dash)
> 
> fruitlist = [line.split('_',1) for line in textf if line.strip()]
> fruitlist.sort()
> 
> # apple_a \\green
> # apple_w \\yellow
> # banana_c \\yellow
> # banana_d \\green
> # banana_e \\green
> # orange_a \\orange
> # orange_b \\yellow
> # orange_x \\yellow
> # orange_y \\orange
> 
> outfile_1 = StringIO.StringIO()
> outfile_1.write(''.join(['_'.join(pair) for pair in fruitlist]))
> 
> # 2) Then summarized like this, ordered with the highest occurances
> # first:
> 
> # orange occurs 4
> # banana occurs 3
> # apple occurs 2
> 
> outfile_2 = StringIO.StringIO()
> fruitfreq = {}
> for fruit, dummyvar in fruitlist: fruitfreq[fruit] = fruitfreq.get(fruit, 0)+1
> fruitfreqlist = [(occ,name) for name,occ in fruitfreq.items()]
> fruitfreqlist.sort()
> fruitfreqlist.reverse()
> outfile_2.write('\n'.join(['%s occurs %s'%(name,occ) for occ,name in fruitfreqlist]+['']))
> 
> # Total occurances is 9
> print >> outfile_2,"Total occurances [sic] is [sic] %s" % reduce(int.__add__, fruitfreq.values())
> 
> ## show results
> print '\nFile 1:\n------------\n%s------------' % outfile_1.getvalue()
> print '\nFile 2:\n------------\n%s------------' % outfile_2.getvalue()
> =========================================================================
> executed:
> 
> [15:52] C:\pywk\clp>stuartc.py
> 
> File 1:
> ------------
> apple_a \\green
> apple_w \\yellow
> banana_c \\yellow
> banana_d \\green
> banana_e \\green
> orange_a \\orange
> orange_b \\yellow
> orange_x \\yellow
> orange_y \\orange
> ------------
> 
> File 2:
> ------------
> orange occurs 4
> banana occurs 3
> apple occurs 2
> Total occurances [sic] is [sic] 9
> ------------
> 
> Is that what you wanted?
> 
> Regards,
> Bengt Richter




More information about the Python-list mailing list