Newbie with sort text file question
stuartc
stuart_clemons at us.ibm.com
Sun Jul 13 18:30:58 EDT 2003
Hi Bengt:
Thank you. Your code worked perfectly based on the text file I
provided.
Unfortunately for me, my real text file has one slight variation that
I did not account for. That is, the fruit name does not always have
an "_" after its name. For example, apple below does not an an "_"
attached to it.
banana_c \\yellow
apple \\green
orange_b \\yellow
This variation in my text file caused a problem with the program.
Here's the error.
Traceback (most recent call last):
File "G:/Python22/Sort_Fruit.py", line 47, in ?
for fruit, dummyvar in fruitlist: fruitfreq[fruit] =
fruitfreq.get(fruit, 0)+1
ValueError: unpack list of wrong size
I tried to debug and fix this variation, but I wasn't able to. I did
notice that your split, splits each line in the file into two fields,
as long as there's an "_" with a fruit name. If the fruit name does
not have an "_", then the split does not occur. I think this is
related to the problem, but I couldn't figure out how to fix it.
Any help will be greatly appreciated. Thanks.
- Stuart
bokr at oz.net (Bengt Richter) wrote in message news:<beq357$thj$0 at 216.39.172.122>...
> On 12 Jul 2003 12:46:51 -0700, stuart_clemons at us.ibm.com (stuartc) wrote:
>
> >Hi:
> >
> >I'm not a total newbie, but I'm pretty green. I need to sort a text
> >file and then get a total for the number of occurances for a part of
> >the string. Hopefully, this will explain it better:
> >
> >Here's the text file:
> >
> >banana_c \\yellow
> >apple_a \\green
> >orange_b \\yellow
> >banana_d \\green
> >orange_a \\orange
> >apple_w \\yellow
> >banana_e \\green
> >orange_x \\yellow
> >orange_y \\orange
> >
> >I would like two output files:
> >
> >1) Sorted like this, by the fruit name (the name before the dash)
> >
> >apple_a \\green
> >apple_w \\yellow
> >banana_c \\yellow
> >banana_d \\green
> >banana_e \\green
> >orange_a \\orange
> >orange_b \\yellow
> >orange_x \\yellow
> >orange_y \\orange
> >
> >2) Then summarized like this, ordered with the highest occurances
> >first:
> >
> >orange occurs 4
> >banana occurs 3
> >apple occurs 2
> >
> >Total occurances is 9
> >
> >Thanks for any help !
>
> ===< stuartc.py >========================================================
> import StringIO
> textf = StringIO.StringIO(r"""
> banana_c \\yellow
> apple_a \\green
> orange_b \\yellow
> banana_d \\green
> orange_a \\orange
> apple_w \\yellow
> banana_e \\green
> orange_x \\yellow
> orange_y \\orange
> """)
>
> # I would like two output files:
> # (actually two files ?? Ok)
>
> # 1) Sorted like this, by the fruit name (the name before the dash)
>
> fruitlist = [line.split('_',1) for line in textf if line.strip()]
> fruitlist.sort()
>
> # apple_a \\green
> # apple_w \\yellow
> # banana_c \\yellow
> # banana_d \\green
> # banana_e \\green
> # orange_a \\orange
> # orange_b \\yellow
> # orange_x \\yellow
> # orange_y \\orange
>
> outfile_1 = StringIO.StringIO()
> outfile_1.write(''.join(['_'.join(pair) for pair in fruitlist]))
>
> # 2) Then summarized like this, ordered with the highest occurances
> # first:
>
> # orange occurs 4
> # banana occurs 3
> # apple occurs 2
>
> outfile_2 = StringIO.StringIO()
> fruitfreq = {}
> for fruit, dummyvar in fruitlist: fruitfreq[fruit] = fruitfreq.get(fruit, 0)+1
> fruitfreqlist = [(occ,name) for name,occ in fruitfreq.items()]
> fruitfreqlist.sort()
> fruitfreqlist.reverse()
> outfile_2.write('\n'.join(['%s occurs %s'%(name,occ) for occ,name in fruitfreqlist]+['']))
>
> # Total occurances is 9
> print >> outfile_2,"Total occurances [sic] is [sic] %s" % reduce(int.__add__, fruitfreq.values())
>
> ## show results
> print '\nFile 1:\n------------\n%s------------' % outfile_1.getvalue()
> print '\nFile 2:\n------------\n%s------------' % outfile_2.getvalue()
> =========================================================================
> executed:
>
> [15:52] C:\pywk\clp>stuartc.py
>
> File 1:
> ------------
> apple_a \\green
> apple_w \\yellow
> banana_c \\yellow
> banana_d \\green
> banana_e \\green
> orange_a \\orange
> orange_b \\yellow
> orange_x \\yellow
> orange_y \\orange
> ------------
>
> File 2:
> ------------
> orange occurs 4
> banana occurs 3
> apple occurs 2
> Total occurances [sic] is [sic] 9
> ------------
>
> Is that what you wanted?
>
> Regards,
> Bengt Richter
More information about the Python-list
mailing list