[Tutor] Expanding a Python script to include a zcat and awk pre-process

Dave Angel davea at ieee.org
Sat Jan 9 18:25:48 CET 2010


galaxywatcher at gmail.com wrote:
> <div class="moz-text-flowed" style="font-family: -moz-fixed">After 
> many more hours of reading and testing, I am still struggling to 
> finish this simple script, which bear in mind, I already got my 
> desired results by preprocessing with an awk one-liner.
>
> I am opening a zipped file properly, so I did make some progress, but 
> simply assigning num1 and num2 to the first 2 columns of the file 
> remains elusive. Num3 here gets assigned, not to the 3rd column, but 
> the rest of the entire file. I feel like I am missing a simple strip() 
> or some other incantation that prevents the entire file from getting 
> blobbed into num3. Any help is appreciated in advance.
>
> #!/usr/bin/env python
>
> import string
> import re
> import zipfile
> highflag = flagcount = sum = sumtotal = 0
> f = file("test.zip")
> z = zipfile.ZipFile(f)
> for f in z.namelist():
>     ranges = z.read(f)
This reads the whole file into ranges.  In your earlier incantation, you 
looped over the file, one line at a time.  So to do the equivalent, you 
want to do a split here, and one more
nesting of loops.
        lines = z.read(f).split("\n")    #build a list of text lines
        for ranges in lines:    #here, ranges is a single line

and of course, indent the remainder.
>     ranges = ranges.strip()
>     num1, num2, num3 = re.split('\W+', ranges, 2)  ## This line is the 
> root of the problem.
>     sum = int(num2) - int(num1)
>     if sum > 10000000:
>         flag1 = " !!!!"
>         flagcount += 1
>     else:
>         flag1 = ""
>     if sum > highflag:
>         highflag = sum
>     print str(num2) + " - " + str(num1) + " = " + str(sum) + flag1
>     sumtotal = sumtotal + sum
>
> print "Total ranges = ", sumtotal
> print "Total ranges over 10 million: ", flagcount
> print "Largest range: ", highflag
>
> ======
> $ zcat test.zip
> 134873600, 134873855, "32787 Protex Technologies, Inc."
> 135338240, 135338495, 40597
> 135338496, 135338751, 40993
> 201720832, 201721087, "12838 HFF Infrastructure & Operations"
> 202739456, 202739711, "1623 Beseau Regional de la Region Languedoc 
> Roussillon"
>
>
> </div>
>


More information about the Tutor mailing list