[Tutor] how to sort the data inside the file.

Mon Dec 31 20:45:21 CET 2007

On Monday 31 December 2007 10:36, Chris Fuller wrote:
> lin = re.findall('\s*([^\s]+)\s+([^\s]+)\s+(\d+)( [kM])?bytes', s)

This is incorrect.  The first version of the script I wrote split the file 
into records by calling split('bytes').  I erroneously assumed I would obtain 
the desired results by sinmply adding "bytes" to the RE.  The original RE 
could have been written such that this would have worked, (and would have 
been a little "cleaner") but it wasn't.  The space should be obligatory, and 
not included with the [kM] group.

I tried some of Kent's suggestions, and compared the run times.  Nested 
split()'s are faster than REs!  Python isn't as slow as you'd think :)

   # seperate into records (drop some trailing whitespace)
   lin = [i.split() for i in s.split('bytes')[:-1]]

   for fields in lin:
      try:
         if   fields[3] == 'M':
            mul = 1000000

         elif fields[3] == 'k':
            mul = 1000

      except IndexError:
         mul = 1

      lout.append( (fields[0], fields[1], int(fields[2])*mul) )

Cheers