FW: [Tutor] Finding items in list of lists.

Tue Mar 18 09:47:02 2003

Much food for thought. Thanks for the clarification vis. collating.

-----Original Message-----
From: Bob Gailer [mailto:bgailer@alum.rpi.edu]
Sent: Monday, March 17, 2003 5:08 PM
To: Shawhan, Doug (CAP, ITS, US); tutor@python.org
Subject: Re: FW: [Tutor] Finding items in list of lists.

At 03:32 PM 3/17/2003 -0500, Doug.Shawhan@gecits.ge.com wrote:
>------------------------snip-------------------------
>
>import string
>import xreadlines
># Grab data from disk
>f=open("\\tmp\\sample.txt","r")
>rawData=[]
>for line in xreadlines.xreadlines(f):
>         rawData.append(string.split(line,'\t'))
># Get rid of the top "row", since it contains no useful data by default
>del rawData[0]
>#       We want to sort by the shared value which is in the tenth "column"

First, it looks like you are collating rather than sorting. Sorting implies 
putting in order and all I see this code doing is creating dictionary
entries.

>db = {}
>gather = []
>for lines in rawData:
>         parentItem = lines[9]
>         for line in rawData:
>                 if line[9] == parentItem:
>                         gather.append(line)
>                 db[parentItem]=gather
>         gather = []

Immediate observation and refinement: once a parentItem is found we don't 
need to find and process it again, so after

parentItem = lines[9]

add

if parentItem not in db:

then continue with:

   for line in rawData:
     etc.

Also you could use list comprehension:

          db[parentItem] = [line for line in rawData if line[9] ==
parentItem]

>#       Now we take the data that have been sorted by parentItem and 
>further sort them by
>#       what type of item they are. For example, if the line has both a 
>printer and a duplex unit
>#       therein, the printer and duplex are sorted out and given an entry 
>of their own. This
>#       enables the items to be uploaded into dam with no issues.
>
>cookedData = {} # <-- new dictionary for the second sort.
>for each in db.keys():
>         sortdb = {} # <-- new dictionary for the item sort
>         for item in db[each]:
>                 sortdb[item[12]] = item
>                 #       filter out the Printer/Duplex combinations
>         if sortdb.has_key('DPLX') and sortdb.has_key('PRT'):
>                 print '%s printer // duplexer match'%each
>                 filtered=[sortdb['PRT'], sortdb['DPLX']]
>                 signify = sortdb['PRT']
>                 signify = signify[8]
>                 cookedData[signify]=filtered
>                 del sortdb['PRT']
>                 del sortdb['DPLX']
>                 #       and the Laptop/Keyboard combinations
>         elif sortdb.has_key('KBD') and sortdb.has_key('LAP'):
>                 print '%s laptop // keyboard match'%each
>                 filtered=[sortdb['LAP'], sortdb['KBD']]
>                 signify = sortdb['LAP']
>                 signify = signify[8]
>                 cookedData[signify]=filtered
>                 del sortdb['LAP']
>                 del sortdb['KBD']
>                 #       now sort out the leftover items (usually
Cpu/Monitor
>combinations)
>         else:
>                 old_potato = [] # <--A type of leftover (I crack me up.)
>                 for leftover in sortdb.keys():
>                         old_potato.append(sortdb[leftover])
>         # and finally add the leftovers to the cookedData.
>         cookedData[item[8]]=old_potato
>
># Now we place the various data into a single long string suitable for DAM
>to ingest
>for item in cookedData.keys():
>         print item, cookedData[item]
>
>--------------------snip-----------------------
>
>Any suggestions for cleanup or concision are welcomed!

An idea (untested). Assumes there will be a pair of records for each shared 
value. If there could be less or more then some modifications are needed.

sortableData = map((lambda x:list((x[9],x[12]))+x), rawData) # copy the 
major and minor sort items to the front of each list.
sortableData.sort() #  do the desired major/minor sort; all items of one 
shared value will now be together and the types within each shared value 
will be in order.
types = {'DPLX': ('PRT', '%s printer // duplexer match', 0), 'KBD': ('LAB', 
'%s laptop // keyboard match', 0), etc.}
# key is the alphabetically earlier of the types
# 1st element of each tuple is the alphabetically later of the types
# 2nd element of each tuple is the message to print
# 3rd element of each tuple is the significantOffset. If 'DPLX' were the 
signifyng item insted of 'PRT' then this offset would be -1
sharevalue = None
index = 0
old_potato = []
while index < len(sortableData): # instead of a for loop, so we can access 
more than one item
   item = sortableData[index]
   if item[0] != sharevalue: # start processing first or next shared value
     sharevalue = item[0]
     if old_potato: # left ove from previous shared value set
       coookedData[sortableData[index-1][10] = old_potato
     if item[1] in types:
       expect, msg, significantOffset = types[item[1]]
       old_potato = []
     else: # must be a leftover
       old_potato = [item]
   else: # continue with next item of current shared value
     if old_potato : # add next leftover
       old_potato.append(item)
     else:
       if item[1] == expect: # we have a pair
         print msg%item[0]
         filtered=sortableData[index-1:index+1] # keep in mind that the 
shared value and type appear at the head of the list
         signify = sortableData[index + significantOffset]
         signify = signify[10]
         cookedData[signify]=filtered
       else: # deal with unmatched pair
   index += 1
if old_potato: # left over from last shared value set
   coookedData[sortableData[index-1][10] = old_potato

Bob Gailer
PLEASE NOTE NEW EMAIL ADDRESS bgailer@alum.rpi.edu
303 442 2625

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.459 / Virus Database: 258 - Release Date: 2/25/2003