[Tutor] sql-like join on two lists or dictionaries

street.sweeper at mailworks.org street.sweeper at mailworks.org
Sat Feb 14 02:24:37 CET 2015


Hello all,

Basically what I have here is header and line data for sales or purchase
orders, and I'm trying to do a sql-like join to bring them together
(which ultimately I did because I couldn't figure this out :)).  I've
managed to get the files into python using string slicing, that's not a
problem.

headers - h.dat

B134542    Bob      ZQ775235
B875432    Joe      ZQ987656
B567943    Steve    ZQ256222

lines - l.dat

B134542       112342   0012
B134542       176542   0001
B875732       76542    0003
B567943       654565   0001
B567943       900011   0001

desired result - hl.dat

B134542       112342   0012    Bob    ZQ775235
B134542       176542   0001    Bob    ZQ775235
B875732       76542    0003    Joe    ZQ987656
B567943       654565   0001    Steve  ZQ256222
B567943       900011   0001    Steve  ZQ256222



in python3 on linux:

#!/usr/bin/env python3

import os

basepath=os.path.join(os.path.expanduser('~'),'temp',)
linefile=os.path.join(basepath,'l.dat')
headerfile=os.path.join(basepath,'h.dat')

with open(headerfile) as h, open(linefile) as l:
  lines = l.readlines()
  headers = h.readlines()

llist = [[linedata[0:7],
          linedata[14:23],
          linedata[23:27]] for linedata in lines]

hlist = [[headerdata[0:7],
          headerdata[11:19],
          headerdata[19:28]] for headerdata in headers]

ldict = [{linedata[0:7]:
              [linedata[14:23],
               linedata[23:27]]} for linedata in lines]

hdict = [{headerdata[0:7]:
              [headerdata[11:19],
               headerdata[19:28]]} for headerdata in headers]

# :)

quit()



Details on the data are that it's a one or many lines to one header
relationship, at least one of each will exist in each file, and
performance probably isn't an issue as it will only be a few tens to
about 100 lines maximum in the lines file.  The match string will be the
0:7 slice.

You can probably guess my questions: should I be making lists or
dictionaries out of this data, and then of course, what should I do with
them to arrive at the combined file?  I saw some examples of joining two
two-item lists, or dictionaries with a single string as the value, but I
couldn't seem to adapt them to what I'm doing here. I also ran across
the dict.extend method, but looking at the result, I didn't think that
was going to go anywhere, particularly with the one to many
headers:lines relationship.

After a while I pulled this into a sqlite file in memory and did the
join.  Using writelines I think I'll be able to get it out to a file,
but it seems to me that there's probably a way to do this without
resorting to sql.  Or is there?

Thanks!


More information about the Tutor mailing list