[Tutor] sql-like join on two lists or dictionaries
street.sweeper at mailworks.org
street.sweeper at mailworks.org
Sat Feb 14 02:24:37 CET 2015
Hello all,
Basically what I have here is header and line data for sales or purchase
orders, and I'm trying to do a sql-like join to bring them together
(which ultimately I did because I couldn't figure this out :)). I've
managed to get the files into python using string slicing, that's not a
problem.
headers - h.dat
B134542 Bob ZQ775235
B875432 Joe ZQ987656
B567943 Steve ZQ256222
lines - l.dat
B134542 112342 0012
B134542 176542 0001
B875732 76542 0003
B567943 654565 0001
B567943 900011 0001
desired result - hl.dat
B134542 112342 0012 Bob ZQ775235
B134542 176542 0001 Bob ZQ775235
B875732 76542 0003 Joe ZQ987656
B567943 654565 0001 Steve ZQ256222
B567943 900011 0001 Steve ZQ256222
in python3 on linux:
#!/usr/bin/env python3
import os
basepath=os.path.join(os.path.expanduser('~'),'temp',)
linefile=os.path.join(basepath,'l.dat')
headerfile=os.path.join(basepath,'h.dat')
with open(headerfile) as h, open(linefile) as l:
lines = l.readlines()
headers = h.readlines()
llist = [[linedata[0:7],
linedata[14:23],
linedata[23:27]] for linedata in lines]
hlist = [[headerdata[0:7],
headerdata[11:19],
headerdata[19:28]] for headerdata in headers]
ldict = [{linedata[0:7]:
[linedata[14:23],
linedata[23:27]]} for linedata in lines]
hdict = [{headerdata[0:7]:
[headerdata[11:19],
headerdata[19:28]]} for headerdata in headers]
# :)
quit()
Details on the data are that it's a one or many lines to one header
relationship, at least one of each will exist in each file, and
performance probably isn't an issue as it will only be a few tens to
about 100 lines maximum in the lines file. The match string will be the
0:7 slice.
You can probably guess my questions: should I be making lists or
dictionaries out of this data, and then of course, what should I do with
them to arrive at the combined file? I saw some examples of joining two
two-item lists, or dictionaries with a single string as the value, but I
couldn't seem to adapt them to what I'm doing here. I also ran across
the dict.extend method, but looking at the result, I didn't think that
was going to go anywhere, particularly with the one to many
headers:lines relationship.
After a while I pulled this into a sqlite file in memory and did the
join. Using writelines I think I'll be able to get it out to a file,
but it seems to me that there's probably a way to do this without
resorting to sql. Or is there?
Thanks!
More information about the Tutor
mailing list