[Tutor] Matching 3 tables with almost identical columns

Elaina Ann Hyde elainahyde at gmail.com
Mon Aug 13 09:24:16 CEST 2012


Greetings,
   I have a rather monolithic database I am re-building at the moment, and
I want to join 3 tables with semi-matching content, I have several sets of
these but they all come in groups of three.  Here is the situation:

--note all tables are in ascii format, space deliminated, with readable
#Header format---

T1_01= Table 1 =
1 + 'stuff1' +additional content (only on some lines)
2  ""
3  +'stuff1'+ a bunch of empty-ness
....400

T1_02= Table 2 =
1 + "different stuff"
2 ""
3 ""
... 400

T1_03 = Table 3 =
5 cols yet other stuff + 001 + stuff
5 cols yet other stuff + 003    ""
5 cols yet other stuff + 007    ""
...
5 cols yet other stuff + 400   some rows are skipped, varies which ones

what I want, for each 'group' I have 3 tables, since these are grouped in a
handy fashion, ie T1_01, T1_02, T1_03 would be table 1,2,3 for group 1, and
again for T2_01,T2_02,T2_03.  I need to do this about 60 times in total and
the table output I am hoping for is:


T1_0123=

1 + 'stuff1' + additional content 1 1 + "different stuff" + 5 cols yet
other stuff + 001 + additional content 3
2 + 'stuff1' + additional content 1 2 + "different stuff" +  "something to
fill in the empty spaces, like a set of -99.9 values"
3 + 'stuff1' + "something to fill in empty spaces as no additional content
available for this line" 1 3 + "different stuff" + additional content 2 5
cols yet other stuff + 003 + additional content 3
...
400 ""

now I was hoping to run all 3 tables in a go then do something like a batch
job on all the sets.
just a note:
since all my tables are ascii tables with headers, for my first 'group' I
have
# T1_1A T1_1B....
# T1_2A T1_2B...
# T1_3A T1_3B...

------------------------
x1=open(sys.argv[1])
dat1=asciitable.read(x1,Reader=asciitable.CommentedHeader,
fill_values=[('','-99.9'),('...','-99.9')])
 ------------------------


The first big problem is:
I cannot read file T1_01  with asciitable, I thought that fill_values would
take care of the empty space but I get the following error
------------------------------------------------
Traceback (most recent call last):
  File "RunAElist.py", line 25, in <module>
    dat1=asciitable.read(x1,Reader=asciitable.CommentedHeader,
fill_values=[('','-99.9'),('...','-99.9')])
  File
"/Library/Frameworks/Python.framework/Versions/7.2/lib/python2.7/site-packages/asciitable-0.8.0-py2.7.egg/asciitable/ui.py",
line 131, in read
    dat = _guess(table, new_kwargs)
  File
"/Library/Frameworks/Python.framework/Versions/7.2/lib/python2.7/site-packages/asciitable-0.8.0-py2.7.egg/asciitable/ui.py",
line 205, in _guess
    raise core.InconsistentTableError('\n'.join(lines))
asciitable.core.InconsistentTableError:
ERROR: Unable to guess table for with the guesses listed below:
Reader:CommentedHeader fill_values: [('', '-99.9'), ('...', '-99.9')]
Reader:CommentedHeader delimiter: '|' fill_values: [('', '-99.9'), ('...',
'-99.9')] quotechar: '"'
Reader:CommentedHeader delimiter: '|' fill_values: [('', '-99.9'), ('...',
'-99.9')] quotechar: "'"
Reader:CommentedHeader delimiter: ',' fill_values: [('', '-99.9'), ('...',
'-99.9')] quotechar: '"'
Reader:CommentedHeader delimiter: ',' fill_values: [('', '-99.9'), ('...',
'-99.9')] quotechar: "'"
Reader:CommentedHeader delimiter: ' ' fill_values: [('', '-99.9'), ('...',
'-99.9')] quotechar: '"'
Reader:CommentedHeader delimiter: ' ' fill_values: [('', '-99.9'), ('...',
'-99.9')] quotechar: "'"
Reader:CommentedHeader delimiter: '\\s' fill_values: [('', '-99.9'),
('...', '-99.9')] quotechar: '"'
Reader:CommentedHeader delimiter: '\\s' fill_values: [('', '-99.9'),
('...', '-99.9')] quotechar: "'"
Reader:CommentedHeader fill_values: [('', '-99.9'), ('...', '-99.9')]
ERROR: Unable to guess table for with the guesses listed above.
Check the table and try with guess=False and appropriate arguments to read()
---------------------------------

However, I CAN read file T1_02 and T1_03, if I do this and ignore T01, I
can make a joined file, but the problem comes in writing the empty spaces
as 'something'.  There will be several T1_02 with no match to T1_03, and
this is still without being able to read T1_01.  In theory if I could read
T1_01, since T1_01 and T1_02 will be the same length, I just declare
'fiber' to be the matching variable 1,2,3...400 as shown in the table above
and what I am thinking is a variation on:

fopen=open('Megalith1.list','w')
for i in xrange(len(varT1_02)):
    if fiber1[i] == fiber2[i]:
        for j in xrange(len(varT1_03)):
            if fiber2[i] == fiber3[j]:
                fopen.write("   ".join([str(k) for k in fiber1])+"   "+"
".join([str(k) for k in fiber2])+"   "+"   ".join([str(k) for k in
fiber3])+"\n")
            else:
                fopen.write("   ".join([str(k) for k in fiber1])+"   "+"
".join([str(k) for k in fiber2])+"\n")

However, I don't resolve the problem of the empty spaces, and I still can't
find a good way to read my T1_01... and I'm additionally not sure how to
make a batch job out of all this table goodness.
Any suggestions are welcome.  Thanks in advance!
~Elaina
-- 
PhD Candidate
Department of Physics and Astronomy
Faculty of Science
Macquarie University
North Ryde, NSW 2109, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20120813/4edca9c1/attachment-0001.html>


More information about the Tutor mailing list