[Tutor] Matching 3 tables with almost identical columns
Steven D'Aprano
steve at pearwood.info
Tue Aug 14 18:53:18 CEST 2012
Hi Elaina,
My comments interleaved with yours. If you read nothing else, scroll down and read my last paragraph.
Good luck!
On 13/08/12 17:24, Elaina Ann Hyde wrote:
> Greetings,
> I have a rather monolithic database I am re-building at the moment, and
> I want to join 3 tables with semi-matching content, I have several sets of
> these but they all come in groups of three. Here is the situation:
>
> --note all tables are in ascii format, space deliminated, with readable
> #Header format---
Does this mean that the data is in a great big flat text file?
> T1_01= Table 1 =
> 1 + 'stuff1' +additional content (only on some lines)
> 2 ""
> 3 +'stuff1'+ a bunch of empty-ness
> ....400
So this is *one* table, correct? And you want to grab three tables at a time?
I don't follow which parts are part of the table format and which are your comments to us. This is my *guess* as to the format:
Each table starts with a single header line, e.g.:
T1_01= Table 1 =
followed by a maximum of 400 records, numbered from 1 to 400. Each record is a single line. Empty records look like this:
23 ""
(that is, the record number followed by an empty double-quoted string). Non-empty records look like this:
42 + 'content'
(that is, the record number followed by a plus-sign followed by a single-quoted string with the data). Some non-empty records have free form text after the field:
42 + 'content' +free form text goes here
or "a bunch of empty-ness" (whatever that means???)
42 + 'content' + [what goes here???]
Am I correct?
What do you mean by "a bunch of empty-ness"?
> T1_03 = Table 3 =
> 5 cols yet other stuff + 001 + stuff
> 5 cols yet other stuff + 003 ""
> 5 cols yet other stuff + 007 ""
> ...
> 5 cols yet other stuff + 400 some rows are skipped, varies which ones
Okay, now I'm completely lost. I can't begin to decipher that. Why does every record start with the same record number "5"? What are the 001, 003, etc. parts?
> what I want, for each 'group' I have 3 tables, since these are grouped in a
> handy fashion, ie T1_01, T1_02, T1_03 would be table 1,2,3 for group 1, and
> again for T2_01,T2_02,T2_03. I need to do this about 60 times in total and
> the table output I am hoping for is:
>
>
> T1_0123=
>
> 1 + 'stuff1' + additional content 1 1 + "different stuff" + 5 cols yet
> other stuff + 001 + additional content 3
> 2 + 'stuff1' + additional content 1 2 + "different stuff" + "something to
> fill in the empty spaces, like a set of -99.9 values"
> 3 + 'stuff1' + "something to fill in empty spaces as no additional content
> available for this line" 1 3 + "different stuff" + additional content 2 5
> cols yet other stuff + 003 + additional content 3
> ...
> 400 ""
Can you simplify your explanation? As it stands, it is just a big ball of mud. Please give a SHORT, SIMPLE example of the data you have to work with, using SIMPLE and DISTINCT values as placeholders.
E.g.:
T1_01= Table 1 =
1 + 'a'
2 + 'b' + foo
3 + 'c' + bar
T1_02= Table 2 =
1 + 'A'
2 + 'B' + baz
3 + 'C'
T1_03= Table 3 =
1 + 'x'
2 + 'y'
3 + 'z'
And the merged table:
T1_0123= Table 123 =
1 + 'a' 'A' 'x'
2 + 'b' + foo 'B' + baz 'y'
3 + 'c' + bar 'C' 'z'
Am I close? What bits am I missing?
> now I was hoping to run all 3 tables in a go then do something like a batch
> job on all the sets.
> just a note:
> since all my tables are ascii tables with headers, for my first 'group' I
> have
> # T1_1A T1_1B....
> # T1_2A T1_2B...
> # T1_3A T1_3B...
And I don't understand this either. Are the hash marks part of the file you are working with?
> ------------------------
> x1=open(sys.argv[1])
> dat1=asciitable.read(x1,Reader=asciitable.CommentedHeader,
> fill_values=[('','-99.9'),('...','-99.9')])
> ------------------------
What is asciitable?
If that is not a standard Python library, it is very likely that nobody here will have used it before. You might have to contact the author of the library, or ask on a forum specially for asciitable (if there is one). If all else fails, you could try the mailing list python-list at python.org (also available as a newsgroup, comp.lang.python) and see if anyone there has used asciitable.
--
Steven
More information about the Tutor
mailing list