[Tutor] Matching 3 tables with almost identical columns

Steven D'Aprano steve at pearwood.info
Tue Aug 14 18:53:18 CEST 2012


Hi Elaina,

My comments interleaved with yours. If you read nothing else, scroll down and read my last paragraph.

Good luck!



On 13/08/12 17:24, Elaina Ann Hyde wrote:
> Greetings,
>     I have a rather monolithic database I am re-building at the moment, and
> I want to join 3 tables with semi-matching content, I have several sets of
> these but they all come in groups of three.  Here is the situation:
>
> --note all tables are in ascii format, space deliminated, with readable
> #Header format---


Does this mean that the data is in a great big flat text file?


> T1_01= Table 1 =
> 1 + 'stuff1' +additional content (only on some lines)
> 2  ""
> 3  +'stuff1'+ a bunch of empty-ness
> ....400

So this is *one* table, correct? And you want to grab three tables at a time?

I don't follow which parts are part of the table format and which are your comments to us. This is my *guess* as to the format:

Each table starts with a single header line, e.g.:

     T1_01= Table 1 =

followed by a maximum of 400 records, numbered from 1 to 400. Each record is a single line. Empty records look like this:

     23 ""

(that is, the record number followed by an empty double-quoted string). Non-empty records look like this:

     42 + 'content'

(that is, the record number followed by a plus-sign followed by a single-quoted string with the data). Some non-empty records have free form text after the field:

     42 + 'content' +free form text goes here

or "a bunch of empty-ness" (whatever that means???)

     42 + 'content' + [what goes here???]



Am I correct?

What do you mean by "a bunch of empty-ness"?



> T1_03 = Table 3 =
> 5 cols yet other stuff + 001 + stuff
> 5 cols yet other stuff + 003    ""
> 5 cols yet other stuff + 007    ""
> ...
> 5 cols yet other stuff + 400   some rows are skipped, varies which ones

Okay, now I'm completely lost. I can't begin to decipher that. Why does every record start with the same record number "5"? What are the 001, 003, etc. parts?



> what I want, for each 'group' I have 3 tables, since these are grouped in a
> handy fashion, ie T1_01, T1_02, T1_03 would be table 1,2,3 for group 1, and
> again for T2_01,T2_02,T2_03.  I need to do this about 60 times in total and
> the table output I am hoping for is:
>
>
> T1_0123=
>
> 1 + 'stuff1' + additional content 1 1 + "different stuff" + 5 cols yet
> other stuff + 001 + additional content 3
> 2 + 'stuff1' + additional content 1 2 + "different stuff" +  "something to
> fill in the empty spaces, like a set of -99.9 values"
> 3 + 'stuff1' + "something to fill in empty spaces as no additional content
> available for this line" 1 3 + "different stuff" + additional content 2 5
> cols yet other stuff + 003 + additional content 3
> ...
> 400 ""


Can you simplify your explanation? As it stands, it is just a big ball of mud. Please give a SHORT, SIMPLE example of the data you have to work with, using SIMPLE and DISTINCT values as placeholders.

E.g.:

T1_01= Table 1 =
1 + 'a'
2 + 'b' + foo
3 + 'c' + bar

T1_02= Table 2 =
1 + 'A'
2 + 'B' + baz
3 + 'C'

T1_03= Table 3 =
1 + 'x'
2 + 'y'
3 + 'z'


And the merged table:

T1_0123= Table 123 =
1 + 'a' 'A' 'x'
2 + 'b' + foo 'B' + baz 'y'
3 + 'c' + bar 'C' 'z'


Am I close? What bits am I missing?


> now I was hoping to run all 3 tables in a go then do something like a batch
> job on all the sets.
> just a note:
> since all my tables are ascii tables with headers, for my first 'group' I
> have
> # T1_1A T1_1B....
> # T1_2A T1_2B...
> # T1_3A T1_3B...


And I don't understand this either. Are the hash marks part of the file you are working with?


> ------------------------
> x1=open(sys.argv[1])
> dat1=asciitable.read(x1,Reader=asciitable.CommentedHeader,
> fill_values=[('','-99.9'),('...','-99.9')])
>   ------------------------

What is asciitable?

If that is not a standard Python library, it is very likely that nobody here will have used it before. You might have to contact the author of the library, or ask on a forum specially for asciitable (if there is one). If all else fails, you could try the mailing list python-list at python.org (also available as a newsgroup, comp.lang.python) and see if anyone there has used asciitable.



-- 
Steven


More information about the Tutor mailing list