Transforming ascii file (pseduo database) into proper database

Albert van der Horst albert at spenarnc.xs4all.nl
Mon Jan 28 04:50:48 EST 2008


In article <9b6a9a56-2ea6-4dd6-9420-afe9a2fdc8d8 at e32g2000prn.googlegroups.com>,
p. <ppetrick at gmail.com> wrote:
>I need to take a series of ascii files and transform the data
>contained therein so that it can be inserted into an existing
>database. The ascii files are just a series of lines, each line
>containing fields separated by '|' character. Relations amongst the
>data in the various files are denoted through an integer identifier, a
>pseudo key if you will. Unfortunately, the relations in the ascii file
>do not match up with those in the database in which i need to insert
>the data, i.e., I need to transform the data from the files before
>inserting into the database. Now, this would all be relatively simple
>if not for the following fact: The ascii files are each around 800MB,
>so pulling everything into memory and matching up the relations before
>inserting the data into the database is impossible.

In this case good old fashioned batch processing (line by line)
may be appropriate.
Read up on tools like sort and join.

These tools are present on all Unix-like systems, and on windows
in open-source toolkits.

>
>My questions are:
>1. Has anyone done anything like this before, and if so, do you have
>any advice?

Puzzling question. Computers weren't invented for GUI's. They were
invented for precisely this kind of thing. So, yes, it is a sure bet.

>2. In the abstract, can anyone think of a way of amassing all the
>related data for a specific identifier from all the individual files
>without pulling all of the files into memory and without having to
>repeatedly open, search, and close the files over and over again?

As long as you don't use Excell, it is not up to it ;-)

Groetjes Albert

--
-- 
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- like all pyramid schemes -- ultimately falters.
albert at spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst



More information about the Python-list mailing list