Transforming ascii file (pseduo database) into proper database
bdesth.quelquechose at free.quelquepart.fr
Mon Jan 21 23:15:40 CET 2008
p. a écrit :
> I need to take a series of ascii files and transform the data
> contained therein so that it can be inserted into an existing
> database. The ascii files are just a series of lines, each line
> containing fields separated by '|' character. Relations amongst the
> data in the various files are denoted through an integer identifier, a
> pseudo key if you will. Unfortunately, the relations in the ascii file
> do not match up with those in the database in which i need to insert
> the data, i.e., I need to transform the data from the files before
> inserting into the database. Now, this would all be relatively simple
> if not for the following fact: The ascii files are each around 800MB,
> so pulling everything into memory and matching up the relations before
> inserting the data into the database is impossible.
> My questions are:
> 1. Has anyone done anything like this before,
More than once, yes.
> and if so, do you have
> any advice?
1/ use the csv module to parse your text files
2/ use a temporary database (which schema will mimic the one in the flat
files), so you can work with the appropriate tools - ie: the RDBMS will
take care of disk/memory management, and you'll have a specialized,
hi-level language (namely, SQL) to reassemble your data the right way.
> 2. In the abstract, can anyone think of a way of amassing all the
> related data for a specific identifier from all the individual files
> without pulling all of the files into memory and without having to
> repeatedly open, search, and close the files over and over again?
More information about the Python-list