[Tutor] merging 2 files.
Martin A. Brown
martin at linux-ip.net
Thu Feb 24 11:41:50 CET 2011
Hi Nitin,
: currently the data in both the file is 6 - 10,000 rows max.
Many ways to skin this cat. You say that the files are 6-10,000
lines. These are small files. Load them into memory. Learn how to
use csv.reader.
: PROBLEM : I need to pick the "first coloum" from test.csv AND
: SEARCH in jhun.csv "second coloum" , IF matches read that row
: from jhun.csv, break it into individual values , concat with the
: first file, test.csv, individual values and write to a third
: file, eg. merged2.csv
Always break your problem into its parts and examine your data.
There's probably a data structure that suits your needs. You have a
lookup table, 'jhun.csv' (your second file). Given your problem
description, it seems like the first column in 'jhun.csv' has your
unique identifiers.
If that's accurate, then read that second file into some sort of
in-memory lookup table. A key, perhaps in a dictionary, would you
say?
Then, you can simply read your other file (test.csv) and print to
output. This is one quick and dirty solution:
import csv
# -- build the lookup table
#
lookup = dict()
file0 = csv.reader(open('jhun.csv','r'))
for row in file0:
lookup[ row[0] ] = row
# -- now, read through the
#
file1 = csv.reader(open('test.csv','r'))
for row in file1:
exists = lookup.get( row[0], None )
if exists:
print row, exists # -- print out only what you want
else:
pass # -- do you need to do something if no lookup entry?
At 10^4 lines in the lookup file, you could easily do this in
memory.
There are many tools for dealing with structured data, even loosely
structured data such as csv. When faced with a problem like this in
the future, ask yourself not only about what tools like csv.reader
you may have at your disposal, but also what data structures are
suited to your questions of your data.
: I am in need of the solution as client breathing down my neck.
They always do. Wear a scarf.
-Martin
--
Martin A. Brown
http://linux-ip.net/
More information about the Tutor
mailing list