Compare list entry from csv files

Dave Angel d at davea.name
Mon Nov 26 17:08:37 EST 2012


On 11/26/2012 04:08 PM, Anatoli Hristov wrote:
> Hello,
>
> I'm trying to complete a namebook CSV file with missing phone numbers
> which are in another CSV file.
> the namebook file is structured:
> First name;Lastname; Address; City; Country; Phone number, where the
> phone number is missing.
>
> The phonebook file is structured as:
> Name; phone, where the name shows first and last name and sometimes
> they are written together like "BillGates" or "Billgatesmicrosoft".
>
> I'm importing the files as lists ex.: phonelist" ["First name", "Last
> name","address","City"."Country","phone"],[etc...]
> in the loop I can compare the entry for ex. "Bill Gates" in the field
> "BillGatesmicrosoft" but I can't index it so I can only take the phone
> number from the file with the phones and insert it to field in the
> Namebook. Can you please give me an advice?
>
> Thanks
>
>
> import csv
>
> origf = open('c:/Working/Test_phonebook.csv', 'rt')
> phonelist = []
>
> try:
>     reader = csv.reader(origf, delimiter=';')
>     for row in reader:
>         phonelist.append(row)
> finally:
>     origf.close()
>
> secfile = open('c:/Working/phones.csv', 'rt')
> phones = []
>
> try:
>     readersec = csv.reader(secfile, delimiter=';')
>     for row in readersec:
>         phones.append(row)
> finally:
>     secfile.close()

You're trying to merge information from a second file into a first one,
where the shared key is only a little bit similar.  Good luck.

For example., in the first file, it might say  Susan; Gatley  and in the
other file it might say Mom.   Good luck coming up with an algorthm to
match those.

Now if you are assured that the two will be identical except for spaces,
then you could reduce both keys to the same format and then match them. 
Or if you want to say they're within a Soundex definition of each
other.  Or if you want to claim that they'll have the same words in
them, but not necessarily the same order.

But if these files are really as randomly connected as you say, then the
best you can probably do is to write two programs.  First is where you
take the names from each file and produce a 3rd file associating the
ones that are obvious (according to some algorithm), then build a list
of exceptions.  Then allow a human being to edit that file.  Then the
second file uses it to merge the first two files for your final pass.


-- 

DaveA




More information about the Python-list mailing list