[Tutor] Comparing two CSV filess using Python

Mark Lawrence breamoreboy at yahoo.co.uk
Sat Feb 14 02:32:59 CET 2015


On 13/02/2015 01:06, andy van wrote:
> Hi, I'm trying to compare two CSV files (and many more like these below). I
> tried many ways, using lists, dictreader and more but nothing gave me the
> output I require. I want to compare all those rows that have same
> !Sample_title and !Sample_geo_accession values (whose positions vary). I've
> been struggling with this for three days now and couldn't come to a
> solution. I highly appreciate any help.
>
> CSV1:
>
> !Sample_title,!Sample_geo_accession,!Sample_status,!Sample_type,!Sample_source_name_ch1
> body,GSM501443,Public on july 22 2010,ribonucleic acid,FB_50_12wk
> foreign,GSM501445,Public on july 22 2010,ribonucleic acid,FB_0_12wk
> HJCENV,GSM501446,Public on july 22 2010,ribonucleic acid,FB_50_12wk
> AsDW,GSM501444,Public on july 22 2010,ribonucleic acid,FB_0_12wk
>
> CSV2:
>
> !Sample_title,!Sample_type,!Sample_source_name_ch1,!Sample_geo_accession
> AsDW,ribonucleic acid,FB_0,GSM501444
> foreign,ribonucleic acid,FB,GSM501449
> HJCENV,RNA,12wk,GSM501446
>
> Desired output (with respect to CSV2):
>
> Added:
> {!Sample_status:{HJCENV:Public on july 22 2010,AsDW:Public on july 22
> 2010}} #Added columns, not rows.
>
> Deleted:
> {} #Since nothing's deleted with respect to CSV2
>
> Changed:
>
> {!Sample_title:AsDW,!Sample_source_name_ch1:(FB_0_12wk,FB_0),!Sample_geo_accession:GSM501444
> !Sample_title:HJCENV,!Sample_type:(ribonucleic
> acid,RNA),!Sample_source_name_ch1:(FB_50_12wk,12wk),!Sample_geo_accession:GSM501446}
> #foreign,ribonucleic acid,FB,GSM501449 doesn't come here since the
> !Sample_geo_accession column value didn't match.
>
> -AN
>

If you're looking at serious data work then I'd recommend pandas 
http://pandas.pydata.org/

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence



More information about the Tutor mailing list