Help me with the script? How to find items in csv file A and not in file B and vice versa

Tue Jun 18 05:01:49 EDT 2013

Hi Peter,
First off - many (many!) thanks.

There's some error I don't understand.
Here's the amended script I used:

import csv

#open CSV's and read first column with product IDs into variables pointing to lists
with open("Afile.csv", "rb") as f: 
    a = {row[0] for row in csv.reader(f)}
with open("Bfile.csv", "rb") as g: 
    b = {row[0] for row in csv.reader(g)} 

#create variables pointing to lists with unique product IDs in A and B respectively 
in_a_not_b = a-b 
in_b_not_a = b-a 

print in_a_not_b
print in_b_not_a

with open("inAnotB.csv", "wb") as f: 
    writer = csv.writer(f) 
    writer.writerows([item] for item in_a_not_b)

with open("inAnotB.csv", "wb") as g: 
    writer = csv.writer(g) 
    writer.writerows([item] for item in_b_not_a)

print "done!" 

and when I run it I get an invalid syntex error and (as a true newbie I used a GUI)in_a_not_b is highlighted in the 
with open("inAnotB.csv", "wb") as f: 
    writer = csv.writer(f) 
    writer.writerows([item] for item in_a_not_b)

part.

Could you please point our what I'm doing wrong?

Thanks again :)

On Tuesday, June 18, 2013 11:39:41 AM UTC+3, Peter Otten wrote:
> Alan Newbie wrote:
> 
> 
> 
> > Hello,
> 
> > Let's say I want to compare two csv files: file A and file B. They are
> 
> > both similarly built - the first column has product IDs (one product per
> 
> > row) and the columns provide some stats about the products such as sales
> 
> > in # and $.
> 
> > 
> 
> > I want to compare these files - see which product IDs appear in the first
> 
> > column of file A and not in B, and which in B and not A. Finally, it would
> 
> > be very great if the result could be written into two new CSV files - one
> 
> > product ID per row in the first column. (no other data in the other
> 
> > columns needed)
> 
> > 
> 
> > This is the script I tried:
> 
> > ==========================
> 
> > 
> 
> > import csv
> 
> > 
> 
> > #open CSV's and read first column with product IDs into variables pointing
> 
> > #to lists
> 
> > A = [line.split(',')[0] for line in open('Afile.csv')]
> 
> > B = [line.split(',')[0] for line in open('Bfile.csv')]
> 
> > 
> 
> > #create variables pointing to lists with unique product IDs in A and B
> 
> > #respectively
> 
> > inAnotB = list(set(A)-set(B))
> 
> > inBnotA = list(set(B)-set(A))
> 
> > 
> 
> > print inAnotB
> 
> > print inBnotA
> 
> > 
> 
> > c = csv.writer(open("inAnotB.csv", "wb"))
> 
> > c.writerow([inAnotB])
> 
> > 
> 
> > 
> 
> > d = csv.writer(open("inBnotA.csv", "wb"))
> 
> > d.writerow([inBnotA])
> 
> > 
> 
> > print "done!"
> 
> > 
> 
> > =====================================================
> 
> > 
> 
> > But it doesn't produce the required results.
> 
> > It prints IDs in this format:
> 
> > 247158132\n
> 
> 
> 
> Python reads lines from a file with the trailing newline included, and 
> 
> line.split(",") with only one column (i. e. no comma) keeps the whole line. 
> 
> As you already know about the csv module you should use it to read your 
> 
> data, e. g. instead of
> 
> 
> 
> > A = [line.split(',')[0] for line in open('Afile.csv')]
> 
> 
> 
> try
> 
> 
> 
> with open("Afile.csv", "rb") as f:
> 
>     a = {row[0] for row in csv.reader(f)}
> 
> ...
> 
> 
> 
> I used {...} instead of [...], so a is already a set and you can proceed:
> 
> 
> 
> 
> 
> in_a_not_b = a - b
> 
> 
> 
> Finally as a shortcut for
> 
> 
> 
> for item in in_a_not_b:
> 
>     writer.writerow([item])
> 
> 
> 
> use the writerows() method to write your data:
> 
> 
> 
> with open("inAnotB.csv", "wb") as f:
> 
>     writer = csv.writer(f)
> 
>     writer.writerows([item] for item in_a_not_b)
> 
> 
> 
> Note that I'm wrapping every item in the set rather than the complete set as 
> 
> a whole. If you wanted to be clever you could spell that even more succinct 
> 
> as
> 
> 
> 
>     writer.writerows(zip(in_a_not_b))
> 
> 
> 
> > and nothing to the csv files.
> 
> > 
> 
> > You could probably tell I'm a newbie.
> 
> > Could you help me out?
> 
> > 
> 
> > here's some dummy data:
> 
> > 
> 
> https://docs.google.com/file/d/0BwziqsHUZOWRYU15aEFuWm9fajA/edit?usp=sharing
> 
> > 
> 
> > 
> 
> https://docs.google.com/file/d/0BwziqsHUZOWRQVlTelVveEhsMm8/edit?usp=sharing
> 
> > 
> 
> > Thanks a bunch in advance! :)