Query regarding set()?
__peter__ at web.de
Fri Jul 10 08:04:35 EDT 2009
> I'm contsructing a simple compare-script and thought I would use set
> () to generate the difference output. But I'm obviosly doing
> something wrong.
> file1 contains 410 rows.
> file2 contains 386 rows.
> I want to know what rows are in file1 but not in file2.
> This is my script:
> s1 = set(open("file1"))
> s2 = set(open("file2"))
Remove the following three lines:
> s3 = set()
> s1temp = set()
> s2temp = set()
> s1temp = set(i.strip() for i in s1)
> s2temp = set(i.strip() for i in s2)
> s3 = s1temp-s2temp
> print len(s3)
> Output is 119. AFAIK 410-386=24. What am I doing wrong here?
You are probably misinterpreting len(s3). s3 contains lines occuring in
"file1" but not in "file2". Duplicate lines are only counted once, and the
order doesn't matter.
So there are 119 lines that occur at least once in "file2", but not in
If that is not what you want you have to tell us what exactly you are
More information about the Python-list