[Tutor] 1 to N searches in files
Steven D'Aprano
steve at pearwood.info
Sun Dec 2 10:34:24 CET 2012
On 02/12/12 19:53, Spectral None wrote:
> However, it seems that the results do not correctly reflect the
>matched/unmatched lines. As an example, if FileA contains "string1"
> and FileB contains multiple occurrences of "string1", it seems that
> the first occurrence matches correctly but subsequent "string1"s
>are treated as unmatched strings.
>
> I am thinking perhaps I don't understand Differ() that well and that
> it is not doing what I hoped to do? Is Differ() comparing first line
> to first line and second line to second line etc in contrast to what
> I wanted to do?
No, and yes.
No, it is not comparing first line to first line.
And yes, it is acting in contrast to what you hope to do, otherwise you
wouldn't be asking the question :-)
Unfortunately, you don't explain what it is that you hope to do, so I'm
going to have to guess. See below.
difflib is used for find differences between two files. It will try to
find a set of changes which will turn file A into file B, e.g:
insert this line here
delete this line there
...
and repeated as many times as needed. Except that difflib.Differ uses
a shorthand of "+" and "-" to indicate adding and deleting lines.
You can find out more about difflib and Differ objects by reading the
Fine Manual. Open a Python interactive shell, and do this:
import difflib
help(difflib.Differ)
If you have any questions, please feel free to ask.
In the code sample you give, you say you do this:
mydiff = difflib.Differ()
results = mydiff(a,b)
but that doesn't work, Differ objects are not callable. Please do not
paraphrase your code. Copy and paste the exact code you have actually
run, don't try to type it out from memory.
Now, I *guess* that what you are trying to do is something like this...
given files A and B:
# file A
spam
ham
eggs
tomato
# file B
tomato
spam
eggs
cheese
spam
spam
you want to generate three lists:
# lines in B that were also in A:
tomato
spam
eggs
# lines in B that were not in A:
cheese
# lines in A that were not found in B:
ham
Am I close?
If not, please explain with an example what you are trying
to do.
--
Steven
More information about the Tutor
mailing list