[Tutor] File Compare
Peter Otten
__peter__ at web.de
Fri Jan 9 13:55:25 CET 2015
Crusier wrote:
> Hi Danny,
>
> Thanks for your suggestion.
>
> The ideal of output of this program is to show if there is any new
> number added to the new file.
>
> In other words, the file content of file1 [0001.hk, 0002.hk, 0003.hk,
> 0004.hk] is comparing with the file content of file2 [0001.hk,
> 0002.hk, 0003.hk, 0005.hk].
>
> The result should be +0005.hk, -0004.hk
>
> Ah. One other thing. Can you explain what you're intending to do
> with this statement?
>
> A = file_contentsA.split(',')
>
> My thinking is I want to make both files as a list, so I can compare
> the two files. However, as you can see, it is only my wishful
> thinking.
As Dave already mentioned you need to split on whitespace:
>>> file_contentsA = "0001.hk 0002.hk 0003.hk"
>>> file_contentsA.split()
['0001.hk', '0002.hk', '0003.hk']
The easiest way to get the added/removed entries is set arithmetic:
>>> file_contentsB = "0001.hk 0002.hk 0005.hk 0006.hk"
>>> entriesA = set(file_contentsA.split())
>>> entriesB = set(file_contentsB.split())
>>> entriesA - entriesB # removed items:
{'0003.hk'}
>>> entriesB - entriesA # added items:
{'0005.hk', '0006.hk'}
Now let's work on the output format:
>>> added = entriesB - entriesA
>>> removed = entriesA - entriesB
>>> added_or_removed = added | removed # union of both sets
>>> for entry in sorted(added_or_removed):
... if entry in added:
... print("+" + entry)
... else:
... print("-" + entry)
...
-0003.hk
+0005.hk
+0006.hk
Limitations of this approach:
- information about duplicate entries is lost
- the original order of entries is lost
More information about the Tutor
mailing list