[Tutor] File Compare

Peter Otten __peter__ at web.de
Fri Jan 9 13:55:25 CET 2015


Crusier wrote:

> Hi Danny,
> 
> Thanks for your suggestion.
> 
> The ideal of output of this program is to show if there is any new
> number added to the new file.
> 
> In other words,  the file content of file1 [0001.hk, 0002.hk, 0003.hk,
> 0004.hk] is comparing with the file content of file2 [0001.hk,
> 0002.hk, 0003.hk, 0005.hk].
> 
> The result should be +0005.hk, -0004.hk
> 
> Ah.  One other thing.  Can you explain what you're intending to do
> with this statement?
> 
>    A = file_contentsA.split(',')
> 
> My thinking is I want to make both files as a list, so I can compare
> the two files. However, as you can see, it is only my wishful
> thinking.

As Dave already mentioned you need to split on whitespace:

>>> file_contentsA = "0001.hk 0002.hk 0003.hk"
>>> file_contentsA.split()
['0001.hk', '0002.hk', '0003.hk']

The easiest way to get the added/removed entries is set arithmetic:

>>> file_contentsB = "0001.hk 0002.hk 0005.hk 0006.hk"
>>> entriesA = set(file_contentsA.split())
>>> entriesB = set(file_contentsB.split())
>>> entriesA - entriesB # removed items:
{'0003.hk'}
>>> entriesB - entriesA # added items:
{'0005.hk', '0006.hk'}

Now let's work on the output format:

>>> added = entriesB - entriesA
>>> removed = entriesA - entriesB
>>> added_or_removed = added | removed # union of both sets
>>> for entry in sorted(added_or_removed):
...     if entry in added:
...         print("+" + entry)
...     else:
...         print("-" + entry)
... 
-0003.hk
+0005.hk
+0006.hk

Limitations of this approach:
- information about duplicate entries is lost
- the original order of entries is lost



More information about the Tutor mailing list