What is the most efficient way to find similarities and differences between the contents of two lists?
nn
pruebauno at latinmail.com
Mon Jun 13 11:26:36 EDT 2011
On Jun 13, 11:06 am, Zachary Dziura <zcdzi... at gmail.com> wrote:
> Hi all.
>
> I'm writing a Python script that will be used to compare two database
> tables. Currently, those two tables are dumped into .csv files,
> whereby my code goes through both files and makes comparisons. Thus
> far, I only have functionality coded to make comparisons on the
> headers to check for similarities and differences. Here is the code
> for that functionality:
>
> similar_headers = 0
> different_headers = 0
> source_headers = sorted(source_mapping.headers)
> target_headers = sorted(target_mapping.headers)
>
> # Check if the headers between the two mappings are the same
> if set(source_headers) == set(target_headers):
> similar_headers = len(source_headers)
> else:
> # We're going to do two run-throughs of the tables, to find the
> # different and similar header names. Start with the source
> # headers...
> for source_header in source_headers:
> if source_header in target_headers:
> similar_headers += 1
> else:
> different_headers += 1
> # Now check target headers for any differences
> for target_header in target_headers:
> if target_header in source_headers:
> pass
> else:
> different_headers += 1
>
> As you can probably tell, I make two iterations: one for the
> 'source_headers' list, and another for the 'target_headers' list.
> During the first iteration, if a specific header (mapped to a variable
> 'source_header') exists in both lists, then the 'similar_headers'
> variable is incremented by one. Similarly, if it doesn't exist in both
> lists, 'different_headers' is incremented by one. For the second
> iteration, it only checks for different headers.
>
> My code works as expected and there are no bugs, however I get the
> feeling that I'm not doing this comparison in the most efficient way
> possible. Is there another way that I can make this same comparison
> while making my code more Pythonic and efficient? I would prefer not
> to have to install an external module from elsewhere, though if I have
> to then I will.
>
> Thanks in advance for any and all answers!
how about:
# Check if the headers between the two mappings are the same
source_headers_set = set(source_headers)
target_headers_set = set(target_headers)
similar_headers = len(source_headers_set & target_headers_set)
different_headers = len(source_headers_set ^ target_headers_set)
More information about the Python-list
mailing list