What is the most efficient way to find similarities and differences between the contents of two lists?

nn pruebauno at latinmail.com
Mon Jun 13 11:26:36 EDT 2011


On Jun 13, 11:06 am, Zachary Dziura <zcdzi... at gmail.com> wrote:
> Hi all.
>
> I'm writing a Python script that will be used to compare two database
> tables. Currently, those two tables are dumped into .csv files,
> whereby my code goes through both files and makes comparisons. Thus
> far, I only have functionality coded to make comparisons on the
> headers to check for similarities and differences. Here is the code
> for that functionality:
>
> similar_headers = 0
> different_headers = 0
> source_headers = sorted(source_mapping.headers)
> target_headers = sorted(target_mapping.headers)
>
> # Check if the headers between the two mappings are the same
> if set(source_headers) == set(target_headers):
>     similar_headers = len(source_headers)
> else:
>     # We're going to do two run-throughs of the tables, to find the
>     # different and similar header names. Start with the source
>     # headers...
>     for source_header in source_headers:
>         if source_header in target_headers:
>             similar_headers += 1
>         else:
>             different_headers += 1
>     # Now check target headers for any differences
>     for target_header in target_headers:
>         if target_header in source_headers:
>             pass
>         else:
>             different_headers += 1
>
> As you can probably tell, I make two iterations: one for the
> 'source_headers' list, and another for the 'target_headers' list.
> During the first iteration, if a specific header (mapped to a variable
> 'source_header') exists in both lists, then the 'similar_headers'
> variable is incremented by one. Similarly, if it doesn't exist in both
> lists, 'different_headers' is incremented by one. For the second
> iteration, it only checks for different headers.
>
> My code works as expected and there are no bugs, however I get the
> feeling that I'm not doing this comparison in the most efficient way
> possible. Is there another way that I can make this same comparison
> while making my code more Pythonic and efficient? I would prefer not
> to have to install an external module from elsewhere, though if I have
> to then I will.
>
> Thanks in advance for any and all answers!

how about:

# Check if the headers between the two mappings are the same
source_headers_set = set(source_headers)
target_headers_set = set(target_headers)

similar_headers = len(source_headers_set & target_headers_set)
different_headers = len(source_headers_set ^ target_headers_set)



More information about the Python-list mailing list