A sets algorithm
Tim Chase
python.list at tim.thechases.com
Sun Feb 7 19:20:50 EST 2016
On 2016-02-08 00:05, Paulo da Silva wrote:
> Às 22:17 de 07-02-2016, Tim Chase escreveu:
>> all_files = list(generate_MyFile_objects())
>> interesting = [
>> (my_file1, my_file2)
>> for i, my_file1
>> in enumerate(all_files, 1)
>> for my_file2
>> in all_files[i:]
>> if my_file1 == my_file2
>> ]
>
> "my_file1 == my_file2" can be implemented into MyFile class taking
> advantage of caching sizes (if different files are different),
> hashes or even content (for small files) or file headers (first n
> bytes). However this seems to have a problem:
> all_files: a b c d e ...
> If a==b then comparing b with c,d,e is useless.
Depends on what the OP wants to have happen if more than one input
file is equal. I.e., a == b == c. Does one just want "a has
duplicates" (and optionally "and here's one of them"), or does one
want "a == b", "a == c" and "b == c" in the output?
> Another solution I thought of, could be defining some methods (I
> still don't know which ones) in MyFile so that I could use sets
> intersection. Would this one be a faster solution?
Adding __hash__ would allow for the set operations, but would
require (as ChrisA points out) knowing how to create a hash function
that encompasses the information you want to compare.
-tkc
More information about the Python-list
mailing list