A sets algorithm
Paulo da Silva
p_s_d_a_s_i_l_v_a_ns at netcabo.pt
Sun Feb 7 19:05:16 EST 2016
Às 22:17 de 07-02-2016, Tim Chase escreveu:
> On 2016-02-07 21:46, Paulo da Silva wrote:
> If you the MyFile objects can be unique but compare for equality
> (e.g. two files on the file-system that have the same SHA1 hash, but
> you want to know the file-names), you'd have to do a paired search
> which would have worse performance and would need to iterate over the
> data multiple times:
> all_files = list(generate_MyFile_objects())
> interesting = [
> (my_file1, my_file2)
> for i, my_file1
> in enumerate(all_files, 1)
> for my_file2
> in all_files[i:]
> if my_file1 == my_file2
"my_file1 == my_file2" can be implemented into MyFile class taking
advantage of caching sizes (if different files are different), hashes or
even content (for small files) or file headers (first n bytes).
However this seems to have a problem:
all_files: a b c d e ...
If a==b then comparing b with c,d,e is useless.
May be using several steps with dict - sizes, then hashes for same sizes
files, etc ...
Another solution I thought of, could be defining some methods (I still
don't know which ones) in MyFile so that I could use sets intersection.
Would this one be a faster solution?
More information about the Python-list