Beginner Question : Iterators and zip

Terry Reedy tjreedy at udel.edu
Sun Jul 13 01:57:38 CEST 2008



moogyd at yahoo.co.uk wrote:
> Hi group,
> 
> I have a basic question on the zip built in function.
> 
> I am writing a simple text file comparison script, that compares line
> by line and character by character. The output is the original file,
> with an X in place of any characters that are different.
> 
> I have managed a solution for a fixed (3) number of files, but I want
> a solution of any number of input files.
> 
> The outline of my solution:
> 
>         for vec in zip(vec_list[0],vec_list[1],vec_list[2]):
>             res = ''
>             for entry in zip(vec[0],vec[1],vec[2]):
>                 if len(set(entry)) > 1:
>                     res = res+'X'
>                 else:
>                     res = res+entry[0]
>             outfile.write(res)
> 
> So vec is a tuple containing a line from each file, and then entry is
> a tuple containg a character from each line.
> 
> 2 questions
> 1) What is the general solution. Using zip in this way looks wrong. Is
> there another function that does what I want

zip(*vec_list) will zip together all entries in vec_list
Do be aware that zip stops on the shortest iterable.  So if vec[1] is 
shorter than vec[0] and matches otherwise, your output line will be 
truncated.  Or if vec[1] is longer and vec[0] matches as far as it goes, 
there will be no signal either.

res=rex+whatever can be written as res+=whatever

> 2) I am using set to remove any repeated characters. Is there a
> "better" way ?

I might have written a third loop to compare vec[0] to vec[1]..., but 
your set solution is easier and prettier.

If speed is an issue, don't rebuild the output line char by char.  Just 
change what is needed in a mutable copy.  I like this better anyway.

res = list(vec[0]) # if all ascii, in 3.0 use bytearray
for n, entry in enumerate(zip(vec[0],vec[1],vec[2])):
   if len(set(entry)) > 1:
       res[n] = 'X'
   outfile.write(''.join(res)) # in 3.0, write(res)

tjr







More information about the Python-list mailing list