Beginner Question : Iterators and zip
tjreedy at udel.edu
Sun Jul 13 01:57:38 CEST 2008
moogyd at yahoo.co.uk wrote:
> Hi group,
> I have a basic question on the zip built in function.
> I am writing a simple text file comparison script, that compares line
> by line and character by character. The output is the original file,
> with an X in place of any characters that are different.
> I have managed a solution for a fixed (3) number of files, but I want
> a solution of any number of input files.
> The outline of my solution:
> for vec in zip(vec_list,vec_list,vec_list):
> res = ''
> for entry in zip(vec,vec,vec):
> if len(set(entry)) > 1:
> res = res+'X'
> res = res+entry
> So vec is a tuple containing a line from each file, and then entry is
> a tuple containg a character from each line.
> 2 questions
> 1) What is the general solution. Using zip in this way looks wrong. Is
> there another function that does what I want
zip(*vec_list) will zip together all entries in vec_list
Do be aware that zip stops on the shortest iterable. So if vec is
shorter than vec and matches otherwise, your output line will be
truncated. Or if vec is longer and vec matches as far as it goes,
there will be no signal either.
res=rex+whatever can be written as res+=whatever
> 2) I am using set to remove any repeated characters. Is there a
> "better" way ?
I might have written a third loop to compare vec to vec..., but
your set solution is easier and prettier.
If speed is an issue, don't rebuild the output line char by char. Just
change what is needed in a mutable copy. I like this better anyway.
res = list(vec) # if all ascii, in 3.0 use bytearray
for n, entry in enumerate(zip(vec,vec,vec)):
if len(set(entry)) > 1:
res[n] = 'X'
outfile.write(''.join(res)) # in 3.0, write(res)
More information about the Python-list