Beginner Question : Iterators and zip

bruno.desthuilliers at gmail.com bruno.desthuilliers at gmail.com
Sat Jul 12 15:50:25 EDT 2008


On 12 juil, 20:55, moo... at yahoo.co.uk wrote:
> Hi group,
>
> I have a basic question on the zip built in function.
>
> I am writing a simple text file comparison script, that compares line
> by line and character by character. The output is the original file,
> with an X in place of any characters that are different.
>
> I have managed a solution for a fixed (3) number of files, but I want
> a solution of any number of input files.
>
> The outline of my solution:
>
>         for vec in zip(vec_list[0],vec_list[1],vec_list[2]):
>             res = ''
>             for entry in zip(vec[0],vec[1],vec[2]):
>                 if len(set(entry)) > 1:
>                     res = res+'X'
>                 else:
>                     res = res+entry[0]
>             outfile.write(res)
>
> So vec is a tuple containing a line from each file, and then entry is
> a tuple containg a character from each line.
>
> 2 questions
> 1) What is the general solution. Using zip in this way looks wrong. Is
> there another function that does what I want

zip is (mostly) ok. What you're missing is how to use it for any
arbitrary number of sequences. Try this instead:

>>> lists = [range(5), range(5,11), range(11, 16)]
>>> lists
[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15]]
>>> for item in zip(*lists):
...     print item
...
(0, 5, 11)
(1, 6, 12)
(2, 7, 13)
(3, 8, 14)
(4, 9, 15)
>>> lists = [range(5), range(5,11), range(11, 16), range(16, 20)]
>>> for item in zip(*lists):
...     print item
...
(0, 5, 11, 16)
(1, 6, 12, 17)
(2, 7, 13, 18)
(3, 8, 14, 19)
>>>

The only caveat with zip() is that it will only use as many items as
there are in your shorter sequence, ie:

>>> zip(range(3), range(10))
[(0, 0), (1, 1), (2, 2)]
>>> zip(range(30), range(10))
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8,
8), (9, 9)]
>>>

So you'd better pad your sequences to make them as long as the longer
one. There are idioms for doing this using the itertools package's
chain and repeat iterators, but I'll leave concrete example as an
exercice to the reader !-)

> 2) I am using set to remove any repeated characters. Is there a
> "better" way ?

That's probably what I'd do too.

> Any other comments/suggestions appreciated.

There's a difflib package in the standard lib. Did you give it a try ?



More information about the Python-list mailing list