Finding duplicate file names and modifying them based on elements of the path
larry.martell at gmail.com
Fri Jul 20 03:01:36 CEST 2012
On Jul 19, 1:43 pm, Paul Rubin <no.em... at nospam.invalid> wrote:
> "Larry.Mart... at gmail.com" <larry.mart... at gmail.com> writes:
> > Thanks for the reply Paul. I had not heard of itertools. It sounds
> > like just what I need for this. But I am having 1 issue - how do you
> > know how many items are in each group?
> Simplest is:
> for key, group in groupby(xs, lambda x:(x[-1],x,x)):
> gs = list(group) # convert iterator to a list
> n = len(gs) # this is the number of elements
> there is some theoretical inelegance in that it requires each group to
> fit in memory, but you weren't really going to have billions of files
> with the same basename.
> If you're not used to iterators and itertools, note there are some
> subtleties to using groupby to iterate over files, because an iterator
> actually has state. It bumps a pointer and maybe consumes some input
> every time you advance it. In a situation like the above, you've got
> some nexted iterators (the groupby iterator generating groups, and the
> individual group iterators that come out of the groupby) that wrap the
> same file handle, so bad confusion can result if you advance both
> iterators without being careful (one can consume file input that you
> thought would go to another).
It seems that if you do a list(group) you have consumed the list. This
screwed me up for a while, and seems very counter-intuitive.
> This isn't as bad as it sounds once you get used to it, but it can be
> a source of frustration at first.
> BTW, if you just want to count the elements of an iterator (while
> consuming it),
> n = sum(1 for x in xs)
> counts the elements of xs without having to expand it into an in-memory
> Itertools really makes Python feel a lot more expressive and clean,
> despite little kinks like the above.
More information about the Python-list