Using enumerate to get line-numbers with itertools grouper?

Peter Otten __peter__ at web.de
Wed Sep 2 15:09:29 CEST 2015


Victor Hooi wrote:

> Hi Peter,
> 
> Hmm, are you sure that will work?

If you want the starting line for the batch, yes:

$ cat tmp.txt
alpha (line #1)
beta (line #2)
gamma (line #3)
delta (line #4)
epsilon (line #5)
zeta (line #6)
eta (line #7)
theta (line #8)
iota (line #9)
kappa (line #10)
$ cat grouper_demo.py
from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return zip_longest(fillvalue=fillvalue, *args)


_BATCH_SIZE = 3

with open("tmp.txt", 'r') as f:
    for index, chunk in enumerate(grouper(f, _BATCH_SIZE)):
        print("batch starting at line", index * _BATCH_SIZE + 1)
        print(chunk)

$ python3 grouper_demo.py 
batch starting at line 1
('alpha (line #1)\n', 'beta (line #2)\n', 'gamma (line #3)\n')
batch starting at line 4
('delta (line #4)\n', 'epsilon (line #5)\n', 'zeta (line #6)\n')
batch starting at line 7
('eta (line #7)\n', 'theta (line #8)\n', 'iota (line #9)\n')
batch starting at line 10
('kappa (line #10)\n', None, None)


> The indexes returned by enumerate will start from zero.
> 
> Also, I've realised line_number is a bit of a misnomer here - it's
> actually the index for the chunks that grouper() is returning.
> 
> So say I had a 10-line textfile, and I was using a _BATCH_SIZE of 50.
> 
> If I do:
> 
>     print(line_number * _BATCH_SIZE)
> 
> I'd just get (0 * 50) = 0 printed out 10 times.
> 
> Even if I add one:
> 
>     print((line_number + 1) * _BATCH_SIZE)
> 
> I will just get 50 printed out 10 times.

So you are trying to solve a slightly different problem. You can attack that 
by moving the enumerate() call:

$ cat grouper_demo2.py
from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return zip_longest(fillvalue=fillvalue, *args)


_BATCH_SIZE = 3

with open("tmp.txt", 'r') as f:
    for chunk in grouper(
            enumerate(f, 1), _BATCH_SIZE, fillvalue=(None, None)):
        print("--- batch ---")
        for index, line in chunk:
            if index is None:
                break
            print(index, line, end="")
        print()
$ python3 grouper_demo2.py
--- batch ---
1 alpha (line #1)
2 beta (line #2)
3 gamma (line #3)

--- batch ---
4 delta (line #4)
5 epsilon (line #5)
6 zeta (line #6)

--- batch ---
7 eta (line #7)
8 theta (line #8)
9 iota (line #9)

--- batch ---
10 kappa (line #10)

$ 




More information about the Python-list mailing list