Using enumerate to get line-numbers with itertools grouper?
Peter Otten
__peter__ at web.de
Wed Sep 2 09:09:29 EDT 2015
Victor Hooi wrote:
> Hi Peter,
>
> Hmm, are you sure that will work?
If you want the starting line for the batch, yes:
$ cat tmp.txt
alpha (line #1)
beta (line #2)
gamma (line #3)
delta (line #4)
epsilon (line #5)
zeta (line #6)
eta (line #7)
theta (line #8)
iota (line #9)
kappa (line #10)
$ cat grouper_demo.py
from itertools import zip_longest
def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return zip_longest(fillvalue=fillvalue, *args)
_BATCH_SIZE = 3
with open("tmp.txt", 'r') as f:
for index, chunk in enumerate(grouper(f, _BATCH_SIZE)):
print("batch starting at line", index * _BATCH_SIZE + 1)
print(chunk)
$ python3 grouper_demo.py
batch starting at line 1
('alpha (line #1)\n', 'beta (line #2)\n', 'gamma (line #3)\n')
batch starting at line 4
('delta (line #4)\n', 'epsilon (line #5)\n', 'zeta (line #6)\n')
batch starting at line 7
('eta (line #7)\n', 'theta (line #8)\n', 'iota (line #9)\n')
batch starting at line 10
('kappa (line #10)\n', None, None)
> The indexes returned by enumerate will start from zero.
>
> Also, I've realised line_number is a bit of a misnomer here - it's
> actually the index for the chunks that grouper() is returning.
>
> So say I had a 10-line textfile, and I was using a _BATCH_SIZE of 50.
>
> If I do:
>
> print(line_number * _BATCH_SIZE)
>
> I'd just get (0 * 50) = 0 printed out 10 times.
>
> Even if I add one:
>
> print((line_number + 1) * _BATCH_SIZE)
>
> I will just get 50 printed out 10 times.
So you are trying to solve a slightly different problem. You can attack that
by moving the enumerate() call:
$ cat grouper_demo2.py
from itertools import zip_longest
def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return zip_longest(fillvalue=fillvalue, *args)
_BATCH_SIZE = 3
with open("tmp.txt", 'r') as f:
for chunk in grouper(
enumerate(f, 1), _BATCH_SIZE, fillvalue=(None, None)):
print("--- batch ---")
for index, line in chunk:
if index is None:
break
print(index, line, end="")
print()
$ python3 grouper_demo2.py
--- batch ---
1 alpha (line #1)
2 beta (line #2)
3 gamma (line #3)
--- batch ---
4 delta (line #4)
5 epsilon (line #5)
6 zeta (line #6)
--- batch ---
7 eta (line #7)
8 theta (line #8)
9 iota (line #9)
--- batch ---
10 kappa (line #10)
$
More information about the Python-list
mailing list