Using enumerate to get line-numbers with itertools grouper?

Victor Hooi victorhooi at gmail.com
Mon Sep 14 05:44:16 CEST 2015


On Thursday, 3 September 2015 03:49:05 UTC+10, Terry Reedy  wrote:
> On 9/2/2015 6:04 AM, Victor Hooi wrote:
> > I'm using grouper() to iterate over a textfile in groups of lines:
> >
> > def grouper(iterable, n, fillvalue=None):
> >      "Collect data into fixed-length chunks or blocks"
> >      # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
> >      args = [iter(iterable)] * n
> >      return zip_longest(fillvalue=fillvalue, *args)
> >
> > However, I'd also like to know the line-number that I'm up to, for printing out in informational or error messages.
> >
> > Is there a way to use enumerate with grouper to achieve this?
> 
> Without a runnable test example, it is hard to be sure what you want. 
> However, I believe replacing 'iter(iterable)' with 'enumerate(iterable, 
> 1)', and taking into account that you will get (line_number, line) 
> tuples instead of lines, will do what you want.
> 
> -- 
> Terry Jan Reedy

Hi,

Hmm,  I've tried that suggestion, but for some reason, it doesn't seem to be unpacking the values correctly - in this case, line_number and chunk below just give me two successive items from the iterable:

Below is the complete code I'm running:

#!/usr/bin/env python3
    from datetime import datetime
    from itertools import zip_longest
    
    def grouper(iterable, n, fillvalue=None):
        "Collect data into fixed-length chunks or blocks"
        # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
        args = [enumerate(iterable, 1)] * n
        return zip_longest(fillvalue=fillvalue, *args)
    
    
    def parse_iostat(lines):
        """Parse lines of iostat information, yielding iostat blocks.
    
        lines should be an iterable yielding separate lines of output
        """
        block = None
        for line in lines:
            line = line.strip()
            try:
                if ' AM' in line or ' PM' in line: # What happens if their device names have AM or PM?
                    tm = datetime.strptime(line, "%m/%d/%Y %I:%M:%S %p")
                else:
                    tm = datetime.strptime(line, "%m/%d/%y %H:%M:%S")
                if block: yield block
                block = [tm]
            except ValueError:
                # It's not a new timestamp, so add it to the existing block
                # We ignore the iostat startup lines (which deals with random restarts of iostat), as well as empty lines
                if '_x86_64_' not in line:
                    block.append(line)
        if block: yield block
    
    with open('iostat_sample_12hr_time', 'r') as f:
        f.__next__() # Skip the "Linux..." line
        f.__next__() # Skip the blank line
        for line_number, chunk in grouper(parse_iostat(f), 2):
            print("Line Number: {}".format(line_number))
            print("Chunk: {}".format(chunk))


Here is the input file:

Linux 3.19.0-20-generic (ip-172-31-12-169)      06/25/2015      _x86_64_        (2 CPU)
    
    06/25/2015 07:37:04 AM
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               0.02    0.00    0.02    0.00    0.00   99.95
    
    Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    xvdap1            0.00     0.04    0.03    0.07     0.00     0.00    84.96     0.00   30.36    2.74   42.83   0.53   0.01
    xvdb              0.00     0.00    0.00    0.00     0.00     0.00    11.62     0.00    0.23    0.19    2.13   0.16   0.00
    xvdf              0.00     0.00    0.00    0.00     0.00     0.00    10.29     0.00    0.41    0.41    0.73   0.38   0.00
    xvdg              0.00     0.00    0.00    0.00     0.00     0.00     9.12     0.00    0.36    0.35    1.20   0.34   0.00
    xvdh              0.00     0.00    0.00    0.00     0.00     0.00    33.35     0.00    1.39    0.41    8.91   0.39   0.00
    dm-0              0.00     0.00    0.00    0.00     0.00     0.00    11.66     0.00    0.46    0.46    0.00   0.37   0.00
    
    06/25/2015 07:37:05 AM
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               0.50    0.00    0.50    0.00    0.00   99.01
    
    Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    xvdap1            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    xvdb              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    xvdf              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    xvdg              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    xvdh              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Essentially, in the full code, what I'd like to be able to do is process a "iostat" file, which contains "blocks" of iostat output, and know at any point in time what line number I was up to in the original file.

Cheers,
Victor


More information about the Python-list mailing list