There must be a better way

Oscar Benjamin oscar.j.benjamin at gmail.com
Tue Apr 23 10:15:21 EDT 2013


On 23 April 2013 14:36, Neil Cerutti <neilc at norwich.edu> wrote:
> On 2013-04-22, Colin J. Williams <cjw at ncf.ca> wrote:
>> Since I'm only interested in one or two columns, the simpler
>> approach is probably better.
>
> Here's a sketch of how one of my projects handles that situation.
> I think the index variables are invaluable documentation, and
> make it a bit more robust. (Python 3, so not every bit is
> relevant to you).
>
> with open("today.csv", encoding='UTF-8', newline='') as today_file:
>     reader = csv.reader(today_file)
>     header = next(reader)

I once had a bug that took a long time to track down and was caused by
using next() without an enclosing try/except StopIteration (or the
optional default argument to next).

This is a sketch of how you can get the bug that I had:

$ cat next.py
#!/usr/bin/env python

def join(iterables):
    '''Join iterable of iterables, stripping first item'''
    for iterable in iterables:
        iterator = iter(iterable)
        header = next(iterator)  # Here's the problem
        for val in iterator:
            yield val

data = [
    ['foo', 1, 2, 3],
    ['bar', 4, 5, 6],
    [], # Whoops! Who put this empty iterable here?
    ['baz', 7, 8, 9],
]

for x in join(data):
    print(x)

$ ./next.py
1
2
3
4
5
6

The values 7, 8 and 9 are not printed but no error message is shown.
This is because calling next on the iterator over the empty list
raises a StopIteration that is not caught in the join generator. The
StopIteration is then "caught" by the for loop that iterates over
join() causing the loop to terminate prematurely. Since the exception
is caught and cleared by the for loop there's no practical way to get
a debugger to hook into the event that causes it.

In my case this happened somewhere in the middle of a long running
process. It was difficult to pin down what was causing this as the
iteration was over non-constant data and I didn't know what I was
looking for. As a result of the time spent fixing this I'm always very
cautious about calling next() to think about what a StopIteration
would do in context.

In this case a StopIteration is raised when reading from an empty csv file:

>>> import csv
>>> with open('test.csv', 'w'): pass
...
>>> with open('test.csv') as csvfile:
...     reader = csv.reader(csvfile)
...     header = next(reader)
...
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
StopIteration

If that code were called from a generator then it would most likely be
susceptible to the problem I'm describing. The fix is to use
next(reader, None) or try/except StopIteration.


Oscar



More information about the Python-list mailing list