string_join overrides TypeError exception thrown in generator
Hi, An interesting problem was pointed out to me, which I have distilled to this testcase: def gen(): raise TypeError, "I am a TypeError" yield 1 def one(): return ''.join( x for x in gen() ) def two(): return ''.join([x for x in gen()]) for x in one, two: try: x() except TypeError, e: print e Expected output is: """ I am a TypeError I am a TypeError """ Actual output is: """ sequence expected, generator found I am a TypeError """ Upon looking at the implementation of 'string_join' in stringobject.c[1], It's quite obvious what's gone wrong, an exception has been triggered in PySequence_Fast, and string_join overrides that exception, assuming that the only TypeErrors thrown by PySequence_Fast are caused by 'orig' being a value that was an invalid sequence type, ignoring the possibility that a TypeError could be thrown by exhausting a generator. seq = PySequence_Fast(orig, ""); if (seq == NULL) { if (PyErr_ExceptionMatches(PyExc_TypeError)) PyErr_Format(PyExc_TypeError, "sequence expected, %.80s found", orig->ob_type->tp_name); return NULL; } I can't see an obvious solution, but perhaps generators should get special treatment regardless. Reading over this code it looks like the generator is exhausted all at once, instead of incrementally.. -- Stephen Thorne Development Engineer [1] http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Objects/stringobject.c?rev=2.231&view=markup
Stephen Thorne wrote:
I can't see an obvious solution, but perhaps generators should get special treatment regardless. Reading over this code it looks like the generator is exhausted all at once, instead of incrementally..
Indeed - str.join uses a multipass approach to build the final string, so it needs to ensure it has a reiterable to play with. PySequence_Fast achieves that, at the cost of dumping a generator into a sequence rather than building a string from it directly. Unicode.join uses PySequence_Fast too, and has the same problem with masking the TypeError from the generator. The calling code simply can't tell if the NULL return was set directly by PySequence_Fast, or was relayed by PySequence_List (which got it from _PyList_Extend, which got it from listextend, which got it from iternext, etc). This is the kind of problem that PEP 344 is designed to solve :) This also shows that argument validation is one of the cases where using an iterable instead of a generator is a good thing, since errors get raised where the generator is created, instead of where it is first used: class gen(object): def __init__(self): raise TypeError, "I am a TypeError" def __iter__(self): yield 1 def one(): return ''.join( x for x in gen() ) def two(): return ''.join([x for x in gen()]) for x in one, two: try: x() except TypeError, e: print e Hmm, makes me think of a neat little decorator: def step_on_creation(gen): def start_gen(*args, **kwds): g = gen(*args, **kwds) g.next() return g start_gen.__name__ = gen.__name__ start_gen.__doc__ = gen.__doc__ start_gen.__dict__ = gen.__dict__ return start_gen @step_on_creation def gen(): # Setup executed at creation time raise TypeError, "I am a TypeError" yield None # The actual iteration steps yield 1 Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.blogspot.com
participants (2)
-
Nick Coghlan
-
Stephen Thorne