[Cython] Redundant Cython exception message strings

Robert Bradshaw robertwb at math.washington.edu
Sat May 28 18:14:04 CEST 2011


On Sat, May 28, 2011 at 1:15 AM, Vitja Makarov <vitja.makarov at gmail.com> wrote:
> 2011/5/28 Robert Bradshaw <robertwb at math.washington.edu>:
>> On Fri, May 27, 2011 at 3:32 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>> Hi,
>>>
>>> I recently stumbled over a tradeoff question with AttributeError, and now
>>> found the same situation for UnboundLocalError in Vitja's control flow
>>> branch. So here it is.
>>>
>>> When we raise an exception several times in different parts of the code with
>>> a message that only differs slightly each time (usually something like
>>> "'NoneType' has no attribute X", or "local variable X referenced before
>>> assignment"), we have three choices to handle this:
>>>
>>> 1) Optimise for speed: create a Python string object at module
>>> initialisation time and call PyErr_SetObject(exc_type, msg_str_obj).
>>>
>>> 2) Current way: let CPython create the string object when raising the
>>> exception and just call PyErr_SetString(exc_type, "complete message").
>>>
>>> 3) Trade speed for size and allow the C compiler to reduce the storage
>>> redundancy: write only the message template and the names as C char*
>>> constants by calling PyErr_Format(exc_type, "message template %s", "X").
>>>
>>> Assuming that exceptions should be exceptional, I'm leaning towards 3). This
>>> would allow the C compiler to collapse multiple usages of the same C string
>>> into one data constant, thus reducing a bit of redundancy in the shared
>>> library size and the memory footprint. However, it would (slightly?) slow
>>> down the exception raising due to the additional string formatting, even
>>> when compared to the need to build a Python string object that it shares
>>> with 2). While 1) would obviously be the fastest way to raise an exception
>>> (no memory allocation, only refcounting), I think it's not worth it for
>>> exceptions as it increases both the runtime memory overhead and the module
>>> startup time.
>>>
>>> Thoughts?
>>
>> Any back-of-the-envelope calculations on how much the savings would
>> be? I think I'm leaning towards 3 as well, certainly not option 1.
>>
>
> For UnboundLocalError and NameError I used 2) way:
>
> https://github.com/vitek/cython/commit/1fe86b85d965753244cd09db38b1089b40f09a58
>
> So maybe I should add functions like __Pyx_RaiseUnboundLocalError and
> __Pyx_RaiseClosureNameError that will  use 3) way.
> How do you like put_error_if_unbound CCodeWriter method is that right
> place for it?

I don't think abstracting it out to a function really saves anything
here given that Python already has PyErr_Format.

- Robert


More information about the cython-devel mailing list