[Python-Dev] Encoding of PyFrameObject members
Xavier de Gaye
xdegaye at gmail.com
Sat Feb 7 11:13:00 CET 2015
On 02/06/2015 11:48 PM, Francis Giraldeau wrote:
> 2015-02-06 6:04 GMT-05:00 Armin Rigo:
>
> Hi,
>
> On 6 February 2015 at 08:24, Maciej Fijalkowski <fijall at gmail.com <mailto:fijall at gmail.com>> wrote:
> > I don't think it's safe to assume f_code is properly filled by the
> > time you might read it, depending a bit where you find the frame
> > object. Are you sure it's not full of garbage?
>
>
> Yes, before discussing how to do the utf8 decoding, we should realize
> that it is really unsafe code starting from the line before. From a
> signal handler you're only supposed to read data that was written to
> "volatile" fields. So even PyEval_GetFrame(), which is done by
> reading the thread state's "frame" field, is not safe: this is not a
> volatile. This means that the compiler is free to do crazy things
> like *first* write into this field and *then* initialize the actual
> content of the frame. The uninitialized content may be garbage, not
> just NULLs.
>
>
> Thanks for these comments. Of course accessing frames withing a signal handler is racy. I confirm that code encoded in non-ascii is not accessible from the uft8 buffer pointer. However, a call
> to PyUnicode_AsUTF8() encodes the data and caches it in the unicode object. Later access returns the byte buffer without memory allocation and re-encoding.
>
> I think it is possible to solve both safety problems by registering a handler with PyPyEval_SetProfile(). On function entry, the handler will call PyUnicode_AsUTF8() on the required frame members to
> make sure the utf8 encoded string is available. Then, we increment the refcount of the frame and assign it to a thread local pointer. On function return, the refcount is decremented. These operations
> occurs in the normal context and they are not racy. The signal handler will use the thread local frame pointer instead of calling PyEval_GetFrame(). Does that sounds good?
You could call Py_AddPendingCall() from your signal handler and access the
frame members from the function scheduled by Py_AddPendingCall().
Xavier
More information about the Python-Dev
mailing list