[Python-Dev] String views

skip@pobox.com skip at pobox.com
Fri Sep 2 05:14:52 CEST 2005


    >> I'm skeptical about performance as well, but not for that reason.  A
    >> string object can have a referent field.  If not NULL, it refers to
    >> another string object which is INCREFed in the usual way.  At string
    >> deallocation, if the referent is not NULL, the referent is DECREFed.
    >> If the referent is NULL, ob_sval is freed.

    Michael> Won't work. A string may have multiple referrents, so a single
    Michael> referent field isn't sufficient.

Hmmm...  I implemented it last night (though it has yet to be tested).  I
suspect it will work.  Here's my PyStringObject struct:

    typedef struct {
        PyObject_VAR_HEAD
        long ob_shash;
        int ob_sstate;
        PyObject *ob_referent;
        char *ob_sval;
    } PyStringObject;

(minus the invariants which I have yet to check).  Suppose url is a string
object whose value is "http://www.python.org/", and that it has a reference
count of 1 and isn't a view onto another string.  Its ob_referent field
would be NULL.  (Maybe it would be better named "ob_target".)  If we then
execute

    before, sep, after = url.partition(":")

upon return before, sep and after would be string objects whose ob_referent
field refers to url and url's reference count would be 4.  Their ob_sval
fields would point to the start of their piece of url.  When the reference
counts of before, sep and after reach zero, they are reclaimed.  Since they
each have a non-NULL ob_referent field, the target object is DECREFed, but
the ob_sval field is not freed.  In the case of url, when its reference
count reaches zero, since its ob_referent field is NULL, its ob_sval field
is freed.

The only tricky business was PyString_AsString.  If the argument object is a
view you have to "un-view" it by copying the interesting bits and DECREFing
the ob_referent.  This is because of the NUL termination guarantee.

I wonder if the use of views would offset the overhead of returning to a
double-malloc allocation.

Skip


More information about the Python-Dev mailing list