Speeding up: s += "string"

Beni Cherniavsky cben at techunix.technion.ac.il
Tue Apr 15 16:17:59 CEST 2003

Oren Tirosh wrote on 2003-04-15:

> Guido is reluctant to add major new code to the code unless it gets some
> serious real-world testing as an extension module. But in the case of
> strings there is a problem: you can't create an object that will be a
> drop-in replacement for a string in Python unless it's actually derived
> from Py_StringObject and has the same internal structure. It can be an
> object that behaves much like a string and it can have an __str__ method
> to convert it into a string - but it will not be a compatible with a
> string in many situations. For example, it will not be accepted as an
> argument to string methods. It's not enough to make an object that can
> be converted into a string (almost any Python object can). It needs to
> *be* a string without sharing the same internal structure.
I bumped into this once while trying to create really transparent
proxies logging all accesses.  Many primitives written in C not only
require arguments to be isinstance() of some primitive class (string,
list, etc.) but also dirtily access its internal by-passing
Python-land customizations.  This means subclassing built-in types
doesn't really work.  At best I missed logging of some accesses, at
worst I got outright bugs.

> Such objects could be derived from the basestring class without being a
> subclass of either str or unicode. The only method they must implement
> is __str__. All other string methods and operations will be emulated by
> calling __str__ and passing the result to the original string method.
> The object may optionally override other methods to provide a more
> efficient implementation.

Seems a good approach to me.

> With this kind of infrastructure in place it
> would be possible to experiment with mutable strings, lazy concatenation,
> shared-buffer strings, etc as extension modules. The best solution could
> be considered for eventual integration into the core.
> It would require quite a lot of changes (ParseTuple, almost all the
> places that call PyString_Check, etc) but I think the Python code might
> actually benefit from it and become smaller and cleaner.  Currently
> there are a lot of places with hard-wired check for string,/unicode and
> objects exposing the buffer interface. These could be handled in a more
> generic way.
I'm +1 on anything that will extend the Pythonic way of informal
interfaces and "type checking by behavior" to interactions with
primitives.  I don't know the C code enough to comment on how to do it

Beni Cherniavsky <cben at tx.technion.ac.il>

More information about the Python-list mailing list