[Python-Dev] RFC: Add a new builtin strarray type to Python?

Nick Coghlan ncoghlan at gmail.com
Sun Oct 2 05:13:53 CEST 2011


On Sat, Oct 1, 2011 at 1:17 PM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Most bytearray methods return a new object in most cases. I don't understand
> why, it's not efficient. I don't know if we can do in-place operations for
> strarray methods having the same name than bytearray methods (which are not
> in-place methods).

No, we can't. The whole point of having separate in-place operators is
to distinguish between operations that can modify the original object,
and those that leave the original object alone (even when it's an
instance of a mutable type like list or bytearray). Efficiency takes a
distant second place to correctness when determining API behaviour.

> str has some more methods that bytes and bytearary don't have, like format. We
> may do in-place operation for these methods.

No we can't, since they're not mutating methods, so they shouldn't
affect the state of the current object.

I'm only -0 on the idea (since bytearray and io.BytesIO seem to
coexist happily enough), but any such strarray object would need to
behave itself with respect to which operations affected the internal
state of the object.

With strings defined as immutable objects, concatenating them in a
loop is formally on O(N*N) operation. Those are always going to scale
poorly. The 'resize if only one reference' trick was fragile, masked a
real algorithmic flaw in user code, but also sped up a lot of naive
software. It was definitely a case of practicality beating purity.

Any change that depends on the user changing their code would be
rather missing the point of the original optimisation - if the user is
sufficiently aware of the problem to know they need to change their
code, then explicitly joining a list of substrings or using a StringIO
object instead of an ordinary string is well within their grasp.

Adding a "disjoint" string representation to the existing PEP 393
suite of representations would solve the same problem in a more
systematic way and, as Martin pointed out, could likely use the same
machinery as is provided for backwards compatibility with code
expecting the legacy string representation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list