[Tutor] a little loop

Oscar Benjamin oscar.j.benjamin at gmail.com
Thu May 30 20:07:41 CEST 2013


Sending again to the list (sorry boB)...

On 29 May 2013 17:51, boB Stepp <robertvstepp at gmail.com> wrote:
>> I don't know exactly how str.join is implemented but it does not use
>> this quadratic algorithm. For example if str.join would first compute
>> the length of the resulting string first then it can allocate memory
>> for exactly one string of that length and copy each substring to the
>> appropriate place (actually I imagine it uses an exponentially
>> resizing buffer but this isn't important).
>>
>
> ...str.join gets around these issues?

As I said I don't know how this is implemented in CPython (I hoped
Eryksun might chime in there :) ).

> In the linked article it was
> discussing increasing memory allocation by powers of two instead of
> trying to determine the exact length of the strings involved,
> mentioning that the maximum wasted memory would be 50% of what was
> actually needed. Is Python more clever in its implementation?

Actually the maximum memory wastage is 100% of what is needed or 50%
of what is actually used. This is if the amount needed is one greater
than a power of two and you end up doubling to the next power of two.
I don't see how CPython could be much cleverer in its implementation.
There aren't that many reasonable strategies here (when implementing
strings as linear arrays like CPython does).

>> * CPython actually has an optimisation that can append strings in
>> precisely this situation. However it is an implementation detail of
>> CPython that may change and it does not work in other interpreters
>> e.g. Jython. Using this kind of code can damage portability since your
>> program may run fine in CPython but fail in other interpreters.
>>
> You are speaking of "appending" and not "concatenation" here?

In this case I was just talking about single characters so you could
think of it as either. However, yes the optimisation is for
concatenation and in particular the '+' and '+=' operators.

> I had not even considered other Python interpreters than CPython. More
> complexity to consider for the future...

It's only a little bit of complexity. Just bear in mind the
distinction between a "language feature" that is true in any
conforming implementation and an "implementation detail" that happens
to be true in some or other interpreter but is not a specified part of
the language, In practise this means not really thinking too hard
about how CPython implements things and just using the recommended
idioms e.g. str.join.

I don't know if it is documented anywhere that str.join is linear
rather than quadratic but I consider that to be a language feature.
Exactly how it achieves linear behaviour (precomputing, resizing,
etc.) is an implementation detail. If your code relies only on
language features then it should not have problems when changing
interpreters.


Oscar


More information about the Tutor mailing list