String performance regression from python 3.2 to 3.3

Terry Reedy tjreedy at
Thu Mar 14 03:35:44 CET 2013

On 3/13/2013 7:43 PM, Chris Angelico wrote:
> On Thu, Mar 14, 2013 at 3:49 AM, rusi <rustompmody at> wrote:
>> This assumes that there are only three choices:
>> - narrow build that is buggy (surrogate pairs for astral characters)
>> - wide build that is 4-fold space inefficient for wide variety of
>> common (ASCII) use-cases
>> - flexible string engine that chooses a small tradeoff of space
>> efficiency over time efficiency.

Wrong. Python almost certainly runs faster with the new string 
representation. This has been explained previously more than once.

>> There is a fourth choice: narrow build that chooses to be partial over
>> being buggy. ie when an astral character is encountered, an exception
>> is thrown rather than trying to fudge it into a 16-bit
>> representation.

This is what tcl/tk does, and it is a dammed nuisance. Completely 
unacceptible for Python's string type.
> It's complexity cost, though, and people would need to know when it
> would be worth giving Python that switch to change its string format.
> Plus, every C extension would need to cope with both formats. I
> personally doubt it'd be worth it, but if you want to knock together a
> patched CPython and get some timing stats, I'm sure this list or
> python-dev will be happy to discuss the matter. :)

I presume the smiley indicates that you know that python developers are 
too busy with real problems to have any interest in bogus solutions to 
bogus problems.

Terry Jan Reedy

More information about the Python-list mailing list