[Python-Dev] Optimize Unicode strings in Python 3.3

Serhiy Storchaka storchaka at gmail.com
Wed May 30 14:08:44 CEST 2012


On 30.05.12 14:26, Victor Stinner wrote:
> I implemented something like that, and it was not efficient and very complex.
>
> See for example the (incomplete) patch for str%args attached to the
> issue #14687:
> http://bugs.python.org/file25413/pyunicode_format-2.patch

I have seen and commented on this patch. That's not what I'm talking about.

> IMO this approach is less efficient than the "Unicode writer" approach because:

I brought this approach is not for the opposition of the "Unicode 
writer", and for comparison with a straight "two steps" method. Of 
course, this can be combined with the "Unicode writer" to get the 
benefits of both methods. For example, you can advance to widen the 
output buffer to a width of format string, or disable overallocation 
when formating the last argument with non-empty suffix.

>   - you have to create many substrings or temporary strings in the
> first step, or (worse) compute each argument twice: the writer
> approach is more efficient here because it avoids computing substrings
> and temporary strings

Not on the first step but on the second step (and this is the only 
single step if you use caching), if you use the "Unicode writer".

>   - you have to parse the format string twice, or you have to write two
> versions of the code: first create a list of items, then concatenate
> items. The PyAccu method concatenates substrings at the end, it is
> less efficient than the writer method (e.g. it has to create a string
> of N fill characters to pad to WIDTH characters).

The code is divided into the compiler and the interpreter. Only the 
first one parses the format string. See Modules/_struct.c.

>   - the code is more complex than the writer method (which is very
> similar to what is used in Python 2.7 and 3.2)

The code that uses the writer method to be rather complicated, the 
difference in the total complexity of these approaches has become 
smaller. ;-)

But it is really not easy work, not assure success, so let waits for its 
time.



More information about the Python-Dev mailing list