[Python-Dev] Optimize Unicode strings in Python 3.3
Victor Stinner
victor.stinner at gmail.com
Wed May 30 13:26:14 CEST 2012
>> The "two steps" method is not promising: parsing the format string
>> twice is slower than other methods.
>
> The "1.5 steps" method is more promising -- first parse the format string in
> an efficient internal representation, and then allocate the output string
> and then write characters (or enlarge and widen the buffer, but with more
> information in any case). The internal representation can be cached (as for
> struct module) that for a repeated formatting will reduce the cost of
> parsing to zero.
I implemented something like that, and it was not efficient and very complex.
See for example the (incomplete) patch for str%args attached to the
issue #14687:
http://bugs.python.org/file25413/pyunicode_format-2.patch
IMO this approach is less efficient than the "Unicode writer" approach because:
- you have to create many substrings or temporary strings in the
first step, or (worse) compute each argument twice: the writer
approach is more efficient here because it avoids computing substrings
and temporary strings
- you have to parse the format string twice, or you have to write two
versions of the code: first create a list of items, then concatenate
items. The PyAccu method concatenates substrings at the end, it is
less efficient than the writer method (e.g. it has to create a string
of N fill characters to pad to WIDTH characters).
- the code is more complex than the writer method (which is very
similar to what is used in Python 2.7 and 3.2)
I wrote a much more complex patch for str%args to remember variables
of the first step to avoid most of the parsing work in the second
step. The patch was very complex and hard to maintain. I chose to not
publish it and try another approach (the Unicode writer).
Note: I'm talking about str%args and str.format(args), the Unicode
writer is not the most efficient method for *any* function creating
strings!
Victor
More information about the Python-Dev
mailing list