[Python-Dev] Optimize Unicode strings in Python 3.3
Serhiy Storchaka
storchaka at gmail.com
Wed May 30 14:08:44 CEST 2012
On 30.05.12 14:26, Victor Stinner wrote:
> I implemented something like that, and it was not efficient and very complex.
>
> See for example the (incomplete) patch for str%args attached to the
> issue #14687:
> http://bugs.python.org/file25413/pyunicode_format-2.patch
I have seen and commented on this patch. That's not what I'm talking about.
> IMO this approach is less efficient than the "Unicode writer" approach because:
I brought this approach is not for the opposition of the "Unicode
writer", and for comparison with a straight "two steps" method. Of
course, this can be combined with the "Unicode writer" to get the
benefits of both methods. For example, you can advance to widen the
output buffer to a width of format string, or disable overallocation
when formating the last argument with non-empty suffix.
> - you have to create many substrings or temporary strings in the
> first step, or (worse) compute each argument twice: the writer
> approach is more efficient here because it avoids computing substrings
> and temporary strings
Not on the first step but on the second step (and this is the only
single step if you use caching), if you use the "Unicode writer".
> - you have to parse the format string twice, or you have to write two
> versions of the code: first create a list of items, then concatenate
> items. The PyAccu method concatenates substrings at the end, it is
> less efficient than the writer method (e.g. it has to create a string
> of N fill characters to pad to WIDTH characters).
The code is divided into the compiler and the interpreter. Only the
first one parses the format string. See Modules/_struct.c.
> - the code is more complex than the writer method (which is very
> similar to what is used in Python 2.7 and 3.2)
The code that uses the writer method to be rather complicated, the
difference in the total complexity of these approaches has become
smaller. ;-)
But it is really not easy work, not assure success, so let waits for its
time.
More information about the Python-Dev
mailing list