[Python-3000] Making more effective use of slice objects in Py3k
Fredrik Lundh
fredrik at pythonware.com
Fri Sep 1 08:46:23 CEST 2006
Guido van Rossum wrote:
> A way to handle UTF-8 strings and other variable-length encodings
> would be to maintain a small cache of index positions with the string
> object.
I think just delaying decoding would take us most of the way. the big
advantage of storage polymorphism is that you can avoid decoding and
encoding (and having to pay for the cycles and bytes needed for that) if
you don't do have to. the XML case you mentioned is a typical example;
just compare the behaviour of a library that does some extra work to
keep things small under the hood with more straightforward implementations:
http://effbot.org/zone/celementtree.htm#benchmarks
(cElementTree uses the "8-bit ascii mixes well with unicode" approach)
there are plenty of optimizations you can do when accessing the
beginning and end of a string (startswith, endswith, comparisions,
slicing, etc), but I think we can deal with that when we get there.
I think the NFS sprint showed that you get better results by working
with real use cases, rather than spending that theorizing. it also
showed that the bottlenecks aren't always where you think they are.
</F>
More information about the Python-3000
mailing list