How do I display unicode value stored in a string variable using ord()
Neil Hodgson
nhodgson at iinet.net.au
Tue Aug 21 03:03:33 EDT 2012
Steven D'Aprano:
> Using variable-sized strings like UTF-8 and UTF-16 for in-memory
> representations is a terrible idea because you can't assume that people
> will only every want to index the first or last character. On average,
> you need to scan half the string, one character at a time. In Big-Oh, we
> can ignore the factor of 1/2 and just say we scan the string, O(N).
In the majority of cases you can remove excessive scanning by
caching the most recent index->offset result. If the next index request
is nearer the cached index than to the beginning then iterate from that
offset. This converts many operations from quadratic to linear. Locality
of reference is common and can often be reasonably exploited.
However, exposing the variable length nature of UTF-8 allows the
application to choose efficient techniques for more cases.
Neil
More information about the Python-list
mailing list