How do I display unicode value stored in a string variable using ord()
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Sun Aug 19 06:51:26 EDT 2012
On Sun, 19 Aug 2012 01:11:56 -0700, Paul Rubin wrote:
> Steven D'Aprano <steve+comp.lang.python at pearwood.info> writes:
>> result = text[end:]
>
> if end not near the end of the original string, then this is O(N) even
> with fixed-width representation, because of the char copying.
Technically, yes. But it's a straight copy of a chunk of memory, which
means it's fast: your OS and hardware tries to make straight memory
copies as fast as possible. Big-Oh analysis frequently glosses over
implementation details like that.
Of course, that assumption gets shaky when you start talking about extra
large blocks, and it falls apart completely when your OS starts paging
memory to disk.
But if it helps to avoid irrelevant technical details, change it to
text[end:end+10] or something.
> if it is near the end, by knowing where the string data area ends, I
> think it should be possible to scan backwards from the end, recognizing
> what bytes can be the beginning of code points and counting off the
> appropriate number. This is O(1) if "near the end" means "within a
> constant".
You know, I think you are misusing Big-Oh analysis here. It really
wouldn't be helpful for me to say "Bubble Sort is O(1) if you only sort
lists with a single item". Well, yes, that is absolutely true, but that's
a special case that doesn't give you any insight into why using Bubble
Sort as your general purpose sort routine is a terrible idea.
Using variable-sized strings like UTF-8 and UTF-16 for in-memory
representations is a terrible idea because you can't assume that people
will only every want to index the first or last character. On average,
you need to scan half the string, one character at a time. In Big-Oh, we
can ignore the factor of 1/2 and just say we scan the string, O(N).
That's why languages tend to use fixed character arrays for strings.
Haskell is an exception, using linked lists which require traversing the
string to jump to an index. The manual even warns:
[quote]
If you think of a Text value as an array of Char values (which it is
not), you run the risk of writing inefficient code.
An idiom that is common in some languages is to find the numeric offset
of a character or substring, then use that number to split or trim the
searched string. With a Text value, this approach would require two O(n)
operations: one to perform the search, and one to operate from wherever
the search ended.
[end quote]
http://hackage.haskell.org/packages/archive/text/0.11.2.2/doc/html/Data-Text.html
--
Steven
More information about the Python-list
mailing list