[Python-ideas] string codes & substring equality

Terry Reedy tjreedy at udel.edu
Fri Nov 29 00:36:36 CET 2013


On 11/28/2013 6:43 AM, spir wrote:
> All right, thank you all for the exchange, the issue of substring
> comparison for equality is solved, with either .startswith(substr, i) or
> .find(substr, i,j). But there remain the problem of getting codes
> (unicodes code point) at arbitrary indexes in a string?

Do you mean ord(code[i])? We already have that.

> Is it weird to consider a .code(i) string method?

No and yes. There are hundreds, thousands of simple compositions that 
different people might like baked into the language to speed a 
particular application. Some numerical users might like Python to have 
the C equivalent of

def muladd(a,b,c): return a * b + c  # or maybe
def muladd(a,b,d): return a + b * c

 > What would be its implementation cost?

What would be the implementation, maintenance, learning, and usability 
cost of adding thousands of such little methods?

 > I would really have good usage for it,

I believe use of ord is rather rare, as builtins go.
In 2.7, it works with both (byte) strings and unicode.
In 3.3, it does not work with bytes as indexing directly returns 
ordinals (b'abc'[1] == 98). So if the text you are parsing is limited to 
ascii or and small ascii superset, such as latin-1, you might do better 
using the bytes encoding.

If your text potentially includes and unicode char and if you have 
measurements that show the the extra cost of the intermediate single 
char is really a bottleneck, then add the composed function privately. 
Or perhaps you could use ctypes to access the innards of a string and 
see if that is faster.

 > certainly numerous other use cases exist.

More that a hand wave is needed to demonstrate that.

-- 
Terry Jan Reedy



More information about the Python-ideas mailing list