Chris Angelico writes:
Start by building up an actual list in RAM of [unicodedata.name(chr(i)) for i in range(sys.maxunicode+1)] and then do regular string operations. I'm fairly sure most Python programmers can figure out how to search a list of strings according to whatever rules they like - maybe using contains/startswith/endswith, or maybe regexps, or whatever.
OK. Times are quite imprecise, but after importing re, sys, unicodedata
names = [unicodedata.name(chr(i)) for i in range(sys.maxunicode+1)] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> ValueError: no such name
oops, although you didn't actually claim that would work. :-) (BTW, chr(0) has no name. At least it was instantaneous. :-) Then
for i in range(sys.maxunicode+1): ... try: ... names.append(unicodedata.name(chr(i))) ... except ValueError: ... pass ...
takes between 1 and 2 seconds, while
names.index("PILE OF POO") 61721 "PILE OF POO" in names True
is instantaneous. Note: 61721 is *much* smaller than 0x1F4A9. And now
pops = [name for name in names if re.match("^P\\S* O.* P", name)] pops ['PILE OF POO']
takes just noticable time (250ms, maybe?) This on a 4-year-old 2.7GHz i7 MacBook Pro running "Mavericks". Plenty good for my use cases.