Re: [Python-ideas] Extend unicodedata with a name search

4 Oct 2014

      On Sat, Oct 04, 2014 at 03:50:33PM +1000, Chris Angelico wrote:
...
On Sat, Oct 4, 2014 at 1:17 PM, Stephen J. Turnbull  wrote:
...
- startswith, endswith, contains: probably sufficient, but I suppose
    one would like at least conjunction and disjunction operations:
    unicodematch.contains('GREEK', 'SMALL', 'ALPHA', op='and')
    unicodematch.startswith('PIECE OF', 'PILE OF', op='or')
    (OK, that's pretty horrible, but it gives an idea.)
There's an easier way, though it would take a bit of setup work. Start
by building up an actual list in RAM of [unicodedata.name(chr(i)) for
i in range(sys.maxunicode+1)] and then do regular string operations.
I'm fairly sure most Python programmers can figure out how to search a
list of strings according to whatever rules they like - maybe using
contains/startswith/endswith, or maybe regexps, or whatever.
py> x = [unicodedata.name(chr(i)) for i in range(sys.maxunicode+1)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
ValueError: no such name

There are 1114112 such code points, and most of them are unused. 
Some of the used ones don't have names:

py> unicodedata.name('\0')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: no such name

But even once you deal with those complications, you'll end up 
duplicating information which (I presume) Python already has, and still 
end up needing to do a linear search in slow Python code looking for 
what you want. I think there are probably better solutions. Or at least, 
I hope there are better solutions :-)

-- 
Steven

Re: [Python-ideas] Extend unicodedata with a name search

Steven D'Aprano