On Sat, Oct 04, 2014 at 03:50:33PM +1000, Chris Angelico wrote:
On Sat, Oct 4, 2014 at 1:17 PM, Stephen J. Turnbull email@example.com wrote:
- startswith, endswith, contains: probably sufficient, but I suppose one would like at least conjunction and disjunction operations: unicodematch.contains('GREEK', 'SMALL', 'ALPHA', op='and') unicodematch.startswith('PIECE OF', 'PILE OF', op='or') (OK, that's pretty horrible, but it gives an idea.)
There's an easier way, though it would take a bit of setup work. Start by building up an actual list in RAM of [unicodedata.name(chr(i)) for i in range(sys.maxunicode+1)] and then do regular string operations. I'm fairly sure most Python programmers can figure out how to search a list of strings according to whatever rules they like - maybe using contains/startswith/endswith, or maybe regexps, or whatever.
py> x = [unicodedata.name(chr(i)) for i in range(sys.maxunicode+1)] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> ValueError: no such name
There are 1114112 such code points, and most of them are unused. Some of the used ones don't have names:
py> unicodedata.name('\0') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: no such name
But even once you deal with those complications, you'll end up duplicating information which (I presume) Python already has, and still end up needing to do a linear search in slow Python code looking for what you want. I think there are probably better solutions. Or at least, I hope there are better solutions :-)