M.-A. Lemburg writes:
On 03.10.2014 23:10, Philipp A. wrote:
Unfortunately, unicodedata is very limited.
Phillip, do you really mean *very* limited? If so, I wonder what else you think is missing besides "fuzzy" name lookup. The UCD is defined by the standard, and AFAICS access to all properties is provided.
But the name database is only queryable using full names! I want to do unicodedata.search('clock') and get a list of dozens of glyphs
You should be able to code this as a PyPI package. I don't think it's a use case that warrants making the unicodedata module more complex.
I think it's unfortunate that unicodedata is limited in this particular way, since the database is in C, and as you point out hardly extensible. For example, as a native English speaker who enjoys wordplay I was able to guess which euphemism is the source of the name of U+1F4A9 without looking it up, but I doubt a non-native would be able to. A builtin ability to do fuzzy searches ("unicodenames.startswith('PILE OF')") would be useful. OTOH, a little thought convinced me that I don't know the TOOWTDI for fuzzy search here: - regexp: database will be a huge string or similar - startswith, endswith, contains: probably sufficient, but I suppose one would like at least conjunction and disjunction operations: unicodematch.contains('GREEK', 'SMALL', 'ALPHA', op='and') unicodematch.startswith('PIECE OF', 'PILE OF', op='or') (OK, that's pretty horrible, but it gives an idea.) - something else?