Re: [Python-ideas] Extend unicodedata with a name search

4 Oct 2014

      M.-A. Lemburg writes:
...
On 03.10.2014 23:10, Philipp A. wrote:
...
...
Unfortunately, unicodedata is very limited.
Phillip, do you really mean *very* limited?  If so, I wonder what else
you think is missing besides "fuzzy" name lookup.  The UCD is defined
by the standard, and AFAICS access to all properties is provided.
...
...
But the name database is only queryable using full names! I want
to do unicodedata.search('clock') and get a list of dozens of glyphs
...
You should be able to code this as a PyPI package. I don't think
it's a use case that warrants making the unicodedata module more
complex.
I think it's unfortunate that unicodedata is limited in this
particular way, since the database is in C, and as you point out
hardly extensible.  For example, as a native English speaker who
enjoys wordplay I was able to guess which euphemism is the source of
the name of U+1F4A9 without looking it up, but I doubt a non-native
would be able to.  A builtin ability to do fuzzy searches
("unicodenames.startswith('PILE OF')") would be useful.

OTOH, a little thought convinced me that I don't know the TOOWTDI for
fuzzy search here:

  - regexp: database will be a huge string or similar

  - startswith, endswith, contains: probably sufficient, but I suppose
    one would like at least conjunction and disjunction operations:
    unicodematch.contains('GREEK', 'SMALL', 'ALPHA', op='and')
    unicodematch.startswith('PIECE OF', 'PILE OF', op='or')
    (OK, that's pretty horrible, but it gives an idea.)

  - something else?

Re: [Python-ideas] Extend unicodedata with a name search

Stephen J. Turnbull