you’re right, all of this works.
iterating over all of unicode simply looked to big a task for me, so i
didn’t consider it, but apparently it works well enough.
yet one puzzle piece is missing: blocks.
python has no built-in information about unicode blocks (which are
basically range()s with associated names).
an API involving blocks would need a way to enumerate them, to get the
range for a name, and the name for a char/codepoint.
2014-10-04 9:13 GMT+02:00 Chris Angelico
On Sat, Oct 4, 2014 at 4:47 PM, Stephen J. Turnbull
wrote: names = [unicodedata.name(chr(i)) for i in range(sys.maxunicode+1)] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> ValueError: no such name
oops, although you didn't actually claim that would work. :-) (BTW, chr(0) has no name. At least it was instantaneous. :-)
Oops, forgot about that. Yet another case where the absence of PEP 463 forces the function to have an additional argument:
names = [unicodedata.name(chr(i), '') for i in range(sys.maxunicode+1)]
Now it works. Sorry for the omission, this is what happens when code is typed straight into the email without testing :)
Then
for i in range(sys.maxunicode+1): ... try: ... names.append(unicodedata.name(chr(i))) ... except ValueError: ... pass ...
I would recommend appending a shim in the ValueError branch, to allow the indexing to be correct. Which would look something like this:
names = [unicodedata.name(chr(i)) except ValueError: '' for i in range(sys.maxunicode+1)]
Or, since name() does indeed have a 'default' parameter, the code from above. :)
takes between 1 and 2 seconds, while
names.index("PILE OF POO") 61721 "PILE OF POO" in names True
is instantaneous. Note: 61721 is *much* smaller than 0x1F4A9.
names.index("PILE OF POO") 128169 hex(_).upper() '0X1F4A9'
And still instantaneous. Of course, a prefix search is a bit slower:
[i for i,s in enumerate(names) if s.startswith("PILE")] [128169]
Takes about 1s on my aging Windows laptop, where the building of the list takes about 4s, so it should be quicker on your system.
The big downside, I guess, is the RAM usage.
sys.getsizeof(names) 4892352 sum(sys.getsizeof(n) for n in names) 30698194
That's ~32MB of stuff stored, just to allow these lookups.
ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/