<div dir="ltr"><div><div><div><div>you’re right, all of this works.<br><br></div>iterating over all of unicode simply looked to big a task for me, so i didn’t consider it, but apparently it works well enough.<br><br></div>yet one puzzle piece is missing: blocks.<br><br></div>python has no built-in information about unicode blocks (which are basically range()s with associated names).<br><br></div>an API involving blocks would need a way to enumerate them, to get the range for a name, and the name for a char/codepoint.<br></div><div class="gmail_extra"><br><div class="gmail_quote">2014-10-04 9:13 GMT+02:00 Chris Angelico <span dir="ltr"><<a href="mailto:rosuav@gmail.com" target="_blank">rosuav@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Sat, Oct 4, 2014 at 4:47 PM, Stephen J. Turnbull <<a href="mailto:stephen@xemacs.org">stephen@xemacs.org</a>> wrote:<br>

>>>> names = [<a href="http://unicodedata.name" target="_blank">unicodedata.name</a>(chr(i)) for i in range(sys.maxunicode+1)]<br>

> Traceback (most recent call last):<br>

>   File "<stdin>", line 1, in <module><br>

>   File "<stdin>", line 1, in <listcomp><br>

> ValueError: no such name<br>

><br>

> oops, although you didn't actually claim that would work. :-)  (BTW,<br>

> chr(0) has no name.  At least it was instantaneous. :-)<br>

<br>

</span>Oops, forgot about that. Yet another case where the absence of PEP 463<br>

forces the function to have an additional argument:<br>

<br>

names = [<a href="http://unicodedata.name" target="_blank">unicodedata.name</a>(chr(i), '') for i in range(sys.maxunicode+1)]<br>

<br>

Now it works. Sorry for the omission, this is what happens when code<br>

is typed straight into the email without testing :)<br>

<span class=""><br>

> Then<br>

><br>

>>>> for i in range(sys.maxunicode+1):<br>

> ...  try:<br>

> ...   names.append(<a href="http://unicodedata.name" target="_blank">unicodedata.name</a>(chr(i)))<br>

> ...  except ValueError:<br>

> ...   pass<br>

> ...<br>

<br>

</span>I would recommend appending a shim in the ValueError branch, to allow<br>

the indexing to be correct. Which would look something like this:<br>

<br>

names = [<a href="http://unicodedata.name" target="_blank">unicodedata.name</a>(chr(i)) except ValueError: '' for i in<br>

range(sys.maxunicode+1)]<br>

<br>

Or, since name() does indeed have a 'default' parameter, the code from above. :)<br>

<span class=""><br>

> takes between 1 and 2 seconds, while<br>

><br>

>>>> names.index("PILE OF POO")<br>

> 61721<br>

>>>> "PILE OF POO" in names<br>

> True<br>

><br>

> is instantaneous.  Note: 61721 is *much* smaller than 0x1F4A9.<br>

<br>

>>> names.index("PILE OF POO")<br>

</span>128169<br>

>>> hex(_).upper()<br>

'0X1F4A9'<br>

<br>

And still instantaneous. Of course, a prefix search is a bit slower:<br>

<br>

>>> [i for i,s in enumerate(names) if s.startswith("PILE")]<br>

[128169]<br>

<br>

Takes about 1s on my aging Windows laptop, where the building of the<br>

list takes about 4s, so it should be quicker on your system.<br>

<br>

The big downside, I guess, is the RAM usage.<br>

<br>

>>> sys.getsizeof(names)<br>

4892352<br>

>>> sum(sys.getsizeof(n) for n in names)<br>

30698194<br>

<br>

That's ~32MB of stuff stored, just to allow these lookups.<br>

<br>

ChrisA<br>

<div class="HOEnZb"><div class="h5">_______________________________________________<br>

Python-ideas mailing list<br>

<a href="mailto:Python-ideas@python.org">Python-ideas@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/python-ideas" target="_blank">https://mail.python.org/mailman/listinfo/python-ideas</a><br>

Code of Conduct: <a href="http://python.org/psf/codeofconduct/" target="_blank">http://python.org/psf/codeofconduct/</a><br>

</div></div></blockquote></div><br></div>