Re: [Python-ideas] Extend unicodedata with a name search

4 Oct 2014

      On Sat, Oct 4, 2014 at 4:47 PM, Stephen J. Turnbull  wrote:
...
...
...
...
names = [unicodedata.name(chr(i)) for i in range(sys.maxunicode+1)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
ValueError: no such name
oops, although you didn't actually claim that would work. :-)  (BTW,
chr(0) has no name.  At least it was instantaneous. :-)
Oops, forgot about that. Yet another case where the absence of PEP 463
forces the function to have an additional argument:

names = [unicodedata.name(chr(i), '') for i in range(sys.maxunicode+1)]

Now it works. Sorry for the omission, this is what happens when code
is typed straight into the email without testing :)
...
Then
...
...
...
for i in range(sys.maxunicode+1):
...  try:
...   names.append(unicodedata.name(chr(i)))
...  except ValueError:
...   pass
...
I would recommend appending a shim in the ValueError branch, to allow
the indexing to be correct. Which would look something like this:

names = [unicodedata.name(chr(i)) except ValueError: '' for i in
range(sys.maxunicode+1)]

Or, since name() does indeed have a 'default' parameter, the code from above. :)
...
takes between 1 and 2 seconds, while
...
...
...
names.index("PILE OF POO")
61721
"PILE OF POO" in names
True
is instantaneous.  Note: 61721 is *much* smaller than 0x1F4A9.
...
...
...
names.index("PILE OF POO")
128169
hex(_).upper()
'0X1F4A9'
And still instantaneous. Of course, a prefix search is a bit slower:
...
...
...
[i for i,s in enumerate(names) if s.startswith("PILE")]
[128169]
Takes about 1s on my aging Windows laptop, where the building of the
list takes about 4s, so it should be quicker on your system.

The big downside, I guess, is the RAM usage.
...
...
...
sys.getsizeof(names)
4892352
sum(sys.getsizeof(n) for n in names)
30698194
That's ~32MB of stuff stored, just to allow these lookups.

ChrisA

Re: [Python-ideas] Extend unicodedata with a name search

Chris Angelico