Extend unicodedata with a name/pattern/regex search for character entity references?
Veek. M
vek.m1234 at gmail.com
Sun Sep 4 00:02:12 EDT 2016
Thomas 'PointedEars' Lahn wrote:
> Veek. M wrote:
>
>> https://mail.python.org/pipermail//python-ideas/2014-October/029630.htm
>>
>> Wanted to know if the above link idea,
>
> … which is 404-compliant; the Internet Archive does not have it either
> …
>
>> had been implemented
>
> Probably not.
>
>> and if there's a module that accepts a pattern like 'cap' and give
>> you all the instances of unicode 'CAP' characters.
>
> I do not know any.
>
>> ⋂ \bigcap
>> ⊓ \sqcap
>> ∩ \cap
>> ♑ \capricornus
>> ⪸ \succapprox
>> ⪷ \precapprox
>>
>> (above's from tex)
>>
>> I found two useful modules in this regard: unicode_tex, unicodedata
>> but unicodedata is a builtin which does not do globs, regexs - so
>> it's kind of limiting in nature.
>
> Quick hack:
>
> #--------------------------------------------------------------------
> from unicode_tex import unicode_to_tex_map
>
> for key, value \
> in filter(lambda item: "cap" in item[1], unicode_to_tex_map.items()):
> print(key, value)
> #--------------------------------------------------------------------
>
> (Optimizations are welcome.)
>
> It is easy to come up with methods that take a globbing or a regular
> expression (globbing expressions can be turned into regular
> expressions easily) and returns, perhaps as a dictionary or list of
> tuples, only the matching entries.
>
> Other than that I think you will have to turn the Unicode Character
> Database (which is available via HTTP as one huge text file; see the
> Python Tutorial on “Internet Access” for how to get it dynamically)
> into whatever form suits you for querying it.
>
>> Would be nice if you could search html/xml character entity
>> references as well.
>
> For what purpose?
>
> Your posting is lacking a real name in the “From” header field.
>
Ouch! Sorry for the bad link Thomas. The link is titled '[Python-ideas]
Extend unicodedata with a name search' and I suspect this updated link
data (http://code.activestate.com/lists/python-ideas/29504/) may work -
if it doesn't you could google the title.
I don't want to dump/replicate the existing Unicode data in module
'unicodedata'.
Regarding purpose, well I need this for hexchat. I IRC a lot and often,
I want to ask a question involving math symbols. I've written some
python (included at the bottom) that translates:
\help filter_word #into a list of symbols and names
(serves as a memory jog). It also translates stuff like:
A \cap B \epsilon C to A ∩ B ε C.
but all this works with a subset of tex - it can't do complicated
formula. I wanted to extend it further.. I don't think I shall be able
to subscript integrals easily but I could make better use of the
available unicode, which means making it more accessible (hence the
pattern matching feature) - html/xml entities provide a new way of
remembering stuff.
---------------------------
Regarding the name (From field), my name *is* Veek.M though I tend to
shorten it to Vek.M on Google (i think Veek was taken or some such
thing). Just to be clear, my parents call me something closely related
to Veek that is NOT Beek or Peek or Squeak or Sneak and my official name
is something really weird. Identity theft being what it is, I probably
am lying anyhow about all this, but it sounds funny so :p
import hexchat
import re, unicode_tex, unicodedata
__module_name__ = 'Unicode'
__module_version__ = '0.1'
__module_description__ = 'Substitute \whatever with Unicode char in
cmdline input'
#re_repl = unicodedata.lookup('N-ARY UNION')
def debug(*args):
hexchat.prnt('#####{}#####'.format(*args))
def print_help(*args):
hexchat.prnt('{}'.format(*args))
def send_message(word, word_eol, userdata):
if not(word[0] == "65293"):
return
msg = hexchat.get_info('inputbox')
if msg is None:
return
x = re.match(r'(^\\help)\s+(\w+)', msg)
if x:
filter = x.groups()[1]
for key, value in unicode_tex.tex_to_unicode_map.items():
if filter in key:
print_help(value + ' ' + key)
hexchat.command("settext %s" % '')
return
tex_matches = re.findall(r'(\\\w+)', msg)
for tex_word in tex_matches:
repl = unicode_tex.tex_to_unicode_map.get(tex_word)
if repl is None:
repl = 'err'
msg = re.sub(re.escape(tex_word), repl, msg)
hexchat.command("settext %s" % msg)
hexchat.hook_print('Key Press', send_message)
More information about the Python-list
mailing list