grapheme cluster library
Rustom Mody
rustompmody at gmail.com
Sat Oct 21 13:18:21 EDT 2017
On Saturday, October 21, 2017 at 9:22:24 PM UTC+5:30, MRAB wrote:
> On 2017-10-21 05:11, Rustom Mody wrote:
> > Is there a recommended library for manipulating grapheme clusters?
> >
> > In particular, in devanagari
> > क् + ि = कि
> > in (pseudo)unicode names
> > KA-letter + I-sign = KI-composite-letter
> >
> > I would like to be able to handle KI as a letter rather than two code-points.
> > Can of course write an automaton to group but guessing that its already
> > available some place…
> >
> You can use the regex module to split a string into graphemes:
>
> regex.findall(r'\X', string)
Thanks MRAB
Yes as I said I discovered r'\X'
Ultimately my code was (effectively) one line!
print("".join(map[x] for x in findall(r'\X', l)))
with map being a few 100 elements of a dictionary such as
map = {
...
'ॐ': "OM",
...
}
$ cat purnam-deva
ॐ पूर्णमदः पूर्णमिदं पूर्णात्पुर्णमुदच्यते
पूर्णस्य पूर्णमादाय पूर्णमेवावशिष्यते ॥
$ ./devanagari2roman.py purnam-deva
OM pUraNamadaH pUraNamidaM pUraNAtpuraNamudachyate
pUraNasya pUraNamAdAya pUraNamavAvashiShyate ..
OM shAntiH shAntiH shAntiH ..
Basically, an inversion of the itrans input method
https://en.wikipedia.org/wiki/ITRANS
More information about the Python-list
mailing list