Question regarding handling of Unicode data in Devnagari
apt.shansen at gmail.com
Sat Sep 12 20:57:03 CEST 2009
> As per the standard posted by the UNICODE for the Devnagari script
> used for Hindi and some other languages of India, we have a standard
> set, like from the range of 0900-097F.
> Where, we have numbers for each character:
> like 0904 for Devnagari letter short a, etc.
> Now, if write a program,
> and I like to see the Devnagari letter short a as output then how
> should I proceed? Can codecs help me or should I use unicodedata?
If you're writing a program, you can include that character with u"\u0904";
the \u escape inside a unicode string is a how you write any arbitrary
unicode literal in python. In Python 2, u"string" is a unicode string, and
"string" is a regular byte string. In Python 3, you don't need that 'u' on
front because "string" is a unicode string. You didn't specify your version,
so whichever is appropriate.
So first, use a unicode string, and second directly write the actual
character with \u instead of just writing the number into a string. That'll
result in a string with a single real character in it, and if you are on a
terminal which is set up to display unicode (with a proper font and such),
you should be able to "print ch" and see the devnagari character that way.
should print your character. If you want to look up characters by name
instead of number, you can use unicodedata, and do:
print unicodedata.lookup("DEVANAGARI LETTER SHORT A")
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-list