Question regarding handling of Unicode data in Devnagari

Stephen Hansen apt.shansen at gmail.com
Sat Sep 12 14:57:03 EDT 2009


>
> As per the standard posted by the UNICODE for the Devnagari script
> used for Hindi and some other languages of India, we have a standard
> set, like from the range of 0900-097F.
> Where, we have numbers for each character:
> like 0904 for Devnagari letter short a, etc.
> Now, if write a program,
>
> where
> ch="0904"
> and I like to see the Devnagari letter short a as output then how
> should I proceed? Can codecs help me or should I use unicodedata?


If you're writing a program, you can include that character with u"\u0904";
the \u escape inside a unicode string is a how you write any arbitrary
unicode literal in python. In Python 2, u"string" is a unicode string, and
"string" is a regular byte string. In Python 3, you don't need that 'u' on
front because "string" is a unicode string. You didn't specify your version,
so whichever is appropriate.

So first, use a unicode string, and second directly write the actual
character with \u instead of just writing the number into a string. That'll
result in a string with a single real character in it, and if you are on a
terminal which is set up to display unicode (with a proper font and such),
you should be able to "print ch" and see the devnagari character that way.

The statement:

    print u"\u0904"

should print your character. If you want to look up characters by name
instead of number, you can use unicodedata, and do:

    print unicodedata.lookup("DEVANAGARI LETTER SHORT A")

HTH,

--S
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090912/e6e4e74e/attachment.html>


More information about the Python-list mailing list