Encoding for Devanagari Script.

Atul. atulskulkarni at gmail.com
Mon Jul 28 11:48:38 EDT 2008


Hi Fredrik and Terry,

Well I got this on IDLE I think I have done something wrong.

>>> import codecs
>>> f = open("C:\Documents and Settings\admin\My Documents\corpus\dainaikAikya collected by sushant.txt","r", "utf_8")

Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    f = open("C:\Documents and Settings\admin\My Documents\corpus
\dainaikAikya collected by sushant.txt","r", "utf_8")
TypeError: an integer is required

after that I tried the read binary mode and tried reading the firt 32
bytes and this is what I got.

>>> f = open("C:\Documents and Settings\\admin\\My Documents\\corpus\\dainaikAikya collected by sushant.txt","rb")
>>> f.read(32)
'\xef\xbb\xbf\xe0\xa4\xa8\xe0\xa4\xb5\xe0\xa5\x80
\xe0\xa4\xa6\xe0\xa4\xbf\xe0\xa4\xb2\xe0\xa5\x8d
\xe0\xa4\xb2\xe0\xa5\x80,'

Now based on my knowledge of Unicode I think this is a utf-8 file (the
first 3 bytes \xef\xbb\xbf), please correct me if I am wrong. How do I
read this?

Atul.

PS: the above code I wrote using the information from the Library
Reference pdf section 4.8 "Codecs". Something wrong I am doing? Please
do let me know.



On Jul 25, 6:21 am, Terry Reedy <tjre... at udel.edu> wrote:
> Atul. wrote:
> > Hello All,
>
> > I wanted to know what encoding should I use to open the files with
> >Devanagaricharacters. I was thinking of UTF-8 but was not sure, any
> > leads on this? Anyone used it earlier?
>
> You cannot hurt your machine by giving that a try.
>
> This is a general comment for all beginners.  Before posting, open the
> interactive interpreter (or IDLE) and try something(s).  If the result
> puzzles you, copy and paste into a post.  Or if more appropriate, open
> the Python manuals and search a bit, or try a search engine.




More information about the Python-list mailing list