utf-8 read/write file

gigs gigs at hi.t-com.hr
Wed Oct 8 17:55:45 EDT 2008


Benjamin wrote:
> On Oct 8, 12:49 pm, Bruno <Br... at hi.t-com.hr> wrote:
>> Hi!
>>
>> I have big .txt file which i want to read, process and write to another .txt file.
>> I have done script for that, but im having problem with croatian characters
>> (Š,Đ,Ž,Č,Ć).
> 
> Can you show us what you have so far?
> 
>> How can I read/write from/to file in utf-8 encoding?
> 
> import codecs
> data = codecs.open("my-utf8-file.txt").read()
> 
>> I read file with fileinput.input.
>>
>> thanks
> 
I have tried with codecs, but when i use encoding="utf-8" i get this error on 
word : život

Traceback (most recent call last):
   File "C:\Users\Administrator\Desktop\getcontent.py", line 43, in <module>
     encoding="utf-8").readlines()
   File "C:\Python25\Lib\codecs.py", line 626, in readlines
     return self.reader.readlines(sizehint)
   File "C:\Python25\Lib\codecs.py", line 535, in readlines
     data = self.read()
   File "C:\Python25\Lib\codecs.py", line 424, in read
     newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9e in position 0: 
unexpected code byte


i just need to read from file1.txt, process (its simple text processing) some 
words and write them to file2.txt without loss of croatian characters. (šđžčć)



More information about the Python-list mailing list