utf-8 read/write file

Aleksandar Radulovic alex at a13x.net
Wed Oct 8 18:59:46 EDT 2008


Hi,

What is the encoding of the file1 you're reading from? I just ran
tests on my machine (OS X)
with both python2.5 and 2.6 and was able to read from a file containing:
"život je lep"

The file is UTF-8 encoded.

>>> data = open("test.txt").read()
>>> data
'\xc5\xbeivot je lep.'
>>> f = open("test2.txt", "wb")
>>> f.write(data)
>>> f.close()

mac-alex:~ alex$ file test.txt
test.txt: UTF-8 Unicode text, with no line terminators
mac-alex:~ alex$ file test2.txt
test2.txt: UTF-8 Unicode text, with no line terminators
mac-alex:~ alex$

Files test.txt and test2.txt are identical.

Regards,
alex.


2008/10/8 gigs <gigs at hi.t-com.hr>:
> Benjamin wrote:
>>
>> On Oct 8, 12:49 pm, Bruno <Br... at hi.t-com.hr> wrote:
>>>
>>> Hi!
>>>
>>> I have big .txt file which i want to read, process and write to another
>>> .txt file.
>>> I have done script for that, but im having problem with croatian
>>> characters
>>> (Š,Đ,Ž,Č,Ć).
>>
>> Can you show us what you have so far?
>>
>>> How can I read/write from/to file in utf-8 encoding?
>>
>> import codecs
>> data = codecs.open("my-utf8-file.txt").read()
>>
>>> I read file with fileinput.input.
>>>
>>> thanks
>>
> I have tried with codecs, but when i use encoding="utf-8" i get this error
> on word : život
>
> Traceback (most recent call last):
>  File "C:\Users\Administrator\Desktop\getcontent.py", line 43, in <module>
>    encoding="utf-8").readlines()
>  File "C:\Python25\Lib\codecs.py", line 626, in readlines
>    return self.reader.readlines(sizehint)
>  File "C:\Python25\Lib\codecs.py", line 535, in readlines
>    data = self.read()
>  File "C:\Python25\Lib\codecs.py", line 424, in read
>    newchars, decodedbytes = self.decode(data, self.errors)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0x9e in position 0:
> unexpected code byte
>
>
> i just need to read from file1.txt, process (its simple text processing)
> some words and write them to file2.txt without loss of croatian characters.
> (šđžčć)
> --
> http://mail.python.org/mailman/listinfo/python-list
>



-- 
a lex 13 x


More information about the Python-list mailing list