Python UTF-8 and codecs
Mike Currie
dev at null.com
Tue Jun 27 16:38:22 EDT 2006
I did make a mistake, it should have been 'wU'.
The starting data is ASCII.
What I'm doing is data processing on files with new line and tab characters
inside quoted fields. The idea is to convert all the new line and
characters to 0x85 and 0x88 respectivly, then process the files. Finally
right before importing them into a database convert them back to new line
and tab's thus preserving the field values.
Will python not handle the control characters correctly?
"Serge Orlov" <serge.orlov at gmail.com> wrote in message
news:mailman.7516.1151440194.27775.python-list at python.org...
> On 6/27/06, Mike Currie <dev at null.com> wrote:
>> I'm trying to write out files that have utf-8 characters 0x85 and 0x08 in
>> them. Every configuration I try I get a UnicodeError: ascii codec can't
>> decode byte 0x85 in position 255: oridinal not in range(128)
>>
>> I've tried using the codecs.open('foo.txt', 'rU', 'utf-8',
>> errors='strict')
>> and that doesn't work and I've also try wrapping the file in an
>> utf8_writer
>> using codecs.lookup('utf8')
>>
>> Any clues?
>
> Use unicode strings for non-ascii characters. The following program
> "works":
>
> import codecs
>
> c1 = unichr(0x85)
> f = codecs.open('foo.txt', 'wU', 'utf-8')
> f.write(c1)
> f.close()
>
> But unichr(0x85) is a control characters, are you sure you want it?
> What is the encoding of your data?
More information about the Python-list
mailing list