[Tutor] encodings

Denis Dzyubenko shad@mail.kubtelecom.ru
Mon Jun 16 08:19:02 2003


On Sat, 14 Jun 2003 23:30:20 +0200,
 Magnus Lyck(ML) wrote to me:

ML> Clearer now?

Yes, now it is clear.

ML> Use type() to check what type you have in each situation.

>> >>> s =3D u"abc=C1=C2=D7"
>> >>> s.encode('cp1251')
>>Traceback (most recent call last):
>>   File "<stdin>", line 1, in ?
>>   File "/usr/lib/python2.1/encodings/cp1251.py", line 18, in encode
>>     return codecs.charmap_encode(input,errors,encoding_map)
>>UnicodeError: charmap encoding error: character maps to <undefined>

ML> That means that your unicode string contains values that CP1251
ML> can't present. Does "print s" produce the output you would expect?

no, 'print s' prodices error:
'UnicodeError: ASCII encoding error: ordinal not in range(128)'

ML> How does it look if you do "print s"? Does it look like cyrillic?
ML> what about "print repr(s)". Are all values in the correct range?

now, values are not listed in the link you gave me
(http://www.oasis-open.org/docbook/xmlcharent/0.3/iso-cyr1.ent)

>>ML> txt.decode('koi8-r').encode('cp1251')
>>
>> >>> txt.decode('koi8-r')
>>Traceback (most recent call last):
>>   File "<stdin>", line 1, in ?
>>AttributeError: decode

ML> That means that txt is not an object of type string. If it's

>>> txt =3D "=C1=C2=D7"
>>> type(txt)
<type 'string'>
>>> txt.decode("koi8-r")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: decode

and dir(txt) doesn't contain attribute 'decode'

ML> Look here:

 >>>> u =3D u'\u042F\u042B\u042C'

ML> Now we have a unicode representaion with three
ML> cyrillic letters. You should be able to do
ML> "print u" and see something reasonable. I start

no, I can't see anything reasonable:

>>> u =3D u'\u042F\u042B\u042C'
>>> u
u'\u042f\u042b\u042c'
>>> print u

Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)

ML> If you still have problems, look at the error handling issues
ML> I wrote about.

sorry, I still can't understand source of my problems :(

--=20
Denis.