[Tutor] Encoding

Wed Mar 3 20:44:51 CET 2010

Please let me post the third update O_o. You can forgot other 2, i'll put
them into this email.

---
>>> s = "ciao è ciao"
>>> print s
ciao è ciao
>>> s.encode('utf-8')

Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    s.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 5:
ordinal not in range(128)
---

I am getting more and more confused.

I was coding in PHP and was saving some strings in the DB. Was using
utf8_encode to encode them before sending to the utf8_unicode_ci table. Ok,
the result was that strings were "double encoded". To fix that I simply
removed the utf8_encode() function and put the "raw" data in the database
(that converts them in utf8). In other words, in PHP, I can encode a string
multiple times:

$c = "giorgio è giorgio";
$c = utf8_encode($c); // this will work in an utf8 html page
$d = utf8_encode($c); // this won't work, will print a strange letter
$d = utf8_decode($d); // this will work. will print an utf8 string

Ok, now, the point is: you (and the manual) said that this line:

s = u"giorgio è giorgio"

will convert the string as unicode. But also said that the part between ""
will be encoded with my editor BEFORE getting encoded in unicode by python.
So please pay attention to this example:

My editor is working in UTF8. I create this:

c = "giorgio è giorgio" // This will be an UTF8 string because of the file's
encoding
d = unicode(c) // This will be an unicode string
e = c.encode() // How will be encoded this string? If PY is working like PHP
this will be an utf8 string.

Can you help me?

Thankyou VERY much

Giorgio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100303/b478ae2f/attachment.html>