[Tutor] questions on encoding
Albert-Jan Roskam
fomcl at yahoo.com
Wed Jul 20 20:49:19 CEST 2011
Hi,
I am looking for test data with accented and multibyte characters. I have found a good resource that I could use to cobble something together (http://www.inter-locale.com/whitepaper/learn/learn-to-test.html) but I was hoping somebody knows some ready resource.
I also have some questions about encoding. In the code below, is there a difference between unicode() and .decode?
s = "§ÇǼÍÍ"
x = unicode(s, "utf-8")
y = s.decode("utf-8")
x == y # returns True
Also, is it, at least theoretically, possible to mix different encodings in byte strings? I'd say no, unless there are multiple BOMs or so. Not that I'd like to try this, but it'd improve my understanding of this sort of obscure topic.
Cheers!!
Albert-Jan
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110720/00d9b207/attachment-0001.html>
More information about the Tutor
mailing list