[OT] does the charset lie?
Skip Montanaro
skip at pobox.com
Sun May 2 13:25:50 EDT 2004
>> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
...
>> ’
>> is the charset correct or should it have been utf-8?
David> The charset is correct. "&" "#" "8" etc. are all in iso-8859-1.
I realized that about five minutes after posting. The Content-Type header
is just for the purposes of HTTP. OTOH, this means if I need the raw
content of the page (after expanding any entities), I need to so something
like (assuming the raw bytes are already in data):
data = unicode(data, "iso-8859-1").encode("utf-8")
data = map_entities_to_utf_8(data)
data = unicode(data, "utf-8")
Skip
More information about the Python-list
mailing list