Wrong default endianess in utf-16 and utf-32 !?
wxjmfauth at gmail.com
Tue Oct 12 15:28:23 CEST 2010
I hope my understanding is correct and I'm not dreaming.
When an endianess is not specified, (BE, LE, unmarked forms),
the Unicode Consortium specifies, the default byte serialization
should be big-endian.
Q: Which of the UTFs do I need to support?
Q: Why do some of the UTFs have a BE or LE in their label,
such as UTF-16LE?
(+ technical papers)
It appears Python is just working in the opposite way.
2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)]
>>> repr(u'abc'.encode('utf-16')[2:]) == repr(u'abc'.encode('utf-16-be'))
>>> repr(u'abc'.encode('utf-16')[2:]) == repr(u'abc'.encode('utf-16-le'))
Ditto with utf-32 and with utf-16/utf-32 in Python 3.1.2
I attempted to find some precise discussions on that subject
and I failed.
More information about the Python-list