UTF-8 question from Dive into Python 3

Tim Harig usernet at ilthio.net
Mon Jan 17 17:34:32 EST 2011


On 2011-01-17, carlo <sysengp2p at gmail.com> wrote:
> Is it true UTF-8 does not have any "big-endian/little-endian" issue
> because of its encoding method? And if it is true, why Mark (and
> everyone does) writes about UTF-8 with and without BOM some chapters
> later? What would be the BOM purpose then?

Yes, it is true.  The BOM simply identifies that the encoding as a UTF-8.:

	http://unicode.org/faq/utf_bom.html#bom5

> 2- If that were true, can you point me to some documentation about the
> math that, as Mark says, demonstrates this?

It is true because UTF-8 is essentially an 8 bit encoding that resorts
to the next bit once it exhausts the addressible space of the current
byte it moves to the next one.  Since the bytes are accessed and assessed
sequentially, they must be in big-endian order.



More information about the Python-list mailing list