UTF-8 question from Dive into Python 3

carlo sysengp2p at gmail.com
Mon Jan 17 17:19:13 EST 2011


Hi,
recently I had to study *seriously* Unicode and encodings for one
project in Python but I left with a couple of doubts arised after
reading the unicode chapter of Dive into Python 3 book by Mark
Pilgrim.

1- Mark says:
"Also (and you’ll have to trust me on this, because I’m not going to
show you the math), due to the exact nature of the bit twiddling,
there are no byte-ordering issues. A document encoded in UTF-8 uses
the exact same stream of bytes on any computer."
Is it true UTF-8 does not have any "big-endian/little-endian" issue
because of its encoding method? And if it is true, why Mark (and
everyone does) writes about UTF-8 with and without BOM some chapters
later? What would be the BOM purpose then?

2- If that were true, can you point me to some documentation about the
math that, as Mark says, demonstrates this?

thank you
Carlo



More information about the Python-list mailing list