RE Module Performance
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Sat Jul 27 02:28:56 EDT 2013
On Fri, 26 Jul 2013 08:46:58 -0700, wxjmfauth wrote:
> BTW, I'm pleased to read "sequence of bits" and not bytes. Again, utf
> transformers are producing sequence of bits, call Unicode Transformation
> Units, with lengths of 8/16/32 *bits*, from there the names utf8/16/32.
> UCS transformers are (were) producing bytes, from there the names
> ucs-2/4.
Not only does your distinction between bits and bytes make no practical
difference on nearly all hardware in common use today[1], but the Unicode
Consortium disagrees with you, and defines UTC in terms of bytes:
"A Unicode transformation format (UTF) is an algorithmic mapping from
every Unicode code point (except surrogate code points) to a unique byte
sequence."
http://www.unicode.org/faq/utf_bom.html#gen2
[1] There may still be some old supercomputers where a byte is more than
8 bits in use, but they're unlikely to support Unicode.
--
Steven
More information about the Python-list
mailing list