Python usage numbers
Terry Reedy
tjreedy at udel.edu
Sun Feb 12 22:09:50 EST 2012
On 2/12/2012 5:14 PM, Chris Angelico wrote:
> On Mon, Feb 13, 2012 at 9:07 AM, Terry Reedy<tjreedy at udel.edu> wrote:
>> The situation before ascii is like where we ended up *before* unicode.
>> Unicode aims to replace all those byte encoding and character sets with
>> *one* byte encoding for *one* character set, which will be a great
>> simplification. It is the idea of ascii applied on a global rather that
>> local basis.
>
> Unicode doesn't deal with byte encodings; UTF-8 is an encoding,
The Unicode Standard specifies 3 UTF storage formats* and 8 UTF
byte-oriented transmission formats. UTF-8 is the most common of all
encodings for web pages. (And ascii pages are utf-8 also.) It is the
only one of the 8 most of us need to much bother with. Look here for the
list
http://www.unicode.org/glossary/#U
and for details look in various places in
http://www.unicode.org/versions/Unicode6.1.0/ch03.pdf
> but so are UTF-16, UTF-32.
> and as many more as you could hope for.
All the non-UTF 'as many more as you could hope for' encodings are not
part of Unicode.
* The new internal unicode scheme for 3.3 is pretty much a mixture of
the 3 storage formats (I am of course, skipping some details) by using
the widest one needed for each string. The advantage is avoiding
problems with each of the three. The disadvantage is greater internal
complexity, but that should be hidden from users. They will not need to
care about the internals. They will be able to forget about 'narrow'
versus 'wide' builds and the possible requirement to code differently
for each. There will only be one scheme that works the same on all
platforms. Most apps should require less space and about the same time.
--
Terry Jan Reedy
More information about the Python-list
mailing list