Unicode utf-8 doesn't do back-and-forth?
John Machin
sjmachin at lexicon.net
Wed Jul 10 00:24:35 EDT 2002
martin at v.loewis.de (Martin v. Loewis) wrote in message news:<m3hej9js39.fsf at mira.informatik.hu-berlin.de>...
> sjmachin at lexicon.net (John Machin) writes:
>
> > 4 more bits? It needs 21 bits to encode the 2**20 possible
> > surrogate-described characters plus the basic 64K characters.
> > assert 21 - 16 == 5
>
> Not really. This makes a total of 2**20+2**16 = 1114112
> characters. Now, math.log(1114112)/math.log(2) is 20.087462841250343,
> so it is rather 4.09 additional bits.
>
Martin,]
(1) Shouldn't you deduct the 2048 surrogates from the count?
(2) Why did you round up to two decimal places and not zero decimal
places? Can you buy 4.09 cans of beer?
Cheers,
John
More information about the Python-list
mailing list