Unicode utf-8 doesn't do back-and-forth?

John Machin sjmachin at lexicon.net
Wed Jul 10 00:24:35 EDT 2002


martin at v.loewis.de (Martin v. Loewis) wrote in message news:<m3hej9js39.fsf at mira.informatik.hu-berlin.de>...
> sjmachin at lexicon.net (John Machin) writes:
> 
> > 4 more bits? It needs 21 bits to encode the 2**20 possible
> > surrogate-described characters plus the basic 64K characters.
> > assert 21 - 16 == 5
> 
> Not really. This makes a total of 2**20+2**16 = 1114112
> characters. Now, math.log(1114112)/math.log(2) is 20.087462841250343,
> so it is rather 4.09 additional bits.
> 

Martin,]

(1) Shouldn't you deduct the 2048 surrogates from the count?
(2) Why did you round up to two decimal places and not zero decimal
places? Can you buy 4.09 cans of beer?

Cheers,
John



More information about the Python-list mailing list