A few questiosn about encoding
Νικόλαος Κούρας
support at superhost.gr
Wed Jun 12 05:09:05 EDT 2013
>> (*) infact UTF8 also indicates the end of each character
> Up to a point. The initial byte encodes the length and the top few
> bits, but the subsequent octets aren’t distinguishable as final in
> isolation. 0x80-0xBF can all be either medial or final.
So, the first high-bits are a directive that UTF-8 uses to know how many
bytes each character is being represented as.
0-127 codepoints(characters) use 1 bit to signify they need 1 bit for
storage and the rest 7 bits to actually store the character ?
while
128-256 codepoints(characters) use 2 bit to signify they need 2 bits for
storage and the rest 14 bits to actually store the character ?
Isn't 14 bits way to many to store a character ?
More information about the Python-list
mailing list