Trying to set a cookie within a python script
python at mrabarnett.plus.com
Tue Aug 3 21:04:07 CEST 2010
Dave Angel wrote:
> ¯º¿Â wrote:
>>> On 3 Αύγ, 18:41, Dave Angel <da... at ieee.org> wrote:
>>>> Different encodings equal different ways of storing the data to the
>>>> media, correct?
>>> Exactly. The file is a stream of bytes, and Unicode has more than 256
>>> possible characters. Further, even the subset of characters that *do*
>>> take one byte are different for different encodings. So you need to tell
>>> the editor what encoding you want to use.
>> For example an 'a' char in iso-8859-1 is stored different than an 'a'
>> char in iso-8859-7 and an 'a' char of utf-8 ?
> Nope, the ASCII subset is identical. It's the ones between 80 and ff
> that differ, and of course not all of those. Further, some of the codes
> that are one byte in 8859 are two bytes in utf-8.
> You *could* just decide that you're going to hardwire the assumption
> that you'll be dealing with a single character set that does fit in 8
> bits, and most of this complexity goes away. But if you do that, do
> *NOT* use utf-8.
> But if you do want to be able to handle more than 256 characters, or
> more than one encoding, read on.
> Many people confuse encoding and decoding. A unicode character is an
> abstraction which represents a raw character. For convenience, the first
> 128 code points map directly onto the 7 bit encoding called ASCII. But
> before Unicode there were several other extensions to 256, which were
> incompatible with each other. For example, a byte which might be a
> European character in one such encoding might be a kata-kana character
> in another one. Each encoding was 8 bits, but it was difficult for a
> single program to handle more than one such encoding.
One encoding might be ASCII + accented Latin, another ASCII + Greek,
another ASCII + Cyrillic, etc. If you wanted ASCII + accented Latin +
Greek then you'd need more than 1 byte per character.
If you're working with multiple alphabets it gets very messy, which is
where Unicode comes in. It contains all those characters, and UTF-8 can
encode all of them in a straightforward manner.
> So along comes unicode, which is typically implemented in 16 or 32 bit
> cells. And it has an 8 bit encoding called utf-8 which uses one byte for
> the first 192 characters (I think), and two bytes for some more, and
> three bytes beyond that.
In UTF-8 the first 128 codepoints are encoded to 1 byte.
More information about the Python-list