Trying to set a cookie within a python script

Benjamin Kaplan benjamin.kaplan at case.edu
Tue Aug 3 22:36:20 EDT 2010


2010/8/3 Νίκος <nikos.the.gr33k at gmail.com>:
>>On 3 Αύγ, 21:00, Dave Angel <da... at ieee.org> wrote:
>
>> A string is an object containing characters. A string literal is one of
>> the ways you create such an object. When you create it that way, you
>> need to make sure the compiler knows the correct encoding, by using the
>> encoding: line at beginning of file.
>
>
> mymessage = "καλημέρα"   <==== string
> mymessage = u"καλημέρα"  <==== string literal?

Not quite. A literal is the actual string in the file, those letters
between the quotes:
"καλημέρα" <=== String literal (a literal value of the string/str type)
u"καλημέρα"  <=== Unicode literal (a literal value of the Unicode
type. The bytes on the page will be converted to unicode using the
file's encoding)
mymessage <==== String (not literal, because it's a value)
>
> So, a string literal is one of the encodings i use to create a string
> object?
>
> Can the encodign of a python script file be in iso-8859-7 which means
> the file contents is saved to the hdd as greek-iso but the part of
> this variabel value mymessage = u"καλημέρα" is saved as utf-8 ot the
> opposite?
>

The compiler does not see u"καλημέρα" on the page. All it sees is the
bytes ['0x75', '0x22', '0xea', '0xe1', '0xeb', '0xe7', '0xec', '0xdd',
'0xf1', '0xe1', '0x22']

Now the compiler knows that the sequence 0x75 0x22 (Stuff) 0x22 means
to create a Unicode literal. So it takes those bytes ('0xea', '0xe1',
'0xeb', '0xe7', '0xec', '0xdd', '0xf1', '0xe1') and decodes them using
the pages encoding, in your case ISO-8859-7. At this point, they don't
have an encoding. They aren't bytes as far as you are concerned, they
are code points. Internally, they're stored as either UTF-16 or UTF-32
depending on how Python was compiled, but that doesn't matter. You can
treat them as if they are characters.

> have the file saved as utf-8 but one variuable value as greek
> encoding?
>

Sure you can. A unicode literal will always have the encoding of the
file. But a string is just a sequence of bytes (forget about the
characters that show up on the page for now). If you do
"\xce\xba\xce\xb1\xce\xbb\xce\xb7\xce\xbc\xce\xad\xcf\x81\xce\xb1".encode('UTF-8')
Then Python will take that sequence of bytes and interpret them as
UTF-8. That will give you the same Unicode string you started out
with: u"καλημέρα"

> Encodings still give me headaches. I try to understand them as
> different ways to store data in a media.
>
> Tell me something. What encoding should i pick for my scripts knowing
> that only contain english + greek chars??
> iso-8859-7 or utf-8 and why?
>
> Can i save the sting lets say "Νίκος" in different encodings and still
> print out correctly in browser?
>
> ascii = the standard english character set only, right?
>

Yes.

>> The web server wraps a few characters before and after your html stream,
>> but it shouldn't touch the stream itself.
>
> So the pythoon compiler using the cgi module is the one that is
> producing the html output that immediately after send to the web
> server, right?
>
>
>> > For example if i say mymessage = "καλημέρα" and the i say mymessage = u"καλημέρα" then the 1st one is a greek encoding variable while the
>> > 2nd its a utf-8 one?

No. They both are in whatever encoding your file is using. But the
first one will be interpreted as a sequence of bytes. the second one
will be interpreted as a sequence of characters. For a single-byte
encoding like ISO-8859-7, it doesn't make a difference. But if you
were to encode it in UTF-8, the first one would have a length of 16
(because the Greek characters are all 2 bytes) and the 2nd one would
have a length of 8.

>>
>> No, the first is an 8 bit copy of whatever bytes your editor happened to
>> save.
>
> But since mymessage = "καλημέρα" is a string containing greek
> characaters why the editor doesn't save it as such?
>

Because you don't save characters, you save bytes.

\xce\xba\xce\xb1\xce\xbb\xce\xb7\xce\xbc\xce\xad\xcf\x81\xce\xb1 is
your String in UTF-8
\xea\xe1\xeb\xe7\xec\xdd\xf1\xe1 is that exact same string in ISO-8859-7

They are two different ways of representing the same characters


> It reminds me of varibles an valeus where if you say
>
> a = 5 , a var becomes instantly an integer variable
> while
> a = 'hello' , become instantly a string variable
>
>
>> mymessage = u"καλημέρα"
>>
>> creates an object that is *not* encoded.
>
> Because it isn't saved by the editor yet? In what satet is this object
> in before it gets encoded?
> And it egts encoded the minute i tell the editor to save the file?
>
>> Encoding is taking the unicode
>> stream and representing it as a stream of bytes, which may or may have
>> more bytes than the original has characters.
>
>
> So this line mymessage = u"καλημέρα" what it does is tell the browser
> thats when its time to save the whole file to save this string as
> utf-8?
>
> If yes, then if were to save the above string as greek encoding how
> was i suppose to right it?
>
> Also if u ise the 'coding line' in the beggining of the file is there
> a need for using the u literal?
>
>> I personally haven't done any cookie code. If I were debugging this, I'd
>> factor out the multiple parts of that if statement, and find out which
>> one isn't true. From here I can't guess.
>
> I did what you say and foudn out that both of the if condition parts
> were always false thast why the if code blck never got executed.
>
> And it is alwsy wrong because the cookie never gets set.
>
> So can you please tell me why this line
>
> cookie['visitor'] = ( 'nikos', time() + 60*60*24*365 )          #this cookie
> will expire in an year
>
> never created a cookie?
> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list