anothernetfellow at gmail.com
Fri Mar 5 18:15:06 CET 2010
2010/3/5 Dave Angel <davea at ieee.org>
> In other words, you don't understand my paragraph above.
Maybe. But please don't be angry. I'm here to learn, and as i've run into a
very difficult concept I want to fully undestand it.
> Once the string is stored in t as an 8 bit string, it's irrelevant what the
> source file encoding was.
Ok, you've said this 2 times, but, please, can you tell me why? I think
that's the key passage to understand how encoding of strings works. The
source file encoding affects all file lines, also strings. If my encoding is
UTF8 python will read the string "ciao è ciao" as 'ciao \xc3\xa8 ciao' but
if it's latin1 it will read 'ciao \xe8 ciao'. So, how can it be irrelevant?
I think the problem is that i can't find any difference between 2 lines
a = u"ciao è ciao"
a = "ciao è ciao"
a = unicode(a)
> If you then (whether it's in the next line, or ten thousand calls later)
> try to convert to unicode without specifying a decoder, it uses the default
> encoder, which is a application wide thing, and not a source file thing. To
> see what it is on your system, use sys.getdefaultencoding().
And this is ok. Spir said that it uses ASCII, you now say that it uses the
default encoder. I think that ASCII on spir's system is the default encoder
> The point is that there isn't just one global value, and it's a good thing.
> You should figure everywhere characters come into your program (eg. source
> files, raw_input, file i/o...) and everywhere characters go out of your
> program, and deal with each of them individually.
Ok. But it always happen this way. I hardly ever have to work with strings
defined in the file.
> Don't store anything internally as strings, and you won't create the
> ambiguity you have with your 't' variable above.
Email: anothernetfellow at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Tutor