<div class="gmail_quote">2010/3/5 Dave Angel <span dir="ltr"><<a href="mailto:davea@ieee.org">davea@ieee.org</a>> </span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
In other words, you don't understand my paragraph above. </blockquote><div><br></div><div>Maybe. But please don't be angry. I'm here to learn, and as i've run into a very difficult concept I want to fully undestand it.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Once the string is stored in t as an 8 bit string, it's irrelevant what the source file encoding was.</blockquote>
<div><br></div><div>Ok, you've said this 2 times, but, please, can you tell me why? I think that's the key passage to understand how encoding of strings works. The source file encoding affects all file lines, also strings. If my encoding is UTF8 python will read the string "ciao è ciao" as 'ciao \xc3\xa8 ciao' but if it's latin1 it will read 'ciao \xe8 ciao'. So, how can it be irrelevant?</div>
<div><br></div><div>I think the problem is that i can't find any difference between 2 lines quoted above:</div><div><br></div><div>a = u"ciao è ciao"</div><div><br></div><div>and</div><div><br></div><div>a = "ciao è ciao"</div>
<div>a = unicode(a)</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">If you then (whether it's in the next line, or ten thousand calls later) try to convert to unicode without specifying a decoder, it uses the default encoder, which is a application wide thing, and not a source file thing. To see what it is on your system, use sys.getdefaultencoding().<br>
</blockquote><div><br></div><div>And this is ok. Spir said that it uses ASCII, you now say that it uses the default encoder. I think that ASCII on spir's system is the default encoder so.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
The point is that there isn't just one global value, and it's a good thing. You should figure everywhere characters come into your program (eg. source files, raw_input, file i/o...) and everywhere characters go out of your program, and deal with each of them individually.</blockquote>
<div><br></div><div>Ok. But it always happen this way. I hardly ever have to work with strings defined in the file.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Don't store anything internally as strings, and you won't create the ambiguity you have with your 't' variable above.<br>
<br>
DaveA<br></blockquote><div><br></div><div>Thankyou Dave</div><div><br></div><div>Giorgio </div></div><br><br clear="all"><br>-- <br>--<br>AnotherNetFellow<br>Email: <a href="mailto:anothernetfellow@gmail.com">anothernetfellow@gmail.com</a><br>