unicode by default
harrismh777
harrismh777 at charter.net
Wed May 11 23:44:23 EDT 2011
Steven D'Aprano wrote:
>> You need to understand the difference between characters and bytes.
>
> http://www.joelonsoftware.com/articles/Unicode.html
>
> is also a good resource.
Thanks for being patient guys, here's what I've done:
>>>> astr="pound sign"
>>>> asym=" \u00A3"
>>>> afile=open("myfile", mode='w')
>>>> afile.write(astr + asym)
> 12
>>>> afile.close()
When I edit "myfile" with vi I see the 'characters' :
pound sign £
... same with emacs, same with gedit ...
When I hexdump myfile I see this:
0000000 6f70 6375 2064 6973 6e67 c220 00a3
This is *not* what I expected... well it is (little-endian) right up to
the 'c2' and that is what is confusing me....
I did not open the file with an encoding of UTF-8... so I'm assuming
UTF-16 by default (python3) so I was expecting a '00A3' little-endian as
'A300' but what I got instead was UTF-8 little-endian 'c2a3' ....
See my problem?... when I open the file with emacs I see the character
pound sign... same with gedit... they're all using UTF-8 by default. By
default it looks like Python3 is writing output with UTF-8 as default...
and I thought that by default Python3 was using either UTF-16 or UTF-32.
So, I'm confused here... also, I used the character sequence \u00A3
which I thought was UTF-16... but Python3 changed my intent to 'c2a3'
which is the normal UTF-8...
Thanks again for your patience... I really do hate to be dense about
this... but this is another area where I'm just beginning to dabble and
I'd like to know soon what I'm doing...
Thanks for the link Steve... I'm headed there now...
kind regards,
m harris
More information about the Python-list
mailing list