unicode by default
harrismh777
harrismh777 at charter.net
Wed May 11 21:22:50 EDT 2011
John Machin wrote:
> (1) You cannot work without using bytes sequences. Files are byte
> sequences. Web communication is in bytes. You need to (know / assume / be
> able to extract / guess) the input encoding. You need to encode your
> output using an encoding that is expected by the consumer (or use an
> output method that will do it for you).
>
> (2) You don't need to use bytes to specify a Unicode code point. Just use
> an escape sequence e.g. "\u0404" is a Cyrillic character.
>
Thanks John. In reverse order, I understand point (2). I'm less clear
on point (1).
If I generate a string of characters that I presume to be ascii/utf-8
(no \u0404 type characters) and write them to a file (stdout) how does
default encoding affect that file.by default..? I'm not seeing that
there is anything unusual going on... If I open the file with vi? If
I open the file with gedit? emacs?
....
Another question... in mail I'm receiving many small blocks that look
like sprites with four small hex codes, scattered about the mail...
mostly punctuation, maybe? ... guessing, are these unicode code
points, and if so what is the best way to 'guess' the encoding? ... is
it coded in the stream somewhere...protocol?
thanks
More information about the Python-list
mailing list