unicode by default

Wed May 11 21:22:50 EDT 2011

John Machin wrote:
> (1) You cannot work without using bytes sequences. Files are byte
> sequences. Web communication is in bytes. You need to (know / assume / be
> able to extract / guess) the input encoding. You need to encode your
> output using an encoding that is expected by the consumer (or use an
> output method that will do it for you).
>
> (2) You don't need to use bytes to specify a Unicode code point. Just use
> an escape sequence e.g. "\u0404" is a Cyrillic character.
>

Thanks John.  In reverse order, I understand point (2). I'm less clear 
on point (1).

If I generate a string of characters that I presume to be ascii/utf-8 
(no \u0404 type characters) and write them to a file (stdout) how does 
default encoding affect that file.by default..?   I'm not seeing that 
there is anything unusual going on...   If I open the file with vi?  If 
I open the file with gedit?  emacs?

....

Another question... in mail I'm receiving many small blocks that look 
like sprites with four small hex codes, scattered about the mail... 
mostly punctuation, maybe?   ... guessing, are these unicode code 
points, and if so what is the best way to 'guess' the encoding? ... is 
it coded in the stream somewhere...protocol?

thanks