[Tutor] character format

Thu May 12 17:06:22 CEST 2005

On May 12, 2005, at 03:00, jfouhy at paradise.net.nz wrote:

> As was pointed out, I'm not American.  I guess the problem stems  
> from an
> American cultural assumption, though, in that Americans (I think)  
> developed the
> ASCII character set without any thought for other languages.

     At that time, it was a reasonable decision. Written French can  
still be understood without accented characters. It's just a bit  
harder, since the only convention we have for this is to replace  
accented characters with their non-accented versions (e.g. é, è and ë  
become e), but rarely if ever causes any trouble. The Germans are  
better in that regard (ä -> ae, ß -> ss...).

     The true problem comes from the lateness in standardizing  
extended ASCII (characters 128 to 255) -- which, I guess, does in  
some way stem from the ACA (as in "we already have what we need to  
write English, so we'll worry about that later").
     Opening a text file that contains extended chars in an editor is  
usually followed by up to 5 minutes of "guess the encoding", as the  
very nature of text files makes it virtually impossible for an editor  
to do it automatically and reliably.
     Now, I only write Unicode text files (most real text editors  
support this), but notepad.exe, as far as I know, only writes Windows- 
encoded files, which themselves are different from DOS-encoded files  
(I still have some of those lying around on some of my hard drives,  
written with edit.exe or e.com)... It gets very messy, very quickly.

> Will a standard xterm display chr(130) as é in linux for you, Max?   
> Or under Mac
> OS X?

     I just made a few tests. So far, it seems that it depends on the  
character encoding in use:
- If it's Western Latin 1, yes, but cat'ing Unicode text files  
doesn't work properly (which is to be expected).
- If it's Unicode, it then depends on the application in use  
(although most of them just fail). bash 2.05 and zsh 4.2.3 get in big  
trouble when I type accented characters. ksh seems a bit more  
tolerant, but backspace behavior becomes erratic. vim sprouts random  
garbage, and emacs beeps at me angrily. Unicode text files cat  
nicely, though.

> Anyway, in Python3000 all strings will be unicode, so it won't  
> matter then :-)

     As should be evident from the above post, I'm *really* looking  
forward to that. ;)

-- Max
maxnoel_fr at yahoo dot fr -- ICQ #85274019
"Look at you hacker... A pathetic creature of meat and bone, panting  
and sweating as you run through my corridors... How can you challenge  
a perfect, immortal machine?"