[Tutor] character format
Max Noel
maxnoel_fr at yahoo.fr
Thu May 12 17:06:22 CEST 2005
On May 12, 2005, at 03:00, jfouhy at paradise.net.nz wrote:
> As was pointed out, I'm not American. I guess the problem stems
> from an
> American cultural assumption, though, in that Americans (I think)
> developed the
> ASCII character set without any thought for other languages.
At that time, it was a reasonable decision. Written French can
still be understood without accented characters. It's just a bit
harder, since the only convention we have for this is to replace
accented characters with their non-accented versions (e.g. é, è and ë
become e), but rarely if ever causes any trouble. The Germans are
better in that regard (ä -> ae, ß -> ss...).
The true problem comes from the lateness in standardizing
extended ASCII (characters 128 to 255) -- which, I guess, does in
some way stem from the ACA (as in "we already have what we need to
write English, so we'll worry about that later").
Opening a text file that contains extended chars in an editor is
usually followed by up to 5 minutes of "guess the encoding", as the
very nature of text files makes it virtually impossible for an editor
to do it automatically and reliably.
Now, I only write Unicode text files (most real text editors
support this), but notepad.exe, as far as I know, only writes Windows-
encoded files, which themselves are different from DOS-encoded files
(I still have some of those lying around on some of my hard drives,
written with edit.exe or e.com)... It gets very messy, very quickly.
> Will a standard xterm display chr(130) as é in linux for you, Max?
> Or under Mac
> OS X?
I just made a few tests. So far, it seems that it depends on the
character encoding in use:
- If it's Western Latin 1, yes, but cat'ing Unicode text files
doesn't work properly (which is to be expected).
- If it's Unicode, it then depends on the application in use
(although most of them just fail). bash 2.05 and zsh 4.2.3 get in big
trouble when I type accented characters. ksh seems a bit more
tolerant, but backspace behavior becomes erratic. vim sprouts random
garbage, and emacs beeps at me angrily. Unicode text files cat
nicely, though.
> Anyway, in Python3000 all strings will be unicode, so it won't
> matter then :-)
As should be evident from the above post, I'm *really* looking
forward to that. ;)
-- Max
maxnoel_fr at yahoo dot fr -- ICQ #85274019
"Look at you hacker... A pathetic creature of meat and bone, panting
and sweating as you run through my corridors... How can you challenge
a perfect, immortal machine?"
More information about the Tutor
mailing list