[Python-3000] locale-aware strings ?

Oleg Broytmann phd at mail2.phd.pp.ru
Wed Sep 6 12:48:39 CEST 2006


On Wed, Sep 06, 2006 at 03:08:21AM -0700, Paul Prescod wrote:
> Windows users do not "tell each program separately about the
> encoding." The encoding varies by file type. It makes no more sense to
> have a global variable that says "all of my files are Shift-JIS" than
> it does to say "all of my files are PowerPoint files." Because someday
> somebody is going to email you a Big-5 file (or a zipfile) and that
> setting will be wrong. Once you know that a file is of type Zip then
> you know that the "encoding" is zipped binary. Once you know that it
> is an Office 2007 file, then you know that the encoding is Zipped XML
> and that the XML will have its own encoding declaration. Once you know
> that it is HTML, then you look for meta tags.
> 
> This is how real-world programs work. They shouldn't guess based on
> system global variables.

   Unfortunately, the real world is a bit worse than that. There are many
protocol and file formats that cary textual information and still don't
provide a hint on encoding.
   First, there are text files. Really, there are still text files. A user
can dump a README file unto his/her personal FTP server, and the file
ususally is in the local encoding.
   MP3 tags. Real nightmare. Nobody follows the standard - tag editors
write tags in the local encoding, and mp3 players interpret them in the
local encoding.
   FTP and other dumb protocols that transfer file names in the encoding
local to the server without announcing that encoding in the metadata.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.


More information about the Python-3000 mailing list