On Sat, Aug 23, 2014 at 01:19:14AM +1000, Steven D'Aprano <steve at pearwood.info> wrote:
> On Fri, Aug 22, 2014 at 04:42:29AM +0200, Oleg Broytman wrote:
> > On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal <chris.barker at noaa.gov> wrote:
> > > This brings up the other key problem. If file names are (almost)
> > > arbitrary bytes, how do you write one to/read one from a text file
> > > with a particular encoding? ( or for that matter display it on a
> > > terminal)
> > 
> >    There is no such thing as an encoding of text files.
> I don't understand this comment. It seems to me that *text* files have 
> to have an encoding, otherwise you can't interpret the contents as text. 

   What encoding does have a text file (an HTML, to be precise) with
text in utf-8, ads in cp1251 (ad blocks were included from different
files) and comments in koi8-r?
   Well, I must admit the HTML was rather an exception, but having a
text file with some strange characters (binary strings, or paragraphs
in different encodings) is not that exceptional.

> Files, of course, only contain bytes, but to be treated as bytes you 
> need some way of transforming byte N to char C (or multiple bytes to C), 
> which is an encoding.

   But you don't need to treat the entire file in one encoding. Strange
characters are clearly visible so you can interpret them differently. I
am very much trained to distinguish koi8, cp1251 and utf-8 texts; I
cannot translate them mentally but I can recognize them.

> Perhaps you just mean that encodings are not recorded in the text file 
> itself?

   Yes, that too.

> To answer Chris' question, you typically cannot include arbitrary 
> bytes in text files, and displaying them to the user is likewise 
> problematic

   As a person who view utf-8 files in koi8 fonts (and vice versa) every
day I'd argue. (-:

