[Python-Dev] Bytes path support
phd at phdru.name
Fri Aug 22 17:51:04 CEST 2014
On Sat, Aug 23, 2014 at 01:19:14AM +1000, Steven D'Aprano <steve at pearwood.info> wrote:
> On Fri, Aug 22, 2014 at 04:42:29AM +0200, Oleg Broytman wrote:
> > On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal <chris.barker at noaa.gov> wrote:
> > > This brings up the other key problem. If file names are (almost)
> > > arbitrary bytes, how do you write one to/read one from a text file
> > > with a particular encoding? ( or for that matter display it on a
> > > terminal)
> > There is no such thing as an encoding of text files.
> I don't understand this comment. It seems to me that *text* files have
> to have an encoding, otherwise you can't interpret the contents as text.
What encoding does have a text file (an HTML, to be precise) with
text in utf-8, ads in cp1251 (ad blocks were included from different
files) and comments in koi8-r?
Well, I must admit the HTML was rather an exception, but having a
text file with some strange characters (binary strings, or paragraphs
in different encodings) is not that exceptional.
> Files, of course, only contain bytes, but to be treated as bytes you
> need some way of transforming byte N to char C (or multiple bytes to C),
> which is an encoding.
But you don't need to treat the entire file in one encoding. Strange
characters are clearly visible so you can interpret them differently. I
am very much trained to distinguish koi8, cp1251 and utf-8 texts; I
cannot translate them mentally but I can recognize them.
> Perhaps you just mean that encodings are not recorded in the text file
Yes, that too.
> To answer Chris' question, you typically cannot include arbitrary
> bytes in text files, and displaying them to the user is likewise
As a person who view utf-8 files in koi8 fonts (and vice versa) every
day I'd argue. (-:
Oleg Broytman http://phdru.name/ phd at phdru.name
Programmers don't die, they just GOSUB without RETURN.
More information about the Python-Dev