[Python-Dev] File encodings

Gustavo Niemeyer niemeyer at conectiva.com
Tue Nov 30 13:29:08 CET 2004


> Gustavo Niemeyer wrote:
> >Given the fact that files have an 'encoding' parameter, and that
> >any unicode strings with characters not in the 0-127 range will
> >raise an exception if being written to files, isn't it reasonable
> >to respect the 'encoding' attribute whenever writing data to a
> >file?
> 
> In general, files don't have an encoding parameter - sys.stdout
> is an exception.

That's the only case I'd like to solve.

If there are platforms that don't know how to set it, we could make
the encoding attribute writable, and that would allow people to
easily set it to the encoding which is deemed correct in their
systems.

> The reason why this works for print and not for write is that
> I considered "print unicodeobject" important, and wanted to
> implement that. file.write is an entirely different code path,
> so it doesn't currently consider Unicode objects; instead, it
> only supports strings (or, more generally, buffers).

I understand your reasoning behind it, and would like to extend
your idea to the write function, allowing anyone to use the common
sys.stdout idiom to implement print-like functionality (like optparse
and many others). For normal files, the absence of the encoding
parameter would ensure the current behavior.

> > This difference may become a really annoying problem when trying to
> > internationalize programs, since it's usual to see third-party code
> > dealing with sys.stdout, instead of using 'print'.
> 
> Apparently, it isn't important enough that somebody had analysed this,
> and offered a patch. In any case, it would be quite unreliable to

That's what I'm doing here! :-)

> pass unicode strings to .write even *if* .write supported .encoding,
> since most files don't have .encoding. Even sys.stdout does not always
> have .encoding - only when it is a terminal, and only if we managed to
> find out what the encoding of the terminal is.

I think that's acceptable. The encoding parameter is meant for output
streams, and Python does its best to try to find a reasonable value
for showing output strings.

Thanks for your answer and clarifications,

-- 
Gustavo Niemeyer
http://niemeyer.net


More information about the Python-Dev mailing list