[Python-Dev] File encodings

Gustavo Niemeyer niemeyer at conectiva.com
Tue Nov 30 13:20:13 CET 2004


Hello Bob,

[...]
> >Given the fact that files have an 'encoding' parameter, and that
> >any unicode strings with characters not in the 0-127 range will
> >raise an exception if being written to files, isn't it reasonable
> >to respect the 'encoding' attribute whenever writing data to a
> >file?
> 
> No, because you don't know it's a file.  You're calling a function with 
> a unicode object.  The function doesn't know that the object was some 
> unicode object that came from a source file of some particular 
> encoding.

I don't understand what you're saying here. The file knows itself
is a file. The write function knows the parameter is unicode.

> >The workaround for that problem is to either use the evil-considered
> >sys.setdefaultencoding(), or to wrap sys.stdout. IMO, both options
> >seem unreasonable for such a common idiom.
> 
> There's no guaranteed correlation whatsoever between the claimed 
> encoding of your source document and the encoding of the user's 
> terminal, why do you want there to be?  What if you have some source 

I don't. I want the write() function of file objects to respect
the encoding attribute of these objects. This is already being
done when print is used. I'm proposing to extend that behavior to
the write function. That's all.

> files with 'foo' encoding and others with 'bar' encoding?  What about 
> ascii encoded source documents that use escape sequences to represent 
> non-ascii characters?  What you want doesn't make any sense so long as 
> python strings and file objects deal in bytes not characters :)

Please, take a long breath, and read my message again. :-)

> Wrapping sys.stdout is the ONLY reasonable solution.
[...]

No, it's not. But I'm glad to know other people is also doing
workarounds for that problem.

-- 
Gustavo Niemeyer
http://niemeyer.net


More information about the Python-Dev mailing list