[Python-Dev] Do I misunderstand how codecs.EncodedFile is supposed to work?

07 Aug 2002 08:46:59 +0200

Skip Montanaro <skip@pobox.com> writes:

> I thought the whole purpose of the EncodedFile class was to provide
> transparent encoding.  

    """ Return a wrapped version of file which provides transparent
        encoding translation.

        Strings written to the wrapped file are interpreted according
        to the given data_encoding and then written to the original
        file as string using file_encoding. The intermediate encoding
        will usually be Unicode but depends on the specified codecs.

        Strings are read from the file using file_encoding and then
        passed back to the caller as string using data_encoding.

        If file_encoding is not given, it defaults to data_encoding.
    """

So, no. It provides transparent recoding: with a file encoding, and a
data encoding.

I never found this class useful.

What you want is a StreamWriter:

f = codecs.get_writer('utf-8')(open('unicode-test', 'w'))

Of course, *this* specific case can be written much easier as

f = codecs.open('unicode-test', 'w', encoding = 'utf-8')

The get_writer case is useful if you already got a file-like object
from somewhere.

> Shouldn't it support transparent encoding of Unicode
> objects?  That is, I told the system I want writes to be in utf-8 when I
> instantiated the class.  

You told it also that input data are in utf-8, as you have omitted the
data_encoding.

> I don't think I should have to call .encode() directly.  I realize I
> can wrap the function in a class that adds the transparency I
> desire, but it seems the whole point should be to make it easy to
> write Unicode objects to files.

Not this class, no. 

Now, you may ask what else is the purpose of this class. I really
don't know - it is against everything I'm advocating, as it assumes
that you have byte strings in a certain encoding in your memory that
you want to save in a different encoding. That should never happen -
all your text data should be Unicode strings.

Regards,
Martin