codecs latin1 unicode standard output file

Serge Orlov sombDELETE at pobox.ru
Mon Dec 15 22:13:30 CET 2003


"Marko Faldix" <marko.faldix.tudisweck at mplusr.de> wrote in message news:brkddv$4evj2$1 at ID-108329.news.uni-berlin.de...
> > > In my point of view python shouldn't act in different ways whether
> result is
> > > piped to file or not.
> >
> > when you print to a console with a known encoding, Python 2.3 auto-
> > magically converts Unicode strings to 8-bit strings using the console
> > encoding.
> >
> > files don't have an encoding, which is why the second case fails.
> >
> > also note that in 2.2 and earlier, you example always failed.
> >
> > </F>
>
> So I just have to use only this:
>
> print "My umlauts are ä, ö, ü"
>
> without any encoding-assignment to use for standard output on console AND
> redirecting to file. In latter case, it looks nice with e.g. notepad, just
> strange on console, so settings for console are to adjust and not python
> code. Right?

No, the right code is
=============================
# -*- coding: iso-8859-1 -*-
import locale, codecs, sys

if not sys.stdout.isatty():
    sys.stdout = codecs.lookup(locale.getpreferedencoding())[3](sys.stdout)

print u"My umlauts are ä, ö, ü"
=============================
The difference between console and file output is that while
there's only one way to output ä on cp850 console, there
are many ways to output the same character to file (latin-1,
utf-8, utf-7, utf-16le, utf-16be, cp850 and maybe more).
So python refuses to guess.
Another rule to follow is to store non-ascii character in
unicode strings. Otherwise either you will have to track
the encodings yourself or assume that all 8-bits strings
in your program have the same encoding. That's not
a good idea. I'm not sure if you will have proper .upper()
and .lower() methods on 8-bit strings. (don't have python
here to check)

-- Serge.






More information about the Python-list mailing list