[Tutor] more encoding strangeness

Tue Dec 23 12:40:45 CET 2008

On Tue, Dec 23, 2008 at 2:10 AM, Eric Abrahamsen
<eric at ericabrahamsen.net> wrote:
> Hi there,
>
> I'm configuring a python command to be used by emacs to filter a buffer
> through python markdown, and noticed something strange. If I run this
> command in the terminal:
>
> python -c "import sys,markdown; print
> markdown.markdown(sys.stdin.read().decode('utf-8'))" < markdown_source.md
>
> The file (which is encoded as utf-8 and contains Chinese characters) is
> converted and output correctly to the terminal. But if I do this to write
> the output to a file:
>
> python -c "import sys,markdown; print
> markdown.markdown(sys.stdin.read().decode('utf-8'))" < markdown_source.md >
> output.hml
>
> I get a UnicodeEncodeError, 'ascii' codec can't encode character u'\u2014'.
> I'm not sure where exactly this is going wrong, as print and
> sys.stdout.write() and whatnot don't provide encoding parameters. What's the
> difference between this command writing to the terminal, and writing to the
> file?

sys.stdout does have an encoding:
In [1]: import sys

In [2]: sys.stdout.encoding
Out[2]: 'UTF-8'

I think print converts to the encoding of stdout, e.g.
In [3]: print u'\u2014'
—

Probably when you pipe the output, sys.stdout.encoding is ascii so the
conversion fails.

The simple solution is to convert explicitly:
print markdown.markdown(sys.stdin.read().decode('utf-8')).encode('utf-8')

Kent