[Tutor] subprocess.getstatusoutput : UnicodeDecodeError
Steven D'Aprano
steve at pearwood.info
Thu Sep 21 20:39:57 EDT 2017
On Thu, Sep 21, 2017 at 03:46:29PM -0700, Evuraan wrote:
> How can I work around this issue where subprocess.getstatusoutput gives
> up, on Python 3.5.2:
getstatusoutput is a "legacy" function. It still exists for code that
has already been using it, but it is not recommended for new code.
https://docs.python.org/3.5/library/subprocess.html#using-the-subprocess-module
Since you're using Python 3.5, let's try using the brand new `run`
function and see if it does better:
import subprocess
result = subprocess.run(["tail", "-3", "/tmp/pmaster.db"],
stdout=subprocess.PIPE)
print("return code is", result.returncode)
print("output is", result.stdout)
It should do better than getstatusoutput, since it returns plain bytes
without assuming they are ASCII. You can then decode them yourself:
# try this and see if it is sensible
print("output is", result.stdout.decode('latin1'))
# otherwise this
print("output is", result.stdout.decode('utf-8', errors='replace'))
> >>> subprocess.getstatusoutput("tail -3 /tmp/pmaster.db",)
> Traceback (most recent call last):
[...]
> File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
> return codecs.ascii_decode(input, self.errors)[0]
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 189:
> ordinal not in range(128)
Let's look at the error message. getstatusoutput apparently expects only
pure ASCII output, because it is choking on a non-ASCII byte, namely
0xe0. Obviously 0xe0 (or in decimal, 224) is not an ASCII value, since
ASCII goes from 0 to 127 only.
If there's one non-ASCII byte in the file, there are probably more.
So what is that mystery 0xe0 byte? It is hard to be sure, because it
depends on the source. If pmaster.db is a binary file, it could mean
anything or nothing. If it is a text file, it depends on the encoding
that the file uses. If it comes from a Mac, it might be:
py> b'\xe0'.decode('macroman')
'‡'
If it comes from Windows in Western Europe, it might be:
py> b'\xe0'.decode('latin1')
'à'
If it comes from Windows in Greece, it might be:
py> b'\xe0'.decode('iso 8859-7')
'ΰ'
and so forth. There's no absolutely reliable way to tell. This is the
sort of nightmare that Unicode was invented to fix, but unfortunately
there still exist millions of files, data formats and applications which
insist on using rubbish "extended ASCII" encodings instead.
> That file's content is kryptonite for python apparently. Other shell
> operations work.
>
> >>> subprocess.getstatusoutput("file /tmp/pmaster.db",)
> (0, '/tmp/pmaster.db: Non-ISO extended-ASCII text, with very long lines,
> with LF, NEL line terminators')
The `file` command agrees with me: it is not ASCII.
--
Steve
More information about the Tutor
mailing list