[Tutor] subprocess.getstatusoutput : UnicodeDecodeError

Thu Sep 21 21:57:22 EDT 2017

>
> getstatusoutput is a "legacy" function. It still exists for code that
> has already been using it, but it is not recommended for new code.
>
> https://docs.python.org/3.5/library/subprocess.html#using-the-subprocess-module
>
> Since you're using Python 3.5, let's try using the brand new `run`
> function and see if it does better:
>
> import subprocess
> result = subprocess.run(["tail", "-3", "/tmp/pmaster.db"],
>                         stdout=subprocess.PIPE)
> print("return code is", result.returncode)
> print("output is", result.stdout)
>
>
> It should do better than getstatusoutput, since it returns plain bytes
> without assuming they are ASCII. You can then decode them yourself:
>
> # try this and see if it is sensible
> print("output is", result.stdout.decode('latin1'))
>
> # otherwise this
> print("output is", result.stdout.decode('utf-8', errors='replace'))
>
>
>
>> >>> subprocess.getstatusoutput("tail -3 /tmp/pmaster.db",)
>> Traceback (most recent call last):
> [...]
>>   File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
>>     return codecs.ascii_decode(input, self.errors)[0]
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 189:
>> ordinal not in range(128)
>
> Let's look at the error message. getstatusoutput apparently expects only
> pure ASCII output, because it is choking on a non-ASCII byte, namely
> 0xe0. Obviously 0xe0 (or in decimal, 224) is not an ASCII value, since
> ASCII goes from 0 to 127 only.
>
> If there's one non-ASCII byte in the file, there are probably more.
>
> So what is that mystery 0xe0 byte? It is hard to be sure, because it
> depends on the source. If pmaster.db is a binary file, it could mean
> anything or nothing. If it is a text file, it depends on the encoding
> that the file uses. If it comes from a Mac, it might be:
>
> py> b'\xe0'.decode('macroman')
> '‡'
>
> If it comes from Windows in Western Europe, it might be:
>
> py> b'\xe0'.decode('latin1')
> 'à'
>
> If it comes from Windows in Greece, it might be:
>
> py> b'\xe0'.decode('iso 8859-7')
> 'ΰ'
>
> and so forth. There's no absolutely reliable way to tell. This is the
> sort of nightmare that Unicode was invented to fix, but unfortunately
> there still exist millions of files, data formats and applications which
> insist on using rubbish "extended ASCII" encodings instead.
>
>
>
>> That file's content is kryptonite for python apparently. Other shell
>> operations work.
>>
>> >>> subprocess.getstatusoutput("file /tmp/pmaster.db",)
>> (0, '/tmp/pmaster.db: Non-ISO extended-ASCII text, with very long lines,
>> with LF, NEL line terminators')
>
> The `file` command agrees with me: it is not ASCII.

Thank you Steve! subprocess.run handles it better.

>>> subprocess.getstatusoutput("tail -400 /tmp/pmaster.txt",)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/subprocess.py", line 805, in getstatusoutput
    data = check_output(cmd, shell=True, universal_newlines=True, stderr=STDOUT)
  File "/usr/lib/python3.5/subprocess.py", line 626, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.5/subprocess.py", line 695, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/usr/lib/python3.5/subprocess.py", line 1059, in communicate
    stdout = self.stdout.read()
  File "/usr/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position
60942: invalid continuation byte

as opposed to:

>>> result = subprocess.run(["tail", "-400", "/tmp/pmaster.txt"], stdout=subprocess.PIPE)
>>> result.returncode
0
>>> subprocess.getstatusoutput("file  /tmp/pmaster.txt",)
(0, '/tmp/pmaster.txt: Non-ISO extended-ASCII text, with very long
lines, with LF, NEL line terminators')
>>>

That was awesome! :)