urllib.request giving unexpected results
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Wed Nov 16 03:09:59 EST 2016
I'm trying to download a file using urllib.request and pipe it straight to an
external process. On Linux systems, the following is a test file that
demonstrates the problem:
--- cut ---
#!/usr/bin/python3.5
import urllib.request
import subprocess
TEST_URL = 'https://www.irs.gov/pub/irs-prior/f1040--1864.pdf'
with urllib.request.urlopen(TEST_URL) as f:
data = subprocess.check_output(['file', '-'], stdin=f)
print(data)
with urllib.request.urlopen(TEST_URL) as f:
with open('/tmp/x.pdf', 'wb') as g:
n = g.write(f.read())
with open('/tmp/x.pdf') as g:
data = subprocess.check_output(['file', '-'], stdin=g)
print(data)
--- cut ---
Output is:
b'/dev/stdin: data\n'
b'/dev/stdin: PDF document, version 1.6\n'
Expected output is:
b'/dev/stdin: PDF document, version 1.6\n'
b'/dev/stdin: PDF document, version 1.6\n'
If I just read from urllib.request, I get what appears to the naked eye to be
the expected data:
py> with urllib.request.urlopen(TEST_URL) as f:
... file = f.read()
...
py> print(file[:100])
b'%PDF-1.6\r%\xe2\xe3\xcf\xd3\r\n55 0 obj\r<</Linearized 1/L 66721/O 57/E
28286/N 4/T 65574/H [ 856 317]>>\rendobj\r '
Certainly looks like a PDF file. So what's going on?
--
Steven
299792.458 km/s — not just a good idea, it’s the law!
More information about the Python-list
mailing list