urllib.request giving unexpected results
Chris Angelico
rosuav at gmail.com
Wed Nov 16 03:24:21 EST 2016
On Wed, Nov 16, 2016 at 7:09 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> I'm trying to download a file using urllib.request and pipe it straight to an
> external process. On Linux systems, the following is a test file that
> demonstrates the problem:
>
>
> --- cut ---
>
> #!/usr/bin/python3.5
>
> import urllib.request
> import subprocess
>
> TEST_URL = 'https://www.irs.gov/pub/irs-prior/f1040--1864.pdf'
>
> with urllib.request.urlopen(TEST_URL) as f:
> data = subprocess.check_output(['file', '-'], stdin=f)
> print(data)
Interesting.
rosuav at sikorsky:~$ python3
Python 3.7.0a0 (default:72e64fc8746b+, Oct 28 2016, 12:35:28)
[GCC 6.2.0 20161010] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.request
>>> import subprocess
>>> TEST_URL = 'https://www.irs.gov/pub/irs-prior/f1040--1864.pdf'
>>> with urllib.request.urlopen(TEST_URL) as f:
... data = subprocess.check_output(['tee', 'tmp/asdfasdf'], stdin=f)
...
rosuav at sikorsky:~/tmp$ hd asdfasdf |head
00000000 17 03 03 40 18 e9 b0 79 7c 03 c8 5d 21 40 2f 11 |... at ...y|..]!@/.|
00000010 4a a3 f1 4d e0 19 04 fc 42 84 d9 cf 59 0b f8 56 |J..M....B...Y..V|
00000020 7d 35 08 88 17 50 24 8c 26 fe d8 13 2b fd 14 55 |}5...P$.&...+..U|
00000030 16 81 c3 1e 13 ae 00 1d d4 8e 9f 0f a4 19 bb 44 |...............D|
00000040 46 d5 bf 25 28 d0 b0 23 44 6f 1c ef 84 d9 82 9b |F..%(..#Do......|
00000050 17 15 3a 11 e1 ec de 59 65 d7 ea 41 dc 53 07 70 |..:....Ye..A.S.p|
00000060 99 d5 11 75 b7 90 7e cd 46 b5 67 ee 9a 62 18 63 |...u..~.F.g..b.c|
00000070 36 7f 7b df a1 fb 6d b8 66 8b 2f 82 e6 05 7e aa |6.{...m.f./...~.|
00000080 d7 9f 9e 05 cf 06 68 6b c8 4c df 5e 24 9d 92 f6 |......hk.L.^$...|
00000090 3d 53 76 11 c1 70 05 14 94 e5 5b ec b0 cf 64 70 |=Sv..p....[...dp|
So that's what file(1) is seeing. My guess is that a urlopen object
isn't "file-like" enough for subprocess. Maybe it's showing a more
"raw" version?
ChrisA
More information about the Python-list
mailing list