urllib slow on FreeBSD 4.7?
Bengt Richter
bokr at oz.net
Thu Nov 21 16:46:02 EST 2002
On Thu, 21 Nov 2002 20:43:17 +1000 (est), Andrew MacIntyre <andymac at bullseye.apana.org.au> wrote:
>On Wed, 20 Nov 2002, dsavitsk wrote:
>
>> as I say, the site i am testing is on a computer not 8 inches from the
>> FreeBSD one (attached to the same kvm no less). The files must traverse
>> nearly 6 feet of cable, 3 other computers were able to download much more
>> quickly, and the freebsd one has failed all day (even switching network
>> cable and spots in the switch).
>>
>> further, for a 2 meg file on
>>
>> Python 2.2.1 (#1, Oct 5 2002, 11:19:44)
>> [GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
>> Type "help", "copyright", "credits" or "license" for more information.
>> >>> import os
>> >>> os.system('/usr/local/bin/wget
>> "http://192.168.0.4/index.asp?file=name" -O file.ext')
>>
>> takes less than 2 seconds
>>
>> while
>> >>> f = open('file2.ext', 'w')
>> >>> u = urllib.urlopen("http://192.168.0.4/index.asp?file=name")
>> >>> f.write(u.read())
>> >>> f.close()
>>
Taking Andrew's clue (below) re memory allocation,
how about (untested)
f = open('file2.ext', 'w')
u = urllib.urlopen("http://192.168.0.4/index.asp?file=name")
f.writelines(u.xreadlines())
f.close()
which should do some readahead chunking but should loop over line reads and writes
without building a monster string in memory. (or are writelines/xreadlines available
in 2.2.1? If not, you could do the corresponding manually). Or you could try to
second guess OS buffering by moving data in binary 512*n-sized chunks etc. BTW, I wonder
how a memory mapped output file would work -- whether the OS is smart enough to
get some of the writing work done speculatively as you go, or just marks for later
write when closed.
>> takes about 2 minutes
>>
>> so, it seems that using wget is the proper way to preceed, but i would
>> rather a python solution.
>
>I can't test this at the moment, but you should be aware that your python
>approach is doing something quite different from wget in processing this
>download - it is reading _the_whole_file_ into memory, and then writing it
>out in one fell swoop, rather than reading & writing a block at a time
>(which is how wget & fetch would be doing this). The writing part will be
>fast, but the building of the in-memory image of the file _may_ be
>happening in such a way that the memory image is constantly being
>increased in size to cope with incoming data.
>
>Various platform realloc() library routines have radically different
>performance behaviours in the face of such memory allocation strategies,
>usually unfavourable. While the FreeBSD memory allocation routines are
>decent, Python has exposed unfavourable performance behavour on FreeBSD in
>other situations. FreeBSD is not alone in this - Python has managed to
>provoke memory allocation issues on most platforms, which have
>progressively been worked around as they've been identified.
>
>Some memory allocation changes were made for 2.2.2 which did improve
>performance on FreeBSD in certain scenarios, but I doubt your usage would
>be affected. Still, I would like to know if 2.2.2 helps if you are able
>to test it (and I would recommend the upgrade from 2.2.1 in any case).
>
>--
>Andrew I MacIntyre "These thoughts are mine alone..."
>E-mail: andymac at bullseye.apana.org.au | Snail: PO Box 370
> andymac at pcug.org.au | Belconnen ACT 2616
>Web: http://www.andymac.org/ | Australia
>
>
Regards,
Bengt Richter
More information about the Python-list
mailing list