urllib slow on FreeBSD 4.7?

Thu Nov 21 16:46:02 EST 2002

On Thu, 21 Nov 2002 20:43:17 +1000 (est), Andrew MacIntyre <andymac at bullseye.apana.org.au> wrote:

>On Wed, 20 Nov 2002, dsavitsk wrote:
>
>> as I say, the site i am testing is on a computer not 8 inches from the
>> FreeBSD one (attached to the same kvm no less).  The files must traverse
>> nearly 6 feet of cable, 3 other computers were able to download much more
>> quickly, and the freebsd one has failed all day (even switching network
>> cable and spots in the switch).
>>
>> further, for a 2 meg file on
>>
>> Python 2.2.1 (#1, Oct  5 2002, 11:19:44)
>> [GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
>> Type "help", "copyright", "credits" or "license" for more information.
>> >>> import os
>> >>> os.system('/usr/local/bin/wget
>> "http://192.168.0.4/index.asp?file=name" -O file.ext')
>>
>> takes less than 2 seconds
>>
>> while

>> >>> f = open('file2.ext', 'w')
>> >>> u = urllib.urlopen("http://192.168.0.4/index.asp?file=name")
>> >>> f.write(u.read())
>> >>> f.close()
>>

Taking Andrew's clue (below) re memory allocation,
how about (untested)

    f = open('file2.ext', 'w')
    u = urllib.urlopen("http://192.168.0.4/index.asp?file=name")
    f.writelines(u.xreadlines())
    f.close()

which should do some readahead chunking but should loop over line reads and writes
without building a monster string in memory. (or are writelines/xreadlines available
in 2.2.1? If not, you could do the corresponding manually). Or you could try to
second guess OS buffering by moving data in binary 512*n-sized chunks etc. BTW, I wonder
how a memory mapped output file would work -- whether the OS is smart enough to
get some of the writing work done speculatively as you go, or just marks for later
write when closed.

>> takes about 2 minutes
>>
>> so, it seems that using wget is the proper way to preceed, but i would
>> rather a python solution.
>
>I can't test this at the moment, but you should be aware that your python
>approach is doing something quite different from wget in processing this
>download - it is reading _the_whole_file_ into memory, and then writing it
>out in one fell swoop, rather than reading & writing a block at a time
>(which is how wget & fetch would be doing this).  The writing part will be
>fast, but the building of the in-memory image of the file _may_ be
>happening in such a way that the memory image is constantly being
>increased in size to cope with incoming data.
>
>Various platform realloc() library routines have radically different
>performance behaviours in the face of such memory allocation strategies,
>usually unfavourable.  While the FreeBSD memory allocation routines are
>decent, Python has exposed unfavourable performance behavour on FreeBSD in
>other situations.  FreeBSD is not alone in this - Python has managed to
>provoke memory allocation issues on most platforms, which have
>progressively been worked around as they've been identified.
>
>Some memory allocation changes were made for 2.2.2 which did improve
>performance on FreeBSD in certain scenarios, but I doubt your usage would
>be affected.  Still, I would like to know if 2.2.2 helps if you are able
>to test it (and I would recommend the upgrade from 2.2.1 in any case).
>
>--
>Andrew I MacIntyre                     "These thoughts are mine alone..."
>E-mail: andymac at bullseye.apana.org.au  | Snail: PO Box 370
>        andymac at pcug.org.au            |        Belconnen  ACT  2616
>Web:    http://www.andymac.org/        |        Australia
>
>

Regards,
Bengt Richter