NNTP binary attachment downloader with asyncore and generators

Josiah Carlson jcarlson at uci.edu
Fri Jun 4 09:56:29 CEST 2004


Good start.

>         self.buffer = ''
>         self.inbuffer = ''
>         self.dataline = ''

Depending on how much data you are looking to move, straight strings and 
string concatenation is not always the fastest way to deal with things.

For incoming buffers...

self.inbuffer = []
#as we receive data
#when we have as much as we need...
data = ''.join(self.inbuffer)
self.inbuffer = []

>     def handle_close(self):
>         pass

Um...you are going to want to actually handle that close...with a 

>     def handle_write(self):
>         print 'sending: ' + self.buffer.strip()
>         sent = self.send(self.buffer)
>         self.buffer = self.buffer[sent:]
>         if self.buffer:
>             print 'didnt send whole line' #does this ever happen?
>             print 'didnt send whole line' #just getting my attention 
>             print 'didnt send whole line' #in case it does

Try sending a few megabytes to it, you'll find the upper end of the 
amount you can send at any one time.

Generally, it all depends on both the configuration of your TCP/IP 
implementation (sending window size), as well as the actual throughput 
and latencies of the connection to the other machine.

What asynchat does (and many other libraries do) is to pre-partition the 
data into small chunks.  Asynchat sticks with 512 bytes (a little small 
IMO), but one would even be conservative at 1024 bytes (ethernet frames 
are ~1500, so there is even some headroom).  Tune it based on what your 
connection is doing over time, this is Python.

Also, use a real FIFO class for buffering.  You can modify Raymond's 
fastest fifo implementation listed here:
to insert those blocks that are accidentally not sent completely.

>     def handle_read(self):

Check out the handle_read() method of asynchat.async_chat.  It does some 
really good things to make line-based protocols work well, and there are 
even some optimizations that can be done for your specific protocol.

>         self.inbuffer += self.recv(8192)

Don't do the above.  Not a good idea.

The generator method looks reasonable, but I admit, I didn't read too 
deeply (I really should be going to bed).

Keep working with sockets, they can be fun (if not frustrating at 
times).  If at any point you contemplate going towards a threaded 
server, just remember:
1. Asyncore (modified properly) can handle hundreds of connections (I'm 
currently limited by the number of file handles Python is compiled with) 
and saturate 100mbit with ease (I have written clients that saturate 
1gbit for certain tasks).
2. Standard Python threads get laggy, and actually reduce bandwidth as 
you approach 10-20 threads (word is that Stackless' tasklets are damn fast).

  - Josiah

More information about the Python-list mailing list