collect data using threads

Peter Hansen peter at engcorp.com
Tue Jun 14 14:15:23 CEST 2005


Qiangning Hong wrote:
> A class Collector, it spawns several threads to read from serial port.
> Collector.get_data() will get all the data they have read since last
> call.  Who can tell me whether my implementation correct?
[snip sample with a list]
> I am not very sure about the get_data() method.  Will it cause data lose
> if there is a thread is appending data to self.data at the same time?

That will not work, and you will get data loss, as Jeremy points out.

Normally Python lists are safe, but your key problem (in this code) is 
that you are rebinding self.data to a new list!  If another thread calls 
on_received() just after the line "x = self.data" executes, then the new 
data will never be seen.

One option that would work safely** is to change get_data() to look like 
this:

def get_data(self):
     count = len(self.data)
     result = self.data[:count]
     del self.data[count:]
     return result

This does what yours was trying to do, but safely.  Not that it doesn't 
reassign self.data, but rather uses a single operation (del) to remove 
all the "preserved" elements at once.  It's possible that after the 
first or second line a call to on_received() will add data, but it 
simply won't be seen until the next call to get_data(), rather than 
being lost.

** I'm showing you this to help you understand why your own approach was 
wrong, not to give you code that you should use.  The key problem with 
even my approach is that it *assumes things about the implementation*. 
Specifically, there are no guarantees in Python the Language (as opposed 
to CPython, the implementation) about the thread-safety of working with 
lists like this.  In fact, in Jython (and possibly other Python 
implementations) this would definitely have problems.  Unless you are 
certain your code will run only under CPython, and you're willing to put 
comments in the code about potential thread safety issues, you should 
probably just follow Jeremy's advice and use Queue.  As a side benefit, 
Queues are much easier to work with!

-Peter



More information about the Python-list mailing list