Parsing a serial stream too slowly

Thomas Rachel nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915 at spamschutz.glglgl.de
Mon Jan 23 18:13:09 EST 2012


Am 23.01.2012 22:48 schrieb M.Pekala:
> Hello, I am having some trouble with a serial stream on a project I am
> working on. I have an external board that is attached to a set of
> sensors. The board polls the sensors, filters them, formats the
> values, and sends the formatted values over a serial bus. The serial
> stream comes out like $A1234$$B-10$$C987$,  where "$A.*$" is a sensor
> value, "$B.*$" is a sensor value, "$C.*$" is a sensor value, ect...
>
> When one sensor is running my python script grabs the data just fine,
> removes the formatting, and throws it into a text control box. However
> when 3 or more sensors are running, I get output like the following:
>
> Sensor 1: 373
> Sensor 2: 112$$M-160$G373
> Sensor 3: 763$$A892$
>
> I am fairly certain this means that my code is running too slow to
> catch all the '$' markers.

This would just result in the receive buffer constantly growing.

Probably the thing with the RE which has been mentionned by Jon is the 
cause.

But I have some remarks to your code.

First, you have code repetition. You could use functions to avoid this.

Second, you have discrepancies between your 3 blocks: with A, you work 
with sensorabuffer, the others have sensor[bc]enable.

Third, if you have a buffer content of '$A1234$$B-10$$C987$', your "A 
code" will match the whole buffer and thus do

     # s = sensorresult.group(0) ->
     s = '$A1234$$B-10$$C987$'
     # s = s[2:-1]
     s = '1234$$B-10$$C987'
     # maybe put that into self.SensorAValue
     self.sensorabuffer = ''


I suggest the following way to go:

* Process your data only once.
* Do something like

[...]
theonebuffer = '$A1234$$B-10$$C987$' # for now

while True:
     sensorresult = re.search(r'\$(.)(.*?)\$(.*)', theonebuffer)
     if sensorresult:
         sensor, value, rest = sensorresult.groups()
         # replace the self.SensorAValue concept with a dict
         self.sensorvalues[sensor] = value
         theonebuffer = rest
     else: break # out of the while

If you execute this code, you'll end with a self.sensorvalues of

     {'A': '1234', 'C': '987', 'B': '-10'}

and a theonebuffer of ''.


Let's make another test with an incomplete sensor value.

theonebuffer = '$A1234$$B-10$$C987$$D65'

[code above]

-> the dict is the same, but theonebuffer='$D65'.

* Why did I do this? Well, you are constantly receiving data. I do this 
with the hope that the $ terminating the D value is yet to come.

* Why does this happen? The regex does not match this incomplete packet, 
the while loop terminates (resp. breaks) and the buffer will contain the 
last known value.


But you might be right - speed might become a concern if you are 
processing your data slower than they come along. Then your buffer fills 
up and eventually kills your program due to full memory. As the buffer 
fills up, the string copying becomes slower and slower, making things 
worse. Whether this becomes relevant, you'll have to test.

BTW, as you use this one regex quite often, a way to speed up could be 
to compile the regex. This will change your code to

sensorre = re.compile(r'\$(.)(.*?)\$(.*)')
theonebuffer = '$A1234$$B-10$$C987$' # for now

while True:
     sensorresult = sensorre.search(theonebuffer)
     if sensorresult:
         sensor, value, rest = sensorresult.groups()
         # replace the self.SensorAValue concept with a dict
         self.sensorvalues[sensor] = value
         theonebuffer = rest
     else: break # out of the while


And finally, you can make use of re.finditer() resp. 
sensorre.finditer(). So you can do

sensorre = re.compile(r'\$(.)(.*?)\$') # note the change
theonebuffer = '$A1234$$B-10$$C987$' # for now

sensorresult = None # init it for later
for sensorresult in sensorre.finditer(theonebuffer):
     sensor, value = sensorresult.groups()
     # replace the self.SensorAValue concept with a dict
     self.sensorvalues[sensor] = value
# and now, keep the rest
if sensorresult is not None:
     # the for loop has done something - cut out the old stuff
     # and keep a possible incomplete packet at the end
     theonebuffer = theonebuffer[sensorresult.end():]

This removes the mentionned string copying as source of increased slowness.

HTH,

Thomas



More information about the Python-list mailing list