Parsing a serial stream too slowly
steve+comp.lang.python at pearwood.info
Tue Jan 24 00:08:05 EST 2012
On Tue, 24 Jan 2012 10:49:41 +1100, Cameron Simpson wrote:
> | def OnSerialRead(self, event):
> | text = event.data
> | self.sensorabuffer = self.sensorabuffer + text
> | self.sensorbbuffer = self.sensorbbuffer + text
> | self.sensorcbuffer = self.sensorcbuffer + text
> Slow and memory wasteful. Supposing a sensor never reports? You will
> accumulate an ever growing buffer string. And extending a string gets
> expensive as it grows.
I admit I haven't read this entire thread, but one thing jumps out at me.
It looks like the code is accumulating strings by repeated +
concatenation. This is risky.
In general, you should accumulate strings into a list buffer, then join
them into a single string in one call:
buffer = 
Use of repeated string addition risks slow quadratic behaviour. The OP is
reporting slow behaviour... alarms bells ring.
For anyone who doesn't understand what I mean about slow quadratic
behaviour, read this:
Recent versions of CPython includes an optimization which *sometimes* can
avoid this poor performance, but it can be defeated easily, and does not
apply to Jython and IronPython, so it is best to not rely on it.
I don't know whether this is the cause of the OP's slow behaviour, but it
is worth investigating. Especially since it is likely to not just be
slow, but SLLLLLOOOOOOWWWWWWWWW -- a bad quadratic algorithm can be tens
of thousands or millions of times slower than it need be.
> The slow: You're compiling the regular expression _every_ time you come
> here (unless the re module caches things, which I seem to recall it may.
> But that efficiency is only luck.
More deliberate design than good luck :)
Nevertheless, good design would have you compile the regex once, and not
rely on the re module's cache.
> Regex _is_ slow. It is good for flexible lexing, but generally Not Fast.
I hope I will never be mistaken for a re fanboy, but credit where credit
is due: if you need the full power of a regex, you almost certainly can't
write anything in Python that will beat the re module.
However, where regexes become a trap is that often people use them for
things which are best coded as simple Python tests that are much faster,
such as using a regex where a simple str.startswith() would do.
More information about the Python-list