Logging data from Arduino using PySerial

Chris Angelico rosuav at gmail.com
Tue Feb 4 05:47:19 CET 2014

On Tue, Feb 4, 2014 at 3:07 PM, Thomas <t.tchorzewski at gmail.com> wrote:
> I've written a script to log data from my Arduino to a csv file. The script works well enough but it's very, very slow. I'm quite new to Python and I just wanted to put this out there to see if any Python experts could help optimise my code. Here it is:

The most important question is: Have you profiled your code? That is,
do you know which part(s) are slow?

And the first part of that question is: Define "slow". How long does
your program actually take? How much data are you accumulating in that
time? By my reading, you're getting 60 lines from the serial port; how
long is each line?

For a basic back-of-the-envelope timing estimate, work out how long
each line is (roughly), and multiply by 60 (number of lines), then
divide by 960 (your baud rate, divided by 10, which gives a rough
bytes-per-second rate). That'll give you a ball-park figure for how
many seconds this loop will take:

>     # Collecting the data from the serial port
>     while True:
>         data_log.append(connection.readline())
>         if len(data_log) > max_length - 1:
>             break

If your lines are 80 characters long, give or take, then that works
out to 80*60/960 = five seconds just to fetch the data from the serial
port. So if the script is taking anywhere from 3 to 8 seconds to run,
then you can assume that it's probably spending most of its time right
here. Yes, that might feel like it's really slow, but it's all down to
your baud rate. You can easily confirm this by surrounding that loop

import time
start_time = time.time()
while ...
   ... append ...
print("Time to fetch from serial port:",time.time()-start_time)

(If this is Python 2, this will produce slightly ugly output as it'll
display it as a tuple. It'll work though.)

Put that in, and then run the program with some kind of external
timing. On Linux, that would be:

$ time python scriptname.py

If the total script execution is barely more than the time spent in
that loop, don't bother optimizing any of the rest of the code - it's
a waste of time improving something that's not materially affecting
your total run time.

But while I'm here looking at your code, I'll take the liberty of
making a few stylistic suggestions :) Feel free to ignore these, but
this is the more Pythonic way of writing the code. Starting with the
serial port fetch loop:

>     # Collecting the data from the serial port
>     while True:
>         data_log.append(connection.readline())
>         if len(data_log) > max_length - 1:
>             break

An infinite loop with a conditional break exactly at one end. This
would be clearer written as a straight-forward conditional loop, with
the condition inverted:

while len(data_log) < max_length:

This isn't strictly identical to your previous version, but since
len(data_log) will always be an integer, it is - I believe -
functionally equivalent. But make sure I haven't introduced any bugs.

(There is another difference, in that my version checks the condition
on entry where yours would check it only after doing the first append
- your version would guarantee a minimum of one line in the log, mine
won't. I'm guessing that this won't be significant in normal usage; if
it is, go back to your version of the code.)

>     def map(x, in_min, in_max, out_min, out_max):
>         return (((x - in_min) * (out_max - out_min))/(in_max - in_min)) + out_min

This shadows the built-in map() function. It works, but it's usually
safer to not do this, in case you subsequently want the default map().

>             line_data = re.findall('\d*\.\d*',line) # Find all digits
>             line_data = filter(None,line_data)    # Filter out empty strings
>             line_data = [float(x) for x in line_data] # Convert Strings to float

You can combine these.

line_data = [float(x) for x in re.findall('\d*\.\d*',line) if x]

Optionally break out the re into a separate line, but definitely I'd
go with the conditional comprehension above the separate filter():

line_data = re.findall('\d*\.\d*',line)
line_data = [float(x) for x in line_data if x]

>             for i in range(1,len(line_data)):
>                 line_data[i]=map(line_data[i],0,1023,0,5)

Is it deliberate that you start from 1 here? Are you consciously
skipping the first element, leaving it unchanged? If not, I would go
for another comprehension:

line_data = [map(x,0,1023,0,5) for x in line_data]

but if so, this warrants a code comment explaining why the first one is special.

>     plt.clf()
>     plt.close()
>     plt.plotfile('data.csv',(0,1,2),names=['time (s)','voltage2 (V)','voltage1 (V)'],newfig=True)
>     plt.show()

This, btw, is the other place that I'd look for a potentially large
slab of time. Bracket this with time.time() calls as above, and see
whether it's slow enough to mean nothing else is a concern. But I'd
first look at the serial port read loop; depending on how long your
lines are, that could easily be quite a few seconds of time just on
its own.


More information about the Python-list mailing list