Working with binary data, S-records (long)

Fri Mar 21 04:03:26 EST 2003

"Hans-Joachim Widmaier" <hjwidmaier at web.de> writes:

> Time to get to the topic. Reading S-records in Python is not all that much
> fun (ok, it's neither in C). I've thought about doing it in C and
> returning a string, but that would lose the address information. And
> creating more complex python data types in C is something I've never done.
> And I don't want to compile under Windows. Thus I wrote a pure Python
> reader, which looks like (this is the whole class so far):
[...]
>     def readrecord(self, line):
>         """Lese eine Zeile als S-Record und gebe Adresse, Daten und Prüfsumme zurück."""
>         type = line[:2]
>         data = [int(line[i:i + 2], 16) for i in range(2, len(line), 2)]
>         cs   = (reduce(operator.add, data) + 1) & 0xff  # Muß 0 ergeben
>         if type in ('S1', 'S9'):
>             adr = (data[1] << 8) + data[2]
>             fd  = 3
>         elif type in ('S2', 'S8'):
>             adr = (data[1] << 16) + (data[2] << 8) + data[3]
>             fd  = 4
>         elif type in ('S3', 'S7'):
>             adr = (long(data[0]) << 24) + (data[2] << 16) + (data[3] << 8) + data[4]
>             fd  = 5
>         elif type == 'S0':      # Kommentar
>             return 'C', 0, data[3:-1], cs
>         else:
>             raise ValueError, "Kein gültiger S-Record"
>         if type > 'S6':         # Startadresse
>             type = 'S'
>         else:                   # Daten
>             type = 'D'
>         return type, adr, data[fd:-1], cs
[...]
> On my development machine (1.7 GHz) it's reasonably fast with a file worth
> 100 KB. But I'm afraid it'll suck on our production machines, which run at
> 166 MHz (give or take some). I thought about using array, but it's lacking
> a method to create a big array without creating a list or string first.
> 
> Anyway, does anyone see a way to speed this up? I'm not going to inline
> readrecord(), as I don't care about 10 %. I'm asking if you see a real
> flaw in my algorithm.

You should try the struct module, this should simplify and maybe also
speed up your code quite a bit.  Another possibility should be to write
an extension not in plain C but with pyrex.

OTOH, if you already have C code which implements the functionality of
readrecord it's not a bug deal to reuse this. You don't have to
construct complicated Python data structures in C, simply build a tuple
with Py_BuildValue().

> 
> <pipe-dreaming mode on>
> Whenever I play with binary data in Python, a dream of a mutable string
> data type crops up. Doing byte fiddling with strings is quite ok as long
> as the data is comparably small. But when the thing gets largish, the
> slicing, copying and reassembling are getting increasingly inelegant, not
> to say "un-pythonic." Even if that hypothetical mutable string type
> wouldn't be returned by read() and wouldn't be accepted by write(),
> conversion from and to normal immutable strings should be cheap.
> <pipe-dreaming mode off>

There are mutable string types in Python (sort of), they are just named
differently.  The array module comes to mind, but if your data comes
from files, mmap should also be considered.

HTH,

Thomas