Efficient multi-slicing technique?

Sun Jan 25 21:40:05 EST 2009

>> I'm not sure if it's more efficient, but there's the struct
>> module: http://docs.python.org/library/struct.html
> 
> Thanks for your suggestion. I've been experimenting with this
> technique, but my initial tests don't show any performance
> improvements over using slice() objects to slice a string.
> However, I missed the nuance of using 'x' to mark filler bytes
> - I'm going to see if this makes a difference (it may as I am
> skipping over several columns of input that I've been 
> currently returning as ignored values)

I don't expect it will make a great deal of difference -- there's
not much room to improve the process.  Are you actually
experiencing efficiency problems?  I regularly use slice
unpacking (without reaching for the struct module) with no
noteworthy performance impact beyond the cost of scanning the
file and doing the processing on those lines (and these are text
files several hundred megs in size).  When I omit my processing
code and just skim through the file, the difference between
slice-unpacking and not slice-unpacking is in the sub-second range.

> <reading your link to doc ...> wait ... it looks like I can
> 'compile' struct strings using by using a Struct class vs. the
> using the module's basic unpack() function. This sounds like
> the difference between using compiled regular expressions vs.
> re-compiling a regular expression on every use. I'll see if
> this makes a difference and report back to the list.

I don't expect it will...in the code for the struct.py I've got
here in my 2.5 distribution, it maintains an internal cache of
compiled strings, so unless you have more than _MAXCACHE=100
formatting strings, it's not something you're really have to
worry about.  (in my main data-processing/ETL app, I can't
envision having more than about 20 formatting strings, if I went
that route)

-tkc