Efficient multi-slicing technique?

Tim Chase python.list at tim.thechases.com
Sun Jan 25 19:47:23 EST 2009


> Is there an efficient way to multi-slice a fixed with string
> into individual fields that's logically equivalent to the way
> one would slice a delimited string using .split()? Background:
> I'm parsing some very large, fixed line-width text files that
> have weekly columns of data (52 data columns plus related
> data). My current strategy is to loop through a list of 
> slice()'s to build a list of the specific field values for
> each line. This is fine for small files, but seems
> inefficient. I'm hoping that there's a built-in (C based)

I'm not sure if it's more efficient, but there's the struct 
module[1]:

   from struct import unpack
   for line in file('sample.txt'):
     (num, a, b, c, nl) = unpack("2s9s7s4sc", line)
     print "num:", repr(num)
     print "a:", repr(a)
     print "b:", repr(b)
     print "c:", repr(c)

Adjust the formatting string for your data (the last "c" is the 
newline character -- you might be able to use "x" here to just 
ignore the byte so it doesn't get returned). The sample data I 
threw was 2/9/7/4 character data.  The general pattern would be

   lengths = [3,18,24,5,1,8]
   FORMAT_STR = (
     ''.join("%ss" % length for length in lengths) +
     'c')
   for line in file(INFILE):
     (f1, f2,..., fn, _) = unpack(FORMAT_STR, line)


-tkc

[1]
http://docs.python.org/library/struct.html










More information about the Python-list mailing list