[Tutor] fixed or variable length fields?

antonmuhin at rambler.ru antonmuhin at rambler.ru" <antonmuhin@rambler.ru
Sat Mar 1 05:15:04 2003


Hello Paul,

Saturday, March 1, 2003, 12:27:42 PM, you wrote:

PT> Would I gain a speed increase by using fixed fields rather than variable
PT> length ones?

PT> I am writing a script that converts Microsoft RTF to XML. The first
PT> stage breaks the file into tokens and puts one token on a line:


PT> ob<nu<nu<nu<0001<{
PT> cw<nu<nu<nu<rtf>true<rtf
PT> cw<nu<nu<nu<macintosh>true<macintosh
PT> cw<nu<nu<nu<font-table>true<font-table

PT> (Fields delimited with "<" and ">" because all "<" and ">" have
PT> been converted to "&lt;" and "&gt;"

PT> I will make several passes through this file to convert the data.

PT> Each time I read a line, I will use the string method, and sometimes the
PT> split method:

PT> if line[12:23] == 'font-table':
PT>         info = [12:23]
PT>         list = info.split(">")
PT>         if list[1] == 'true':
PT>                 # do something

PT> If I use fixed length fields, then I won't have to do any splitting. I
PT> also know that in perl, there is a way to use 'pack' and 'unpack' to
PT> quickly access fixed fields. I have never used this, and don't know if
PT> the pack in Python is similar.

PT> If fix fields did give me a speed increase, I would certainly suffer
PT> from readibility. For example, the above 4 lines of tokens might look
PT> like:

PT> opbr:null:null:null:0001
PT> ctrw:null:null:true:rtfx
PT> ctrw:null:null:true:mact
PT> ctrw:null:null:true:fntb

PT> Instead of 'macintosh', I have 'mact'; instead of 'font-table', I have
PT> 'fntb'. 

PT> Thanks

PT> Paul
There might be another source of your Python script poor performance,
although I'm not sure and gurus might correct me.

Slicing operations on string in Python seems to be rather expensive
for strings are immutable: line[12:23], if I understand it right,
should create new temporal string on heap, comapre it to the constant
and lately gc it. You may use array module if you use a lot of
slicing.

struct module might be of interest for you too.

Another source of improvment might be to use generatots instead of
lists.

-- 
Best regards,
 anton                            mailto:antonmuhin@rambler.ru