Newbie help with manipulating text files

Joe Murray jmurray at agyinc.com
Fri May 25 16:25:08 EDT 2001


jk wrote:
> 
> I hope someone can help me with this problem, I know I am just thinking
> about it the wrong way.
> 
> I have a file that I want to divide into three parts: a header, a section
> I want to sort, and a footer. The header and footer are not in a
> consistent format, but the middle part I want to sort has a bunch of
> lines that all start with the same word.
> 
> So the file looks something like this:
> 
> header header header this is a header
> this is a headerthis is a headerthis is a header
> this is a headerthis is a headerthis is a header
> yep this is a header
> 
> middle part z
> middle part a
> middle part c
> middle part d
> middle part e
> 
> footer this is the footerfooter this is the footerfooter this is the
> footerfooter this is the footerfooter this is the footerfooter this is the
> footerfooter this is the footerfooter this is the footerfooter this is the
> footerfooter this is the footerfooter this is the footerfooter this is the
> footerfooter this is the footerfooter this is the footer
> 
> I've been able to use re to grab all of the middle part and put it into a
> sortable list, but I want to be able to print out the whole file with the
> middle part sorted but the header and footer unchanged. I think that what
> I want is to read the whole thing into a list and then split the list
> into 3 sublists, manipulate the list I want and then write out a new file
> with the modified lists joined together. The part I don't know how to do
> is to get the header and footer into their own lists.
> 
> thanks in advance,
> 
> jk
> --
> http://mail.python.org/mailman/listinfo/python-list

Hmmm,

This might work.

>>> import re
>>> f = open('test.txt')
>>> header = []
>>> data = []
>>> footer = []
>>> data_regex = re.compile('middle')
>>> 
>>> while 1:
...     line = f.readline()
...     if data_regex.match(line): break
...     header.append(line)
... 
>>> while 1:
...     data.append(line)
...     if not data_regex.match(line): break
...     line = f.readline()
... 
>>> while 1:
...     footer.append(line)
...     line = f.readline()
...     if not line: break 
... 
>>> header
['header header header this is a header\012', 'this is a headerthis is a
headerthis is a header\012', 'this is a headerthis is a headerthis is a
header\012', 'yep this is a header\012', '\012', '\012']
>>> data
['middle part z\012', 'middle part a\012', 'middle part c\012', 'middle
part d\012', 'middle part e\012', '\012']
>>> footer
['\012', 'footer this is the footerfooter this is the footerfooter this
is the\012', 'footerfooter this is the footerfooter this is the
footerfooter this is the\012', 'footerfooter this is the footerfooter
this is the footerfooter this is the\012', 'footerfooter this is the
footerfooter this is the footerfooter this is the\012', 'footerfooter
this is the footerfooter this is the footerfooter\012']>>> 

If that seems appropriate, there you go.  You could some other nifty
things, in no particular order, like:
(1) use f.readlines() and iterate over that list
(2) use a string find versus a regular expression for 'speed'
(3) use a flag to determine which section of the input you're in,
header, data, footer and roll the three while loops into one
(4) avoid the 'while 1:' idiom by doing priming readlines


joe


PS ... A heartwarming story:

Yesterday, I was explaining to my coworker how we were going to perform
statistical analysis of some image data.  I started plugging away on a
Python script while he looked over my shoulder.  After typing the entire
program to do the analysis, I looked up and said, "So that's the
implementation."  He replied, "So, now that we have psuedocode we
actually have to code it, right?"  TRUE STORY!  Ahhh, Python.

-- 
Joseph Murray
Bioinformatics Specialist, AGY Therapeutics
290 Utah Avenue, South San Francisco, CA 94080
(650) 228-1146




More information about the Python-list mailing list