Splitting Text files

William Park opengeometry at NOSPAM.yahoo.ca
Tue Jul 2 16:51:07 EDT 2002


Ken Seergobin <kseergobin at sympatico.ca> wrote:
> "William Park" <opengeometry at NOSPAM.yahoo.ca> wrote in message
> news:aft0r5$go61b$3 at ID-99293.news.dfncis.de...
> 
>> Perhaps, you should remove 'X-No-Archive'.  Most people won't give
>> answers, let alone reply, to such posts.
> 
> Personally, I'm not thrilled by every scrap of information being
> recorded.  However, if the no-archive option makes getting information
> easier the content of the original post will be repeated in this message.
> (That said, I do understand why posts like these should be archived.)
> 
> Original Post:
> 
> I've looked around, but have been unable to locate a good example of how
> to split a text file.  Specifically, I have datafiles with an
> identification line marked with the name of a BMP  file followed by many
> lines of data.  This repeats a number of times for each datafile.  Within
> the data lines I'm only interested in extracting the those with a
> specific keyword.  Ultimately, I'd like to have a datafile for each BMP
> listed in the original file.
> 
> Suggestions would be appreciated.  I really couldn't make sense of the
> regular expression notes I found.
> 
> Thanks,
> Ken

I'm guessing that your data file looks something like
    file1.bmp
    ...<data lines>...
    ...
    file2.bmp
    ...<data lines>...
    ...
    file3.bmp
    ...

1. In shell, you'd do like
    csplit file '/\w*\.bmp/' '{*}'		--> xx00, xx01, ...
    mv xx00 file1.bmp
    mv xx01 file2.bmp
    ...
	
2. However, since you're only interested in those data lines with certain
keywords, simply do
    egrep -e '^file[0-9]\.bmp$' -e 'your_search_pattern' file
or
    for x in xx[0-9][0-9]; do
	egrep 'your_search_pattern' $i
    done

Translating these to Python is left as exercise for readers. ;-)
	
-- 
William Park, Open Geometry Consulting, <opengeometry at yahoo.ca>
8-CPU Cluster, Hosting, NAS, Linux, LaTeX, python, vim, mutt, tin



More information about the Python-list mailing list