Splitting Text files

Ken Seergobin kseergobin at sympatico.ca
Tue Jul 2 20:16:36 EDT 2002


Thanks.  Csplit makes the task much easier!  The
extraction of the critical data lines plus relevant
fields was never an issue.  With your suggestions though
I can now write batch or shell scripts to do most of the
work.  However, python offers unlimited possibilities
when analysing/visualizing the data.  As such, I was looking for an easy way
to abandon the use of mulitple unix(like) tools.

Part of what I was looking for was the ability to use the
root of the bitmap name as the output name.  I can extract
the bitmap name, but don't know how to redirect data to
a file with a similar name (e.g., bitmap1.txt).

Anyway, csplit saves a great deal of time.

Ken

"William Park" <opengeometry at NOSPAM.yahoo.ca> wrote in message
news:aft3np$h9lhc$1 at ID-99293.news.dfncis.de...
> Ken Seergobin <kseergobin at sympatico.ca> wrote:
> > "William Park" <opengeometry at NOSPAM.yahoo.ca> wrote in message
> > news:aft0r5$go61b$3 at ID-99293.news.dfncis.de...
> >
> >> Perhaps, you should remove 'X-No-Archive'.  Most people won't give
> >> answers, let alone reply, to such posts.
> >
> > Personally, I'm not thrilled by every scrap of information being
> > recorded.  However, if the no-archive option makes getting information
> > easier the content of the original post will be repeated in this
message.
> > (That said, I do understand why posts like these should be archived.)
> >
> > Original Post:
> >
> > I've looked around, but have been unable to locate a good example of how
> > to split a text file.  Specifically, I have datafiles with an
> > identification line marked with the name of a BMP  file followed by many
> > lines of data.  This repeats a number of times for each datafile.
Within
> > the data lines I'm only interested in extracting the those with a
> > specific keyword.  Ultimately, I'd like to have a datafile for each BMP
> > listed in the original file.
> >
> > Suggestions would be appreciated.  I really couldn't make sense of the
> > regular expression notes I found.
> >
> > Thanks,
> > Ken
>
> I'm guessing that your data file looks something like
>     file1.bmp
>     ...<data lines>...
>     ...
>     file2.bmp
>     ...<data lines>...
>     ...
>     file3.bmp
>     ...
>
> 1. In shell, you'd do like
>     csplit file '/\w*\.bmp/' '{*}' --> xx00, xx01, ...
>     mv xx00 file1.bmp
>     mv xx01 file2.bmp
>     ...
>
> 2. However, since you're only interested in those data lines with certain
> keywords, simply do
>     egrep -e '^file[0-9]\.bmp$' -e 'your_search_pattern' file
> or
>     for x in xx[0-9][0-9]; do
> egrep 'your_search_pattern' $i
>     done
>
> Translating these to Python is left as exercise for readers. ;-)
>
> --
> William Park, Open Geometry Consulting, <opengeometry at yahoo.ca>
> 8-CPU Cluster, Hosting, NAS, Linux, LaTeX, python, vim, mutt, tin





More information about the Python-list mailing list