[TriZPUG] More Fun With Text Processing

Carol Ludwig csl at med.unc.edu
Fri Apr 3 19:10:58 CEST 2009


Use sed to convert groups of two spaces to some character ( s/  /:/g ), then collapse the muliple : to one.  (Or , , etc).

echo "1.1               HIGHAVAIL    5001.023GB    4501.008GB     1192337  2.1" | sed 's/  /:/g' | sed 's/::/:/g' | sed 's/::/:/g'  | sed 's/::/:/g'

1.1: HIGHAVAIL:5001.023GB:4501.008GB: 1192337:2.1

Pipes on command line output, or if you have it in a file, cat the file.
cat filename | sed 's/  /:/g' | sed 's/::/:/g' | sed 's/::/:/g'  | sed 's/::/:/g'

----- Original Message -----
From: Chris Rossi <chris at christophermrossi.com>
Date: Friday, April 3, 2009 12:03
Subject: Re: [TriZPUG] More Fun With Text Processing
To: "Triangle (North Carolina) Zope and Python Users Group" <trizpug at python.org>

> Or maybe it's already outputting tab characters?
> 
> Chris
> 
> 
> On Fri, Apr 3, 2009 at 11:51 AM, Stephan Altmueller <
> stephan_altmueller at unc.edu> wrote:
> 
> > Josh,
> >
> > I think the first thing you should do is nail down the exact 
> file format.
> > If you have missing values and spaces in your format you have no
> > unambiguous way
> > to decide what column an entry belongs to.
> >
> > Can you make the command line program insert some sort of 
> delimiter like
> > commas ?
> >
> >    -- Stephan
> >
> > Josh Johnson wrote:
> > > Ok all,
> > > Since we've got a brain trust of pythonistas that know how 
> to deal
> > > with strings, here's a problem I'm facing right now that I'd 
> like some
> > > input on:
> > >
> > > I've got a tabular list, it's the output from a command-line 
> program,> > and I need to parse it into some sort of structure.
> > >
> > > Here's an example of the data (the headings and column width 
> will vary):
> > > TARGET         
> VOLUME GROUP        
> LENGTH     AVAILABLE         NPE
> > > MIRROR
> > > 
> 1.1               HIGHAVAIL    5001.023GB    4501.008GB     1192337  2.1
> > > 
> 1.3                  BACKUP    5001.023GB    4250.759GB     1192337
> > > 
> 1.4                  BACKUP    3000.613GB    3000.353GB      715402
> > > 
> 2.2               HIGHAVAIL    5001.023GB    5001.015GB     1192337  1.2
> > > 
> 2.3                  BACKUP    5001.023GB    5000.763GB     1192337
> > > 
> 2.4                  BACKUP    3000.613GB    3000.353GB      715402
> > >
> > > I'd like a structure I can work with, like say, a list of hashes.
> > >
> > > My initial approach involves treating the header row as the 
> guide for
> > > the field lengths, and then extracting substrings for each 
> field in
> > > each row.
> > >
> > > I also thought about just doing a split on spaces, but some 
> of the
> > > fields could have spaces in their data.
> > >
> > > What do you guys think?
> > >
> > > JJ
> > > _______________________________________________
> > > TriZPUG mailing list
> > > TriZPUG at python.org
> > > http://mail.python.org/mailman/listinfo/trizpug
> > > http://trizpug.org is the Triangle Zope and Python Users Group
> >
> >
> > --
> > -------------------------------------------------
> > Stephan Altmueller
> > Applications Analyst, Enterprise Applications
> > Office of Arts and Sciences Information Services
> > University of North Carolina at Chapel Hill
> > CB 3056, 06 Howell Hall
> > Chapel Hill, NC 27599-3056
> > 919.448.5936 (direct line)
> > stephan_altmueller at unc.edu
> > AIM: oasisaltmuell
> > http://oasis.unc.edu
> >
> > _______________________________________________
> > TriZPUG mailing list
> > TriZPUG at python.org
> > http://mail.python.org/mailman/listinfo/trizpug
> > http://trizpug.org is the Triangle Zope and Python Users Group
> >
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/trizpug/attachments/20090403/096703ab/attachment.htm>


More information about the TriZPUG mailing list