[TriZPUG] More Fun With Text Processing

Stephan Altmueller stephan_altmueller at unc.edu
Fri Apr 3 17:51:36 CEST 2009


I think the first thing you should do is nail down the exact file format.
If you have missing values and spaces in your format you have no
unambiguous way
to decide what column an entry belongs to.

Can you make the command line program insert some sort of delimiter like
commas ?

    -- Stephan

Josh Johnson wrote:
> Ok all,
> Since we've got a brain trust of pythonistas that know how to deal
> with strings, here's a problem I'm facing right now that I'd like some
> input on:
> I've got a tabular list, it's the output from a command-line program,
> and I need to parse it into some sort of structure.
> Here's an example of the data (the headings and column width will vary):
> TARGET         VOLUME GROUP        LENGTH     AVAILABLE         NPE 
> 1.1               HIGHAVAIL    5001.023GB    4501.008GB     1192337  2.1
> 1.3                  BACKUP    5001.023GB    4250.759GB     1192337
> 1.4                  BACKUP    3000.613GB    3000.353GB      715402
> 2.2               HIGHAVAIL    5001.023GB    5001.015GB     1192337  1.2
> 2.3                  BACKUP    5001.023GB    5000.763GB     1192337
> 2.4                  BACKUP    3000.613GB    3000.353GB      715402
> I'd like a structure I can work with, like say, a list of hashes.
> My initial approach involves treating the header row as the guide for
> the field lengths, and then extracting substrings for each field in
> each row.
> I also thought about just doing a split on spaces, but some of the
> fields could have spaces in their data.
> What do you guys think?
> JJ
> _______________________________________________
> TriZPUG mailing list
> TriZPUG at python.org
> http://mail.python.org/mailman/listinfo/trizpug
> http://trizpug.org is the Triangle Zope and Python Users Group

Stephan Altmueller
Applications Analyst, Enterprise Applications
Office of Arts and Sciences Information Services
University of North Carolina at Chapel Hill
CB 3056, 06 Howell Hall
Chapel Hill, NC 27599-3056 
919.448.5936 (direct line)
stephan_altmueller at unc.edu 
AIM: oasisaltmuell

More information about the TriZPUG mailing list