[TriZPUG] More Fun With Text Processing

David Handy david at handysoftware.com
Fri Apr 3 18:44:36 CEST 2009

On Fri, Apr 03, 2009 at 11:31:57AM -0400, Josh Johnson wrote:
> Ok all,
> Since we've got a brain trust of pythonistas that know how to deal with  
> strings, here's a problem I'm facing right now that I'd like some input 
> on:
> I've got a tabular list, it's the output from a command-line program,  
> and I need to parse it into some sort of structure.
> Here's an example of the data (the headings and column width will vary):
> 1.1               HIGHAVAIL    5001.023GB    4501.008GB     1192337  2.1
> 1.3                  BACKUP    5001.023GB    4250.759GB     1192337
> 1.4                  BACKUP    3000.613GB    3000.353GB      715402
> 2.2               HIGHAVAIL    5001.023GB    5001.015GB     1192337  1.2
> 2.3                  BACKUP    5001.023GB    5000.763GB     1192337
> 2.4                  BACKUP    3000.613GB    3000.353GB      715402
> I'd like a structure I can work with, like say, a list of hashes.
> My initial approach involves treating the header row as the guide for  
> the field lengths, and then extracting substrings for each field in each  
> row.

It's a bummer you have to screen-scrape like that. Any chance you can get at
the source for the command-line utility whose output you are parsing, and
access the underlying data yourself directly from Python?

You said that the headings and column widths could vary. Are there just a
handful of different variations? If so, I would study those, and then 
hard-code the field positions and widths in your script.  I'd make it table
driven, like this:

# UNTESTED, use at own risk

option1_fields = [
    ('TARGET', 0, 14),
    ('VOLUME GROUP', 15, 30),
    # etc

data = []
for line in file:
    d = {}
    for fieldname, start, end in option1_fields:
        d[fieldname] = line[start:end].strip()

That gives you your list of dictionaries.

David H

> I also thought about just doing a split on spaces, but some of the  
> fields could have spaces in their data.
> What do you guys think?
> JJ
> _______________________________________________
> TriZPUG mailing list
> TriZPUG at python.org
> http://mail.python.org/mailman/listinfo/trizpug
> http://trizpug.org is the Triangle Zope and Python Users Group

David Handy
Computer Programming is Fun!
Beginning Computer Programming with Python

More information about the TriZPUG mailing list