[TriZPUG] More Fun With Text Processing
David Handy
david at handysoftware.com
Fri Apr 3 18:44:36 CEST 2009
On Fri, Apr 03, 2009 at 11:31:57AM -0400, Josh Johnson wrote:
> Ok all,
> Since we've got a brain trust of pythonistas that know how to deal with
> strings, here's a problem I'm facing right now that I'd like some input
> on:
>
> I've got a tabular list, it's the output from a command-line program,
> and I need to parse it into some sort of structure.
>
> Here's an example of the data (the headings and column width will vary):
> TARGET VOLUME GROUP LENGTH AVAILABLE NPE MIRROR
> 1.1 HIGHAVAIL 5001.023GB 4501.008GB 1192337 2.1
> 1.3 BACKUP 5001.023GB 4250.759GB 1192337
> 1.4 BACKUP 3000.613GB 3000.353GB 715402
> 2.2 HIGHAVAIL 5001.023GB 5001.015GB 1192337 1.2
> 2.3 BACKUP 5001.023GB 5000.763GB 1192337
> 2.4 BACKUP 3000.613GB 3000.353GB 715402
>
> I'd like a structure I can work with, like say, a list of hashes.
>
> My initial approach involves treating the header row as the guide for
> the field lengths, and then extracting substrings for each field in each
> row.
It's a bummer you have to screen-scrape like that. Any chance you can get at
the source for the command-line utility whose output you are parsing, and
access the underlying data yourself directly from Python?
You said that the headings and column widths could vary. Are there just a
handful of different variations? If so, I would study those, and then
hard-code the field positions and widths in your script. I'd make it table
driven, like this:
# UNTESTED, use at own risk
option1_fields = [
('TARGET', 0, 14),
('VOLUME GROUP', 15, 30),
# etc
]
data = []
for line in file:
d = {}
for fieldname, start, end in option1_fields:
d[fieldname] = line[start:end].strip()
data.append(d)
That gives you your list of dictionaries.
David H
>
> I also thought about just doing a split on spaces, but some of the
> fields could have spaces in their data.
>
> What do you guys think?
>
> JJ
> _______________________________________________
> TriZPUG mailing list
> TriZPUG at python.org
> http://mail.python.org/mailman/listinfo/trizpug
> http://trizpug.org is the Triangle Zope and Python Users Group
--
David Handy
Computer Programming is Fun!
Beginning Computer Programming with Python
http://www.handysoftware.com/cpif/
More information about the TriZPUG
mailing list