parsing question
Tim Chase
python.list at tim.thechases.com
Mon May 31 11:07:02 EDT 2010
On 05/31/2010 08:42 AM, Mag Gam wrote:
> I have a file with bunch of nfsstat -c (on AIX) which has all the
> hostnames, for example
...
> Is there a an easy way to parse this file according to each host?
>
> So,
> r1svr.Connectionless.calls=6553
> r1svr.Connectionless.badcalls=0
>
> and so on...
>
>
> I am currently using awk which I am able to get what I need, but
> curious if in python how people handle block data.
Since you already profess to having an awk solution, I felt it
was okay to at least take a stab at my implementation (rather
than doing your job for you :). Without a complete spec for the
output, it's a bit of guesswork, but I got something fairly close
to what you want. It uses nested dictionaries which mean the
keys and values have to be referenced like
servers["r1svr"]["connectionless"]["calls"]
and the values are strings (I'm not sure what you want in the
case of the data that has both a value and percentage) not
ints/floats/percentages/etc.
That said, this should get you fairly close to what you describe:
###########################################
import re
header_finding_re = re.compile(r'\b\w{2,}')
version_re = re.compile(r'^Version (\d+):\s*\(.*\)$', re.I)
CLIENT_HEADER = 'Client '
CONNECTION_HEADER = 'Connection'
servers = {}
server = client = orig_client = subtype = None
source = file('data.txt')
for line in source:
line = line.rstrip('\r\n')
if not line.strip(): continue
if line.startswith('='*5) and line.endswith('='*5):
server = line.strip('=')
client = orig_client = subtype = None
elif line.startswith(CLIENT_HEADER):
orig_client = client = line[len(CLIENT_HEADER):-1]
subtype = 'all'
elif line.startswith(CONNECTION_HEADER):
subtype = line.replace(' ', '').lower()
else: # it's a version or header row
m = version_re.match(line)
if m:
subtype = "v" + m.group(1)
else:
if None in (server, client, subtype):
print "Missing data", repr((server, client, subtype))
continue
dest = servers.setdefault(server, {}
).setdefault(client, {}
).setdefault(subtype, {})
data = source.next()
row = header_finding_re.finditer(line)
prev = row.next()
for header in row:
key = prev.group(0)
value = data[prev.start():header.start()].strip()
prev = header
dest[key] = value
key = prev.group(0)
value = data[prev.start():].strip()
dest[key] = value
for server, clients in servers.items():
for client, subtypes in clients.items():
for subtype, kv in subtypes.items():
for key, value in kv.items():
print ".".join([server, client, subtype, key]),
print '=', value
###########################################
Have fun,
-tkc
More information about the Python-list
mailing list