text file parsing (awk -> python)
bearophileHUGS at lycos.com
bearophileHUGS at lycos.com
Wed Nov 22 13:02:20 EST 2006
Peter Otten, your solution is very nice, it uses groupby splitting on
empty lines, so it doesn't need to read the whole files into memory.
But Daniel Nogradi says:
> But the names of the fields (node, x, y) keeps changing from file to
> file, even their number is not fixed, sometimes it is (node, x, y, z).
Your version with the converters dict fails to convert the number of
node, z fields, etc. (generally using such converters dict is an
elegant solution, it allows to define string, float, etc fields):
> converters = dict(
> x=int,
> y=int
> )
I have created a version with a RE, but it's probably too much rigid,
it doesn't handle files with the z field, etc:
data = """node 10
y 1
x -1
node 11
x -2
y 1
z 5
node 12
x -3
y 1
z 6"""
import re
unpack = re.compile(r"(\D+) \s+ ([-+]? \d+) \s+" * 3, re.VERBOSE)
result = []
for obj in unpack.finditer(data):
block = obj.groups()
d = dict((block[i], int(block[i+1])) for i in xrange(0, 6, 2))
result.append(d)
print result
So I have just modified and simplified your quite nice solution (I have
removed the pprint, but it's the same):
def open(filename):
from cStringIO import StringIO
return StringIO(data)
from itertools import groupby
records = []
for empty, record in groupby(open("records.txt"), key=str.isspace):
if not empty:
pairs = ([k, int(v)] for k,v in map(str.split, record))
records.append(dict(pairs))
print records
Bye,
bearophile
More information about the Python-list
mailing list