Extracting values from text file
bearophileHUGS at lycos.com
bearophileHUGS at lycos.com
Fri Jun 16 05:20:13 EDT 2006
First try, probably there are better ways to do it, and it's far from
resilient, it breaks in lot of different ways (example: more than one
number in one line, number with text on both sides of the line, etc.)
I have divided the data munging in many lines so I can see what's
happening, and you can fix/modify the code quikly.
Bye,
bearophile
data1 = """
Some text that can span some lines.
More text
Apples 34
56 Ducks
Some more text.
0.5 g butter
"""
import re
# Separate lines in a list
data2 = data1.split("\n")
print data2, "\n"
# clear lines from trailing and leading spaces, newlines, etc.
data3 = map(str.strip, data2)
print data3, "\n"
# remove blank lines after the stripping
data4 = filter(None, data3)
print data4, "\n"
# create a list of (lines, numbers) of only the lines with a number
inside
patt1 = re.compile("\d+\.?\d*") # No scientific notation
data5 = [(line, n) for line in data4 for n in patt1.findall(line)]
print data5, "\n"
# remove the number from the lines, and strip such lines
data6 = [(line.replace(num, "").strip(), num) for line, num in data5]
print data6, "\n"
def nconv(num):
"To convert a number to an int, and if not possible to a float"
try:
result = int(num)
except ValueError:
result = float(num)
return result
# convert the number strings into ints or floats
data7 = [(line, nconv(num)) for line, num in data6]
print data7, "\n"
# build the final dict of (line: number)
result = dict(data7)
print result, "\n"
More information about the Python-list
mailing list