regular expression to extract text
Roel Mathys
rm at rm.net
Thu Nov 20 11:31:52 EST 2003
Although I hold no grudge against regexes, I've overused them myself in
the past (it's a bit rusty). But nowadays I prefer to use them less and
less.
bye,
rm
ps: I don't know what the purpose really was, but I gave it a little
shot anyway.
------------------------------------------------------------------------
text = """
Using unit cell orientation matrix from collect.rmat
NOTICE: Performing automatic cell standardization
The following database entries have similar unit cells:
Refcode Sumformula
<Conventional cell parameters>
------------------------------------------
QEXZUO C26 H31 N1 O3
6.164 15.892 22.551 90.00 90.00 90.00
------------------------------------------
ARQTYD C19 H23 N1 O5
6.001 15.227 22.558 90.00 90.00 90.00
------------------------------------------
NHDIIS C45 H40 Cl2
6.532 15.147 22.453 90.00 90.00 90.00 """
result = {}
refcode = None
started = False
for line in text.split('\n') :
if not started \
and line == '------------------------------------------' :
started = True
continue
if started :
if refcode is None :
fields = line.split()
refcode = fields[0]
sumformula = fields[1:]
else :
cellparams = map( float , line.split())
# assuming refcode is unique
result[refcode] = { 'sumformula' : sumformula
, 'cellparams' : cellparams
}
refcode = None
started = False
from pprint import pprint
pprint( result )
More information about the Python-list
mailing list