regular expression to extract text

Roel Mathys rm at rm.net
Thu Nov 20 11:31:52 EST 2003


Although I hold no grudge against regexes, I've overused them myself in 
the past (it's a bit rusty). But nowadays I prefer to use them less and 
less.

bye,
rm

ps: I don't know what the purpose really was, but I gave it a little 
shot anyway.

------------------------------------------------------------------------

text = """
Using unit cell orientation matrix from collect.rmat
NOTICE: Performing automatic cell standardization
The following database entries have similar unit cells:
Refcode     Sumformula
       <Conventional cell parameters>
------------------------------------------
QEXZUO     C26 H31 N1 O3
          6.164   15.892   22.551    90.00    90.00    90.00
------------------------------------------
ARQTYD     C19 H23 N1 O5
          6.001   15.227   22.558    90.00    90.00    90.00
------------------------------------------
NHDIIS     C45 H40 Cl2
          6.532   15.147   22.453    90.00    90.00    90.00 """

result = {}
refcode = None
started = False
for line in text.split('\n') :
     if not started \
     and line == '------------------------------------------' :
         started = True
         continue
     if started :
         if refcode is None :
             fields = line.split()
             refcode = fields[0]
             sumformula = fields[1:]
         else :
             cellparams = map( float , line.split())
             # assuming refcode is unique
             result[refcode] = { 'sumformula' : sumformula
                               , 'cellparams' : cellparams
                               }
             refcode = None
             started = False

from pprint import pprint

pprint( result )





More information about the Python-list mailing list