regular expression to extract text

Peter Hansen peter at engcorp.com
Thu Nov 20 16:37:23 CET 2003


Mark Light wrote:
> 
> Hi I have a file read in as a string that looks like below. What I want to
> do is pull out the bits of information to eventually put in an html table.
> FOr the 1st example the 3 bits are:
> 1.QEXZUO
> 2. C26 H31 N1 O3
> 3. 6.164   15.892   22.551    90.00    90.00    90.00
> 
> ANy ideas of the best way to do this - I was trying regular expressions but
> not getting very far.
> 
> Thanks,
> 
> Mark.
> 
> """
> Using unit cell orientation matrix from collect.rmat
> NOTICE: Performing automatic cell standardization
> The following database entries have similar unit cells:
> Refcode     Sumformula
>       <Conventional cell parameters>
> ------------------------------------------
> QEXZUO     C26 H31 N1 O3
>          6.164   15.892   22.551    90.00    90.00    90.00
> ------------------------------------------
> ARQTYD     C19 H23 N1 O5
>          6.001   15.227   22.558    90.00    90.00    90.00
> ------------------------------------------
> NHDIIS     C45 H40 Cl2
>          6.532   15.147   22.453    90.00    90.00    90.00 """

I don't think you've given enough information here.  Are those
"bits" supposed to be kept intact, complete with internal spacing,
or are you doing more manipulation of them?  What is the definition
of the "bits"?  Specifically, is bit 1 "the first non-space token
after a line of hyphens"?  Is bit 2 "everything on the line after
bit 1, with leading and trailing spaces stripped"?  Is bit 3
"everything on the following line, with leading/trailing spaces
stripped"?

Those definitions roughly fit what you describe, and if that's
all you need, the solution should be pretty trivial, without
having to use regular expressions which would be overkill in this
case.




More information about the Python-list mailing list