[Tutor] Regular Expression help

Fri Jun 29 09:37:32 CEST 2007

Thanks everyone for the replies all worked well, I adopted the string
splitting approach in favour of the regex one as it seemed to miss less
of the edge cases. I would like to thank everyone for their help once
again 

-----Original Message-----
From: Kent Johnson [mailto:kent37 at tds.net] 
Sent: 27 June 2007 14:55
To: tutor at python.org; Gardner, Dean
Subject: Re: [Tutor] Regular Expression help

Gardner, Dean wrote:
> Hi
> 
> I have a text file that I would like to split up so that I can use it 
> in Excel to filter a certain field. However as it is a flat text file 
> I need to do some processing on it so that Excel can correctly import
it.
> 
> File Example:
> tag             desc                    VR      VM
> (0012,0042) Clinical Trial Subject Reading ID LO 1
> (0012,0050) Clinical Trial Time Point ID LO 1
> (0012,0051) Clinical Trial Time Point Description ST 1
> (0012,0060) Clinical Trial Coordinating Center Name LO 1
> (0018,0010) Contrast/Bolus Agent LO 1
> (0018,0012) Contrast/Bolus Agent Sequence SQ 1
> (0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
> (0018,0015) Body Part Examined CS 1
> 
> What I essentially want is to use python to process this file to give 
> me
> 
> 
> (0012,0042); Clinical Trial Subject Reading ID; LO; 1 (0012,0050); 
> Clinical Trial Time Point ID; LO; 1 (0012,0051); Clinical Trial Time 
> Point Description; ST; 1 (0012,0060); Clinical Trial Coordinating 
> Center Name; LO; 1 (0018,0010); Contrast/Bolus Agent; LO; 1 
> (0018,0012); Contrast/Bolus Agent Sequence; SQ ;1 (0018,0014); 
> Contrast/Bolus Administration Route Sequence; SQ; 1 (0018,0015); Body 
> Part Examined; CS; 1
> 
> so that I can import to excel using a delimiter.
> 
> This file is extremely long and all I essentially want to do is to 
> break it into it 'fields'
> 
> Now I suspect that regular expressions are the way to go but I have 
> only basic experience of using these and I have no idea what I should
be doing.

This seems to work:

data = '''\
(0012,0042) Clinical Trial Subject Reading ID LO 1
(0012,0050) Clinical Trial Time Point ID LO 1
(0012,0051) Clinical Trial Time Point Description ST 1
(0012,0060) Clinical Trial Coordinating Center Name LO 1
(0018,0010) Contrast/Bolus Agent LO 1
(0018,0012) Contrast/Bolus Agent Sequence SQ 1
(0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
(0018,0015) Body Part Examined CS 1'''.splitlines()

import re
fieldsRe = re.compile(r'^(\(\d+,\d+\)) (.*?) (\w+) (\d+)$')

for line in data:
    match = fieldsRe.match(line)
    if match:
        print ';'.join(match.group(1, 2, 3, 4))

I don't think you want the space after the ; that you put in your
example; Excel wants a single-character delimiter.

Kent

DISCLAIMER:
Unless indicated otherwise, the information contained in this message is privileged and confidential, and is intended only for the use of the addressee(s) named above and others who have been specifically authorized to receive it. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message and/or attachments is strictly prohibited. The company accepts no liability for any damage caused by any virus transmitted by this email. Furthermore, the company does not warrant a proper and complete transmission of this information, nor does it accept liability for any delays. If you have received this message in error, please contact the sender and delete the message. Thank you.