[Tutor] Regular Expression help

Wed Jun 27 14:26:17 CEST 2007

I think I have a solution.

File
############################
(0012,0042) Clinical Trial Subject Reading ID LO 1
(0012,0050) Clinical Trial Time Point ID LO 1
(0012,0051) Clinical Trial Time Point Description ST 1
(0012,0060) Clinical Trial Coordinating Center Name LO 1
(0018,0010) Contrast/Bolus Agent LO 1
(0018,0012) Contrast/Bolus Agent Sequence SQ 1
(0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
(0018,0015) Body Part Examined CS 1

Script
#############################
#!/usr/bin/python

import re

#matchstr regex flow
# (\(\d+,\d+\))     # (0018,0014)
# \s                   # [space]
# (..*)                # Contrast/Bolus Administration Route Sequence
# \s                   # space
# ([a-z]{2})         # SQ - two letters and no more
# \s                  # [space]
# (\d)                # 1 - single digit
# re.I)               # case insensitive

matchstr = re.compile(r"(\(\d+,\d+\))\s(..*)\s([a-z]{2})\s(\d)",re.I)
myfile = open('/tmp/file','r')

for line in myfile.readlines():
        regex_match = matchstr.match(line)
        if regex_match:
                print regex_match.group(1) + ";" + regex_match.group(2) +
";" + regex_match.group(3) + ";" + regex_match.group(4)

Output
#####################
(0012,0042);Clinical Trial Subject Reading ID;LO;1
(0012,0050);Clinical Trial Time Point ID;LO;1
(0012,0051);Clinical Trial Time Point Description;ST;1
(0012,0060);Clinical Trial Coordinating Center Name;LO;1
(0018,0010);Contrast/Bolus Agent;LO;1
(0018,0012);Contrast/Bolus Agent Sequence;SQ;1
(0018,0014);Contrast/Bolus Administration Route Sequence;SQ;1
(0018,0015);Body Part Examined;CS;1

On 6/27/07, Gardner, Dean <Dean.Gardner at barco.com> wrote:
>
>  Hi
>
> I have a text file that I would like to split up so that I can use it in
> Excel to filter a certain field. However as it is a flat text file I need to
> do some processing on it so that Excel can correctly import it.
>
> File Example:
> tag             desc                    VR      VM
> (0012,0042) Clinical Trial Subject Reading ID LO 1
> (0012,0050) Clinical Trial Time Point ID LO 1
> (0012,0051) Clinical Trial Time Point Description ST 1
> (0012,0060) Clinical Trial Coordinating Center Name LO 1
> (0018,0010) Contrast/Bolus Agent LO 1
> (0018,0012) Contrast/Bolus Agent Sequence SQ 1
> (0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
> (0018,0015) Body Part Examined CS 1
>
> What I essentially want is to use python to process this file to give me
>
> (0012,0042); Clinical Trial Subject Reading ID; LO; 1
> (0012,0050); Clinical Trial Time Point ID; LO; 1
> (0012,0051); Clinical Trial Time Point Description; ST; 1
> (0012,0060); Clinical Trial Coordinating Center Name; LO; 1
> (0018,0010); Contrast/Bolus Agent; LO; 1
> (0018,0012); Contrast/Bolus Agent Sequence; SQ ;1
> (0018,0014); Contrast/Bolus Administration Route Sequence; SQ; 1
> (0018,0015); Body Part Examined; CS; 1
>
> so that I can import to excel using a delimiter.
>
> This file is extremely long and all I essentially want to do is to break
> it into it 'fields'
>
> Now I suspect that regular expressions are the way to go but I have only
> basic experience of using these and I have no idea what I should be doing.
>
> Can anyone help.
>
> Thanks
>
> DISCLAIMER:
> Unless indicated otherwise, the information contained in this message is
> privileged and confidential, and is intended only for the use of the
> addressee(s) named above and others who have been specifically authorized to
> receive it. If you are not the intended recipient, you are hereby notified
> that any dissemination, distribution or copying of this message and/or
> attachments is strictly prohibited. The company accepts no liability for any
> damage caused by any virus transmitted by this email. Furthermore, the
> company does not warrant a proper and complete transmission of this
> information, nor does it accept liability for any delays. If you have
> received this message in error, please contact the sender and delete the
> message. Thank you.
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070627/43800ea0/attachment.htm