[Tutor] Regular Expression help
Tom Tucker
tktucker at gmail.com
Wed Jun 27 14:26:17 CEST 2007
I think I have a solution.
File
############################
(0012,0042) Clinical Trial Subject Reading ID LO 1
(0012,0050) Clinical Trial Time Point ID LO 1
(0012,0051) Clinical Trial Time Point Description ST 1
(0012,0060) Clinical Trial Coordinating Center Name LO 1
(0018,0010) Contrast/Bolus Agent LO 1
(0018,0012) Contrast/Bolus Agent Sequence SQ 1
(0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
(0018,0015) Body Part Examined CS 1
Script
#############################
#!/usr/bin/python
import re
#matchstr regex flow
# (\(\d+,\d+\)) # (0018,0014)
# \s # [space]
# (..*) # Contrast/Bolus Administration Route Sequence
# \s # space
# ([a-z]{2}) # SQ - two letters and no more
# \s # [space]
# (\d) # 1 - single digit
# re.I) # case insensitive
matchstr = re.compile(r"(\(\d+,\d+\))\s(..*)\s([a-z]{2})\s(\d)",re.I)
myfile = open('/tmp/file','r')
for line in myfile.readlines():
regex_match = matchstr.match(line)
if regex_match:
print regex_match.group(1) + ";" + regex_match.group(2) +
";" + regex_match.group(3) + ";" + regex_match.group(4)
Output
#####################
(0012,0042);Clinical Trial Subject Reading ID;LO;1
(0012,0050);Clinical Trial Time Point ID;LO;1
(0012,0051);Clinical Trial Time Point Description;ST;1
(0012,0060);Clinical Trial Coordinating Center Name;LO;1
(0018,0010);Contrast/Bolus Agent;LO;1
(0018,0012);Contrast/Bolus Agent Sequence;SQ;1
(0018,0014);Contrast/Bolus Administration Route Sequence;SQ;1
(0018,0015);Body Part Examined;CS;1
On 6/27/07, Gardner, Dean <Dean.Gardner at barco.com> wrote:
>
> Hi
>
> I have a text file that I would like to split up so that I can use it in
> Excel to filter a certain field. However as it is a flat text file I need to
> do some processing on it so that Excel can correctly import it.
>
> File Example:
> tag desc VR VM
> (0012,0042) Clinical Trial Subject Reading ID LO 1
> (0012,0050) Clinical Trial Time Point ID LO 1
> (0012,0051) Clinical Trial Time Point Description ST 1
> (0012,0060) Clinical Trial Coordinating Center Name LO 1
> (0018,0010) Contrast/Bolus Agent LO 1
> (0018,0012) Contrast/Bolus Agent Sequence SQ 1
> (0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
> (0018,0015) Body Part Examined CS 1
>
> What I essentially want is to use python to process this file to give me
>
> (0012,0042); Clinical Trial Subject Reading ID; LO; 1
> (0012,0050); Clinical Trial Time Point ID; LO; 1
> (0012,0051); Clinical Trial Time Point Description; ST; 1
> (0012,0060); Clinical Trial Coordinating Center Name; LO; 1
> (0018,0010); Contrast/Bolus Agent; LO; 1
> (0018,0012); Contrast/Bolus Agent Sequence; SQ ;1
> (0018,0014); Contrast/Bolus Administration Route Sequence; SQ; 1
> (0018,0015); Body Part Examined; CS; 1
>
> so that I can import to excel using a delimiter.
>
> This file is extremely long and all I essentially want to do is to break
> it into it 'fields'
>
> Now I suspect that regular expressions are the way to go but I have only
> basic experience of using these and I have no idea what I should be doing.
>
> Can anyone help.
>
> Thanks
>
> DISCLAIMER:
> Unless indicated otherwise, the information contained in this message is
> privileged and confidential, and is intended only for the use of the
> addressee(s) named above and others who have been specifically authorized to
> receive it. If you are not the intended recipient, you are hereby notified
> that any dissemination, distribution or copying of this message and/or
> attachments is strictly prohibited. The company accepts no liability for any
> damage caused by any virus transmitted by this email. Furthermore, the
> company does not warrant a proper and complete transmission of this
> information, nor does it accept liability for any delays. If you have
> received this message in error, please contact the sender and delete the
> message. Thank you.
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070627/43800ea0/attachment.htm
More information about the Tutor
mailing list