[Tutor] Regular Expression help

Wed Jun 27 17:08:22 CEST 2007

On Jun 27, 2007, at 10:24 AM, Mike Hansen wrote:

>
>
>> -----Original Message-----
>> From: tutor-bounces at python.org
>> [mailto:tutor-bounces at python.org] On Behalf Of Gardner, Dean
>> Sent: Wednesday, June 27, 2007 3:59 AM
>> To: tutor at python.org
>> Subject: [Tutor] Regular Expression help
>>
>> Hi
>>
>> I have a text file that I would like to split up so that I
>> can use it in Excel to filter a certain field. However as it
>> is a flat text file I need to do some processing on it so
>> that Excel can correctly import it.
>>
>> File Example:
>> tag             desc                    VR      VM
>> (0012,0042) Clinical Trial Subject Reading ID LO 1
>> (0012,0050) Clinical Trial Time Point ID LO 1
>> (0012,0051) Clinical Trial Time Point Description ST 1
>> (0012,0060) Clinical Trial Coordinating Center Name LO 1
>> (0018,0010) Contrast/Bolus Agent LO 1
>> (0018,0012) Contrast/Bolus Agent Sequence SQ 1
>> (0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
>> (0018,0015) Body Part Examined CS 1
>>
>> What I essentially want is to use python to process this file
>> to give me
>>
>>
>> (0012,0042); Clinical Trial Subject Reading ID; LO; 1
>> (0012,0050); Clinical Trial Time Point ID; LO; 1
>> (0012,0051); Clinical Trial Time Point Description; ST; 1
>> (0012,0060); Clinical Trial Coordinating Center Name; LO; 1
>> (0018,0010); Contrast/Bolus Agent; LO; 1
>> (0018,0012); Contrast/Bolus Agent Sequence; SQ ;1
>> (0018,0014); Contrast/Bolus Administration Route Sequence; SQ; 1
>> (0018,0015); Body Part Examined; CS; 1
>>
>> so that I can import to excel using a delimiter.
>>
>> This file is extremely long and all I essentially want to do
>> is to break it into it 'fields'
>>
>> Now I suspect that regular expressions are the way to go but
>> I have only basic experience of using these and I have no
>> idea what I should be doing.
>>
>> Can anyone help.
>>
>> Thanks
>>
>
> Hmmmm... You might be able to do this without the need for regular
> expressions. You can split the row on spaces which will give you a  
> list.
> Then you can reconstruct the row inserting your delimiter as needed  
> and
> joining the rest with spaces again.
>
> In [63]: row = "(0012,0042) Clinical Trial Subject Reading ID LO 1"
>
> In [64]: row_items = row.split(' ')
>
> In [65]: row_items
> Out[65]: ['(0012,0042)', 'Clinical', 'Trial', 'Subject', 'Reading',
> 'ID', 'LO',
> '1']
>
> In [66]: tag = row_items.pop(0)
>
> In [67]: tag
> Out[67]: '(0012,0042)'
>
> In [68]: vm = row_items.pop()
>
> In [69]: vm
> Out[69]: '1'
>
> In [70]: vr = row_items.pop()
>
> In [71]: vr
> Out[71]: 'LO'
>
> In [72]: desc = ' '.join(row_items)
>
> In [73]: new_row = "%s; %s; %s; %s" %(tag, desc, vr, vm, )
>
> In [74]: new_row
> Out[74]: '(0012,0042); Clinical Trial Subject Reading ID; LO; 1'
>
> Someone might think of a better way with them thar fancy lambdas and
> list comprehensions thingys, but I think this will work.
>
>
I sent this to Dean this morning:

Dean,

I would do something like this (if your pattern is always the same.)

  foo =['(0012,0042) Clinical Trial Subject Reading ID LO 1 ',
  '(0012,0050) Clinical Trial Time Point ID LO 1 ',
  '(0012,0051) Clinical Trial Time Point Description ST 1 ',
  '(0012,0060) Clinical Trial Coordinating Center Name LO 1 ',
  '(0018,0010) Contrast/Bolus Agent LO 1 ',
  '(0018,0012) Contrast/Bolus Agent Sequence SQ 1 ',
  '(0018,0014) Contrast/Bolus Administration Route Sequence SQ 1 ',
  '(0018,0015) Body Part Examined CS 1',]

import csv
writer = csv.writer(open('/Users/reed/tmp/foo.csv', 'w'), delimiter=';')

for lin in foo:
     lin = lin.split()
     row = (lin[0], ' '.join(lin[1:-2]), lin[-2], lin[-1])
     writer.writerow(row)

more foo.csv
(0012,0042);Clinical Trial Subject Reading ID;LO;1
(0012,0050);Clinical Trial Time Point ID;LO;1
(0012,0051);Clinical Trial Time Point Description;ST;1
(0012,0060);Clinical Trial Coordinating Center Name;LO;1
(0018,0010);Contrast/Bolus Agent;LO;1
(0018,0012);Contrast/Bolus Agent Sequence;SQ;1
(0018,0014);Contrast/Bolus Administration Route Sequence;SQ;1
(0018,0015);Body Part Examined;CS;1

HTH,
~reed