[Tutor] Remove some known text around text I need to keep
Robert Alexander
bob at ralexander.it
Wed Jan 29 11:37:23 EST 2020
Dear friends,
I have 200 text files in which a doctor has “tagged” parts of the text as symptom, sign or therapy (in Italian SINTOMO, SEGNO, TERAPIA) as follows:
[SEGNO: edemi declivi >> a sinistra]
[SINTOMO: non lamenta dispnea]
[SINTOMO: paziente sintomatica per dispnea moderata]
[SEGNO: Non edemi]
[SEGNO: calo di 2 kg (55,8 kg)]
[TERAPIA: ha ridotto la terapia diuretica]
[TERAPIA: Lieve riduzione della terapia diuretica]
[TERAPIA: Lieve riduzione della terapia diuretica]
and so forth. These lines can span over a single text line and I need to remove just the tags and rewrite the original text without them. To give you an example on the same lines as above what I need to write back in the new “clean” files are like the following:
edemi declivi >> a sinistra]
non lamenta dispnea]
paziente sintomatica per dispnea moderata]
Non edemi]
calo di 2 kg (55,8 kg)]
ha ridotto la terapia diuretica]
Lieve riduzione della terapia diuretica]
Lieve riduzione della terapia diuretica]
if there was a newline in the text I should respect it and rewrite it out.
I am not 100% sure about thehandling of the space between the tags (eg [SEGNO:) and the first word within it.
What would you recommend? Regular expressions? Have a tutorial or document I can study?
Thank you very much.
More information about the Tutor
mailing list