finditer

gintare g.statkute at gmail.com
Mon Jul 7 09:19:05 CEST 2014


If smbd has time, maybe you could advice how to accomplish this task in faster way.

I have a text = """ word{vb}
wordtransl {vb}

sent1.

sent1trans.

sent2

sent2trans... """

I need to match once wordtransl, and than many times repeating patterns consisting of sent and senttrans.




The way i achieved this goal is for sure not most efficient one:
sw=word # i know the word
stry='\s*'+sw+'\s*.*\{vb\}\n+'
stry=stry+'(?P<Wtrans>.*)\{vb\}\n+'
stryc=re.compile(stry, re.UNICODE)
LtryM=re.search(stryc, linef) #here i find wordtrans

part=re.split(stryc,linef) #here i split search text to obtain part with repeating sent and senttrans

stry2='(?:'
stry2=stry2+'\s*'+sw+'\s*.*\{vb\}\n+'
stry2=stry2+'(?P<Wtrans>.*)\{vb\}\n+'
stry2=stry2+')*'
stry2=stry2+'('
stry2=stry2+'(?P<SVsent>.*)\n+'
stry2=stry2+'(?P<SVtrans>.*)\n+'
stry2=stry2+')'
stryc2=re.compile(stry2, re.UNICODE)
LtryM=re.finditer(stryc2, part[2]) #here i find text pieces consisting sent and sentrans 
for item in LtryM:
	stry3=''
	stry3=stry3+'(?P<SVsent>.*)\n+'
	stry3=stry3+'(?P<SVtrans>.*)\n+'
	stryc3=re.compile(stry3, re.UNICODE)
	LtryM3=re.search(stryc3, item.group()) #here i find sent and senttrans
	print(LtryM3.groupdict())



More information about the Python-list mailing list