Program inefficiency?
thebjorn
BjornSteinarFjeldPettersen at gmail.com
Sat Sep 29 15:05:26 EDT 2007
On Sep 29, 8:32 pm, hall.j... at gmail.com wrote:
> It think he's saying it should look like this:
>
> # File: masseditor.py
>
> import re
> import os
> import time
>
> p1= re.compile('(href=|HREF=)+(.*)(#)+(.*)(\w\'\?-<:)+(.*)(">)+')
> p2= re.compile('(name=")+(.*)(\w\'\?-<:)+(.*)(">)+')
> p100= re.compile('(a name=)+(.*)(-)+(.*)(></a>)+')
> q1= r"\1\2\3\4_\6\7"
> q2= r"\1\2_\4\5"
>
> def massreplace():
> editfile = open("C:\Program Files\Credit Risk Management\Masseditor
> \editfile.txt")
> filestring = editfile.read()
> filelist = filestring.splitlines()
>
> for i in range(len(filelist)):
> source = open(filelist[i])
> starttext = source.read()
>
> for i in range (13):
> interimtext = p1.sub(q1, starttext)
> interimtext= p2.sub(q2, interimtext)
> interimtext= p100.sub(q2, interimtext)
> source.close()
> source = open(filelist[i],"w")
> source.write(finaltext)
> source.close()
>
> massreplace()
>
> I'll try that and see how it works...
Ok, if you want a single RE... How about:
test = '''
<a href="Web_Sites.htm#A Web Sites">
<a name="A Web Sites"></a>
<a
href="Web_Sites.htm#A Web Sites">
<a
name="A Web Sites"></a>
<a HREF="Web_Sites.htm#A Web Sites">
<a name=Quoteless></a>
<a name = "oo ps"></a>
'''
import re
r = re.compile(r'''
(?:href=['"][^#]+[#]([^"']+)["'])
| (?:name=['"]?([^'">]+))
''', re.IGNORECASE | re.MULTILINE | re.DOTALL | re.VERBOSE)
def zap_space(m):
return m.group(0).replace(' ', '_')
print r.sub(zap_space, test)
It prints out
<a href="Web_Sites.htm#A_Web_Sites">
<a name="A_Web_Sites"></a>
<a
href="Web_Sites.htm#A_Web_Sites">
<a
name="A_Web_Sites"></a>
<a HREF="Web_Sites.htm#A_Web_____________________________Sites">
<a name=Quoteless></a>
<a name = "oo ps"></a>
-- bjorn
More information about the Python-list
mailing list