[Tutor] MemoryError
Kent Johnson
kent37 at tds.net
Fri Dec 10 01:38:12 CET 2004
Liam,
Here's a nifty re trick for you. The sub() method can take a function as the replacement parameter.
Instead of replacing with a fixed string, the function is called with the match object. Whatever
string the function returns, is substituted for the match. So you can simplify your code a bit,
something like this:
def replaceTag(item): # item is a match object
# This is exactly your code
text=gettextFunc(item.group()) #Will try and stick to string method
for this, but I'll see.
if not text:
text="Default" #Will give a text value for the href, so some
lucky human can change it
url=geturlFunc(item.group()) # The simpler the better, and so far
re has been the simplest
if not url:
href = '"" #This will delete the applet, as there are applet's
acting as placeholders
else:
href='<a "%s">%s</a>' % (url, text)
# Now return href
return href
now your loop and replacements get replaced by the single line
codeSt = reObj.sub(replaceTag, codeSt)
:-)
Kent
Liam Clarke wrote:
> Hi all,
>
> Yeah, I should've written this in functions from the get go, but I
> thought it would be a simple script. :/
>
> I'll come back to that script when I've had some sleep, my son was
> recently born and it's amazing how dramatically lack of sleep affects
> my acuity. But, I want to figure out what's going wrong.
>
> That said, the re path is bearing fruit. I love the method finditer(),
> as I can reduce my overly complicated string methods from my original
> code to
>
> x=file("toolkit.txt",'r')
> s=x.read()
> x.close()
> appList=[]
>
> regExIter=reObj.finditer(s) #Here's a re obj I compiled earlier.
>
> for item in regExIter:
> text=gettextFunc(item.group()) #Will try and stick to string method
> for this, but I'll see.
> if not text:
> text="Default" #Will give a text value for the href, so some
> lucky human can change it
> url=geturlFunc(item.group()) # The simpler the better, and so far
> re has been the simplest
> if not url:
> href = '"" #This will delete the applet, as there are applet's
> acting as placeholders
> else:
> href='<a "%s">%s</a>' % (url, text)
>
> appList.append(item.span(), href)
>
> appList.reverse()
>
> for ((start, end), href) in appList:
>
> codeSt=codeSt.replace(codeSt[start:end], href)
>
>
> Of course, that's just a rought draft, but it seems a whole lot
> simpler to me. S'pose code needs a modicum of planning.
>
> Oh, and I d/led BeautifulSoup, but I couldn't work it right, so I
> tried re, and it suits my needs.
>
> Thanks for all the help.
>
> Regards,
>
> Liam Clarke
> On Thu, 09 Dec 2004 11:53:46 -0800, Jeff Shannon <jeff at ccvcorp.com> wrote:
>
>>Liam Clarke wrote:
>>
>>
>>>So, I'm going to throw caution to the wind, and try an re approach. It
>>>can't be any more unwieldy and ugly than what I've got going at the
>>>moment.
>>
>>If you're going to try a new approach, I'd strongly suggest using a
>>proper html/xml parser instead of re's. You'll almost certainly have
>>an easier time using a tool that's designed for your specific problem
>>domain than you will trying to force a more general tool to work.
>>Since you're specifically trying to find (and replace) certain html
>>tags and attributes, and that's exactly what html parsers *do*, well,
>>the conclusions seems obvious (to me at least). ;)
>>
>>There are lots of html parsing tools available in Python (though I've
>>never needed one myself). I've heard lots of good things about
>>BeautifulSoup...
>>
>>
>>
>>Jeff Shannon
>>Technician/Programmer
>>Credit International
>>
>>_______________________________________________
>>Tutor maillist - Tutor at python.org
>>http://mail.python.org/mailman/listinfo/tutor
>>
>
>
>
More information about the Tutor
mailing list