[Tutor] MemoryError

Kent Johnson kent37 at tds.net
Fri Dec 10 01:38:12 CET 2004


Liam,

Here's a nifty re trick for you. The sub() method can take a function as the replacement parameter. 
Instead of replacing with a fixed string, the function is called with the match object. Whatever 
string the function returns, is substituted for the match. So you can simplify your code a bit, 
something like this:

def replaceTag(item):	# item is a match object
     # This is exactly your code
     text=gettextFunc(item.group()) #Will try and stick to string method
  for this, but I'll see.
     if not text:
        text="Default" #Will give a text value for the href, so some
  lucky human can change it
     url=geturlFunc(item.group()) # The simpler the better, and so far
  re has been the simplest
     if not url:
       href = '"" #This will delete the applet, as there are applet's
  acting as placeholders
     else:
       href='<a "%s">%s</a>' % (url, text)

     # Now return href
     return href

now your loop and replacements get replaced by the single line
codeSt = reObj.sub(replaceTag, codeSt)

:-)

Kent


Liam Clarke wrote:
> Hi all, 
> 
> Yeah, I should've written this in functions from the get go, but I
> thought it would be a simple script. :/
> 
> I'll come back to that script when I've had some sleep, my son was
> recently born and it's amazing how dramatically lack of sleep affects
> my acuity. But, I want to figure out what's going wrong.
> 
> That said, the re path is bearing fruit. I love the method finditer(),
>  as I can reduce my overly complicated string methods from my original
> code to
> 
> x=file("toolkit.txt",'r')
> s=x.read() 
> x.close()
> appList=[]
> 
> regExIter=reObj.finditer(s) #Here's a re obj I compiled earlier. 
> 
> for item in regExIter:
>    text=gettextFunc(item.group()) #Will try and stick to string method
> for this, but I'll see.
>    if not text:
>       text="Default" #Will give a text value for the href, so some
> lucky human can change it
>    url=geturlFunc(item.group()) # The simpler the better, and so far
> re has been the simplest
>    if not url:
>      href = '"" #This will delete the applet, as there are applet's
> acting as placeholders
>    else:
>      href='<a "%s">%s</a>' % (url, text)
> 
>    appList.append(item.span(), href)
> 
> appList.reverse()
> 
> for ((start, end), href) in appList:
> 
>      codeSt=codeSt.replace(codeSt[start:end], href)
> 
> 
> Of course, that's just a rought draft, but it seems a whole lot
> simpler to me. S'pose code needs a modicum of planning.
> 
> Oh, and I d/led BeautifulSoup, but I couldn't work it right, so I
> tried re, and it suits my needs.
> 
> Thanks for all the help.
> 
> Regards,
> 
> Liam Clarke
> On Thu, 09 Dec 2004 11:53:46 -0800, Jeff Shannon <jeff at ccvcorp.com> wrote:
> 
>>Liam Clarke wrote:
>>
>>
>>>So, I'm going to throw caution to the wind, and try an re approach. It
>>>can't be any more unwieldy and ugly than what I've got going at the
>>>moment.
>>
>>If you're going to try a new approach, I'd strongly suggest using a
>>proper html/xml parser instead of re's.  You'll almost certainly have
>>an easier time using a tool that's designed for your specific problem
>>domain than you will trying to force a more general tool to work.
>>Since you're specifically trying to find (and replace) certain html
>>tags and attributes, and that's exactly what html parsers *do*, well,
>>the conclusions seems obvious (to me at least). ;)
>>
>>There are lots of html parsing tools available in Python (though I've
>>never needed one myself). I've heard lots of good things about
>>BeautifulSoup...
>>
>>
>>
>>Jeff Shannon
>>Technician/Programmer
>>Credit International
>>
>>_______________________________________________
>>Tutor maillist  -  Tutor at python.org
>>http://mail.python.org/mailman/listinfo/tutor
>>
> 
> 
> 


More information about the Tutor mailing list