[Tutor] simple regex question

Peter Otten __peter__ at web.de
Sun May 1 13:46:46 EDT 2016


bruce wrote:

> Hi. I have a chunk of text code, which has multiple lines.
> 
> I'd like to do a regex, find a pattern, and in the line that matches the
> pattern, mod the line. Sounds simple.
> 
> I've created a test regex. However, after spending time/google.. can't
> quite figure out how to then get the "complete" line containing the
> returned regex/pattern.
> 
> Pretty sure this is simple, and i'm just missing something.
> 
> my test "text" and regex are:
> 
> 
>   s='''
> <td valign="top" colspan="1"><b><a href="#"
> id='CourseId10795788|ACCT2081|002_005_006' style="font-weight:bold;"
> onclick='ShowSeats(this);return false;' alt="Click for Class Availability"
> title="Click for Class Availability">ACCT2081</a></b></td>'''
> 
> 
>   pattern = re.compile(r'Course\S+|\S+\|')
>   aa= pattern.search(s).group()
>   print "sss"
>   print aa
> 
> so, once I get the group, I'd like to use the returned match to then get
> the complete line..
> 
> pointers/thoughts!! (no laughing!!)

Are you sure you are processing text rather than structured data? HTML 
doesn't have the notion of a "line". To extract information from HTML tools 
like Beautiful Soup are better suited than regular expressions:

import bs4
import re
s = ...
soup = bs4.BeautifulSoup(s)
for a in soup.find_all("a", id=re.compile(r"Course\S+\|\S+\|")):
    print a["id"]
    print a.text
    print a.parent.parent["colspan"]




More information about the Tutor mailing list