[Tutor] simple regex question
Peter Otten
__peter__ at web.de
Sun May 1 13:46:46 EDT 2016
bruce wrote:
> Hi. I have a chunk of text code, which has multiple lines.
>
> I'd like to do a regex, find a pattern, and in the line that matches the
> pattern, mod the line. Sounds simple.
>
> I've created a test regex. However, after spending time/google.. can't
> quite figure out how to then get the "complete" line containing the
> returned regex/pattern.
>
> Pretty sure this is simple, and i'm just missing something.
>
> my test "text" and regex are:
>
>
> s='''
> <td valign="top" colspan="1"><b><a href="#"
> id='CourseId10795788|ACCT2081|002_005_006' style="font-weight:bold;"
> onclick='ShowSeats(this);return false;' alt="Click for Class Availability"
> title="Click for Class Availability">ACCT2081</a></b></td>'''
>
>
> pattern = re.compile(r'Course\S+|\S+\|')
> aa= pattern.search(s).group()
> print "sss"
> print aa
>
> so, once I get the group, I'd like to use the returned match to then get
> the complete line..
>
> pointers/thoughts!! (no laughing!!)
Are you sure you are processing text rather than structured data? HTML
doesn't have the notion of a "line". To extract information from HTML tools
like Beautiful Soup are better suited than regular expressions:
import bs4
import re
s = ...
soup = bs4.BeautifulSoup(s)
for a in soup.find_all("a", id=re.compile(r"Course\S+\|\S+\|")):
print a["id"]
print a.text
print a.parent.parent["colspan"]
More information about the Tutor
mailing list