regular expression, help
vincent at vincentdavis.net
Tue Jan 27 18:45:29 CET 2009
is BeautifulSoup really better? Since I don't know either I would prefer to
learn only one for now.
On Tue, Jan 27, 2009 at 10:39 AM, MRAB <google at mrabarnett.plus.com> wrote:
> Vincent Davis wrote:
>> I think there are two parts to this question and I am sure lots I am
>> missing. I am hoping an example will help me
>> I have a html doc that I am trying to use regular expressions to get a
>> value out of.
>> here is an example or the line
>> <td colspan='2'>Parcel ID: 39-034-15-009 </td>
>> I want to get the number "39-034-15-009" after "Parcel ID:" The number
>> will be different each time but always the same format.
>> I think I can match "Parcel ID:" but not sure how to get the number after.
>> "Parcel ID:" only occurs once in the document.
>> is this how i need to start?
>> pid = re.compile('Parcel ID: ')
>> Basically I am completely lost and am not finding examples I find helpful.
>> I am getting the html using myurl=urllib.urlopen(). Can I use RE like this
>> I think the two key things I need to know are
>> 1, how do I get the text after a match?
>> 2, when I use myurl=urllib.urlopen(http://.......). can I use the myurl
>> as the string in a RE, thenum=pid.match(myurl)
>> Something like:
> pid = re.compile(r'Parcel ID: (\d+(?:-\d+)*)')
> myurl = urllib.urlopen(url)
> text = myurl.read()
> thenum = pid.search(text).group(1)
> Although BeautifulSoup is the preferred solution.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-list