regular expression, help

Vincent Davis vincent at vincentdavis.net
Tue Jan 27 18:45:29 CET 2009


is BeautifulSoup really better? Since I don't know either I would prefer to
learn only one for now.
Thanks
Vincent Davis



On Tue, Jan 27, 2009 at 10:39 AM, MRAB <google at mrabarnett.plus.com> wrote:

> Vincent Davis wrote:
>
>> I think there are two parts to this question and I am sure lots I am
>> missing. I am hoping an example will help me
>> I have a html doc that I am trying to use regular expressions to get a
>> value out of.
>> here is an example or the line
>> <td colspan='2'>Parcel ID: 39-034-15-009 </td>
>> I want to get the number "39-034-15-009" after "Parcel ID:" The number
>> will be different each time but always the same format.
>> I think I can match "Parcel ID:" but not sure how to get the number after.
>> "Parcel ID:" only occurs once in the document.
>>
>> is this how i need to start?
>> pid = re.compile('Parcel ID: ')
>>
>> Basically I am completely lost and am not finding examples I find helpful.
>>
>> I am getting the html using myurl=urllib.urlopen(). Can I use RE like this
>> thenum=pid.match(myurl)
>>
>> I think the two key things I need to know are
>> 1, how do I get the text after a match?
>> 2, when I use myurl=urllib.urlopen(http://.......). can I use the myurl
>> as the string in a RE, thenum=pid.match(myurl)
>>
>>  Something like:
>
> pid = re.compile(r'Parcel ID: (\d+(?:-\d+)*)')
> myurl = urllib.urlopen(url)
> text = myurl.read()
> myurl.close()
> thenum = pid.search(text).group(1)
>
> Although BeautifulSoup is the preferred solution.
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090127/f9cbd5aa/attachment.html>


More information about the Python-list mailing list