<div class="gmail_quote">On Wed, Jan 6, 2010 at 1:59 PM, Tim Chase <span dir="ltr"><<a href="mailto:python.list@tim.thechases.com">python.list@tim.thechases.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div><div></div><div class="h5">Victor Subervi wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

On Wed, Jan 6, 2010 at 1:27 PM, Tim Chase <<a href="mailto:python.list@tim.thechases.com" target="_blank">python.list@tim.thechases.com</a>>wrote:<br>

<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

But if you're using it on HTML form text, regexps are usually the wrong<br>

tool, and you should be using an HTML parser (such as BeautifulSoup) that<br>

knows how to handle odd text and escapings better and more robustly than<br>

regexps will<br>

</blockquote>

<br>

I have an automatically generated HTML form from which I need to extract<br>

data to the script which this form calls (to which the information is sent).<br>

I believe BeautifulSoup is geared to scraping pages that exist permanently<br>

on the web. By the time BeautifulSoup was called, this page would be gone.<br>

</blockquote>

<br></div></div>

BeautifulSoup takes string data fed to it, and builds a structure that can be neatly navigated.  That string data can come from a web page, from a disk, or even a serial port, a random-character-generator, or just from HTML that's built up in memory and never sees a network or a disk.  It's worth reading its documentation[1] and trying its examples to get familiar with it.<br>

</blockquote><div><br>k. Thanks.<br>beno<br>

</div></div>