SimplePrograms challenge
Steven Bethard
steven.bethard at gmail.com
Wed Jun 13 16:07:35 EDT 2007
Rob Wolfe wrote:
> Steven Bethard <steven.bethard at gmail.com> writes:
>
>>> I vote for example with ElementTree (without xpath)
>>> with a mention of using ElementSoup for invalid HTML.
>> Sounds good to me. Maybe something like::
>>
>> import xml.etree.ElementTree as etree
>> dinner_recipe = '''
>> <ingredients>
>> <ing><amt><qty>24</qty><unit>slices</unit></amt><item>baguette</item></ing>
>> <ing><amt><qty>2+</qty><unit>tbsp</unit></amt><item>olive_oil</item></ing>
> ^^^^^^^^^
> Is that a typo here?
Just trying to make Thunderbird line-wrap correctly. ;-) It's better
with a space instead of an underscore.
>> <ing><amt><qty>1</qty><unit>cup</unit></amt><item>tomatoes</item></ing>
>> <ing><amt><qty>1-2</qty><unit>tbsp</unit></amt><item>garlic</item></ing>
>> <ing><amt><qty>1/2</qty><unit>cup</unit></amt><item>Parmesan</item></ing>
>> <ing><amt><qty>1</qty><unit>jar</unit></amt><item>pesto</item></ing>
>> </ingredients>'''
>> pantry = set(['olive oil', 'pesto'])
>> tree = etree.fromstring(dinner_recipe)
>> for item_elem in tree.getiterator('item'):
>> if item_elem.text not in pantry:
>> print item_elem.text
>
> That's nice example. :)
>
>> Though I wouldn't know where to put the ElementSoup link in this one...
>
> I had a regular HTML in mind, something like:
>
> <code>
> # HTML page
> dinner_recipe = '''
> <html><head><title>Recipe</title></head><body>
> <table>
> <tr><th>amt</th><th>unit</th><th>item</th></tr>
> <tr><td>24</td><td>slices</td><td>baguette</td></tr>
> <tr><td>2+</td><td>tbsp</td><td>olive_oil</td></tr>
> <tr><td>1</td><td>cup</td><td>tomatoes</td></tr>
> <tr><td>1-2</td><td>tbsp</td><td>garlic</td></tr>
> <tr><td>1/2</td><td>cup</td><td>Parmesan</td></tr>
> <tr><td>1</td><td>jar</td><td>pesto</td></tr>
> </table>
> </body></html>'''
>
> # program
> import xml.etree.ElementTree as etree
> tree = etree.fromstring(dinner_recipe)
>
> #import ElementSoup as etree # for invalid HTML
> #from cStringIO import StringIO # use this
> #tree = etree.parse(StringIO(dinner_recipe)) # wrapper for BeautifulSoup
>
> pantry = set(['olive oil', 'pesto'])
>
> for ingredient in tree.getiterator('tr'):
> amt, unit, item = ingredient.getchildren()
> if item.tag == "td" and item.text not in pantry:
> print "%s: %s %s" % (item.text, amt.text, unit.text)
> </code>
>
> But if that's too complicated I will not insist on this. :)
> Your example is good enough.
Sure, that looks fine to me. =)
Steve
More information about the Python-list
mailing list