How to find <tag> to </tag> HTML strings and 'save' them?
mark at agtechnical.co.uk
Mon Mar 26 02:23:54 CEST 2007
Yep, I agree! once I've got this done I'll be back to trawling the
Life never gives you the convenience of learning something fully
before having to apply what you have learnt ;]
Thanks for the feedback and links, I'll be sure to check those out.
On Mar 26, 12:05 am, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
> En Sun, 25 Mar 2007 19:44:17 -0300, <m... at agtechnical.co.uk> escribió:
> > from BeautifulSoup import BeautifulSoup
> > import re
> > page = open("soup_test/tomatoandcream.html", 'r')
> > soup = BeautifulSoup(page)
> > myTagSearch = str(soup.findAll('h2'))
> > myFile = open('Soup_Results.html', 'w')
> > myFile.write(myTagSearch)
> > myFile.close()
> > del myTagSearch
> > ...............................
> > Firstly, I'm getting the following character: "[" at the start, "]" at
> > the end of the code. Along with "," in between each tag line listing.
> > This seems like normal behaviour but I can't find the way to strip
> > them out.
> findAll() returns a list. You convert the list to its string
> representation, using str(...), and that's the way lists look like: with
>  around, and commas separating elements. If you don't like that, don't
> use str(some_list).
> Do you like an item by line? Use "\n".join(myTagSearch) (remember to strip
> the str() around findAll)
> Do you like comma separated items? Use ",".join(myTagSearch)
> Read about lists herehttp://docs.python.org/lib/typesseq.htmland strings
> For the remaining questions, I strongly suggest reading the Python
> Tutorial (or any other book like Dive into Python). You should grasp some
> basic knowledge of the language at least, before trying to use other tools
> like BeautifulSoup; it's too much for a single step.
> Gabriel Genellina- Hide quoted text -
> - Show quoted text -
More information about the Python-list