[Tutor] Titles from a web page
Michiel Overtoom
motoom at xs4all.nl
Thu May 5 07:27:11 CEST 2011
On May 5, 2011, at 07:16, James Mills wrote:
> On Thu, May 5, 2011 at 1:52 PM, Modulok <modulok at gmail.com> wrote:
>> You might look into the third party module, 'BeautifulSoup'. It's designed to
>> help you interrogate markup (even poor markup), extracting nuggets of data based
>> on various criteria.
>
> lxml is also work looking into which provides similar functionality.
For especially broken markup you might even consider version 3.07a of BeautifulSoup. The parser in later versions got slightly less forgiving.
Greetings,
--
"Control over the use of one's ideas really constitutes control over other people's lives; and it is usually used to make their lives more difficult." - Richard Stallman
More information about the Tutor
mailing list