Question concerning this list

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Sun Dec 31 05:30:05 EST 2006


In <mailman.2166.1167535289.32031.python-list at python.org>, Thomas Ploch
wrote:

> Alright, my prof said '... to process documents written in structural
> markup languages using regular expressions is a no-no.' (Because of
> nested Elements? Can't remember) So I think he wants us to use regexes
> to learn them. He is pointing to HTMLParser though.

Problem is that much of the HTML in the wild is written in a structured
markup language but it's in many cases broken.  If you just search some
words or patterns that appear somewhere in the documents then regular
expressions are good enough.  If you want to actually *parse* HTML "from
the wild" better use the BeautifulSoup_ parser.

.. _BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/

> You are probably right. For me it boils down to these problems:
> - Implementing a stack for large queues of documents which is faster
> than list.pop(index) (Is there a lib for this?)

If you need a queue then use one:  take a look at `collections.deque` or
the `Queue` module in the standard library.

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list