html5lib not thread safe. Is the Python SAX library thread-safe?

John Nagle nagle at
Sun Mar 11 21:30:58 CET 2012

    "html5lib" is apparently not thread safe.
(see "")
Looking at the code, I've only found about three problems.
They're all the usual "cached in a global without locking" bug.
A few locks would fix that.

    But html5lib calls the XML SAX parser. Is that thread-safe?
Or is there more trouble down at the bottom?

(I run a multi-threaded web crawler, and currently use BeautifulSoup,
which is thread safe, although dated.  I'm looking at converting to

				John Nagle

More information about the Python-list mailing list