html5lib not thread safe. Is the Python SAX library thread-safe?
Cameron Simpson
cs at zip.com.au
Sun Mar 11 17:45:01 EDT 2012
On 11Mar2012 13:30, John Nagle <nagle at animats.com> wrote:
| "html5lib" is apparently not thread safe.
| (see "http://code.google.com/p/html5lib/issues/detail?id=189")
| Looking at the code, I've only found about three problems.
| They're all the usual "cached in a global without locking" bug.
| A few locks would fix that.
|
| But html5lib calls the XML SAX parser. Is that thread-safe?
| Or is there more trouble down at the bottom?
|
| (I run a multi-threaded web crawler, and currently use BeautifulSoup,
| which is thread safe, although dated. I'm looking at converting to
| html5lib.)
IIRC, BeautifulSoup4 may do that for you:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#you-need-a-parser
"Beautiful Soup 4 uses html.parser by default, but you can plug in
lxml or html5lib and use that instead."
Just for interest, re locking, I wrote a little decorator the other day,
thus:
@locked_property
def foo(self):
compute foo here ...
return foo value
and am rolling its use out amongst my classes. Code:
def locked_property(func, lock_name='_lock', prop_name=None, unset_object=None):
''' A property whose access is controlled by a lock if unset.
'''
if prop_name is None:
prop_name = '_' + func.func_name
def getprop(self):
''' Attempt lockless fetch of property first.
Use lock if property is unset.
'''
p = getattr(self, prop_name)
if p is unset_object:
with getattr(self, lock_name):
p = getattr(self, prop_name)
if p is unset_object:
p = func(self)
setattr(self, prop_name, p)
return p
return property(getprop)
It tries to be lockless in the common case. I suspect it is only safe in
CPython where there is a GIL. If raw python assignments and fetches can
overlap (eg Jypthon I think?) I probably need shared "read" lock around
the first "p = getattr(self, prop_name). Any remarks?
Cheers,
--
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/
Ed Campbell's <ed at Tekelex.Com> pointers for long trips:
1. lay out the bare minimum of stuff that you need to take with you, then
put at least half of it back.
More information about the Python-list
mailing list