Mailman 3 [lxml-dev] Looking for performance tips for soupparser - lxml - The Python XML Toolkit

Dec. 31, 2009

      Hi, 
  first of all, I have to say that I really like soupparser.  Thanks a
lot for it.  I use it a lot data mining on a somewhat large document
collection that I often revisit to try new ideas.  Soupparser is fast but
I put a lot of strain on it so I was looking for ways to speed things
up.  My first idea was to use beaker to cache the root Element object
of every document to disk.  Unfortunately, Element instances are not
pickleable so I have to look for something else. 

Would any of you have some tips to share on speeding things up with
soupparser?  How hard would it be to make elements conform to the
pickling protocol?

-- 
Yannick Gingras
http://ygingras.net
http://confoo.ca -- track coordinator
http://montrealpython.org -- lead organizer

[lxml-dev] Looking for performance tips for soupparser

Yannick Gingras

Stefan Behnel

Yannick Gingras

Stefan Behnel

Yannick Gingras

tags

participants (2)