
On Aug 22, 2004, at 3:28 PM, angryhicKclown@netscape.net wrote:
I was looking over the page on twistedmatrix.com on contributing, and it referred me to here. Over at the mono project, they have a todo-list sort of thing, that idle hackers such as myself can work on. I was wondering what the best way (besides monetary...I am a poor student) to contribute to the Twisted project is?
Welllll, since you ask.. :) Here's a relatively self-contained project that could use working on: twisted.web.microdom and twisted.web.sux is supposed to implement an XML/XHTML and HTML parser. It is pretty useless as an XML parser, given its relative slowness and the existence of expat/python xml libraries which do already do a very good job of being an XML parser. Microdom is *almost* a useful HTML parser, but it's missing support for a lot of HTML peculiarities that really need to be handled ("<tr><td>foo<tr><td>bar" for one, strange whitespace collapsing rules, for another, and I'm sure there's more). Perl has a very good HTML parser in HTML::TreeParser whose algorithms could be duplicated. This project isn't even very twisted specific (sux/microdom only have very minor dependancies on the rest of twisted) so it could conceivably be made into a general purpose python module in its own right. There are a variety of other Python HTML parsers, but from what I can tell, they're even worse than microdom is. It'd be way cool to have a python HTML parser that actually works. Can't let perl win! Any victi...volunteers? ;0 James