[Twisted-Python] Contributing?
![](https://secure.gravatar.com/avatar/585f442a21e59b7eed84604abf45c5eb.jpg?s=120&d=mm&r=g)
I was looking over the page on twistedmatrix.com on contributing, and it referred me to here. Over at the mono project, they have a todo-list sort of thing, that idle hackers such as myself can work on. I was wondering what the best way (besides monetary...I am a poor student) to contribute to the Twisted project is? Thanks, Peter Hunt __________________________________________________________________ Switch to Netscape Internet Service. As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register Netscape. Just the Net You Need. New! Netscape Toolbar for Internet Explorer Search from anywhere on the Web and block those annoying pop-ups. Download now at http://channels.netscape.com/ns/search/install.jsp
![](https://secure.gravatar.com/avatar/7ed9784cbb1ba1ef75454034b3a8e6a1.jpg?s=120&d=mm&r=g)
angryhicKclown@netscape.net wrote:
I was looking over the page on twistedmatrix.com on contributing, and it referred me to here. Over at the mono project, they have a todo-list sort of thing, that idle hackers such as myself can work on. I was wondering what the best way (besides monetary...I am a poor student) to contribute to the Twisted project is?
Thanks,
In April, I gave a simple summary of the state of various protocol-level parts of Twisted: http://twistedmatrix.com/pipermail/twisted-python/2004-April/007641.html One task might be to turn this into a real todo list :) One could also look at the modules which scored particularly low and try to improve them. Jp
![](https://secure.gravatar.com/avatar/15fa47f2847592672210af8a25cd1f34.jpg?s=120&d=mm&r=g)
On Aug 22, 2004, at 3:28 PM, angryhicKclown@netscape.net wrote:
Welllll, since you ask.. :) Here's a relatively self-contained project that could use working on: twisted.web.microdom and twisted.web.sux is supposed to implement an XML/XHTML and HTML parser. It is pretty useless as an XML parser, given its relative slowness and the existence of expat/python xml libraries which do already do a very good job of being an XML parser. Microdom is *almost* a useful HTML parser, but it's missing support for a lot of HTML peculiarities that really need to be handled ("<tr><td>foo<tr><td>bar" for one, strange whitespace collapsing rules, for another, and I'm sure there's more). Perl has a very good HTML parser in HTML::TreeParser whose algorithms could be duplicated. This project isn't even very twisted specific (sux/microdom only have very minor dependancies on the rest of twisted) so it could conceivably be made into a general purpose python module in its own right. There are a variety of other Python HTML parsers, but from what I can tell, they're even worse than microdom is. It'd be way cool to have a python HTML parser that actually works. Can't let perl win! Any victi...volunteers? ;0 James
![](https://secure.gravatar.com/avatar/3b1704542e4ad7f5fb303b631be59d71.jpg?s=120&d=mm&r=g)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
People say nice things about Beautiful Soup: http://www.crummy.com/software/BeautifulSoup/ - -- Nicola Larosa - nico@tekNico.net "...it's easier to add documentation and support to Linux than security to Windows." -- Dan DeMaggio, CRYPTO-GRAM, June 2003 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) iD8DBQFBLijFXv0hgDImBm4RAo98AJ0fW1mx9pZNsEWggOS8vfXyGw/qGgCfRNhK 3Q6K4QVMLuQQsUsb6u3GcwM= =WCsn -----END PGP SIGNATURE-----
![](https://secure.gravatar.com/avatar/15fa47f2847592672210af8a25cd1f34.jpg?s=120&d=mm&r=g)
On Aug 26, 2004, at 2:15 PM, Nicola Larosa wrote:
Unfortunately, it's trying to solve a completely different problem. It is not to hoping to make a tree of the entire document, but rather, to do something like "give me all the hrefs on the page". As such, it doesn't even *try* to parse html properly, it just knows enough to be able to ignore the parts of the page you aren't asking for. Its intro says:
However, that is not entirely accurate, unless "well formed" doesn't mean "follows the HTML4 standard". It doesn't parse "<table><tr><td>foo<tr><td>bar</table>" correctly -- a perfectly valid bit of HTML4. Microdom's goal is to yield a well-formed data structure from a well-formed HTML document, and most ill-formed HTML documents too. James
![](https://secure.gravatar.com/avatar/7ed9784cbb1ba1ef75454034b3a8e6a1.jpg?s=120&d=mm&r=g)
angryhicKclown@netscape.net wrote:
I was looking over the page on twistedmatrix.com on contributing, and it referred me to here. Over at the mono project, they have a todo-list sort of thing, that idle hackers such as myself can work on. I was wondering what the best way (besides monetary...I am a poor student) to contribute to the Twisted project is?
Thanks,
In April, I gave a simple summary of the state of various protocol-level parts of Twisted: http://twistedmatrix.com/pipermail/twisted-python/2004-April/007641.html One task might be to turn this into a real todo list :) One could also look at the modules which scored particularly low and try to improve them. Jp
![](https://secure.gravatar.com/avatar/15fa47f2847592672210af8a25cd1f34.jpg?s=120&d=mm&r=g)
On Aug 22, 2004, at 3:28 PM, angryhicKclown@netscape.net wrote:
Welllll, since you ask.. :) Here's a relatively self-contained project that could use working on: twisted.web.microdom and twisted.web.sux is supposed to implement an XML/XHTML and HTML parser. It is pretty useless as an XML parser, given its relative slowness and the existence of expat/python xml libraries which do already do a very good job of being an XML parser. Microdom is *almost* a useful HTML parser, but it's missing support for a lot of HTML peculiarities that really need to be handled ("<tr><td>foo<tr><td>bar" for one, strange whitespace collapsing rules, for another, and I'm sure there's more). Perl has a very good HTML parser in HTML::TreeParser whose algorithms could be duplicated. This project isn't even very twisted specific (sux/microdom only have very minor dependancies on the rest of twisted) so it could conceivably be made into a general purpose python module in its own right. There are a variety of other Python HTML parsers, but from what I can tell, they're even worse than microdom is. It'd be way cool to have a python HTML parser that actually works. Can't let perl win! Any victi...volunteers? ;0 James
![](https://secure.gravatar.com/avatar/3b1704542e4ad7f5fb303b631be59d71.jpg?s=120&d=mm&r=g)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
People say nice things about Beautiful Soup: http://www.crummy.com/software/BeautifulSoup/ - -- Nicola Larosa - nico@tekNico.net "...it's easier to add documentation and support to Linux than security to Windows." -- Dan DeMaggio, CRYPTO-GRAM, June 2003 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) iD8DBQFBLijFXv0hgDImBm4RAo98AJ0fW1mx9pZNsEWggOS8vfXyGw/qGgCfRNhK 3Q6K4QVMLuQQsUsb6u3GcwM= =WCsn -----END PGP SIGNATURE-----
![](https://secure.gravatar.com/avatar/15fa47f2847592672210af8a25cd1f34.jpg?s=120&d=mm&r=g)
On Aug 26, 2004, at 2:15 PM, Nicola Larosa wrote:
Unfortunately, it's trying to solve a completely different problem. It is not to hoping to make a tree of the entire document, but rather, to do something like "give me all the hrefs on the page". As such, it doesn't even *try* to parse html properly, it just knows enough to be able to ignore the parts of the page you aren't asking for. Its intro says:
However, that is not entirely accurate, unless "well formed" doesn't mean "follows the HTML4 standard". It doesn't parse "<table><tr><td>foo<tr><td>bar</table>" correctly -- a perfectly valid bit of HTML4. Microdom's goal is to yield a well-formed data structure from a well-formed HTML document, and most ill-formed HTML documents too. James
participants (4)
-
angryhicKclown@netscape.net
-
James Y Knight
-
Jp Calderone
-
Nicola Larosa