
Here is an idea for an html package:
htmlentitydefs -> html.entities htmllib -> html.tools HTMLParser -> html.parser
What do people think?
-Brett

On 2008-02-20 10:49, Brett Cannon wrote:
Here is an idea for an html package:
htmlentitydefs -> html.entities htmllib -> html.tools HTMLParser -> html.parser
What do people think?
+1

Brett Cannon wrote:
Here is an idea for an html package:
htmlentitydefs -> html.entities htmllib -> html.tools HTMLParser -> html.parser
What do people think?
+1
What about markupbase and sgml?
markupbase -> html.markup ?
or:
sgmllib -> sgml.parser markupbase -> sgml.base
with aliases sgml.SGMLParser and sgml.SGMLParserError
Christian

On Feb 20, 2008 1:59 AM, Christian Heimes christian@cheimes.de wrote:
Brett Cannon wrote:
Here is an idea for an html package:
htmlentitydefs -> html.entities htmllib -> html.tools HTMLParser -> html.parser
What do people think?
+1
What about markupbase and sgml?
markupbase has no public API and has already been renamed _markupbase.
As for sgmllib, since it only exists for htmllib, I say it should be merged into html.tools since there is no naming conflicts.
But I just realized htmllib itself is a parser for html. So naming it html.tools seems wrong.
Perhaps it should be:
htmlentitydefs -> html.entities htmllib -> html.parser sgmllib -> html.parser HTMLParser -> html.xparser
That way the two different parsers are both delineated as parsers, but the fact that HTMLParser handles XHTML is covered.
I will also ask the web-sig if both parsers are really needed.
-Brett

On 2008-02-20 11:15, Brett Cannon wrote:
On Feb 20, 2008 1:59 AM, Christian Heimes christian@cheimes.de wrote:
Brett Cannon wrote:
Here is an idea for an html package:
htmlentitydefs -> html.entities htmllib -> html.tools HTMLParser -> html.parser
What do people think?
+1
What about markupbase and sgml?
markupbase has no public API and has already been renamed _markupbase.
As for sgmllib, since it only exists for htmllib, I say it should be merged into html.tools since there is no naming conflicts.
-1. SGML is a much more general markup language than HTML.
It's true that only htmllib does use sgmllib in the std lib, but there are applications out there that rely on sgmllib for the SGML part, e.g. ones implementing the DocBook tool chain.
But I just realized htmllib itself is a parser for html. So naming it html.tools seems wrong.
Perhaps it should be:
htmlentitydefs -> html.entities htmllib -> html.parser sgmllib -> html.parser HTMLParser -> html.xparser
That way the two different parsers are both delineated as parsers, but the fact that HTMLParser handles XHTML is covered.
I will also ask the web-sig if both parsers are really needed.
-Brett _______________________________________________ stdlib-sig mailing list stdlib-sig@python.org http://mail.python.org/mailman/listinfo/stdlib-sig

On Feb 20, 2008, at 5:59 AM, M.-A. Lemburg wrote:
-1. SGML is a much more general markup language than HTML.
Certainly. sgmllib doesn't handle the general case, and is unlikely to be reasonably extensible to support a substantial subset of SGML.
It's true that only htmllib does use sgmllib in the std lib, but there are applications out there that rely on sgmllib for the SGML part, e.g. ones implementing the DocBook tool chain.
I'd strongly encourage anyone who wants to use Python for general SGML parsing to either fork sgmllib (with a new name) and give it a real go, or integrate an existing SGML parser. Pretending that sgmllib is really a usable SGML parser doesn't seem like a good idea. It's also not widely enough used that it makes sense to have such a beast in the standard library.
-Fred

On Feb 20, 2008 9:40 AM, Fred Drake fdrake@acm.org wrote:
On Feb 20, 2008, at 5:59 AM, M.-A. Lemburg wrote:
-1. SGML is a much more general markup language than HTML.
Certainly. sgmllib doesn't handle the general case, and is unlikely to be reasonably extensible to support a substantial subset of SGML.
This is why I suggested moving it. Since the docs for sgmllib explicitly state it does not provide full support for SGML as-is I figured it should just get shifted. Heck, it might be a prime candidate for API removal since it is not a complete implementation.
It's true that only htmllib does use sgmllib in the std lib, but there are applications out there that rely on sgmllib for the SGML part, e.g. ones implementing the DocBook tool chain.
I'd strongly encourage anyone who wants to use Python for general SGML parsing to either fork sgmllib (with a new name) and give it a real go, or integrate an existing SGML parser. Pretending that sgmllib is really a usable SGML parser doesn't seem like a good idea. It's also not widely enough used that it makes sense to have such a beast in the standard library.
I am definitely adding sgmllib to the list of possible module to remove, but that is a separate email for another day. =)
-Brett

On 2008-02-20 21:20, Brett Cannon wrote:
On Feb 20, 2008 9:40 AM, Fred Drake fdrake@acm.org wrote:
On Feb 20, 2008, at 5:59 AM, M.-A. Lemburg wrote:
-1. SGML is a much more general markup language than HTML.
Certainly. sgmllib doesn't handle the general case, and is unlikely to be reasonably extensible to support a substantial subset of SGML.
This is why I suggested moving it. Since the docs for sgmllib explicitly state it does not provide full support for SGML as-is I figured it should just get shifted. Heck, it might be a prime candidate for API removal since it is not a complete implementation.
Ok, remove my -1 :-)
It's true that only htmllib does use sgmllib in the std lib, but there are applications out there that rely on sgmllib for the SGML part, e.g. ones implementing the DocBook tool chain.
I'd strongly encourage anyone who wants to use Python for general SGML parsing to either fork sgmllib (with a new name) and give it a real go, or integrate an existing SGML parser. Pretending that sgmllib is really a usable SGML parser doesn't seem like a good idea. It's also not widely enough used that it makes sense to have such a beast in the standard library.
I am definitely adding sgmllib to the list of possible module to remove, but that is a separate email for another day. =)
Are there any wrappers for OpenJade and OpenSP out there ?
http://openjade.sourceforge.net/

On Feb 20, 2008 9:24 AM, Fred Drake fdrake@acm.org wrote:
On Feb 20, 2008, at 4:49 AM, Brett Cannon wrote:
htmlentitydefs -> html.entities htmllib -> html.tools HTMLParser -> html.parser
Drop htmllib (html.tools) completely, and you're good to go.
I have not finished reading my email, but I hope the web-sig likes that idea since that is what I would like to see happen as well.
-Brett

On Feb 20, 2008 12:18 PM, Brett Cannon brett@python.org wrote:
On Feb 20, 2008 9:24 AM, Fred Drake fdrake@acm.org wrote:
On Feb 20, 2008, at 4:49 AM, Brett Cannon wrote:
htmlentitydefs -> html.entities htmllib -> html.tools HTMLParser -> html.parser
Drop htmllib (html.tools) completely, and you're good to go.
I have not finished reading my email, but I hope the web-sig likes that idea since that is what I would like to see happen as well.
So I read my mail and both Fred and Guido supported the idea on the web-sig which works for me. =)
So the currently winning idea is:
HTMLParser -> html.parser htmlentitydefs -> html.entities htmllib -> *removed* sgmllib -> *removed*
-Brett

On Feb 20, 2008, at 3:30 PM, Brett Cannon wrote:
So the currently winning idea is:
Another semi-related module is "formatter". While the interfaces are general, I'd be curious to find out if that's being used for anything other than formatting HTML. htmllib and pydoc appear to be the main users of this module in the standard library.
-Fred

On Feb 20, 2008 12:42 PM, Fred Drake fdrake@acm.org wrote:
On Feb 20, 2008, at 3:30 PM, Brett Cannon wrote:
So the currently winning idea is:
Another semi-related module is "formatter". While the interfaces are general, I'd be curious to find out if that's being used for anything other than formatting HTML. htmllib and pydoc appear to be the main users of this module in the standard library.
Google Code Search turned up two places that seem to be using it in a non-HTML way:
http://www.google.com/codesearch?hl=en&q=+lang:python+%22import+formatte... http://www.google.com/codesearch?hl=en&q=+lang:python+%22from+formatter%...
-Brett
participants (5)
-
Brett Cannon
-
Christian Heimes
-
Fred Drake
-
M.-A. Lemburg
-
Raymond Hettinger