Here is an idea for an html package: htmlentitydefs -> html.entities htmllib -> html.tools HTMLParser -> html.parser What do people think? -Brett
On 2008-02-20 10:49, Brett Cannon wrote:
Here is an idea for an html package:
htmlentitydefs -> html.entities htmllib -> html.tools HTMLParser -> html.parser
What do people think?
+1 -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 20 2008)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611
Brett Cannon wrote:
Here is an idea for an html package:
htmlentitydefs -> html.entities htmllib -> html.tools HTMLParser -> html.parser
What do people think?
+1 What about markupbase and sgml? markupbase -> html.markup ? or: sgmllib -> sgml.parser markupbase -> sgml.base with aliases sgml.SGMLParser and sgml.SGMLParserError Christian
On Feb 20, 2008 1:59 AM, Christian Heimes <christian@cheimes.de> wrote:
Brett Cannon wrote:
Here is an idea for an html package:
htmlentitydefs -> html.entities htmllib -> html.tools HTMLParser -> html.parser
What do people think?
+1
What about markupbase and sgml?
markupbase has no public API and has already been renamed _markupbase. As for sgmllib, since it only exists for htmllib, I say it should be merged into html.tools since there is no naming conflicts. But I just realized htmllib itself is a parser for html. So naming it html.tools seems wrong. Perhaps it should be: htmlentitydefs -> html.entities htmllib -> html.parser sgmllib -> html.parser HTMLParser -> html.xparser That way the two different parsers are both delineated as parsers, but the fact that HTMLParser handles XHTML is covered. I will also ask the web-sig if both parsers are really needed. -Brett
On 2008-02-20 11:15, Brett Cannon wrote:
On Feb 20, 2008 1:59 AM, Christian Heimes <christian@cheimes.de> wrote:
Brett Cannon wrote:
Here is an idea for an html package:
htmlentitydefs -> html.entities htmllib -> html.tools HTMLParser -> html.parser
What do people think? +1
What about markupbase and sgml?
markupbase has no public API and has already been renamed _markupbase.
As for sgmllib, since it only exists for htmllib, I say it should be merged into html.tools since there is no naming conflicts.
-1. SGML is a much more general markup language than HTML. It's true that only htmllib does use sgmllib in the std lib, but there are applications out there that rely on sgmllib for the SGML part, e.g. ones implementing the DocBook tool chain.
But I just realized htmllib itself is a parser for html. So naming it html.tools seems wrong.
Perhaps it should be:
htmlentitydefs -> html.entities htmllib -> html.parser sgmllib -> html.parser HTMLParser -> html.xparser
That way the two different parsers are both delineated as parsers, but the fact that HTMLParser handles XHTML is covered.
I will also ask the web-sig if both parsers are really needed.
-Brett _______________________________________________ stdlib-sig mailing list stdlib-sig@python.org http://mail.python.org/mailman/listinfo/stdlib-sig
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 20 2008)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611
On Feb 20, 2008, at 5:59 AM, M.-A. Lemburg wrote:
-1. SGML is a much more general markup language than HTML.
Certainly. sgmllib doesn't handle the general case, and is unlikely to be reasonably extensible to support a substantial subset of SGML.
It's true that only htmllib does use sgmllib in the std lib, but there are applications out there that rely on sgmllib for the SGML part, e.g. ones implementing the DocBook tool chain.
I'd strongly encourage anyone who wants to use Python for general SGML parsing to either fork sgmllib (with a new name) and give it a real go, or integrate an existing SGML parser. Pretending that sgmllib is really a usable SGML parser doesn't seem like a good idea. It's also not widely enough used that it makes sense to have such a beast in the standard library. -Fred -- Fred Drake <fdrake at acm.org>
On Feb 20, 2008 9:40 AM, Fred Drake <fdrake@acm.org> wrote:
On Feb 20, 2008, at 5:59 AM, M.-A. Lemburg wrote:
-1. SGML is a much more general markup language than HTML.
Certainly. sgmllib doesn't handle the general case, and is unlikely to be reasonably extensible to support a substantial subset of SGML.
This is why I suggested moving it. Since the docs for sgmllib explicitly state it does not provide full support for SGML as-is I figured it should just get shifted. Heck, it might be a prime candidate for API removal since it is not a complete implementation.
It's true that only htmllib does use sgmllib in the std lib, but there are applications out there that rely on sgmllib for the SGML part, e.g. ones implementing the DocBook tool chain.
I'd strongly encourage anyone who wants to use Python for general SGML parsing to either fork sgmllib (with a new name) and give it a real go, or integrate an existing SGML parser. Pretending that sgmllib is really a usable SGML parser doesn't seem like a good idea. It's also not widely enough used that it makes sense to have such a beast in the standard library.
I am definitely adding sgmllib to the list of possible module to remove, but that is a separate email for another day. =) -Brett
On 2008-02-20 21:20, Brett Cannon wrote:
On Feb 20, 2008 9:40 AM, Fred Drake <fdrake@acm.org> wrote:
On Feb 20, 2008, at 5:59 AM, M.-A. Lemburg wrote:
-1. SGML is a much more general markup language than HTML. Certainly. sgmllib doesn't handle the general case, and is unlikely to be reasonably extensible to support a substantial subset of SGML.
This is why I suggested moving it. Since the docs for sgmllib explicitly state it does not provide full support for SGML as-is I figured it should just get shifted. Heck, it might be a prime candidate for API removal since it is not a complete implementation.
Ok, remove my -1 :-)
It's true that only htmllib does use sgmllib in the std lib, but there are applications out there that rely on sgmllib for the SGML part, e.g. ones implementing the DocBook tool chain. I'd strongly encourage anyone who wants to use Python for general SGML parsing to either fork sgmllib (with a new name) and give it a real go, or integrate an existing SGML parser. Pretending that sgmllib is really a usable SGML parser doesn't seem like a good idea. It's also not widely enough used that it makes sense to have such a beast in the standard library.
I am definitely adding sgmllib to the list of possible module to remove, but that is a separate email for another day. =)
Are there any wrappers for OpenJade and OpenSP out there ? http://openjade.sourceforge.net/ -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 20 2008)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611
On Feb 20, 2008 9:24 AM, Fred Drake <fdrake@acm.org> wrote:
On Feb 20, 2008, at 4:49 AM, Brett Cannon wrote:
htmlentitydefs -> html.entities htmllib -> html.tools HTMLParser -> html.parser
Drop htmllib (html.tools) completely, and you're good to go.
I have not finished reading my email, but I hope the web-sig likes that idea since that is what I would like to see happen as well. -Brett
On Feb 20, 2008 12:18 PM, Brett Cannon <brett@python.org> wrote:
On Feb 20, 2008 9:24 AM, Fred Drake <fdrake@acm.org> wrote:
On Feb 20, 2008, at 4:49 AM, Brett Cannon wrote:
htmlentitydefs -> html.entities htmllib -> html.tools HTMLParser -> html.parser
Drop htmllib (html.tools) completely, and you're good to go.
I have not finished reading my email, but I hope the web-sig likes that idea since that is what I would like to see happen as well.
So I read my mail and both Fred and Guido supported the idea on the web-sig which works for me. =) So the currently winning idea is: HTMLParser -> html.parser htmlentitydefs -> html.entities htmllib -> *removed* sgmllib -> *removed* -Brett
On Feb 20, 2008, at 3:30 PM, Brett Cannon wrote:
So the currently winning idea is:
Another semi-related module is "formatter". While the interfaces are general, I'd be curious to find out if that's being used for anything other than formatting HTML. htmllib and pydoc appear to be the main users of this module in the standard library. -Fred -- Fred Drake <fdrake at acm.org>
On Feb 20, 2008 12:42 PM, Fred Drake <fdrake@acm.org> wrote:
On Feb 20, 2008, at 3:30 PM, Brett Cannon wrote:
So the currently winning idea is:
Another semi-related module is "formatter". While the interfaces are general, I'd be curious to find out if that's being used for anything other than formatting HTML. htmllib and pydoc appear to be the main users of this module in the standard library.
Google Code Search turned up two places that seem to be using it in a non-HTML way: http://www.google.com/codesearch?hl=en&q=+lang:python+%22import+formatter%22+show:8GkSOvDsbvA:ZD7wF8p7IcE:W6eAjEYIQ_M&sa=N&cd=18&ct=rc&cs_p=http://osx.freshmeat.net/redir/ftpcube/20073/url_tgz/ftpcube-0.5.1.tar.gz&cs_f=ftpcube-0.5.1/libftpcube/txtwrapper.py#first http://www.google.com/codesearch?hl=en&q=+lang:python+%22from+formatter%22+show:xMrpNMMr5lM:bdNE19lGChs:6z9ZtePjddc&sa=N&cd=10&ct=rc&cs_p=http://freshmeat.net/redir/issuedealer/38032/url_tgz/IssueDealer-0.9.120.tar.gz&cs_f=IssueDealer/rest.py#first -Brett
participants (5)
-
Brett Cannon
-
Christian Heimes
-
Fred Drake
-
M.-A. Lemburg
-
Raymond Hettinger