[Python-checkins] r63481 - in python/trunk: Doc/library/html.entities.rst Doc/library/html.parser.rst Doc/library/htmllib.rst Doc/library/htmlparser.rst Doc/library/markup.rst Lib/HTMLParser.py Lib/html/parser.py Lib/htmlentitydefs.py Lib/htmllib.py Lib/lib-old/HTMLParser.py Lib/lib-old/htmlentitydefs.py Lib/test/test_codeccallbacks.py Lib/test/test_multibytecodec_support.py Lib/test/test_py3kwarn.py Lib/test/test_sundry.py Misc/NEWS

fred.drake python-checkins at python.org
Tue May 20 08:08:39 CEST 2008


Author: fred.drake
Date: Tue May 20 08:08:38 2008
New Revision: 63481

Log:
revert creation of the html.entities and html.parser modules
(http://bugs.python.org/issue2882)


Added:
   python/trunk/Doc/library/htmlparser.rst
      - copied, changed from r63437, /python/trunk/Doc/library/htmlparser.rst
   python/trunk/Lib/HTMLParser.py
      - copied unchanged from r63429, /python/trunk/Lib/HTMLParser.py
   python/trunk/Lib/htmlentitydefs.py
      - copied unchanged from r63429, /python/trunk/Lib/htmlentitydefs.py
Removed:
   python/trunk/Doc/library/html.entities.rst
   python/trunk/Doc/library/html.parser.rst
   python/trunk/Lib/lib-old/HTMLParser.py
   python/trunk/Lib/lib-old/htmlentitydefs.py
Modified:
   python/trunk/Doc/library/htmllib.rst
   python/trunk/Doc/library/markup.rst
   python/trunk/Lib/html/parser.py
   python/trunk/Lib/htmllib.py
   python/trunk/Lib/test/test_codeccallbacks.py
   python/trunk/Lib/test/test_multibytecodec_support.py
   python/trunk/Lib/test/test_py3kwarn.py
   python/trunk/Lib/test/test_sundry.py
   python/trunk/Misc/NEWS

Deleted: python/trunk/Doc/library/html.entities.rst
==============================================================================
--- python/trunk/Doc/library/html.entities.rst	Tue May 20 08:08:38 2008
+++ (empty file)
@@ -1,42 +0,0 @@
-:mod:`html.entities` --- Definitions of HTML general entities
-=============================================================
-
-.. module:: htmlentitydefs
-   :synopsis: Old name for the html.entities module.
-
-.. module:: html.entities
-   :synopsis: Definitions of HTML general entities.
-.. sectionauthor:: Fred L. Drake, Jr. <fdrake at acm.org>
-
-.. note::
-   The :mod:`htmlentitydefs` module has been renamed to :mod:`html.entities` in
-   Python 3.0.  It is importable under both names in Python 2.6 and the rest of
-   the 2.x series.
-
-
-This module defines three dictionaries, ``name2codepoint``, ``codepoint2name``,
-and ``entitydefs``. ``entitydefs`` is used by the :mod:`htmllib` module to
-provide the :attr:`entitydefs` member of the :class:`HTMLParser` class.  The
-definition provided here contains all the entities defined by XHTML 1.0  that
-can be handled using simple textual substitution in the Latin-1 character set
-(ISO-8859-1).
-
-
-.. data:: entitydefs
-
-   A dictionary mapping XHTML 1.0 entity definitions to their replacement text in
-   ISO Latin-1.
-
-
-.. data:: name2codepoint
-
-   A dictionary that maps HTML entity names to the Unicode codepoints.
-
-   .. versionadded:: 2.3
-
-
-.. data:: codepoint2name
-
-   A dictionary that maps Unicode codepoints to HTML entity names.
-
-   .. versionadded:: 2.3

Deleted: python/trunk/Doc/library/html.parser.rst
==============================================================================
--- python/trunk/Doc/library/html.parser.rst	Tue May 20 08:08:38 2008
+++ (empty file)
@@ -1,190 +0,0 @@
-:mod:`html.parser` --- Simple HTML and XHTML parser
-===================================================
-
-.. module:: HTMLParser
-   :synopsis: Old name for the html.parser module.
-
-.. module:: html.parser
-   :synopsis: A simple parser that can handle HTML and XHTML.
-
-.. note::
-   The :mod:`HTMLParser` module has been renamed to :mod:`html.parser` in Python
-   3.0.  It is importable under both names in Python 2.6 and the rest of the 2.x
-   series.
-
-
-.. versionadded:: 2.2
-
-.. index::
-   single: HTML
-   single: XHTML
-
-This module defines a class :class:`HTMLParser` which serves as the basis for
-parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
-Unlike the parser in :mod:`htmllib`, this parser is not based on the SGML parser
-in :mod:`sgmllib`.
-
-
-.. class:: HTMLParser()
-
-   The :class:`HTMLParser` class is instantiated without arguments.
-
-   An :class:`HTMLParser` instance is fed HTML data and calls handler functions when tags
-   begin and end.  The :class:`HTMLParser` class is meant to be overridden by the
-   user to provide a desired behavior.
-
-   Unlike the parser in :mod:`htmllib`, this parser does not check that end tags
-   match start tags or call the end-tag handler for elements which are closed
-   implicitly by closing an outer element.
-
-An exception is defined as well:
-
-
-.. exception:: HTMLParseError
-
-   Exception raised by the :class:`HTMLParser` class when it encounters an error
-   while parsing.  This exception provides three attributes: :attr:`msg` is a brief
-   message explaining the error, :attr:`lineno` is the number of the line on which
-   the broken construct was detected, and :attr:`offset` is the number of
-   characters into the line at which the construct starts.
-
-:class:`HTMLParser` instances have the following methods:
-
-
-.. method:: HTMLParser.reset()
-
-   Reset the instance.  Loses all unprocessed data.  This is called implicitly at
-   instantiation time.
-
-
-.. method:: HTMLParser.feed(data)
-
-   Feed some text to the parser.  It is processed insofar as it consists of
-   complete elements; incomplete data is buffered until more data is fed or
-   :meth:`close` is called.
-
-
-.. method:: HTMLParser.close()
-
-   Force processing of all buffered data as if it were followed by an end-of-file
-   mark.  This method may be redefined by a derived class to define additional
-   processing at the end of the input, but the redefined version should always call
-   the :class:`HTMLParser` base class method :meth:`close`.
-
-
-.. method:: HTMLParser.getpos()
-
-   Return current line number and offset.
-
-
-.. method:: HTMLParser.get_starttag_text()
-
-   Return the text of the most recently opened start tag.  This should not normally
-   be needed for structured processing, but may be useful in dealing with HTML "as
-   deployed" or for re-generating input with minimal changes (whitespace between
-   attributes can be preserved, etc.).
-
-
-.. method:: HTMLParser.handle_starttag(tag, attrs)
-
-   This method is called to handle the start of a tag.  It is intended to be
-   overridden by a derived class; the base class implementation does nothing.
-
-   The *tag* argument is the name of the tag converted to lower case. The *attrs*
-   argument is a list of ``(name, value)`` pairs containing the attributes found
-   inside the tag's ``<>`` brackets.  The *name* will be translated to lower case,
-   and quotes in the *value* have been removed, and character and entity references
-   have been replaced.  For instance, for the tag ``<A
-   HREF="http://www.cwi.nl/">``, this method would be called as
-   ``handle_starttag('a', [('href', 'http://www.cwi.nl/')])``.
-
-   .. versionchanged:: 2.6
-      All entity references from :mod:`html.entities` are now replaced in the
-      attribute values.
-
-
-.. method:: HTMLParser.handle_startendtag(tag, attrs)
-
-   Similar to :meth:`handle_starttag`, but called when the parser encounters an
-   XHTML-style empty tag (``<a .../>``).  This method may be overridden by
-   subclasses which require this particular lexical information; the default
-   implementation simple calls :meth:`handle_starttag` and :meth:`handle_endtag`.
-
-
-.. method:: HTMLParser.handle_endtag(tag)
-
-   This method is called to handle the end tag of an element.  It is intended to be
-   overridden by a derived class; the base class implementation does nothing.  The
-   *tag* argument is the name of the tag converted to lower case.
-
-
-.. method:: HTMLParser.handle_data(data)
-
-   This method is called to process arbitrary data.  It is intended to be
-   overridden by a derived class; the base class implementation does nothing.
-
-
-.. method:: HTMLParser.handle_charref(name)
-
-   This method is called to process a character reference of the form ``&#ref;``.
-   It is intended to be overridden by a derived class; the base class
-   implementation does nothing.
-
-
-.. method:: HTMLParser.handle_entityref(name)
-
-   This method is called to process a general entity reference of the form
-   ``&name;`` where *name* is an general entity reference.  It is intended to be
-   overridden by a derived class; the base class implementation does nothing.
-
-
-.. method:: HTMLParser.handle_comment(data)
-
-   This method is called when a comment is encountered.  The *comment* argument is
-   a string containing the text between the ``--`` and ``--`` delimiters, but not
-   the delimiters themselves.  For example, the comment ``<!--text-->`` will cause
-   this method to be called with the argument ``'text'``.  It is intended to be
-   overridden by a derived class; the base class implementation does nothing.
-
-
-.. method:: HTMLParser.handle_decl(decl)
-
-   Method called when an SGML declaration is read by the parser.  The *decl*
-   parameter will be the entire contents of the declaration inside the ``<!``...\
-   ``>`` markup.  It is intended to be overridden by a derived class; the base
-   class implementation does nothing.
-
-
-.. method:: HTMLParser.handle_pi(data)
-
-   Method called when a processing instruction is encountered.  The *data*
-   parameter will contain the entire processing instruction. For example, for the
-   processing instruction ``<?proc color='red'>``, this method would be called as
-   ``handle_pi("proc color='red'")``.  It is intended to be overridden by a derived
-   class; the base class implementation does nothing.
-
-   .. note::
-
-      The :class:`HTMLParser` class uses the SGML syntactic rules for processing
-      instructions.  An XHTML processing instruction using the trailing ``'?'`` will
-      cause the ``'?'`` to be included in *data*.
-
-
-.. _htmlparser-example:
-
-Example HTML Parser Application
--------------------------------
-
-As a basic example, below is a very basic HTML parser that uses the
-:class:`HTMLParser` class to print out tags as they are encountered::
-
-   from html.parser import HTMLParser
-
-   class MyHTMLParser(HTMLParser):
-
-       def handle_starttag(self, tag, attrs):
-           print "Encountered the beginning of a %s tag" % tag
-
-       def handle_endtag(self, tag):
-           print "Encountered the end of a %s tag" % tag
-

Modified: python/trunk/Doc/library/htmllib.rst
==============================================================================
--- python/trunk/Doc/library/htmllib.rst	(original)
+++ python/trunk/Doc/library/htmllib.rst	Tue May 20 08:08:38 2008
@@ -77,12 +77,12 @@
       Interface definition for transforming an abstract flow of formatting events into
       specific output events on writer objects.
 
-   Module :mod:`html.parser`
+   Module :mod:`HTMLParser`
       Alternate HTML parser that offers a slightly lower-level view of the input, but
       is designed to work with XHTML, and does not implement some of the SGML syntax
       not used in "HTML as deployed" and which isn't legal for XHTML.
 
-   Module :mod:`html.entities`
+   Module :mod:`htmlentitydefs`
       Definition of replacement text for XHTML 1.0  entities.
 
    Module :mod:`sgmllib`
@@ -147,3 +147,44 @@
    call to :meth:`save_bgn`.  If the :attr:`nofill` flag is false, whitespace is
    collapsed to single spaces.  A call to this method without a preceding call to
    :meth:`save_bgn` will raise a :exc:`TypeError` exception.
+
+
+:mod:`htmlentitydefs` --- Definitions of HTML general entities
+==============================================================
+
+.. module:: htmlentitydefs
+   :synopsis: Definitions of HTML general entities.
+.. sectionauthor:: Fred L. Drake, Jr. <fdrake at acm.org>
+
+.. note::
+   The :mod:`htmlentitydefs` module has been renamed to :mod:`html.entities` in
+   Python 3.0.
+
+
+This module defines three dictionaries, ``name2codepoint``, ``codepoint2name``,
+and ``entitydefs``. ``entitydefs`` is used by the :mod:`htmllib` module to
+provide the :attr:`entitydefs` member of the :class:`HTMLParser` class.  The
+definition provided here contains all the entities defined by XHTML 1.0  that
+can be handled using simple textual substitution in the Latin-1 character set
+(ISO-8859-1).
+
+
+.. data:: entitydefs
+
+   A dictionary mapping XHTML 1.0 entity definitions to their replacement text in
+   ISO Latin-1.
+
+
+.. data:: name2codepoint
+
+   A dictionary that maps HTML entity names to the Unicode codepoints.
+
+   .. versionadded:: 2.3
+
+
+.. data:: codepoint2name
+
+   A dictionary that maps Unicode codepoints to HTML entity names.
+
+   .. versionadded:: 2.3
+

Copied: python/trunk/Doc/library/htmlparser.rst (from r63437, /python/trunk/Doc/library/htmlparser.rst)
==============================================================================
--- /python/trunk/Doc/library/htmlparser.rst	(original)
+++ python/trunk/Doc/library/htmlparser.rst	Tue May 20 08:08:38 2008
@@ -1,17 +1,13 @@
 
-:mod:`html.parser` --- Simple HTML and XHTML parser
-===================================================
+:mod:`HTMLParser` --- Simple HTML and XHTML parser
+==================================================
 
 .. module:: HTMLParser
-   :synopsis: Old name for the :mod:`html.parser` module.
-
-.. module:: html.parser
    :synopsis: A simple parser that can handle HTML and XHTML.
 
 .. note::
    The :mod:`HTMLParser` module has been renamed to
-   :mod:`html.parser` in Python 3.0.  It is importable under both names
-   in Python 2.6 and the rest of the 2.x series.
+   :mod:`html.parser` in Python 3.0.
 
 
 .. versionadded:: 2.2
@@ -100,8 +96,8 @@
    ``handle_starttag('a', [('href', 'http://www.cwi.nl/')])``.
 
    .. versionchanged:: 2.6
-      All entity references from :mod:`html.entities` are now replaced in the
-      attribute values.
+      All entity references from :mod:`htmlentitydefs` are now replaced in the attribute
+      values.
 
 
 .. method:: HTMLParser.handle_startendtag(tag, attrs)
@@ -179,7 +175,7 @@
 As a basic example, below is a very basic HTML parser that uses the
 :class:`HTMLParser` class to print out tags as they are encountered::
 
-   from html.parser import HTMLParser
+   from HTMLParser import HTMLParser
 
    class MyHTMLParser(HTMLParser):
 

Modified: python/trunk/Doc/library/markup.rst
==============================================================================
--- python/trunk/Doc/library/markup.rst	(original)
+++ python/trunk/Doc/library/markup.rst	Tue May 20 08:08:38 2008
@@ -23,8 +23,7 @@
 
 .. toctree::
 
-   html.parser.rst
-   html.entities.rst
+   htmlparser.rst
    sgmllib.rst
    htmllib.rst
    pyexpat.rst

Modified: python/trunk/Lib/html/parser.py
==============================================================================
--- python/trunk/Lib/html/parser.py	(original)
+++ python/trunk/Lib/html/parser.py	Tue May 20 08:08:38 2008
@@ -372,17 +372,16 @@
                     c = int(s)
                 return unichr(c)
             else:
-                # Cannot use name2codepoint directly, because HTMLParser
-                # supports apos, which is not part of HTML 4
-                import html.entities
+                # Cannot use name2codepoint directly, because HTMLParser supports apos,
+                # which is not part of HTML 4
+                import htmlentitydefs
                 if HTMLParser.entitydefs is None:
                     entitydefs = HTMLParser.entitydefs = {'apos':u"'"}
-                    for k, v in html.entities.name2codepoint.iteritems():
+                    for k, v in htmlentitydefs.name2codepoint.iteritems():
                         entitydefs[k] = unichr(v)
                 try:
                     return self.entitydefs[s]
                 except KeyError:
                     return '&'+s+';'
 
-        return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));",
-                      replaceEntities, s)
+        return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s)

Modified: python/trunk/Lib/htmllib.py
==============================================================================
--- python/trunk/Lib/htmllib.py	(original)
+++ python/trunk/Lib/htmllib.py	Tue May 20 08:08:38 2008
@@ -24,7 +24,7 @@
 
     """
 
-    from html.entities import entitydefs
+    from htmlentitydefs import entitydefs
 
     def __init__(self, formatter, verbose=0):
         """Creates an instance of the HTMLParser class.

Deleted: python/trunk/Lib/lib-old/HTMLParser.py
==============================================================================
--- python/trunk/Lib/lib-old/HTMLParser.py	Tue May 20 08:08:38 2008
+++ (empty file)
@@ -1,8 +0,0 @@
-from warnings import warnpy3k
-
-warnpy3k(("The HTMLParser module has been renamed to html.parser"
-          " in Python 3.0"), stacklevel=2)
-
-from sys import modules
-import html.parser
-modules["HTMLParser"] = html.parser

Deleted: python/trunk/Lib/lib-old/htmlentitydefs.py
==============================================================================
--- python/trunk/Lib/lib-old/htmlentitydefs.py	Tue May 20 08:08:38 2008
+++ (empty file)
@@ -1,8 +0,0 @@
-from warnings import warnpy3k
-
-warnpy3k(("The htmlentitydefs module has been renamed to html.entities"
-          " in Python 3.0"), stacklevel=2)
-
-from sys import modules
-import html.entities
-modules["htmlentitydefs"] = html.entities

Modified: python/trunk/Lib/test/test_codeccallbacks.py
==============================================================================
--- python/trunk/Lib/test/test_codeccallbacks.py	(original)
+++ python/trunk/Lib/test/test_codeccallbacks.py	Tue May 20 08:08:38 2008
@@ -1,5 +1,5 @@
 import test.test_support, unittest
-import sys, codecs, html.entities, unicodedata
+import sys, codecs, htmlentitydefs, unicodedata
 
 class PosReturn:
     # this can be used for configurable callbacks
@@ -86,7 +86,7 @@
             l = []
             for c in exc.object[exc.start:exc.end]:
                 try:
-                    l.append(u"&%s;" % html.entities.codepoint2name[ord(c)])
+                    l.append(u"&%s;" % htmlentitydefs.codepoint2name[ord(c)])
                 except KeyError:
                     l.append(u"&#%d;" % ord(c))
             return (u"".join(l), exc.end)

Modified: python/trunk/Lib/test/test_multibytecodec_support.py
==============================================================================
--- python/trunk/Lib/test/test_multibytecodec_support.py	(original)
+++ python/trunk/Lib/test/test_multibytecodec_support.py	Tue May 20 08:08:38 2008
@@ -64,7 +64,7 @@
         if self.has_iso10646:
             return
 
-        from html.entities import codepoint2name
+        from htmlentitydefs import codepoint2name
 
         def xmlcharnamereplace(exc):
             if not isinstance(exc, UnicodeEncodeError):

Modified: python/trunk/Lib/test/test_py3kwarn.py
==============================================================================
--- python/trunk/Lib/test/test_py3kwarn.py	(original)
+++ python/trunk/Lib/test/test_py3kwarn.py	Tue May 20 08:08:38 2008
@@ -216,13 +216,11 @@
 class TestStdlibRenames(unittest.TestCase):
 
     renames = {'copy_reg': 'copyreg', 'Queue': 'queue',
-               'htmlentitydefs': 'html.entities',
                'SocketServer': 'socketserver',
                'ConfigParser': 'configparser',
                'repr': 'reprlib',
                'FileDialog': 'tkinter.filedialog',
                'FixTk': 'tkinter._fix',
-               'HTMLParser': 'html.parser',
                'ScrolledText': 'tkinter.scrolledtext',
                'SimpleDialog': 'tkinter.simpledialog',
                'Tix': 'tkinter.tix',

Modified: python/trunk/Lib/test/test_sundry.py
==============================================================================
--- python/trunk/Lib/test/test_sundry.py	(original)
+++ python/trunk/Lib/test/test_sundry.py	Tue May 20 08:08:38 2008
@@ -50,7 +50,7 @@
             import encodings
             import formatter
             import getpass
-            import html.entities
+            import htmlentitydefs
             import ihooks
             import imghdr
             import imputil

Modified: python/trunk/Misc/NEWS
==============================================================================
--- python/trunk/Misc/NEWS	(original)
+++ python/trunk/Misc/NEWS	Tue May 20 08:08:38 2008
@@ -50,10 +50,6 @@
 Library
 -------
 
-- Issue #2882: The htmlentitydefs module has been renamed to 'html.entities'
-  and HTMLParser has been renamed to 'html.parser'; the old names have been
-  deprecated and will be removed in Python 3.0.
-
 - Issue #961805: Fix Text.edit_modified() in Tkinter.
 
 - Issue #1793: Function ctypes.util.find_msvcrt() added that returns


More information about the Python-checkins mailing list