[issue25017] htmllib deprecated: Which library to use? Missing sane default in docs
New submission from Thomas Guettler: At the top of the htmllib module:
Deprecated since version 2.6: The htmllib module has been removed in Python 3.
Source: https://docs.python.org/2/library/htmllib.html#module-htmllib Newcomers need more advice: Which library should be used? I know there are many html parsing libraries. But there should be a sane default for newcomers. Is there already an agreement of a sane default html parsing library? ---------- assignee: docs@python components: Documentation messages: 250088 nosy: docs@python, guettli priority: normal severity: normal status: open title: htmllib deprecated: Which library to use? Missing sane default in docs _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
Martin Panter added the comment: PEP 3108 says “Superseded by HTMLParser”. I presume this means Python 3’s “html.parser” module (called “HTMLParser” in Python 2). I guess a lot of work would be involved in changing existing code over, but it shouldn’t be much of a problem for someone writing new code. ---------- nosy: +martin.panter versions: +Python 2.7 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
Thomas Guettler added the comment: This issue is just about documentation. No code change is required for it. How to update the docs, to point to html.parser? ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
Changes by Berker Peksag <berker.peksag@gmail.com>: ---------- nosy: +ezio.melotti _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
Ezio Melotti added the comment: If you want to create a patch, you have to edit the file Doc/library/htmllib.rst in the 2.7 branch. You can find information about cloning the CPython repository and switching branch in the devguide. The warning should suggest :mod:`HTMLParser` for Python 2 and the equivalent :mod:`html.parser` for Python 3. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
Changes by Berker Peksag <berker.peksag@gmail.com>: ---------- keywords: +easy stage: -> needs patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
Nan Wu added the comment: Added a small patched for this change. ---------- keywords: +patch nosy: +Nan Wu Added file: http://bugs.python.org/file40796/htmllib_deprecation_warning.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
Berker Peksag added the comment: Thanks for the patch. I think we can move the Python 3 part of the patch to a new note directive (similar to the example in httplib documentation: https://docs.python.org/2/library/httplib.html) For example: .. deprecated:: 2.6 Use :mode:`HTMLParser` instead. .. note:: The :mod:`htmllib` module has been removed in Python 3. Use :mod:`html.parser` (equivalent of :mode:`HTMLParser`) instead. ---------- nosy: +berker.peksag stage: needs patch -> patch review _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
Martin Panter added the comment: Also beware it should be :mod: not :mode: :) ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
Nan Wu added the comment: Updated the patch. The typo was fixed too. Thanks for the catching. ---------- Added file: http://bugs.python.org/file40831/htmllib_deprecation_warning_2.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
Martin Panter added the comment: This looks good enough to me. I would have probably avoided littering the page with too many Deprecated and Note boxes, but I can respect your and Berker’s preference to add the separate box. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
R. David Murray added the comment: The note should actually be parallel to the http one (assuming 2to3 does do the translation), rather than say "use instead", which would be incorrect advice for a python2 user :) ---------- nosy: +r.david.murray _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
Martin Panter added the comment: Not quite. This is a two-step deprecation: 1. “htmllib” is removed in favour of HTMLParser. The API is different, so no automatic 2to3 change would be practical. 2. HTMLParser is renamed to “html.parser”, and 2to3 handles this. This is already documented at <https://docs.python.org/2/library/htmlparser.html>. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
R. David Murray added the comment: OK, then the note should be dropped. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
Martin Panter added the comment: David: are you saying you like the first patch better (ignoring the markup mistakes)? ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
R. David Murray added the comment: Yes, though I hadn't looked at it before this :) ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
Martin Panter added the comment: Here is a cleaned-up version of Nan’s first patch. ---------- Added file: http://bugs.python.org/file41027/htmllib_deprecation_warning_3.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
Berker Peksag added the comment: htmllib_deprecation_warning_3.patch looks good to me. ---------- stage: patch review -> commit review _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
Roundup Robot added the comment: New changeset 7bc8f56ef1f3 by Martin Panter in branch '2.7': Issue #25017: Document that htmllib is superseded by module HTMLParser https://hg.python.org/cpython/rev/7bc8f56ef1f3 ---------- nosy: +python-dev _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
Changes by Martin Panter <vadmium+py@gmail.com>: ---------- resolution: -> fixed stage: commit review -> resolved status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue25017> _______________________________________
participants (7)
-
Berker Peksag -
Ezio Melotti -
Martin Panter -
Nan Wu -
R. David Murray -
Roundup Robot -
Thomas Guettler