[issue10875] Update Regular Expression HOWTO

New submission from Terry J. Reedy tjreedy@udel.edu:
0. Does 'Release 0.05' at the top have any useful current meaning? or could it be deleted?
1. Introduction:
The history paragraph "The re module was added in Python 1.5, and provides Perl-style regular expression patterns. Earlier versions of Python came with the regex module, which provided Emacs-style patterns. The regex module was removed completely in Python 2.5." might be eliminated in 3.x, or at least the irrelevant-for-py3 reference to regex. This is a policy decision.
2. Performing matches:
"If you have Tkinter available, you may also want to look at Tools/scripts/redemo.py,"
Change 'Tkinter' to 'tkinter' and make it a module reference. In link, change 'scripts' to 'demo' as redemo.py got moved.
"Phil Schwartz’s Kodos is also an interactive tool for developing and testing RE patterns."
Add the url '(http://kodos.sourceforge.net/)' to the text so that Windows help users can copy and paste it into a browser. (This should be a general policy.)
"Python 2.2.2 (#1, Feb 10 2003, 12:57:01)" delete
<_sre.SRE_Match object at 80c4f68>
This is correctly updated (for late 2.x and 3.x)
"<re.MatchObject instance at 80c9650>" (7 like this)
Globally replace 're.MatchObject instance' with '_sre.SRE_Match object'
3. Footnote
"[1] Introduced in Python 2.2.2."
remove for 3.x here and wherever footnote reference is in the text.
4. "Not Using re.VERBOSE"
This section is about *using* re.VERBOSE and the benefit thereof, not about not using it. I recommend deleting 'Not' as it gives the impression that the section is a warning about not using, the opposite of the intent.
5. Code example output and doctest:
I ran doctest.testfile("C:/programs/PyDev/py32/Doc/howto/regex.rst", module_relative = False)
After the 're...' to '_sre...' substitution above, all 11 failures would be due to 'at 0x#######' address mismatches. I believe changing all 11 addresses to '0x...' (I took this from the doctest doc) would both fix the failures and remove irrelevant detail for human readers.
The other 87 examples all passed ;-!.
Is there any current doctest-related markup that should be added?
---------- assignee: docs@python components: Documentation messages: 125855 nosy: akuchling, docs@python, terry.reedy priority: normal severity: normal stage: needs patch status: open title: Update Regular Expression HOWTO versions: Python 2.7, Python 3.1, Python 3.2
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

Georg Brandl georg@python.org added the comment:
Your points 1-5 all sound valid to me. Would you like to do make a patch? I don't know what to do about the release number. Probably doesn't hurt anyone to keep it.
---------- nosy: +georg.brandl
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

Éric Araujo merwok@netwok.org added the comment:
Good points overall.
The only subpoint I disagree with is this one: “Add the url '(http://kodos.sourceforge.net/)' to the text so that Windows help users can copy and paste it into a browser. (This should be a general policy.)” IMO, it’s the job of the Sphinx builder to add URIs in plaintext if the format does not have hyperlinks. -1 on cluttering the source and HTML output with duplicated links.
---------- nosy: +eric.araujo
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

Georg Brandl georg@python.org added the comment:
Oh right, I misread that one. Can't Windows help users right-click and select "Copy URL"?
----------
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

SilentGhost ghost.adh@gmail.com added the comment:
Here is the patch implementing all but the url suggestion.
Doctest still has 11 failures (changing to '0x...' didn't help).
---------- keywords: +patch nosy: +SilentGhost Added file: http://bugs.python.org/file20329/regex.rst.diff
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

SilentGhost ghost.adh@gmail.com added the comment:
A few bits and pieces fixed compared to the previous patch.
doctest.testfile("/home/mischa/pydev/Doc/howto/regex.rst", module_relative = False, optionflags=doctest.ELLIPSIS)
TestResults(failed=0, attempted=98)
---------- Added file: http://bugs.python.org/file20331/regex.rst.diff
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

Changes by SilentGhost ghost.adh@gmail.com:
Removed file: http://bugs.python.org/file20329/regex.rst.diff
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

SilentGhost ghost.adh@gmail.com added the comment:
It seems that the special sequences description in Matching Characters section need to be updated to incorporate information on unicode and bytes. I don't think, however, that it's a good idea just to copy that information from the Doc/library/re.rst May be the section could be shortened and linked to that RE Syntax section? there aren't any deeper links available unfortunately.
----------
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

Terry J. Reedy tjreedy@udel.edu added the comment:
I agree that the .rst should not have two copies and that any windows.chm specific fixup should be in the tool. Right now, right clicking gives a context menu with one item: Properties. Clicking that brings up a dialog box with a url that can be copied. Good enough for me at the moment but not terribly obvious. A possible separate issue.
Unless A Kuchling says different, I would like to remove the version number. It implies to me that this doc is in pre-alpha condition and it is far beyond that. I see that the patch already does so.
-:file:`Tools/scripts/redemo.py`, a demonstration program included with the +:file:`Tools/scripts/demo.py`, a demonstration program included with the
should (currently) be +:file:`Tools/demo/redemo.py`, a demonstration program included with the
Other than that, the patch looks good. Thanks. I am still thinking about Matching Characters. Once the patch is fixed with possible addition, a 2.7 version can easily be made be deleting the 3.x-specific deletions.
----------
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

SilentGhost ghost.adh@gmail.com added the comment:
I don't know whether it would be easy to strip down py3k version to 2.7 version.
Seeing how it's just a basic introduction, I would think that a single statement re unicode support might be sufficient. For exhaustive description of special sequences refer the docs and carry on with ascii strings.
Attached patch fixes path issue.
---------- Added file: http://bugs.python.org/file20332/regex.rst.diff
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

Changes by SilentGhost ghost.adh@gmail.com:
Removed file: http://bugs.python.org/file20331/regex.rst.diff
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

Terry J. Reedy tjreedy@udel.edu added the comment:
Since I think I know how to do it, easily, I will try to derive the 2.7 patch.
In Matching Characters, I think "The following predefined special sequences are available:"
should be expanded to
"The following predefined special sequences are a subset of those available. The equivalent classes are for bytes patterns. For a complete list of sequences and expanded class definitions for Unicode string patterns, see the end of Regular Expression Syntax." (with section reference markup).
Note to myself. /bytes/byte string/ for 2.7.
While the changes all look innocuous to me with respect to building the docs, I am curious if you have tried to rebuild the HOWTO (if you have the tool chain, which I do not).
----------
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

Éric Araujo merwok@netwok.org added the comment:
I agree that the .rst should not have two copies and that any windows.chm specific fixup should be in the tool. Right now, right clicking gives a context menu with one item: Properties. Clicking that brings up a dialog box with a url that can be copied. Good enough for me at the moment but not terribly obvious. A possible separate issue.
I would argue that this is a bug in the CHM viewers, not Python :)
----------
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

SilentGhost ghost.adh@gmail.com added the comment:
While the changes all look innocuous to me with respect to building the docs, I am curious if you have tried to rebuild the HOWTO (if you have the tool chain, which I do not).
I did rebuild the docs with 'make html'. Build was clean every time. If you meant something else please let me know.
----------
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

Terry J. Reedy tjreedy@udel.edu added the comment:
I applied patch to 3.2, 3.1 in r87904, r87905. Thanks. I had to re-edit for 2.7: r87909.
I made a separate small patch for my suggested addition to Matching Characters. Could someone check that it is correct, given that re.rst contains the target directive (or whatever it is called): .. _re-syntax:
---------- assignee: docs@python -> terry.reedy stage: needs patch -> commit review Added file: http://bugs.python.org/file20340/zregex2.rst.diff
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

Éric Araujo merwok@netwok.org added the comment:
Looks good, builds without warnings.
Note that you can use :ref:`re-syntax` and Sphinx will substitute the heading for you. The :role:`some special text <real-target>` form is used when you want to control the text of the link.
(That thing is called an hyperlink target: http://docutils.sourceforge.net/docs/user/rst/quickref.html#hyperlink-target...)
----------
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

Terry J. Reedy tjreedy@udel.edu added the comment:
Thanks. r87911,r87912
---------- resolution: -> fixed status: open -> closed
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

Terry J. Reedy tjreedy@udel.edu added the comment:
and r87918 for 2.7, with bytes -> byte string
----------
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

Changes by Terry J. Reedy tjreedy@udel.edu:
---------- Removed message: http://bugs.python.org/msg125954
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________

Terry J. Reedy tjreedy@udel.edu added the comment:
Correction: r87912 and r87913 for 3.x
----------
_______________________________________ Python tracker report@bugs.python.org http://bugs.python.org/issue10875 _______________________________________
participants (4)
-
Georg Brandl
-
SilentGhost
-
Terry J. Reedy
-
Éric Araujo