[Python-checkins] bpo-32174: Let .chm document display non-ASCII characters properly (GH-9758)

Steve Dower webhook-mailer at python.org
Mon Oct 8 17:21:03 EDT 2018


https://github.com/python/cpython/commit/6261ae9b01fb8429b779169f8de37ff567c144e8
commit: 6261ae9b01fb8429b779169f8de37ff567c144e8
branch: master
author: animalize <animalize at users.noreply.github.com>
committer: Steve Dower <steve.dower at microsoft.com>
date: 2018-10-08T14:20:54-07:00
summary:

bpo-32174: Let .chm document display non-ASCII characters properly (GH-9758)

Let .chm document display non-ASCII characters properly

Escape the `body` part of .chm source file to 7-bit ASCII, to fix visual effect on some MBCS Windows systems.

files:
A Doc/tools/extensions/escape4chm.py
A Misc/NEWS.d/next/Documentation/2018-10-08-19-15-28.bpo-32174.YO9CYm.rst
M Doc/conf.py

diff --git a/Doc/conf.py b/Doc/conf.py
index d8efce035c9c..7f720ce3832d 100644
--- a/Doc/conf.py
+++ b/Doc/conf.py
@@ -14,7 +14,7 @@
 # ---------------------
 
 extensions = ['sphinx.ext.coverage', 'sphinx.ext.doctest',
-              'pyspecific', 'c_annotations']
+              'pyspecific', 'c_annotations', 'escape4chm']
 
 # General substitutions.
 project = 'Python'
diff --git a/Doc/tools/extensions/escape4chm.py b/Doc/tools/extensions/escape4chm.py
new file mode 100644
index 000000000000..6f2e35725b37
--- /dev/null
+++ b/Doc/tools/extensions/escape4chm.py
@@ -0,0 +1,39 @@
+"""
+Escape the `body` part of .chm source file to 7-bit ASCII, to fix visual
+effect on some MBCS Windows systems.
+
+https://bugs.python.org/issue32174
+"""
+
+import re
+from html.entities import codepoint2name
+
+# escape the characters which codepoint > 0x7F
+def _process(string):
+    def escape(matchobj):
+        codepoint = ord(matchobj.group(0))
+
+        name = codepoint2name.get(codepoint)
+        if name is None:
+            return '&#%d;' % codepoint
+        else:
+            return '&%s;' % name
+
+    return re.sub(r'[^\x00-\x7F]', escape, string)
+
+def escape_for_chm(app, pagename, templatename, context, doctree):
+    # only works for .chm output
+    if not hasattr(app.builder, 'name') or app.builder.name != 'htmlhelp':
+        return
+
+    # escape the `body` part to 7-bit ASCII
+    body = context.get('body')
+    if body is not None:
+        context['body'] = _process(body)
+
+def setup(app):
+    # `html-page-context` event emitted when the HTML builder has
+    # created a context dictionary to render a template with.
+    app.connect('html-page-context', escape_for_chm)
+
+    return {'version': '1.0', 'parallel_read_safe': True}
diff --git a/Misc/NEWS.d/next/Documentation/2018-10-08-19-15-28.bpo-32174.YO9CYm.rst b/Misc/NEWS.d/next/Documentation/2018-10-08-19-15-28.bpo-32174.YO9CYm.rst
new file mode 100644
index 000000000000..a11a4b3eb087
--- /dev/null
+++ b/Misc/NEWS.d/next/Documentation/2018-10-08-19-15-28.bpo-32174.YO9CYm.rst
@@ -0,0 +1,2 @@
+chm document displays non-ASCII charaters properly on some MBCS Windows
+systems.



More information about the Python-checkins mailing list