[New-bugs-announce] [issue11113] html.entities mapping dicts need updating?

Brian Jones report at bugs.python.org
Fri Feb 4 04:43:55 CET 2011

New submission from Brian Jones <bkjones at gmail.com>:

In Python 3.2b2, html.entities.codepoint2name and name2codepoint only support the 252 HTML entity names defined in the HTML 4 spec from 1997. I'm wondering if there's a reason not to support W3C Recommendation 'XML Entity Definitions for Characters' 


This standard contains significantly more characters, and it is noted in that spec that the HTML 5 drafts use that spec's entities. You can see the current HTML 5 'Named character references' here: 


If this is just a matter of somebody going in to do the grunt work, let me know. 

If startup costs associated with importing a huge dictionary are a concern, perhaps a more efficient type that enables the same lookup interface can be defined. 

If other reasons exist to not move in this direction, please do let me know!

components: Library (Lib), Unicode, XML
messages: 127865
nosy: Brian.Jones
priority: normal
severity: normal
status: open
title: html.entities mapping dicts need updating?
type: feature request
versions: Python 3.2

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list