[Python-bugs-list] [ python-Bugs-765036 ] Unicode non-characters
SourceForge.net
noreply@sourceforge.net
Wed, 02 Jul 2003 18:52:59 -0700
Bugs item #765036, was opened at 2003-07-03 01:52
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=765036&group_id=5470
Category: Unicode
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Gnosis Software (gnosis)
Assigned to: M.-A. Lemburg (lemburg)
Summary: Unicode non-characters
Initial Comment:
The alleged codepoints unichr(0xFFFE) and
unichr(0xFFFF) are not unicode characters. This document:
http://www.unicode.org/charts/PDF/UFFF0.pdf
Contains:
Noncharacters
These codes are intended for process internal uses, but
are not permitted for interchange.
FFFE !<not a character>
¨ the value FFFE !is guaranteed not to be
a Unicode character at all
¨ may be used to detect byte order by
contrast with FEFF which is a character
FEFF zero width no-break space
FFFF !<not a character>
¨ the value FFFF !is guaranteed not to be
a Unicode character at all
In particular, an XML document that contains such an
alleged unicode entity in not well-formed.
All unicode-aware versions of Python threat these
codepoints in the same manner as other codepoints, e.g.
both unichr(0xFFFE) and u'\uffff' pass without complaint.
I believe the correct behavior would be for Python to
raise an exception, or at least a warning, on access to
these spurious characters.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=765036&group_id=5470