Greg Ewing writes:
Steven D'Aprano wrote:
Aside: you keep writing H..HHHHHH for Unicode code points. Unicode code points go up to hex 10FFFF,
They do *now*, but we can't be sure that they will stay that way in the future.
In Unicode, they will. Blood was shed over the issue in the ISO 10646 committees before the standards could be unified. Huge amounts of software validate UTF-8 and UTF-16 including staying within the range, and won't easily be converted to accept extended ranges. So Unicode and ISO 10646 will stay within the current 17 pages. To go beyond that they'll need a new standard. In any case, it seems really unlikely that more than 1,000,000 code points will ever be needed, unless there's a mutation that makes all of *us* obsolete.
The Ruby \U{...} syntax has the following advantages:
So does the \N{U+XXXX} proposal, and it has the further advantage of indicating the obvious semantics as a name for this character/code point, which is consistent with the actual usage of the U+XXXX syntax in the standard.