break unichr instead of fix ord?

Dieter Maurer dieter at handshake.de
Sun Aug 30 06:54:21 CEST 2009


"Martin v. Löwis" <martin at v.loewis.de> writes on Fri, 28 Aug 2009 10:12:34 +0200:
> > The PEP says:
> >      * unichr(i) for 0 <= i < 2**16 (0x10000) always returns a
> >        length-one string.
> > 
> >      * unichr(i) for 2**16 <= i <= TOPCHAR will return a
> >        length-one string on wide Python builds. On narrow
> >        builds it will raise ValueError.
> > and
> >      * ord() is always the inverse of unichr()
> > 
> > which of course we know; that is the current behavior.  But
> > there is no reason given for that behavior.
> 
> Sure there is, right above the list:
> 
> "Most things will behave identically in the wide and narrow worlds."
> 
> That's the reason: scripts should work the same as much as possible
> in wide and narrow builds.
> 
> What you propose would break the property "unichr(i) always returns
> a string of length one, if it returns anything at all".

But getting a "ValueError" in some builds (and not in others)
is rather worse than getting unicode strings of different length....

> >    1) Should surrogate pairs be disallowed on narrow builds?
> > That appears to have been answered in the negative and is
> > not relevant to my question.
> 
> It is, as it does lead to inconsistencies between wide and narrow
> builds. OTOH, it also allows the same source code to work on both
> versions, so it also preserves the uniformity in a different way.

Do you not have the inconsistencies in any case?
... "ValueError" in some builds and not in others ...




More information about the Python-list mailing list