[Python-Dev] Fix Unicode-disabled build of Python 2.7
Serhiy Storchaka
storchaka at gmail.com
Wed Jun 25 14:55:35 CEST 2014
25.06.14 00:03, Jim J. Jewett написав(ла):
> It would be good to fix the tests (and actual library issues).
> Unfortunately, some of the specifically proposed changes (such as
> defining and using _unicode instead of unicode within python code)
> look to me as though they would trigger problems in the normal build
> (where the unicode object *does* exist, but would no longer be used).
This is recomended by MvL [1] and widely used (19 times in source code)
idiom.
[1] http://bugs.python.org/issue8767#msg159473
> Other changes, such as the use of \x escapes, appear correct, but make
> the tests harder to read -- and might end up removing a test for
> correct unicode funtionality across different spellings.
>
> Even if we assume that the tests are fine, and I'm just an idiot who
> misread them, the fact that there is any confusion means that these
> particular changes may be tricky enough to be for a bad tradeoff for 2.7.
>
> It *might* work if you could make a more focused change. For example,
> instead of leaving the 'unicode' name unbound, provide an object that
> simply returns false for isinstance and raises a UnicodeError for any
> other method call. Even *this* might be too aggressive to 2.7, but the
> fact that it would only appear in the --disable-unicode builds, and
> would make them more similar to the regular build are points in its
> favor.
No, existing code use different approach. "unicode" doesn't exist, while
encode/decode methods exist but are useless. If my memory doesn't fail
me, there is even special explanatory comment about this historical
decision somewhere. This decision was made many years ago.
> Before doing that, though, please document what the --disable-unicode
> mode is actually *supposed* to do when interacting with byte-streams
> that a standard defines as UTF-8. (For example, are the changes to
> _xml_dumps and _xml_loads at
> http://bugs.python.org/file35758/multiprocessing.patch
> correct, or do those functions assume they get bytes as input, or
> should the functions raise an exception any time they are called?)
Looking more carefully, I see that there is a bug in unicode-enable
build (wrong backporting from 3.x). In 2.x xmlrpclib.dumps produces
already utf-8 encoded string, in 3.x xmlrpc.client.dumps produces
unicode string. multiprocessing should fail with non-ascii str or unicode.
Side benefit of my patches is that they expose existing errors in
unicode-enable build.
More information about the Python-Dev
mailing list