Deprecate `from __future__ import unicode_literals`?
I am beginning to think that `from __future__ import unicode_literals` does more harm than good. I don't recall exactly why we introduced it, but with the restoration of u"" literals in Python 3.3 we have a much better story for writing straddling code that is unicode-correct. The problem is that the future import does both too much and not enough -- it does too much because it changes literals to unicode even in contexts where there is no benefit (e.g. the argument to getattr() -- I still hear of code that breaks due to this occasionally) and at the same time it doesn't do anything for strings that you read from files, receive from the network, or even from other files that don't use the future import. I wonder if we can add an official note to the 2.7 docs recommending against it? (And maybe even to the 3.x docs if it's mentioned there at all.) -- --Guido van Rossum (python.org/~guido http://python.org/%7Eguido)
On 17 December 2016 at 08:24, Guido van Rossum
I am beginning to think that `from __future__ import unicode_literals` does more harm than good. I don't recall exactly why we introduced it, but with the restoration of u"" literals in Python 3.3 we have a much better story for writing straddling code that is unicode-correct.
The problem is that the future import does both too much and not enough -- it does too much because it changes literals to unicode even in contexts where there is no benefit (e.g. the argument to getattr() -- I still hear of code that breaks due to this occasionally) and at the same time it doesn't do anything for strings that you read from files, receive from the network, or even from other files that don't use the future import.
I wonder if we can add an official note to the 2.7 docs recommending against it? (And maybe even to the 3.x docs if it's mentioned there at all.)
I think thats a good idea. I've found u"" to be entirely sufficient and very robust. Perhaps also have python2 -3 report on it? -Rob
I personally used it when I was forced to use python 2 and working mainly with unicode processing (It is particularly handy when working with json for example) Le 16/12/2016 à 20:24, Guido van Rossum a écrit :
I am beginning to think that `from __future__ import unicode_literals` does more harm than good. I don't recall exactly why we introduced it, but with the restoration of u"" literals in Python 3.3 we have a much better story for writing straddling code that is unicode-correct.
The problem is that the future import does both too much and not enough -- it does too much because it changes literals to unicode even in contexts where there is no benefit (e.g. the argument to getattr() -- I still hear of code that breaks due to this occasionally) and at the same time it doesn't do anything for strings that you read from files, receive from the network, or even from other files that don't use the future import.
I wonder if we can add an official note to the 2.7 docs recommending against it? (And maybe even to the 3.x docs if it's mentioned there at all.)
-- --Guido van Rossum (python.org/~guido http://python.org/%7Eguido)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/xavier.combelle%40gmail.c...
On 12/16/2016 11:24 AM, Guido van Rossum wrote:
I am beginning to think that `from __future__ import unicode_literals` does more harm than good. I don't recall exactly why we introduced it, but with the restoration of u"" literals in Python 3.3 we have a much better story for writing straddling code that is unicode-correct.
So cross-version code would be primarily 2.7 and 3.3+ ? I can live with that. -- ~Ethan~
On Sat, Dec 17, 2016 at 8:07 AM, Ethan Furman
On 12/16/2016 11:24 AM, Guido van Rossum wrote:
I am beginning to think that `from __future__ import unicode_literals` does more harm than good. I don't recall exactly why we introduced it, but with the restoration of u"" literals in Python 3.3 we have a much better story for writing straddling code that is unicode-correct.
So cross-version code would be primarily 2.7 and 3.3+ ? I can live with that.
Or 3.5+ so you get percent formatting for bytes. +1 for deprecating unicode_literals; I don't remember ever using or wanting it. ChrisA
On Dec 16, 2016, at 01:07 PM, Ethan Furman wrote:
On 12/16/2016 11:24 AM, Guido van Rossum wrote:
I am beginning to think that `from __future__ import unicode_literals` does more harm than good. I don't recall exactly why we introduced it, but with the restoration of u"" literals in Python 3.3 we have a much better story for writing straddling code that is unicode-correct.
So cross-version code would be primarily 2.7 and 3.3+ ? I can live with that.
So can I. I don't mind "silently" deprecating it, such as adding strong admonitions against its use in the docs, but clearly it can't be removed (at least until 3.7) and I worry about breaking existing code, even with a more chatty DeprecationWarning. At least in some circles, the problems of unicode_literals are known, but it's still useful and it's used in lots of places. Getting rid of cruft like this is one of the more satisfying edits when dropping Python 2 support. :) Cheers, -Barry
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/16/2016 04:27 PM, Barry Warsaw wrote:
Getting rid of cruft like this is one of the more satisfying edits when dropping Python 2 support. :)
Ripping it out and replacing with explicit unicode literals is pretty satisfying when straddling, too. ;) Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJYVF7+AAoJEPKpaDSJE9HYSF0P/Ax00KYJQpIQdr7U4vn3Sz6F CpAfxIxR4uuZJMNwzxl1sBmsJ0xvoO2aldGwbbOzlvlbP1km4MlLfRC/ZFwoKWs0 yDA5xiUrwUGDPME6IEtTzn7CCk5INP6avX2zLkZg6qMfJ9Cd0VJkcJGAXE6CtAwS swAEJcfeIhb+5gnyHHECLc6XC+LQPf6GHkD0im3ayACr73bMCvdHRYF7pJaZ/XWN 1WYbRlPup0//Ge0MbHAUdn8GwnEm+e2GB1roKEryaSBEHfhtDm1iKPjWeg/gic91 j76nTeQ0qepdjGjGAISiPersSPEW44bzXCSDLh6OfQAUtDqA9pWFbNfOtMkjuM89 +VRC606QinShzwVbmsTbVwl4VAmYqPg/BplteP81nV8uOrsRlFkNJ6oLqhsTM6eM lFSBGnwDnrP1URt5r2LGs6aKKmZb5aGdW7puYgaaNzrzD5uMW5Kr1B7cPOwP//rD Y37x4Cu5jq0v9K5yVEc4GbvBdCjgREAUxweS5xUwWoPxFEPcdJiGZqLeYzpV2Llm K+J+Wa91RdKUtW3G/k16te9QVA0HWFSLMi1+v8XD4xoe3dmktxZeWSa6sUWaDeDT gso1uABYrvssiNT9+iMLNXNtJ2o4ZytMp6P9uOIUkJWqval1jPzWFZzF5wJA98mI ebthSapz3wpZQe6+ab17 =frxy -----END PGP SIGNATURE-----
On 16 Dec 2016, at 16:07, Ethan Furman
wrote: On 12/16/2016 11:24 AM, Guido van Rossum wrote:
I am beginning to think that `from __future__ import unicode_literals` does more harm than good. I don't recall exactly why we introduced it, but with the restoration of u"" literals in Python 3.3 we have a much better story for writing straddling code that is unicode-correct.
So cross-version code would be primarily 2.7 and 3.3+ ? I can live with that.
Speaking for third-party library authors, almost all cross-version code that does anything remotely close to a network is 2.7 and 3.3+. Requests dropped 3.2 support basically as soon as we could once 3.3’s unicode literals were restored, and since then I haven’t written anything that targets 3.2. It’s just too frustrating. And while I’m shoving my oar in, I’ve never seen anyone be happy with using “from __future__ import unicode_literals”. As others in this thread have said, it just points a loaded gun at your foot and lets you wait for it to go off. Cory
On Dec 16, 2016, at 11:24 AM, Guido van Rossum
wrote: I am beginning to think that `from __future__ import unicode_literals` does more harm than good. I don't recall exactly why we introduced it, but with the restoration of u"" literals in Python 3.3 we have a much better story for writing straddling code that is unicode-correct.
The problem is that the future import does both too much and not enough -- it does too much because it changes literals to unicode even in contexts where there is no benefit (e.g. the argument to getattr() -- I still hear of code that breaks due to this occasionally) and at the same time it doesn't do anything for strings that you read from files, receive from the network, or even from other files that don't use the future import.
I wonder if we can add an official note to the 2.7 docs recommending against it? (And maybe even to the 3.x docs if it's mentioned there at all.)
+1 Leaving it in place will likely cause more problems than it solves, so I think your suggest is a net win even if there is some bit of disruption. Also, as far as I can tell, the adoption rate of Python 3.2 was very low. Python 3's story didn't become attractive until later. Raymond
I've actually found unicode_literals useful in getting code to be Python
2/3-compatible. I try to use a Python 3-like model of always using unicode
for text data and only using str for binary data, and unicode_literals
helps achieve that, since most string literals are meant to be text, not
binary data. The issue with functions like getattr is annoying, but in my
experience it's not a common problem (I don't often call getattr() with a
string literal as an argument).
2016-12-16 14:56 GMT-08:00 Raymond Hettinger
On Dec 16, 2016, at 11:24 AM, Guido van Rossum
wrote: I am beginning to think that `from __future__ import unicode_literals` does more harm than good. I don't recall exactly why we introduced it, but with the restoration of u"" literals in Python 3.3 we have a much better story for writing straddling code that is unicode-correct.
The problem is that the future import does both too much and not enough -- it does too much because it changes literals to unicode even in contexts where there is no benefit (e.g. the argument to getattr() -- I still hear of code that breaks due to this occasionally) and at the same time it doesn't do anything for strings that you read from files, receive from the network, or even from other files that don't use the future import.
I wonder if we can add an official note to the 2.7 docs recommending against it? (And maybe even to the 3.x docs if it's mentioned there at all.)
+1 Leaving it in place will likely cause more problems than it solves, so I think your suggest is a net win even if there is some bit of disruption. Also, as far as I can tell, the adoption rate of Python 3.2 was very low. Python 3's story didn't become attractive until later.
Raymond _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ jelle.zijlstra%40gmail.com
On 2016-12-16, 19:24 GMT, Guido van Rossum wrote:
I am beginning to think that `from __future__ import unicode_literals` does more harm than good. I don't recall exactly why we introduced it, but with the restoration of u"" literals in Python 3.3 we have a much better story for writing straddling code that is unicode-correct.
??? There has been absolute fanaticism about not changing anything in Python 2.* because of supposed stability of API, even in situations when I don’t think API was really in danger (http://bugs.python.org/issue19494). And now you would remove a feature which zillions of lines of code depend on, or at least could depend on? And yes, I do use it in my current porting efforts of M2Crypto to be py2/3k compatible. I don’t understand. Matěj -- https://matej.ceplovi.cz/blog/, Jabber: mcepl@ceplovi.cz GPG Finger: 3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B C9D8 Never, never, never believe any war will be smooth and easy, or that anyone who embarks on the strange voyage can measure the tides and hurricanes he will encounter. The statesman who yields to war fever must realise that once the signal is given, he is no longer the master of policy but the slave of unforeseeable and uncontrollable events. -- Winston Churchill, 1930
On Fri, Dec 16, 2016 at 3:52 PM, Matěj Cepl
I don’t understand.
No need to get all bent out of shape. My proposal is to simply add a note to the docs recommending against using this. I wouldn't change any code, not even a silent deprecation warning. (Also, read the rest of the thread to learn why this is not the best practice for writing Python 2/3 straddling code.) -- --Guido van Rossum (python.org/~guido)
On 2016-12-16, 23:59 GMT, Guido van Rossum wrote:
No need to get all bent out of shape. My proposal is to simply add a note to the docs recommending against using this. I wouldn't change any code, not even a silent deprecation warning. (Also, read the rest of the thread to learn why this is not the best practice for writing Python 2/3 straddling code.)
Oh, that sounds a way better. Thank you. Matěj -- https://matej.ceplovi.cz/blog/, Jabber: mcepl@ceplovi.cz GPG Finger: 3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B C9D8 If Patrick Henry thought that taxation without representation was bad, he should see how bad it is with representation.
On 16.12.16 21:24, Guido van Rossum wrote:
e.g. the argument to getattr() -- I still hear of code that breaks due to this occasionally)
What is the problem with unicode in getattr()? Unicode attribute name is converted to str, and since the result is cached, this even don't add much overhead.
On 2016-12-17 10:06, Serhiy Storchaka wrote:
On 16.12.16 21:24, Guido van Rossum wrote:
e.g. the argument to getattr() -- I still hear of code that breaks due to this occasionally)
What is the problem with unicode in getattr()? Unicode attribute name is converted to str, and since the result is cached, this even don't add much overhead.
It breaks the str optimization of dicts. Dict with str-only keys are special-cased in Python 2. Christian
On 17.12.16 13:44, Christian Heimes wrote:
On 2016-12-17 10:06, Serhiy Storchaka wrote:
On 16.12.16 21:24, Guido van Rossum wrote:
e.g. the argument to getattr() -- I still hear of code that breaks due to this occasionally)
What is the problem with unicode in getattr()? Unicode attribute name is converted to str, and since the result is cached, this even don't add much overhead.
It breaks the str optimization of dicts. Dict with str-only keys are special-cased in Python 2.
getattr() converts a unicode to str and passes a str to PyObject_GetAttr().
On 17 December 2016 at 21:58, Serhiy Storchaka
On 17.12.16 13:44, Christian Heimes wrote:
On 2016-12-17 10:06, Serhiy Storchaka wrote:
On 16.12.16 21:24, Guido van Rossum wrote:
e.g. the argument to getattr() -- I still hear of code that breaks due to this occasionally)
What is the problem with unicode in getattr()? Unicode attribute name is converted to str, and since the result is cached, this even don't add much overhead.
It breaks the str optimization of dicts. Dict with str-only keys are special-cased in Python 2.
getattr() converts a unicode to str and passes a str to PyObject_GetAttr().
getattr() may do the right thing, but directly accessing __dict__ doesn't. python-future has a good write-up of the benefits and drawbacks, as they originally recommended it unconditionally when modernising code, and myself, Armin Ronacher, and a few others convinced them to be a little more judicious in suggesting it to people: http://python-future.org/unicode_literals.html However, that page also points out that whether or not it's likely to help more than it hurts depends a lot on which version of Python you're starting from: - if you're making originally Python 3 only code also work on Python 2, and hence defining the first ever version of its Python 2 API, then you probably *do* want to use unicode_literals, and then explicitly mark bytes literals to get Python 2 working - if you're modernising Python 2 code and have a lot of existing API users on Python 2, then you probably *don't* want to use unicode_literals, and instead explicitly mark your text literals as Unicode to get Python 3 working In cases like Django where folks successfully adopted the "unicode_literals" import for modernisation purposes, it was a matter of aiming to get to that "Python 3 native, with Python 2 also supported" structure as soon as possible (the fact that Django is structured primarily as an object oriented framework likely helped with that, as it has a lot of control over the data types user applications end up encountering). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
I have updated the porting HOWTO to drop recommending unicode_literals and
also to mention running optional type checkers like mypy and pytype twice
(once under Python 2 and again under Python 3).
On Fri, 16 Dec 2016 at 11:25 Guido van Rossum
I am beginning to think that `from __future__ import unicode_literals` does more harm than good. I don't recall exactly why we introduced it, but with the restoration of u"" literals in Python 3.3 we have a much better story for writing straddling code that is unicode-correct.
The problem is that the future import does both too much and not enough -- it does too much because it changes literals to unicode even in contexts where there is no benefit (e.g. the argument to getattr() -- I still hear of code that breaks due to this occasionally) and at the same time it doesn't do anything for strings that you read from files, receive from the network, or even from other files that don't use the future import.
I wonder if we can add an official note to the 2.7 docs recommending against it? (And maybe even to the 3.x docs if it's mentioned there at all.)
-- --Guido van Rossum (python.org/~guido http://python.org/%7Eguido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
Please don't get rid of unicode+literals -- I don't even think we should depreciate it as a recommendation or discourage it. Maybe a note or two added as to where issues may arise would be good. I've found importing unicode_literals to be an excellent way to write py2/3 code. And I have never found a problem. I'm also hoping that my py2/3 compatible code will someday be py3 only -- and then I'll be really glad that I don't have all those u" all over the place. Also it does "automagically" do the right thing with, for instance passing a literal to the file handling functions in the os module -- so that's pretty nice. The number of times you need to add a b"" is FAR fewer than "text" string literals. Let's keep it. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Mon, Dec 19, 2016 at 11:50 PM, Chris Barker
Please don't get rid of unicode+literals -- I don't even think we should depreciate it as a recommendation or discourage it.
Maybe a note or two added as to where issues may arise would be good.
I've found importing unicode_literals to be an excellent way to write py2/3 code. And I have never found a problem.
I'm also hoping that my py2/3 compatible code will someday be py3 only -- and then I'll be really glad that I don't have all those u" all over the place.
Also it does "automagically" do the right thing with, for instance passing a literal to the file handling functions in the os module -- so that's pretty nice.
The number of times you need to add a b"" is FAR fewer than "text" string literals. Let's keep it.
-CHB
Same thing here... also, it helps coding with the same mindset of Python 3, where everything is unicode by default -- and yes, there are problems if you use a unicode in an API that accepts bytes on Python 2, but then, you can also have the same issues on Python 3 -- you need to know and keep track on the bytes vs unicode everywhere (although they're syntactically similar to declare, they're not the same thing) and I find that there are less places where you need to put b'' than u'' (if you code with unicode in mind in Python 2)... On the ideal world, Python 2 would actually be improved to accept unicode on the places where Python 3 accepts unicode (such as subprocess.Popen, etc) to make it easier in porting applications that actually do the "right" thing on Python 2 to go to Python 3. Best Regards, Fabio
participants (17)
-
Barry Warsaw
-
Brett Cannon
-
Chris Angelico
-
Chris Barker
-
Christian Heimes
-
Cory Benfield
-
Ethan Furman
-
Fabio Zadrozny
-
Guido van Rossum
-
Jelle Zijlstra
-
Matěj Cepl
-
Nick Coghlan
-
Raymond Hettinger
-
Robert Collins
-
Serhiy Storchaka
-
Tres Seaver
-
Xavier Combelle