String encoding in Py2.7
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Tue May 29 06:39:54 EDT 2018
On Tue, 29 May 2018 09:19:52 +0000, Fabien LUCE wrote:
> May 29 2018 11:12 AM, "Thomas Jollans" <tjol at tjol.eu> wrote:
>> On 2018-05-29 09:55, ftg at lutix.org wrote:
>>
>>> Hello,
>>> Using Python 2.7 (will switch to Py3 soon but Before I'd like to
>>> understand how string encoding worked)
>>
>> Oh dear. This is probably the exact wrong way to go about it: the
>> interplay between string encoding, unicode and bytes is much less clear
>> and easy to understand in Python 2.
>
> Ok I will quickly jump into py3 then.
Why I applaud this decision -- the latest Python 3.x series is much
better than 2.7 -- please don't imagine that moving to Python 3 will
eliminate all encoding issues, especially when dealing with real-world
data that comes to you in a mix of weird and often broken encodings.
Python 3 eliminates one common source of problems: unlike Python 2, it
won't try to guess what you mean when you combines bytes and Unicode
text. In Python 2, that worked for the simple cases, and was often
convenient, but at the cost of leading to hard to diagnose and hard to
fix errors in the complex cases. Python 3 no longer guesses, which means
you have to be more diligent in converting bytes to text and vice versa.
Also, it has to be said that Python 3 makes one use-case harder: mixed
binary bytes plus ASCII text. (Or so I've been told.)
But for the common case where you have human readable text in Unicode,
and machine readable bytes in hex bytes, and can keep them separate,
Python 3 is much better.
I recommend you start with reading these if you haven't already:
https://nedbatchelder.com/text/unipain.html
https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-
software-developer-absolutely-positively-must-know-about-unicode-and-
character-sets-no-excuses/
Sorry for the huge URL, try this if your mail client breaks it:
https://tinyurl.com/h8yg9d7
--
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson
More information about the Python-list
mailing list