[Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

Greg Ewing greg.ewing at canterbury.ac.nz
Thu Jan 9 06:22:28 CET 2014


Kristján Valur Jónsson wrote:
> all you want is to open that .txt
> file on the drive and extract some phone numbers and merge in some email
> addresses. What encoding does the file have? Do I care? Must I care?

To some extent, yes. If the encoding happens to be an
ascii-compatible one, such as latin-1 or utf-8, you can
probably extract the phone numbers without caring what
the rest of the bytes mean. But not if it's utf-16,
for example.

If you know that all the files on your system have an
ascii-compatible encoding, you can use the surrogateescape
error handler to avoid having to know about the exact
encoding. Granted, that makes it slightly more complicated
than it was in Python 2, but not much.

-- 
Greg


More information about the Python-Dev mailing list