[Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

Terry Reedy tjreedy at udel.edu
Thu Jan 9 02:35:48 CET 2014


On 1/8/2014 5:04 PM, Kristján Valur Jónsson wrote:
>
> Believe it or not, sometimes you really don't care about encodings.
> Sometimes you just want to parse text files.  Python 3 forces you to
> think about abstract concepts like encodings when all you want is to
> open that .txt file on the drive and extract some phone numbers and

I suspect that you would do that by looking for the bytes that can be 
interpreted as ascii digits. That will work fine as long as the .txt 
file has an ascii-compatible encoding. As soon as it does not, the 
little utility fails. It also fails with non-European digits, such as 
are used in Arabic and Indic writings.

Even if you are in an environment where all .txt files are encoded in 
utf-8, it will be easier to look for non-ascii digits in decoded unicode 
strings.

> merge in some email addresses.  What encoding does the file have?  Do
> I care?  Must I care?

If the email addresses have non-ascii characters, then you must.

...
> All this talk is positive, though.  The fact that these topics
> have finally reached the halls of python-dev are indication that
> people out there are _trying_ to move to 3.3 :)

That is an interesting observation, worth keeping in mind among the turmoil.

-- 
Terry Jan Reedy




More information about the Python-Dev mailing list