On Mon, Mar 16, 2020, 1:04 AM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

Also, strs and reprs of arbitrary objects often end up in places
such as log files which aren't equipped to handle unicode or other
fancy things. So keeping them as basic as possible is a good idea.

Is this why __unicode__ was removed in favor of just __str__?

It can be argued that logfiles should also be shell control character escaped at ingestion time.

Exactly. Python 3 uses a Unicode model for strings. And that means anywhere you have strings, you have Unicode. And you need to deal with encoding issues on I/O. I'm not sure how logfiles are any different than any other file I/O.

Related note: logfiles are likely to dump arbitrary messages attached to Exceptions as well. So you really need to be able to deal with arbitrary Unicode anyway.

(note: in 2.7, passing arbitrary Unicode through the Exception machinery leads to messy errors, we really don't want that)

That all being said, there is something to be said for keeping all __str__ and __repr__ on builtins to be a lowest common denominator subset (i.e. ascii) -- your logging system and whatever should handle any Unicode without raising, but it may use a "ignore" or "replace" error handler, and it would be pretty ugly to strip out parts of standard representations of builtins.

-CHB

--
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython