__str__ vs. __repr__
andy at robanal.demon.co.uk
Sun Nov 7 11:41:18 EST 1999
"Tim Peters" <tim_one at email.msn.com> wrote:
>> Should I really assume stdout is capable of displaying Latin-1?
>No. But I don't grasp why you think you need to know *anything* about this.
>Until Unicode takes over the world, there's nothing you can do other than
>tell users the truth: "most of" printable 7-bit ASCII displays the same way
>across the world, but outside of that *all* bets are off. It will vary by
>all of OS, display device, displaying program, and user configuration
Well said. I'm going to pass this message around to my team at work as
it encapsulates so many of the issues involved in internationalisation
(and if you think inputting François is bad, try figuring out a way to
clean up Japanese names using vi on an English Solaris box).
I have found there are only three really sane points of reference when
dealing with encodings - anywhere between the gaps just causes
confusion all around:
Level 1: Code that is assumed to work as above (tab, newline and
Level 2: Code that keeps all 256 values intact without understanding
the contents, like Python strings.
Level 3: A fully general multi-byte encoding toolkit, where the
programmer is in explicit control of which encoding a string is in,
and has capable libraries which can convert from one to the other, and
can reason in advance about whether a particular data set can survive
a particular round-trip conversion, and knows the exact capabilities
of the fonts ultimately used for display or printing.
AFAIK (3) does not exist yet - we got most of the way at work with
Unilib and a load of ad hoc Python code, and Java goes most of the
way. I think the key concept is a special kind of string which is
tagged to know which encoding it is in.
More information about the Python-list