[Tutor] %s %r with cutom type

Sat Mar 13 14:21:52 CET 2010

On Sat, 13 Mar 2010 08:40:31 pm spir wrote:

> > The print statement understands how to directly print strings
> > (byte-strings and unicode-strings) and doesn't call your __str__
> > method.
> >
> > http://docs.python.org/reference/simple_stmts.html#the-print-statem
> >ent
>
> Right. But then how to print out customized strings?

print str(obj)
print any_function_you_like(obj)

You shouldn't have objects try to lie about what they are. If a string 
is "hello" (without the quotes), then you shouldn't have it pretend to 
be "`hello`" (with quotes).

> > As for string interpolation, I have reported this as a bug:
> >
> > http://bugs.python.org/issue8128
>
> Yes, at least the actual behaviour should be clear and properly
> documented. And it should be the same for str and unicode type, and
> their subclasses. 

Which is why I have reported it as a bug.

[...]
> Notes for the following comments of yours:
> (1) What I posted is test code written only to show the issue. (eg
> debug prints are not in the original code) (2) This class is intended
> for a kind parsing and string processing library (think at pyparsing,
> but designed very differently). It should work only with unicode
> string, so convert source and every bit of string in pattern defs (eg
> for literal match). __str__ and __repr__ are intended for feedback
> (programmer test and user information, in both cases mainly error
> messages). 

What do you mean "mainly error messages"?

__str__ is intended for converting objects into a string. That's why it 
is called __str__ rather than __give_the_user_feedback__.

> __repr__ should normally not be used, I wrote it rather 
> for completion.

What do you mean, "for completion"? Unicode strings already have a 
__repr__ method. If you don't need to customize it, don't, and your 
class will inherit the existing __repr__ method.

> > This may be a problem. Why are you making your unicode class
> > pretend to be a byte-string?
>
> (This answer rather for __str__)
> Not to pollute output. Eg parse tree nodes (= match results) show
> like: integer:[sign:- digit:123]

You should keep display presentation and internal value as separate as 
possible. If your parse tree wants to display data in a particular 
format, then it is the responsibility of the parse tree to format the 
data correctly, not of the data. In fact, the parse tree itself should 
never print results. That is up to the caller: perhaps you want to 
write it to a file, print to standard out, or standard error, save it 
in a string, or anything you like.

>>> class ParseTree(object):
...     def match_results(self, arg):
...         return (42, "+", (1, 2), "something")
...     def format_results(self, arg):
...         result = self.match_results(arg)
...         template = "%d: [%c:- digits:%s] `%s`"
...         return template % result
...
>>> x = ParseTree().format_results(None)
>>> print x
42: [+:- digits:(1, 2)] `something`
>>> myfile.write(x + '\n')
>>>

If the Parse Tree does the printing, then the caller can't do anything 
except print.

[...]
> > > s = "éâÄ"
> >
> > This may be a problem. "éâÄ" is not a valid str, because it
> > contains non-ASCII characters.
>
> It's just a test case (note (1)) for non-ascii input, precisely.

Maybe so, but your test case depends on external factors like the 
terminal encoding. This is a bad test, because somebody else running it 
may get something completely different.

> > As far as I know, the behaviour of stuffing unicode characters into
> > byte-strings is not well-defined in Python, and will depend on
> > external factors like the terminal you are running in, if any. It
> > may or may not work as you expect. It is better to do this:
> >
> > u = u"éâÄ"
> > s = u.encode('uft-8')
>
> Yo, but I cannot expect every user to always use only unicode
> everywhere as input to my lib (both in sources to be parsed and in
> pattern defs) like a robot.

Of course you can. What happens if they pass None instead of a string? 
They get an error. What if they pass the integer 45? They get an error. 
What if they pass the list [1.235, 59.02, -267.1]? They get an error.

You are not responsible for the caller passing bad data. 

If your class relies on the user passing unicode strings, then you 
document the fact that it requires unicode strings. Then you have a 
choice:

* you can prohibit byte strings, and raise an error if they pass byte 
strings; or

* you can make a reasonable effort to convert byte strings to unicode, 
by calling encode, but if the encode() fails, oh well, that's the 
caller's responsibility.

If the user wants a string "cat" and they pass "C aT  \n" instead, 
you're not responsible for fixing their mistake. If they want the 
unicode string u"éâÄ" and they pass the byte-string "\xe9\xe2\xc4" 
instead, that's not your problem either.

> One main reason for my Unicode type (that 
> accepts both str and unicode). 

If all you want is a subclass of unicode which defaults to UTF-8 instead 
of ASCII for encoding, then I will agree with you 100%. That's a nice 
idea. But you seem to be taking a nice, neat, unicode subclass and 
trying to turn it into a swiss-army knife, containing all sorts of 
extra functionality to do everything for the user. That is a bad idea.

> Anyway, all that source of troubles 
> disappears with py3 :-)
> Then, I only need __str__ to produce nice, clear, unpolluted output.
>
> > which will always work consistently so long as you declare a source
> > encoding at the top of your module:
> >
> > # -*- coding: UTF-8 -*-
>
> Yes, this applies to my own code. But what about user code calling my
> lib? (This is the reason for Unicode.ENCODING config param).

That is their responsibility, not yours.

-- 
Steven D'Aprano