Compact repr operator (i.e., __short_repr__)

I was curious if what kind of interest there would be in adding an overridable method to types to allow getting back a repr, but a predictably short one. The motivation is that there are many cases where producing the full, round-trippable repr is possible, but would take up significant resources or be too huge to be useful for human consumption. The built-in [reprlib module][1] certainly shows there's an interest for many built-in types, and I think arguably that same functionality is desirable for user types. As mentioned in the subject a possible API could be: class ValueObject(object): def __init__(self, content): self.content = content def __repr__(self): return '%s(%r)' % (self.__class__.__name__, self.content) def __short_repr__(self, size=None, depth=None): # TODO: should interpretation of size/depth be casual or strict? if size and len(self.content) > size: short_content = self.content[:size] + '...' else: pass # TODO: could just return normal repr possibly return '<%s content=%r)' % (self.__class__.__name__, short_content) Without something like this, there's no way to ask an object if it's repr is of manageable length, or how its repr could be meaningfully shortened. Usually I just chop out the middle and add an ellipsis, but as for the time spent generating that middle, I'll never get those cycles back. Anyways, thanks for your attention, and happy Monday to all! Mahmoud https://github.com/mahmoud https://twitter.com/mhashemi http://sedimental.org [1]: https://docs.python.org/2/library/functions.html#func-repr [2]: https://docs.python.org/3.4/library/reprlib.html

On Mon, Feb 8, 2016 at 3:36 PM, Mahmoud Hashemi <mahmoud@hatnote.com> wrote:
numpy as realized this, and produced a __repr__ (and __str__) that truncates: In [*26*]: len(arr) Out[*26*]: 100000 In [*27*]: repr(arr) Out[*27*]: 'array([ 0.00000000e+00, 1.00001000e-02, 2.00002000e-02, ...,\n 9.99980000e+02, 9.99990000e+02, 1.00000000e+03])' I"m not sure that a full-sized repr is ever useful, so this seems fine to me. I wonder how often anyone actually counts on eval(repr(obj)) == obj ? In short, I don't see that this would be all that useful. -CHB
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Mon, Feb 8, 2016 at 6:36 PM Mahmoud Hashemi <mahmoud@hatnote.com> wrote:
With the addition of ``__short_repr__`` would there be a change to the default Python REPL? If not, how would you use the new magic method? If so, then why not just change __repr__ for all the built-ins to truncate automatically? As far as I know, reprs are not part of the backwards compatibility guarantee. (Note, I'm not advocating for changing reprs.) If the usage would be to write a function that checks for the existence of __short_repr__, then why not simply move the implementation of ``if len(s) < maxlen`` etc. to that function?

Roundtrippable reprs are certainly part of Python canon, whether or not they are universally used (Chris), or guaranteed (Mike). I could see __short_repr__ (and associated reprlib) being the desired behavior in some console environments, but I'm not one to say it should be the default in the main Python REPL. My use cases are for 1) a web based console/REPL and 2) a configuration store that maintains a human readable history of past values (i.e., it does not maintain references to the objects themselves). But mostly I wanted to kick off the discussion of how to update reprlib (and pprint) to be more efficient and applicable. Mahmoud On Mon, Feb 8, 2016 at 4:28 PM, Michael Selik <mike@selik.org> wrote:

On 9 Feb 2016, at 02:49, Mahmoud Hashemi <mahmoud@hatnote.com> wrote:
Roundtrippable reprs are certainly part of Python canon, whether or not they are universally used (Chris), or guaranteed (Mike).
They can be part of the canon all they want, but if they’re not universally guaranteed then I don’t know that this is a real problem. It means that the world of Python objects divides into two kinds: first, those with __repr__ return values that can be round tripped and those with __repr__ return values that cannot be round tripped. Given that objects of the second type already exist (I know this for a fact because I have written some quite recently!), it would be an error to assume that the identity 'eval(repr(x)) == x’ holds for arbitrary types. In fact, not only does it not hold for all third-party types, it doesn’t even hold for all built-in types:
I think the reality is that there is no constraint on the representation of arbitrary types to be round-trippable in any way. Again, all custom types have non-round-trippable representations by default, many more eclectic built-in types have non-round-tripppable representations (in addition to NaN, the memoryview object leaps to mind). I can also note the Python documentation on repr:
For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object.
If the language doesn’t even try to enforce the idea that representations will be round-trippable, I think there’s just no problem here. Cory

If you want to make the case that default reprs should be non-roundtrippable in the case that they're too long, that's a fine and separate discussion. Though I have argued in the past that float('nan') would be a less surprising/more useful repr. And that's what it's about, usefulness. It's useful to be able to copy in and out of the REPL, even for very large reprs. I've done it, my coworkers and students do it, and you've probably done it as well. But there are other times (often outside the REPL), where that is not the case, and being able to address them explicitly, in the vein of reprlib and pprint -- but better -- would be *useful*. Who hasn't wished that the built-in defaultdict and OrderedDict were as pprintable or reprlib.repr-able as dict. There's plenty of room to improve. On Tue, Feb 9, 2016 at 12:56 AM, Cory Benfield <cory@lukasa.co.uk> wrote:

The appropriate truncation for any repr that is too long would depend on the particulars of both the object and the purpose of the output. If reprlib and pprint were improved for say, defaultdict and OrderedDict, or more generic things, what would its features be? I'm guessing you wouldn't just duplicate the functionality of ``pprint(dict(x))``. On Tue, Feb 9, 2016, 4:09 AM Mahmoud Hashemi <mahmoud@hatnote.com> wrote:

For those examples, the pprint/reprlib behavior probably wouldn't be much different than dict/list of tuples. I can put my vote in, but I prefer to defer design specifics to the creators/maintainers of those types. That's my point. Right now I feel like repr() is a powerful, but underleveraged function in the builtin ecosystem. Instead of pprint.pformat(), why not add a keyword argument, and have repr(obj, pretty=True), and have that call through with arguments to a more full featured __repr__? Same could go for repr(obj, length=150, depth=2). These could have defaults set for REPL sessions. Everyone is doing hacky, incomplete heuristic introspection, from IPython to the standard library. Instead, let's embrace the complexities of viewing application state, and allow for formalization of community efforts. Mahmoud On Wed, Feb 10, 2016 at 11:18 AM, Michael Selik <mike@selik.org> wrote:

If we want to embrace the complexities of viewing application state, perhaps we should divorce this conversation from the repr altogether. There’s plenty of applications where “viewing application state” will require more than the repr generally provides: you’ll want insight into private member variables, into the state of composed objects, and potentially into the state of some of those private member variables (e.g. is this lock held at this moment?). This is outside the generally accepted scope of the repr as I understand it: most people don’t provide the repr in this form. If you want the complexity of application state then you probably want some way to dump the graph of objects starting with a specific one, along with their state. That’s totally beyond the scope of the repr, and is closer to what pickle would do for you (though as Andrew points out, pickle is not the right tool for this job either). What I don’t understand is: if you are in the REPL already, what does any of this buy you? The REPL has plenty of powerful tools for introspecting Python objects: specifically, it has the Python programming language! If you want to introspect application state from the REPL, then just grab the object you want and start interrogating its state. dir() works, printing fields works, calling methods works, and all of this is far more powerful than anything you can get from the repr. Cory

Michael: Yes, I have several projects on PyPI, some of which are related and somewhat popular, but I hardly think that a PyPI package can encompass the API changes I'm describing. We already have several built-in modules (reprlib, pprint) and 3rd party modules (IPython Notebook, Django 500 pages, the Werkzeug debugger) that see a great deal of usage, but only support a few builtinn types and have to hack heuristics for the rest. Cory: Introspecting application state happens at many levels, one step at a time. Not only is perfect the enemy of good here, but I really don't think an omni-solution like the one you're describing is a realistic, Pythonic goal. Besides, graphs are powerful, but don't go underestimating linear text. An enhanced repr allows developer-users to communicate the context of their usage so that the library authors can programmatically provide and prioritize information in their types. I've seen (and written) plenty of reprs that encode useful information in my types. To your example, I might have: <MyLock 0x123 held=True>. I still think you're selling repr short. It can be much more. One day we might have:
repr(obj, pretty=True, depth=2, width=120)
Though, as I said before, my primary use case isn't the REPL, and those arguments should be part of the REPL config (pprint defaults to width=80 for consoles). The point is that repr has room to evolve. Mahmoud On Thu, Feb 11, 2016 at 1:00 AM, Cory Benfield <cory@lukasa.co.uk> wrote:

On 11 February 2016 at 19:24, Mahmoud Hashemi <mahmoud@hatnote.com> wrote:
I generally haven't been following these threads, but this one caught my eye. Making pprint extensible was one of the rationales for adding functools.singledispatch to the standard library. However, as far as I am aware, we haven't had anyone find the time to follow up on that change by actually applying it where it would make sense to do so. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Feb 11, 2016 at 01:24:42AM -0800, Mahmoud Hashemi wrote:
I'm sympathetic, but I'm not sure that repr() is the right place to be focusing. repr() is too closely tied to the REPL and the existing string formatting codes to start making radical changes. But we do have the pprint library, which (in my opinion) is underpowered and underused. I know that some people think that Python already has too many string conversion functions ( str()/__str__ and repr()/__repr__ ) but for my own use I've occasionally found that it has too few. Some time ago, I started working on a library for continued fractions. I count at least five useful/standard representations for a continued fraction. All of these would be equivalent: ContinuedFraction(1, 2, 3, 4, 5) [1; 2, 3, 4, 5] 1 + 1/(2 + 1/(3 + 1/(4 + 1/5))) 1 1 1 1 1 + ----- ----- ----- ----- 2 + 3 + 4 + 5 1 1 + ----------------- 1 2 + ------------- 1 3 + --------- 1 4 + ----- 5 (Apart from the first, the remaining four are standard notations used in mathematics.) Now obviously I can just add my own string conversion methods to the class, but it would be nice to be able to integrate with the pprint (and maybe reprlib) libraries in some way rather than an ad hoc set of arbitrarily named methods? I must admit that I'm just thinking aloud here, I don't actually have a concrete idea. But maybe my use-case will spark an idea in someone else. -- Steve

On Thu, Feb 11, 2016, at 05:59, Steven D'Aprano wrote:
(Apart from the first, the remaining four are standard notations used in mathematics.)
How about 1+\dfrac 1{2+\dfrac 1{3+\dfrac 1{4+\dfrac 1 5}}}? A way to get a "mathematical abstract syntax tree" that can be operated on and transformed into that or any of your notations [1+1/(2+1/(3+1/(4+1/5))) might be the default] for numeric objects (and other things such as matrices and vectors) might be nice.

On 11 Feb 2016, at 09:24, Mahmoud Hashemi <mahmoud@hatnote.com> wrote:
An enhanced repr allows developer-users to communicate the context of their usage so that the library authors can programmatically provide and prioritize information in their types. I've seen (and written) plenty of reprs that encode useful information in my types. To your example, I might have: <MyLock 0x123 held=True>.
To be clear, I absolutely do that as well. In fact, if I’m writing a repr that’s almost always the form I give it. Again, I’m not arguing that the repr isn’t useful. I’m arguing that it *is* useful, and that attempting to extend its utility with flags and complex functionality dilutes its utility. I’m in opposition to trying to shovel more into the repr: to try to extend it with more flexibility and therefore increase the burden on developers to be useful in that scenario. I don’t disagree that the repr *can* be more: I just don’t think it should be. The repr is, like it or not, implicitly unstructured. Because the return value of __repr__ is a string, we cannot easily extend it for utility without pushing that cost out to developers. That is, if we wanted to support your “pretty=True” argument to repr(), all developers would need to start writing __repr__ functions that respond to that flag. This is, IMO, a bad way to obtain the feature you want, because at this point ‘pretty’ becomes constrained by the original author. It also means that some flags may not work if the author has not supported them: ‘pretty’ may do nothing, ‘pretty’ may work but ‘depth’ may be ignored, or ‘width’. What I’ve understood from your mails is that you want is the ability to provide a *structured* debug representation of an object. That is, a function to hang off an object that is formatted according to the preferences of the user, not the developer. I think this is a good thing to have, because structured debug representations allow for more automated processing, as well as additional utility for users: see for example the structlog[0] module, which applies this concept to logging. This is a good and reasonable thing to want, but it’s a different thing to the repr. The repr is fundamentally an unstructured representation of whatever the author believed was useful to describe about the object. I don’t see any reason to remove that or to put pressure on that mechanism to be more than it is. Cory [0]: https://structlog.readthedocs.org/en/stable/

On Feb 9, 2016, at 00:56, Cory Benfield <cory@lukasa.co.uk> wrote:
So? One use of round-trippable reprs is to copy them from output and paste them into source code or an interactive session. When doing so, you almost always know that you're dealing with a builtin or third-party type that's round-trippable--and, when you're surprised, it's almost always obvious, because you get something that looks nothing at all like the source equivalent, and that raises a SyntaxError if you try to evaluate it anyway. And even those "almost"s aren't a problem in practice. Sure, a list that recursively contains itself looks misleadingly round-trippable, and will evaluate successfully into the wrong thing. But this rarely comes up in practice--and, if it does, because there's a human being inspecting, debugging, or playing with things, rather than a program, it's easy to deal with. This isn't theoretical--I do this all the time when debugging code, I write my own types to make them easier to debug this way, and it saves me time and hassle. In fact, it's one of the benefits of using Python over some of the other languages I use, where repr or toDebugString or whatever is never useful, instead of being usually useful especially with common types. The only thing the non-universality of round-tripping means is that you can't use repr with eval as a persistence format. Which is a good thing--you *shouldn't* use it as a persistence format, and that would be true even if it did work. But using it as an inspecting/debugging format is not a problem, and breaking that would be a bad idea. In fact, breaking it would make repr nearly pointless. Except for a types that define __repr__ but block __str__ with a TypeError (which is rare, and it's debatable whether those types are even valid), when would you ever use repr otherwise?

I don’t think we’re arguing the same point. I’m not saying that __repr__ shouldn’t be round-trippable: if that makes sense for your type then totally fine, go for it. However, I am saying that I don’t see the advantage in having *both* a round-trippable and non-round-trippable repr for the same type. If you’re copying and pasting then the length of the round-trippable representation is a non-issue, and you actively want it to appear in your debug output rather than a shorter version that elides information required to round-trip the data. So in this case, what purpose does the shorter version serve? Cory

On Tue, Feb 9, 2016, at 03:56, Cory Benfield wrote:
One other example is classes and functions though in many cases it's not clear why this should be the case. Have the default for top-level functions and classes check whether it's reachable through [module].[name] and if so return that. The default for methods could use [class].[name], or [obj].[name] for bound methods. Instead of the current representation, you could have the default repr on objects use pickle. def __repr__(self): return 'pickle.loads(%r)' % pickle.dumps(self) Or maybe have a helper function e.g. def unrepr(description, pickled) return pickle.dumps(pickled) # ignore description and default to "unrepr('MyClass object at 0x12345678', b'pickledstuff')" [Speaking of pickle and since you mentioned memoryview, I noticed a bug: pickle apparently successfully generates output for a memoryview, but it fails on attempting to unpickle it - I think that it should throw an exception immediately on attempting to pickle it.] Certainly there's the odd edge case where it doesn't make sense for it to be round-trippable, and there's a tradeoff between human-readability and round-trippability (see e.g. all the classes that can't be round-tripped unless the class is imported)... but that's part of where a short-repr vs long-repr could make sense - short-repr could Decimal('123.456') or even <Decimal 123.456> whereas long-repr returns __import__('decimal').Decimal('123.456')

On Wednesday, February 10, 2016 12:29 PM, Random832 <random832@fastmail.com> wrote:
That seems like a bad idea. First, as a human programmer, the repr "[1, 2, (3, 'spam')]" means something to me--and the same thing to me as to Python; the repr "unrepr('silly.C object at 0x106ec2b38', b'\x80\x03csilly\nC\nq\x00)\x81q\x01}q\x02X\x01\x00\x00\x00xq\x03K\x02sb.')" means less to me than the current "<silly.C at 0x106ec2b38>", while taking up a lot more space. Meanwhile, if I've changed the silly module since printing out the repr, that pickle will fail--or, worse, and more likely, succeed but give me the wrong value. And, since eval'ing reprs is generally something you do when experimenting or when debugging code that you're actively working on, this will be very common. Meanwhile, eval("[1, 2, (3, 'spam')]") == [1, 2, (3, 'spam')], and that's true for most containers and "value-type" objects, which tend to be the kinds of things that have round-trippable reprs today. That probably won't be true for instances of arbitrary types where the creator didn't think of a repr. Finally, when I paste a repr into the REPL or into my source and it needs further qualification, as with datetime or Decimal, I can almost always figure this out pretty easily. And I would much rather have to figure it out and then paste "decimal.Decimal('123.456')" into my source than have the computer figure it out and paste "__import__('decimal').Decimal('123.456')" or something with a pickle in the middle of it into my source. It's true that pasting reprs is just something that tends to often work when you'd expect it to today, not something guaranteed--but since repr is used for exploration and debugging by humans, not for automated processing by scripts, "tends to often work" tends to often be very good. So there really isn't a problem here. And your suggestion doesn't really help that use anyway; the only use it helps is using repr as an automated serialization format, which we explicitly _don't_ want to help. If you want automated serialization, just use pickle. There's no reason to twist repr and eval to support that use case almost but not quite as well (and even less securely and efficiently). If you want something more readable, use something like YAML with a custom type library, or jsonpickle, etc. Or, if you think we need a more human-readable pickle format, that might be an interesting idea, but there's no reason to twist repr into it, or to force it to be readable by eval instead of loads.

Greg Ewing <greg.ewing@canterbury.ac.nz> writes:
Or, to avoid calling dunder methods directly, use ‘type(x)’ and ‘id(x)’. Whether all these suggestions address the stated requirement will depend on how predictable is meant by “predictably short”. -- \ “All persons, living and dead, are purely coincidental.” | `\ —_Timequake_, Kurt Vonnegut | _o__) | Ben Finney

On Mon, Feb 8, 2016 at 3:36 PM, Mahmoud Hashemi <mahmoud@hatnote.com> wrote:
numpy as realized this, and produced a __repr__ (and __str__) that truncates: In [*26*]: len(arr) Out[*26*]: 100000 In [*27*]: repr(arr) Out[*27*]: 'array([ 0.00000000e+00, 1.00001000e-02, 2.00002000e-02, ...,\n 9.99980000e+02, 9.99990000e+02, 1.00000000e+03])' I"m not sure that a full-sized repr is ever useful, so this seems fine to me. I wonder how often anyone actually counts on eval(repr(obj)) == obj ? In short, I don't see that this would be all that useful. -CHB
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Mon, Feb 8, 2016 at 6:36 PM Mahmoud Hashemi <mahmoud@hatnote.com> wrote:
With the addition of ``__short_repr__`` would there be a change to the default Python REPL? If not, how would you use the new magic method? If so, then why not just change __repr__ for all the built-ins to truncate automatically? As far as I know, reprs are not part of the backwards compatibility guarantee. (Note, I'm not advocating for changing reprs.) If the usage would be to write a function that checks for the existence of __short_repr__, then why not simply move the implementation of ``if len(s) < maxlen`` etc. to that function?

Roundtrippable reprs are certainly part of Python canon, whether or not they are universally used (Chris), or guaranteed (Mike). I could see __short_repr__ (and associated reprlib) being the desired behavior in some console environments, but I'm not one to say it should be the default in the main Python REPL. My use cases are for 1) a web based console/REPL and 2) a configuration store that maintains a human readable history of past values (i.e., it does not maintain references to the objects themselves). But mostly I wanted to kick off the discussion of how to update reprlib (and pprint) to be more efficient and applicable. Mahmoud On Mon, Feb 8, 2016 at 4:28 PM, Michael Selik <mike@selik.org> wrote:

On 9 Feb 2016, at 02:49, Mahmoud Hashemi <mahmoud@hatnote.com> wrote:
Roundtrippable reprs are certainly part of Python canon, whether or not they are universally used (Chris), or guaranteed (Mike).
They can be part of the canon all they want, but if they’re not universally guaranteed then I don’t know that this is a real problem. It means that the world of Python objects divides into two kinds: first, those with __repr__ return values that can be round tripped and those with __repr__ return values that cannot be round tripped. Given that objects of the second type already exist (I know this for a fact because I have written some quite recently!), it would be an error to assume that the identity 'eval(repr(x)) == x’ holds for arbitrary types. In fact, not only does it not hold for all third-party types, it doesn’t even hold for all built-in types:
I think the reality is that there is no constraint on the representation of arbitrary types to be round-trippable in any way. Again, all custom types have non-round-trippable representations by default, many more eclectic built-in types have non-round-tripppable representations (in addition to NaN, the memoryview object leaps to mind). I can also note the Python documentation on repr:
For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object.
If the language doesn’t even try to enforce the idea that representations will be round-trippable, I think there’s just no problem here. Cory

If you want to make the case that default reprs should be non-roundtrippable in the case that they're too long, that's a fine and separate discussion. Though I have argued in the past that float('nan') would be a less surprising/more useful repr. And that's what it's about, usefulness. It's useful to be able to copy in and out of the REPL, even for very large reprs. I've done it, my coworkers and students do it, and you've probably done it as well. But there are other times (often outside the REPL), where that is not the case, and being able to address them explicitly, in the vein of reprlib and pprint -- but better -- would be *useful*. Who hasn't wished that the built-in defaultdict and OrderedDict were as pprintable or reprlib.repr-able as dict. There's plenty of room to improve. On Tue, Feb 9, 2016 at 12:56 AM, Cory Benfield <cory@lukasa.co.uk> wrote:

The appropriate truncation for any repr that is too long would depend on the particulars of both the object and the purpose of the output. If reprlib and pprint were improved for say, defaultdict and OrderedDict, or more generic things, what would its features be? I'm guessing you wouldn't just duplicate the functionality of ``pprint(dict(x))``. On Tue, Feb 9, 2016, 4:09 AM Mahmoud Hashemi <mahmoud@hatnote.com> wrote:

For those examples, the pprint/reprlib behavior probably wouldn't be much different than dict/list of tuples. I can put my vote in, but I prefer to defer design specifics to the creators/maintainers of those types. That's my point. Right now I feel like repr() is a powerful, but underleveraged function in the builtin ecosystem. Instead of pprint.pformat(), why not add a keyword argument, and have repr(obj, pretty=True), and have that call through with arguments to a more full featured __repr__? Same could go for repr(obj, length=150, depth=2). These could have defaults set for REPL sessions. Everyone is doing hacky, incomplete heuristic introspection, from IPython to the standard library. Instead, let's embrace the complexities of viewing application state, and allow for formalization of community efforts. Mahmoud On Wed, Feb 10, 2016 at 11:18 AM, Michael Selik <mike@selik.org> wrote:

If we want to embrace the complexities of viewing application state, perhaps we should divorce this conversation from the repr altogether. There’s plenty of applications where “viewing application state” will require more than the repr generally provides: you’ll want insight into private member variables, into the state of composed objects, and potentially into the state of some of those private member variables (e.g. is this lock held at this moment?). This is outside the generally accepted scope of the repr as I understand it: most people don’t provide the repr in this form. If you want the complexity of application state then you probably want some way to dump the graph of objects starting with a specific one, along with their state. That’s totally beyond the scope of the repr, and is closer to what pickle would do for you (though as Andrew points out, pickle is not the right tool for this job either). What I don’t understand is: if you are in the REPL already, what does any of this buy you? The REPL has plenty of powerful tools for introspecting Python objects: specifically, it has the Python programming language! If you want to introspect application state from the REPL, then just grab the object you want and start interrogating its state. dir() works, printing fields works, calling methods works, and all of this is far more powerful than anything you can get from the repr. Cory

Michael: Yes, I have several projects on PyPI, some of which are related and somewhat popular, but I hardly think that a PyPI package can encompass the API changes I'm describing. We already have several built-in modules (reprlib, pprint) and 3rd party modules (IPython Notebook, Django 500 pages, the Werkzeug debugger) that see a great deal of usage, but only support a few builtinn types and have to hack heuristics for the rest. Cory: Introspecting application state happens at many levels, one step at a time. Not only is perfect the enemy of good here, but I really don't think an omni-solution like the one you're describing is a realistic, Pythonic goal. Besides, graphs are powerful, but don't go underestimating linear text. An enhanced repr allows developer-users to communicate the context of their usage so that the library authors can programmatically provide and prioritize information in their types. I've seen (and written) plenty of reprs that encode useful information in my types. To your example, I might have: <MyLock 0x123 held=True>. I still think you're selling repr short. It can be much more. One day we might have:
repr(obj, pretty=True, depth=2, width=120)
Though, as I said before, my primary use case isn't the REPL, and those arguments should be part of the REPL config (pprint defaults to width=80 for consoles). The point is that repr has room to evolve. Mahmoud On Thu, Feb 11, 2016 at 1:00 AM, Cory Benfield <cory@lukasa.co.uk> wrote:

On 11 February 2016 at 19:24, Mahmoud Hashemi <mahmoud@hatnote.com> wrote:
I generally haven't been following these threads, but this one caught my eye. Making pprint extensible was one of the rationales for adding functools.singledispatch to the standard library. However, as far as I am aware, we haven't had anyone find the time to follow up on that change by actually applying it where it would make sense to do so. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Feb 11, 2016 at 01:24:42AM -0800, Mahmoud Hashemi wrote:
I'm sympathetic, but I'm not sure that repr() is the right place to be focusing. repr() is too closely tied to the REPL and the existing string formatting codes to start making radical changes. But we do have the pprint library, which (in my opinion) is underpowered and underused. I know that some people think that Python already has too many string conversion functions ( str()/__str__ and repr()/__repr__ ) but for my own use I've occasionally found that it has too few. Some time ago, I started working on a library for continued fractions. I count at least five useful/standard representations for a continued fraction. All of these would be equivalent: ContinuedFraction(1, 2, 3, 4, 5) [1; 2, 3, 4, 5] 1 + 1/(2 + 1/(3 + 1/(4 + 1/5))) 1 1 1 1 1 + ----- ----- ----- ----- 2 + 3 + 4 + 5 1 1 + ----------------- 1 2 + ------------- 1 3 + --------- 1 4 + ----- 5 (Apart from the first, the remaining four are standard notations used in mathematics.) Now obviously I can just add my own string conversion methods to the class, but it would be nice to be able to integrate with the pprint (and maybe reprlib) libraries in some way rather than an ad hoc set of arbitrarily named methods? I must admit that I'm just thinking aloud here, I don't actually have a concrete idea. But maybe my use-case will spark an idea in someone else. -- Steve

On Thu, Feb 11, 2016, at 05:59, Steven D'Aprano wrote:
(Apart from the first, the remaining four are standard notations used in mathematics.)
How about 1+\dfrac 1{2+\dfrac 1{3+\dfrac 1{4+\dfrac 1 5}}}? A way to get a "mathematical abstract syntax tree" that can be operated on and transformed into that or any of your notations [1+1/(2+1/(3+1/(4+1/5))) might be the default] for numeric objects (and other things such as matrices and vectors) might be nice.

On 11 Feb 2016, at 09:24, Mahmoud Hashemi <mahmoud@hatnote.com> wrote:
An enhanced repr allows developer-users to communicate the context of their usage so that the library authors can programmatically provide and prioritize information in their types. I've seen (and written) plenty of reprs that encode useful information in my types. To your example, I might have: <MyLock 0x123 held=True>.
To be clear, I absolutely do that as well. In fact, if I’m writing a repr that’s almost always the form I give it. Again, I’m not arguing that the repr isn’t useful. I’m arguing that it *is* useful, and that attempting to extend its utility with flags and complex functionality dilutes its utility. I’m in opposition to trying to shovel more into the repr: to try to extend it with more flexibility and therefore increase the burden on developers to be useful in that scenario. I don’t disagree that the repr *can* be more: I just don’t think it should be. The repr is, like it or not, implicitly unstructured. Because the return value of __repr__ is a string, we cannot easily extend it for utility without pushing that cost out to developers. That is, if we wanted to support your “pretty=True” argument to repr(), all developers would need to start writing __repr__ functions that respond to that flag. This is, IMO, a bad way to obtain the feature you want, because at this point ‘pretty’ becomes constrained by the original author. It also means that some flags may not work if the author has not supported them: ‘pretty’ may do nothing, ‘pretty’ may work but ‘depth’ may be ignored, or ‘width’. What I’ve understood from your mails is that you want is the ability to provide a *structured* debug representation of an object. That is, a function to hang off an object that is formatted according to the preferences of the user, not the developer. I think this is a good thing to have, because structured debug representations allow for more automated processing, as well as additional utility for users: see for example the structlog[0] module, which applies this concept to logging. This is a good and reasonable thing to want, but it’s a different thing to the repr. The repr is fundamentally an unstructured representation of whatever the author believed was useful to describe about the object. I don’t see any reason to remove that or to put pressure on that mechanism to be more than it is. Cory [0]: https://structlog.readthedocs.org/en/stable/

On Feb 9, 2016, at 00:56, Cory Benfield <cory@lukasa.co.uk> wrote:
So? One use of round-trippable reprs is to copy them from output and paste them into source code or an interactive session. When doing so, you almost always know that you're dealing with a builtin or third-party type that's round-trippable--and, when you're surprised, it's almost always obvious, because you get something that looks nothing at all like the source equivalent, and that raises a SyntaxError if you try to evaluate it anyway. And even those "almost"s aren't a problem in practice. Sure, a list that recursively contains itself looks misleadingly round-trippable, and will evaluate successfully into the wrong thing. But this rarely comes up in practice--and, if it does, because there's a human being inspecting, debugging, or playing with things, rather than a program, it's easy to deal with. This isn't theoretical--I do this all the time when debugging code, I write my own types to make them easier to debug this way, and it saves me time and hassle. In fact, it's one of the benefits of using Python over some of the other languages I use, where repr or toDebugString or whatever is never useful, instead of being usually useful especially with common types. The only thing the non-universality of round-tripping means is that you can't use repr with eval as a persistence format. Which is a good thing--you *shouldn't* use it as a persistence format, and that would be true even if it did work. But using it as an inspecting/debugging format is not a problem, and breaking that would be a bad idea. In fact, breaking it would make repr nearly pointless. Except for a types that define __repr__ but block __str__ with a TypeError (which is rare, and it's debatable whether those types are even valid), when would you ever use repr otherwise?

I don’t think we’re arguing the same point. I’m not saying that __repr__ shouldn’t be round-trippable: if that makes sense for your type then totally fine, go for it. However, I am saying that I don’t see the advantage in having *both* a round-trippable and non-round-trippable repr for the same type. If you’re copying and pasting then the length of the round-trippable representation is a non-issue, and you actively want it to appear in your debug output rather than a shorter version that elides information required to round-trip the data. So in this case, what purpose does the shorter version serve? Cory

On Tue, Feb 9, 2016, at 03:56, Cory Benfield wrote:
One other example is classes and functions though in many cases it's not clear why this should be the case. Have the default for top-level functions and classes check whether it's reachable through [module].[name] and if so return that. The default for methods could use [class].[name], or [obj].[name] for bound methods. Instead of the current representation, you could have the default repr on objects use pickle. def __repr__(self): return 'pickle.loads(%r)' % pickle.dumps(self) Or maybe have a helper function e.g. def unrepr(description, pickled) return pickle.dumps(pickled) # ignore description and default to "unrepr('MyClass object at 0x12345678', b'pickledstuff')" [Speaking of pickle and since you mentioned memoryview, I noticed a bug: pickle apparently successfully generates output for a memoryview, but it fails on attempting to unpickle it - I think that it should throw an exception immediately on attempting to pickle it.] Certainly there's the odd edge case where it doesn't make sense for it to be round-trippable, and there's a tradeoff between human-readability and round-trippability (see e.g. all the classes that can't be round-tripped unless the class is imported)... but that's part of where a short-repr vs long-repr could make sense - short-repr could Decimal('123.456') or even <Decimal 123.456> whereas long-repr returns __import__('decimal').Decimal('123.456')

On Wednesday, February 10, 2016 12:29 PM, Random832 <random832@fastmail.com> wrote:
That seems like a bad idea. First, as a human programmer, the repr "[1, 2, (3, 'spam')]" means something to me--and the same thing to me as to Python; the repr "unrepr('silly.C object at 0x106ec2b38', b'\x80\x03csilly\nC\nq\x00)\x81q\x01}q\x02X\x01\x00\x00\x00xq\x03K\x02sb.')" means less to me than the current "<silly.C at 0x106ec2b38>", while taking up a lot more space. Meanwhile, if I've changed the silly module since printing out the repr, that pickle will fail--or, worse, and more likely, succeed but give me the wrong value. And, since eval'ing reprs is generally something you do when experimenting or when debugging code that you're actively working on, this will be very common. Meanwhile, eval("[1, 2, (3, 'spam')]") == [1, 2, (3, 'spam')], and that's true for most containers and "value-type" objects, which tend to be the kinds of things that have round-trippable reprs today. That probably won't be true for instances of arbitrary types where the creator didn't think of a repr. Finally, when I paste a repr into the REPL or into my source and it needs further qualification, as with datetime or Decimal, I can almost always figure this out pretty easily. And I would much rather have to figure it out and then paste "decimal.Decimal('123.456')" into my source than have the computer figure it out and paste "__import__('decimal').Decimal('123.456')" or something with a pickle in the middle of it into my source. It's true that pasting reprs is just something that tends to often work when you'd expect it to today, not something guaranteed--but since repr is used for exploration and debugging by humans, not for automated processing by scripts, "tends to often work" tends to often be very good. So there really isn't a problem here. And your suggestion doesn't really help that use anyway; the only use it helps is using repr as an automated serialization format, which we explicitly _don't_ want to help. If you want automated serialization, just use pickle. There's no reason to twist repr and eval to support that use case almost but not quite as well (and even less securely and efficiently). If you want something more readable, use something like YAML with a custom type library, or jsonpickle, etc. Or, if you think we need a more human-readable pickle format, that might be an interesting idea, but there's no reason to twist repr into it, or to force it to be readable by eval instead of loads.

Greg Ewing <greg.ewing@canterbury.ac.nz> writes:
Or, to avoid calling dunder methods directly, use ‘type(x)’ and ‘id(x)’. Whether all these suggestions address the stated requirement will depend on how predictable is meant by “predictably short”. -- \ “All persons, living and dead, are purely coincidental.” | `\ —_Timequake_, Kurt Vonnegut | _o__) | Ben Finney
participants (11)
-
Andrew Barnert
-
Ben Finney
-
Chris Barker
-
Cory Benfield
-
Greg Ewing
-
Mahmoud Hashemi
-
Michael Selik
-
Nick Coghlan
-
Random832
-
Steven D'Aprano
-
Sven R. Kunze