Memory address vs serial number in reprs

I have problem with the location of hexadecimal memory address in custom reprs. <threading.BoundedSemaphore: 2/3 at 0x7ff4c26b3eb0> vs <threading.BoundedSemaphore at 0x7ff4c26b3eb0: 2/3> The long hexadecimal number makes the repr longer and distracts attention from other useful information. We could get rid of it, but it is useful if we want to distinguish objects of the same type. Although it is hard to distinguish long hexadecimal numbers which differ only by few digits in the middle. What if use serial numbers to differentiate instances? <threading.BoundedSemaphore #5: 2/3> where the serial number starts with 1 and increased for every new instance of that type. The advantages are: * Shorter repr. * Easier to distinguish different objects. * The serial number is unique for the life of program and cannot be reused (in contrary to id/memory address). The disadvantages are: * Increased object size and creation time. I do not propose to use serial numbers for all objects, because it would increase the size of objects and the fixed-size integer can be overflowed for some short-living objects created in mass (like numbers, strings, tuples). But only for some custom objects implemented in Python, for which size and creation time are not critical. I want to start with synchronization objects in threading and multiprocessing which did not have custom reprs, than change reprs of locks and asyncio objects. Is it worth to do?

On Sun, Jul 19, 2020 at 06:38:30PM +0300, Serhiy Storchaka wrote:
What if use serial numbers to differentiate instances?
I like this idea. It is similar to how Jython and IronPython object IDs work: # Jython >>> id(None) 2 >>> id(len) 3 >>> object() <object object at 0x4>
This sounds reasonable to me. +1 -- Steven

On Sat, Jul 25, 2020 at 12:03:55PM +0300, Serhiy Storchaka wrote:
19.07.20 19:33, Steven D'Aprano пише:
No, I do not propose to change object IDs. I proposed only to use serial numbers instead of IDs in reprs of some classes.
Yes, I understood that you were only talking about reprs, and only for a few classes. I was pointing out a similarity, that was all. I'm sorry if I wasn't clear enough. -- Steven

That looks expensive, esp. for objects implemented in Python — an extra dict entry plus a new unique int object. What is the problem you are trying to solve for these objects specifically? Just that the hex numbers look distracting doesn’t strike me as sufficient motivation. On Sun, Jul 19, 2020 at 08:39 Serhiy Storchaka <storchaka@gmail.com> wrote:
-- --Guido (mobile)

19.07.20 20:02, Guido van Rossum пише:
It is the main problem that I want to solve. " at 0x7ff4c26b3eb0" is 18 characters long, and first and last digits usually are the same for different objects. Also, since objects can reuse memory after destroying other objects, unique identifier can help to analyze logs. It is not so expensive. New dict entry does not cost anything if the object already has a dict (space for 5 entries is reserved from the start). The size of small integer up to 2**30 is 28 bytes, and integers up to 255 does not cost anything. It is minor in comparison with the Python object size (48 bytes), dict size (104 bytes), and the size of other object attributes (locks, counters, etc). It is very unlikely the program will have millions of semaphores or event objects at one time, it is most likely it will use tens of them.

On Sun, Jul 19, 2020, at 13:02, Guido van Rossum wrote:
Could the numbers be kept outside the object, perhaps in a weak* dictionary that's maintained in the __repr__ method, so you don't pay for it if you don't use it? *if the object's hash/eq use identity, anyway... a "weak identity-keyed dictionary" might be a nice thing to add anyway

On Sun, 19 Jul 2020 18:38:30 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:
How about putting it in parentheses, to point more clearly that it can most of the time be ignored: <threading.BoundedSemaphore: 2/3 (at 0x7ff4c26b3eb0)>
I would like it if it applied to all objects, but doing it only for certain objects will be distracting and confusing (does the serial number point to a specific feature? it turns out it doesn't, it's just an arbitrary aesthetical choice). Regards Antoine.

Dear all, While it would be nice to have simpler identifiers for objects, it would be hard to make it work for multiprocessing, as objects in different interpreter would end up having the same repr. Shared objects (locks) might also have different serial numbers depending on how many objects have been created before it is communicated to the child process. regards Thomas Le dim. 19 juil. 2020 à 21:26, Antoine Pitrou <solipsis@pitrou.net> a écrit :

On 7/19/20 4:30 PM, Thomas Moreau wrote:
My guess is that these numbers are the 'id()' of the object, which as an implementation detail in CPython is the object address. If some other method was chosen for generating the object id, then by necessity, there would need to be a method to let multiple interpreters keep the number unique, perhaps some bits being reserved for an interpreter id, and the rest be a serial number. -- Richard Damon

On Sun, Jul 19, 2020 at 1:34 PM Thomas Moreau <thomas.moreau.2010@gmail.com> wrote:
Adding to what was said here, there are serious implications outside of the multiprocessing case, too... 1) In a multi-threaded Python, threads will need to contend over a per-type counter, serializing the allocation of those counted types. 2) In a Python with tagged immediates (like fixnums, etc.) the added space cost would disqualify counted types from being implemented as an immediate value. This would force counted types to be heap-allocated and suffer from the aforementioned serialization.

19.07.20 22:17, Antoine Pitrou пише:
It will just make the repr 2 characters longer and will not solve other problems (that first and last digits of the identifier of different objects usually are the same, and that the same identifier can be used for different objects in different time).

On 19/07/2020 16:38, Serhiy Storchaka wrote:
I have problem with the location of hexadecimal memory address in custom reprs.
I agree they are "noise" mostly and difficult to distinguish when you need to.
What would happen at a __class__ assignment? IIUC class assignability is an equivalence relation amongst types: serial numbers would have to be unique within the equivalence class, not within the type. Otherwise, they would have to change (unlike id()), may not round-trip if __class__ were assigned there and back. Jeff Allen

Hi Serhiy, Can I suggest using a short hash of the id as a prefix to the id? <object object at 0x7fc16c0a2ed0> would become something like: <object object #71 at 0x7fc16c0a2ed0> This approach uses no extra memory in the object and makes similar objects more visually distinct. It fails to make the repr shorter, and the hashed ids are not globally unique. The hash doesn't need to be secure, just have a good spread. Cheers, Mark. On 19/07/2020 4:38 pm, Serhiy Storchaka wrote:

On Sun, Jul 19, 2020 at 06:38:30PM +0300, Serhiy Storchaka wrote:
What if use serial numbers to differentiate instances?
I like this idea. It is similar to how Jython and IronPython object IDs work: # Jython >>> id(None) 2 >>> id(len) 3 >>> object() <object object at 0x4>
This sounds reasonable to me. +1 -- Steven

On Sat, Jul 25, 2020 at 12:03:55PM +0300, Serhiy Storchaka wrote:
19.07.20 19:33, Steven D'Aprano пише:
No, I do not propose to change object IDs. I proposed only to use serial numbers instead of IDs in reprs of some classes.
Yes, I understood that you were only talking about reprs, and only for a few classes. I was pointing out a similarity, that was all. I'm sorry if I wasn't clear enough. -- Steven

That looks expensive, esp. for objects implemented in Python — an extra dict entry plus a new unique int object. What is the problem you are trying to solve for these objects specifically? Just that the hex numbers look distracting doesn’t strike me as sufficient motivation. On Sun, Jul 19, 2020 at 08:39 Serhiy Storchaka <storchaka@gmail.com> wrote:
-- --Guido (mobile)

19.07.20 20:02, Guido van Rossum пише:
It is the main problem that I want to solve. " at 0x7ff4c26b3eb0" is 18 characters long, and first and last digits usually are the same for different objects. Also, since objects can reuse memory after destroying other objects, unique identifier can help to analyze logs. It is not so expensive. New dict entry does not cost anything if the object already has a dict (space for 5 entries is reserved from the start). The size of small integer up to 2**30 is 28 bytes, and integers up to 255 does not cost anything. It is minor in comparison with the Python object size (48 bytes), dict size (104 bytes), and the size of other object attributes (locks, counters, etc). It is very unlikely the program will have millions of semaphores or event objects at one time, it is most likely it will use tens of them.

On Sun, Jul 19, 2020, at 13:02, Guido van Rossum wrote:
Could the numbers be kept outside the object, perhaps in a weak* dictionary that's maintained in the __repr__ method, so you don't pay for it if you don't use it? *if the object's hash/eq use identity, anyway... a "weak identity-keyed dictionary" might be a nice thing to add anyway

On Sun, 19 Jul 2020 18:38:30 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:
How about putting it in parentheses, to point more clearly that it can most of the time be ignored: <threading.BoundedSemaphore: 2/3 (at 0x7ff4c26b3eb0)>
I would like it if it applied to all objects, but doing it only for certain objects will be distracting and confusing (does the serial number point to a specific feature? it turns out it doesn't, it's just an arbitrary aesthetical choice). Regards Antoine.

Dear all, While it would be nice to have simpler identifiers for objects, it would be hard to make it work for multiprocessing, as objects in different interpreter would end up having the same repr. Shared objects (locks) might also have different serial numbers depending on how many objects have been created before it is communicated to the child process. regards Thomas Le dim. 19 juil. 2020 à 21:26, Antoine Pitrou <solipsis@pitrou.net> a écrit :

On 7/19/20 4:30 PM, Thomas Moreau wrote:
My guess is that these numbers are the 'id()' of the object, which as an implementation detail in CPython is the object address. If some other method was chosen for generating the object id, then by necessity, there would need to be a method to let multiple interpreters keep the number unique, perhaps some bits being reserved for an interpreter id, and the rest be a serial number. -- Richard Damon

On Sun, Jul 19, 2020 at 1:34 PM Thomas Moreau <thomas.moreau.2010@gmail.com> wrote:
Adding to what was said here, there are serious implications outside of the multiprocessing case, too... 1) In a multi-threaded Python, threads will need to contend over a per-type counter, serializing the allocation of those counted types. 2) In a Python with tagged immediates (like fixnums, etc.) the added space cost would disqualify counted types from being implemented as an immediate value. This would force counted types to be heap-allocated and suffer from the aforementioned serialization.

19.07.20 22:17, Antoine Pitrou пише:
It will just make the repr 2 characters longer and will not solve other problems (that first and last digits of the identifier of different objects usually are the same, and that the same identifier can be used for different objects in different time).

On 19/07/2020 16:38, Serhiy Storchaka wrote:
I have problem with the location of hexadecimal memory address in custom reprs.
I agree they are "noise" mostly and difficult to distinguish when you need to.
What would happen at a __class__ assignment? IIUC class assignability is an equivalence relation amongst types: serial numbers would have to be unique within the equivalence class, not within the type. Otherwise, they would have to change (unlike id()), may not round-trip if __class__ were assigned there and back. Jeff Allen

Hi Serhiy, Can I suggest using a short hash of the id as a prefix to the id? <object object at 0x7fc16c0a2ed0> would become something like: <object object #71 at 0x7fc16c0a2ed0> This approach uses no extra memory in the object and makes similar objects more visually distinct. It fails to make the repr shorter, and the hashed ids are not globally unique. The hash doesn't need to be secure, just have a good spread. Cheers, Mark. On 19/07/2020 4:38 pm, Serhiy Storchaka wrote:
participants (10)
-
Antoine Pitrou
-
Carl Shapiro
-
Guido van Rossum
-
Jeff Allen
-
Mark Shannon
-
Random832
-
Richard Damon
-
Serhiy Storchaka
-
Steven D'Aprano
-
Thomas Moreau