anything like C++ references?

Wed Jul 16 15:41:54 EDT 2003

On 16 Jul 2003 05:18:48 GMT, bokr at oz.net (Bengt Richter) wrote:

>>Copies are only made when they are needed. The lazy copy optimisation,
>>in other words, still exists.
>(BTW, don't get hung up on this part, ok? Skip forward to the ==[ code part ]==
>below if you're going to skip ;-)
>
>Ok, so the copy happens at b[2]=4 right? This is still useless copying
>if the holder of the "a" reference *wants* to have it as a continuing
>reference to the same single shared list. And you can program that in C++
>if you want to, and you presumably would. Except now you have to create pointer variable.

If you want to share a single object, you need an explicit way to do
it.

Let me use an analogy from a discussion held elsewhere but on a
related subject.

In Algol and Fortran were among the earliest high level languages.
There were, I believe, called 'third generation' (assembler as opposed
to machine code being the second generation). Hence the name of
'forth' (which was supposed to be ever higher level, though IMO it is
a lower level language) and also the term '4GL' that was heavily used
for languages like SQL at one time.

Anyway, Algol and Fortran apparently (I didn't know this until today)
*always* used call-by-reference.

So why is it that all C parameters are call-by-value (though the value
may be a pointer) and Pascal parameters are call-by-value by default?

This is new to me - my first experiences of C and Pascal was at an age
when the whos-language-is-better rows were about Pascal providing
call-by-reference. To me, the distinction is that C is lower level -
closer to what you'd do in assembler or machine code - whereas Pascal
is higher level. You can pass pointers in Pascal to achieve the same
goal as a 'var' parameter, but 'var' better expresses why you are
doing it.

I had assumed, therefore, that there had been a natural progression
from low-level assembler - that earlier high level languages would
either be like C or like Pascal, depending on just how high level they
are. But that's not true. Algol and Fortran both used
call-by-reference, and that somewhat established tradition was
rejected when C and Pascal were being developed.

Why the rejection? On a point of principle. When parameters are
call-by-reference by default, it allows accidental side-effects.
Sometimes, functions change parameters that callers aren't expecting
them to change.

When changing one identifier can affect another by default, errors can
happen. There are tools for doing this explicitly. They are called
pointers or references.

>
>Ok now how would you handle the case of multiple copies of that pointer variable?
>Make it "smart" so that you get copy-on-write of what it points to? That's the discussion
>we're having about the first level of Python references, ISTM.

Easy enough.

Reference counting for the object keeps track of how many variables
are currently using it as their value. Assign to as many variables as
you like - all you get is all those variables referencing one object
with a higher reference count.

Before modifying the object, you check the reference count. If greater
than one, you need to make a copy.

Pointer/reference objects have to point to the 'variable' rather than
the object. Access the object through that, and they are then
guarenteed to access the correct object that they were associated
with.

Pointers and references don't increment the reference count as they
are not supposed to get copies.

ASCII art warning...

  +---+    +-------+      +---------------+
  | x |--->| x var |----->| object : RC=2 |
  +---+    +-------+      +---------------+
                                ^
          +---+    +-------+    |
          | y |--->| y var |----+
          +---+    +-------+
                        ^
                        |
 +---+   +-------+   +-------------------+
 | z |-->| z var |-->| ptr object : RC=1 |
 +---+   +-------+   +-------------------+

Garbage collection continues to work as it already does, having
nothing to do with the new reference count in the object. The
intermediate 'variable' thing means that even if the original variable
goes out of scope, the pointer can still refer to it - and a new
variable created with the same name doesn't have to re-use the old
variable thingy.

This involves an extra layer of indirection compared with Python at
the moment. It would involve a (small) performance hit as a result.
But there is no copying of the object unless the copy is needed.

>>C++ uses exactly this kind of approach for the std::string class among
>>others. Copy-on-write is a pretty common transparent implementation
>>detail in 'heavy' classes, including those written by your everyday
>>programmers. Does that mean C++ is slower that Python? Of course not!
>>
>It could be, if the unwanted copying was big enough.

No, it can't. It *never* creates an unnecessary copy. That is the
whole point of lazy copying.

>I'm a bit disappointed in not getting a comment on class NSHorne ;-)

I *know* that you can use a class to get pointer behaviour. The issue
of pointers becomes important if you don't get that behaviour because
of copy-on-write.

>Does this give names in ns the "variable" semantics you wanted? Is there any difference
>other than "semantic sugar?" Explaining the difference, if any, will also clarify what
>you are after, to me, and maybe others ;-)

The issue is what happens by default.