What is a type error?

Joachim Durchholz jo at durchholz.org
Sat Jul 15 18:37:14 EDT 2006


Marshall schrieb:
> Joachim Durchholz wrote:
>> As I said elsewhere, the record has an identity even though it isn't
>> explicit in SQL.
> 
> Hmmmm. What can this mean?
> 
> In general, I feel that "records" are not the right conceptual
> level to think about.

They are, when it comes to aliasing of mutable data. I think it's 
justified by the fact that aliased mutable data has a galling tendency 
to break abstraction barriers. (More on this on request.)

> In any event, I am not sure what you mean by non-explicit
> identity.

The identity isn't visible from inside SQL. (Unless there's an OID 
facility available, which *is* an explicit identity.)

 > I would say, records in SQL have value, and their
> identity is exactly their value.

Definitely not. You can have two equal records and update just one of 
them, yielding non-equal records; by my definition (and by intuition), 
this means that the records were equal but not identical.

 > I do not see that they have
> any identity outside of their value. We can uniquely identify
> any particular record via a key, but a table may have more
> than one key, and an update may change the values of one
> key but not another. So it is not possible in general to
> definitely and uniquely assign a mapping from each record
> of a table after an update to each record of the table before
> the update, and if you can't do that, then where
> is the record identity?

Such a mapping is indeed possible. Simply extend the table with a new 
column, number the columns consecutively, and identify the records via 
that column.

But even if you don't do that, there's still identity. It is irrelevant 
whether the programs can directly read the value of the identity field; 
the adverse effects happen because updates are in-place. (If every 
update generated a new record, then we'd indeed have no identity.)

> Okay. At this point, though, the term aliasing has become extremely
> general. I believe "i+1+1" is an alias for "i+2" under this definition.

No, "i+1+1" isn't an alias in itself. It's an expression - to be an 
alias, it would have to be a reference to something.

However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly 
important - 1+1 is replacable by 2 in every context, so this is 
essentially the same as saying "a[i+2] is an alias of a[i+2]", which is 
vacuously true.

There's another aspect here. If two expressions are always aliases to 
the same mutable, that's usually easy to determine; this kind of 
aliasing is usually not much of a problem.
What's more of a problem are those cases where there's occasional 
aliasing. I.e. a[i] and a[j] may or may not be aliases of each other, 
depending on the current value of i and j, and *that* is a problem 
because the number of code paths to be tested doubles. It's even more of 
a problem because testing with random data will usually not uncover the 
case where the aliasing actually happens; you have to go around and 
construct test cases specifically for the code paths that have aliasing. 
Given that references may cross abstraction barriers (actually that's 
often the purpose of constructing a reference in the first place), this 
means you have to look for your test cases at multiple levels of 
software abstraction, and *that* is really, really bad.

> That is so general that I am concerned it has lost its ability to
> identify problems specific to pointers.

If the reference to "pointers" above means "references", then I don't 
know about any pointer problems that cannot be reproduced, in one form 
or the other, in any of the other aliasing mechanisms.

> Again, by generalizing the term this far, I am concerned with a
> loss of precision. If "joe" in the prolog is a references, then
> "reference" is just a term for "data" that is being used in a
> certain way. The conection with a specfic address space
> has been lost in favor of the full domain of the datatype.

Aliasing is indeed a more general idea that goes beyond address spaces.

However, identity and aliasing can be defined in fully abstract terms, 
so I welcome this opportunity to get rid of a too-concrete model.

>> The records still have identities. It's possible to have two WHERE
>> clauses that refer to the same record, and if you update the record
>> using one WHERE clause, the record returned by the other WHERE clause
>> will have changed, too.
> 
> Is this any different from saying that an expression that includes
> a variable will produce a different value if the variable changes?

Yes.
Note that the WHERE clause properly includes array indexing (just set up 
a table that has continuous numeric primary key, and a single other column).

I.e. I'm not talking about how a[i] is an alias of a[i+1] after updating 
i, I'm talking about how a[i] may be an alias of a[j].

> It seems odd to me to suggest that "i+1" has identity.

It doesn't (unless it's passed around as a closure, but that's 
irrelevant to this discussion).
"i" does have identity. "a[i]" does have identity. "a[i+1]" does have 
identity.
Let me say that for purposes of this discussion, if it can be assigned 
to (or otherwise mutated), it has identity. (We *can* assign identity to 
immutable things, but it's equivalent to equality and not interesting 
for this discussion.)

 > I can see
> that i has identity, but I would say that "i+1" has only value.

Agreed.

> But perhaps the ultimate upshoot of this thread is that my use
> of terminology is nonstandard.

It's somewhat restricted, but not really nonstandard.

>> Possibly. There are so many isolation levels that I have to look them up
>> whenever I want to get the terminology 100% correct.
> 
> Hmmm. Is it that there are so many, or that they are simply not
> part of our daily concern?

I guess it's the latter. IIRC there are four or five isolation levels.

 > It seems to me there are more different
> styles of parameter passing than there are isolation levels, but
> I don't usually see (competent) people (such as yourself) getting
> call-by-value confused with call-by-reference.

Indeed.
Though the number of parameter passing mechanisms isn't that large 
anyway. Off the top of my head, I could recite just three (by value, by 
reference, by name aka lazy), the rest are just combinations with other 
concepts (in/out/through, most notably) or a mapping to implementation 
details (by reference vs. "pointer by value" in C++, for example).

Regards,
Jo



More information about the Python-list mailing list