[Types-sig] Interface PEP

Marcin 'Qrczak' Kowalczyk qrczak@knm.org.pl
17 Mar 2001 13:04:09 GMT


Thu, 15 Mar 2001 01:31:22 +0100, Sverker Nilsson <sverker.is@home.se> pis=
ze:

> What not only many but perhaps all people would agree on, including
> me, is that the builtin types or classes should be unified with
> user-defined classes in this way: So that user-defined classes can
> inherit from the classes that the objects with builtin types have.

I agree.

> That doesn't necessary mean unifying types with classes. It could just
> as well mean that we defined the classes of the builtin objects.

Why not to unify then?

I understand that there are technical reasons, like the speed
resulting from avoiding looking up operations by name, or the fact
that if calling an object is equivalent to calling its __call__
method, then x(4) is x.__call__(4) which is x.__call__.__call__(4)
which is x.__call__.__call__.__call__(4) etc., so we must break the
conceptual madness and say that calling a builtin function object
is not further reduced to calling its method but it has the callable
capability built in.

But from the point of users of the language there is no reason why
files should be objects of FileType and not objects of InstanceType
with class File. It's an artifact of the implementation.

> Why would it be cleaner to unify type and class, given the many
> variants that exist, that you describe (some of) yourself below?

Because concepts for which many other languages use terms "type"
and "class" are more different than concepts for which Python uses
these terms (except C++, but let's not follow this crazy language).

These words have different meanings in those languages and whether
types and classes can be unified in Python is independent from whether
they can be unified in Haskell. It's just a terminology clash.

> below you say that Eiffell had problems with having to analyze too
> much to see if a class would yield a subtype.
>=20
> But here you seem to say that type is separate from class, and it
> describes an interface.

Well, the term "type" is used when talking about Eiffel but the
language syntax has no "type declarations" which bind types to names
or the like.

ARRAY[INTEGER] is a type, because ARRAY is a generic class and INTEGER
is a type. INTEGER is a type because INTEGER is a class.

That is, for any non-generic class (a module which defines a set of
features, inheritance relations etc.) there exists a type based on it
(a sequence of characters that you can write e.g. when declaring an
argument of a method, which denotes that this method accepts particular
objects here, and can use features defined by the class on which the
type is based).

In a given context you always talk about a type or about a class.
It doesn't make sense to say "this may be either a type or a
class". Types are based on classes and classes yield types, but the
intersection of set of types and the set of classes is empty - these
are separate concepts.

> So the problems you say they had, seem to be the same whether they
> called the type a type or interface.

It's because they try to unify subtyping interfaces with inheriting
implementation (like C++ and Java do).

Eiffel allows viewing an object of a subclass as an object of a
superclass, including assignment to variables declared as holding
objects of the superclass. This is subtyping.

Eiffel also allows covariant changes of method types (replacing
declared types of method arguments with subtypes) and hiding features.
This is inheritance and there is nothing wrong with it.

What is wrong is when these two cases are combined, and a type is
used both to describe an interface (to pass objects of various types
under a parameter or attribute declared with that type) and to inherit
implementation (to let some classes include its features by default
and state only the differences) when the latter uses covariant changes
or feature hiding. This is unsound: the type system would not catch
some incompatible feature calls.

There are additional factors which makes it worse (maybe both are
aspects of the same problem):

1. Eiffel doesn't allow to have functions parametrized by types, only
classes parametrized by types, so it uses subtyping (a method accepting
two objects of type COMPARABLE) to express genericity (a method should
accept two object os any type which is COMPARABLE). Haskell solves
this case correctly by making interfaces (like COMPARABLE, called
Ord there) a different concept than types, and applying interface
constraints to types of parameters instead of parameters themselves.

2. Eiffel assumes that all arguments of generic types are covariant,
i.e. that ARRAY[INTEGER] is a subtype of ARRAY[COMPARABLE] only
because INTEGER is a subtype of COMPARABLE. But this is not true in
general, in particular for mutable arrays.

Array item assignment is contravariant by nature (if you want to put an
INTEGER in an array, you would accept a command which promises to take
a COMPARABLE - not vice versa). Other array operations are covariant
(if you want to extract a COMPARABLE, you will accept a function which
promises to extract an INTEGER). This makes the array type invariant.
I don't remember how Eiffel handles the problem.

Java has the same problem and throws a runtime exception if an array
of B is coerced to an array of B's superclass A and a wrong object is
assigned to it. Haskell doesn't have the problem because it doesn't
use subtyping to express genericity.

OCaml sometimes uses subtyping. Until recently it treated all generic
types as invariant (a list of As is not a subtype of a list of Bs
no matter how A and B are related, except when A =3D B). It doesn't
bite much because subtyping is not used much. Recently (version 3.01)
it added variance inference and explicit variance annotations, which
allows to coerce a list of As to a list of Bs if A is a subtype of
B, because the list type is covariant in its parameter. Arrays are
invariant because they are mutable. The function type is contravariant
on the argument type and covariant on the result type.

> Haskell is arguably one of the more sofisticated. Still they had
> problems defining general container classes, at least before they
> introduced dual inheritance.

Indeed container classes have some problems for the Haskell's type
system if they are to be generic enough (i.e. when some containers
are fully generic and others work only on element types conforming
to some interfaces or being a concrete type).

It works quite well if sequences, sets/bags, dictionaries and arrays
are treated separately. Chris Okasaki has done this (Edison library).
It doesn't work well if one wants to use the same method for operations
on different kinds of collections with different enough signatures
(e.g. if the same method is used for adding an element to a set and
a <key,value> pair to a dictionary) - it gets messy and requires
further language extensions.

> (I don't know if that's there now officially, I remember it was
> controversal.)

I assume that you refer to multiparametric classes. They are indeed
not in Haskell 98, and there is no other official standard. They are
supported by 2 out of 4 implementations (one of the implementations
where they are not supported is dead, the other is alive).

> And I read some paper that showed how impossible it was to define
> a natural join... in some context.

I don't know what is natural join.

> I'd be happy to use OCaml for its superb speed, have looked just a
> little in it, but may be held back afraid of future refactoring
> problems due to the static type system.
>=20
> BTW, does it have any dynamic types at all?

It has polymorphic variants (like algebraic types, but with variants
not belonging to particular types), which provide a larger flexibility
than traditional algebraic types while still statically ensure type
correctness.

It has Obj.magic which coerces an object of any type to any other type,
if you want to skip the type system.

[...]
> It's just a special case. Extending our knowledge could be made by
> having the class type be something more specific than InstanceType.
>=20
> Possibly the type could be dynamically generated and contain a pointer
> to the object, to allow for generality in checking that the methods
> are compatible with some other type (interface).

Looks very weird for me. Python's types in fact reflect the internal
representation, with some builtin cases and one generic case. I don't
see how this mechanism could help with interfaces.

Even if it was technically doable, it's wrong. An object doesn't have
a single interface. For two independent interfaces the fact whether
an object implements one of them is unrelated to the fact whether it
implements the other. There is no function object->interface but at
most a relation over <object,interface> pairs.

> > I assume that we use the term "type" as currently defined in Python,
> > and don't change that.
>=20
> Now it just means the types defined in types.py. Right?
> Are you saying that no more types should be added, except by
> new builtin objects?

No, I'm saying that we shouldn't change which concept is called
"type" (and "class" too) in Python, at least until types are classes
are unified. We may introduce new concepts, "interface", "protocol",
whatever, or change properties of types and classes a bit, but radical
terminology changes would be confusing.

> > ListType describes an implementation, i.e. how an object was
> > constructed. There is a particular object layout and particular
> > methods.
>=20
> When something has type ListType you know its particular methods.=20
> So why doesnt ListType define an interface too?

Because an interface rarely says anything about the representation.
Other types supporting the same interface should be generally accepted
where lists are accepted. ListType says too much about an object.

Python has exceptions to this. Many builtin types require other builtin
types, e.g. __dict__ of a builtin class object must have the builtin
dictionary type. It's slowly shifting to allow more generality, but...

...concrete types must be finally used somewhere. You can express
an abstract interface of an integer, but you can't express every
abstract interface in terms of other abstract interfaces. Finally
some real work must be done besides message passing.

If you try to express pure Smalltalk model in Haskell:
    newtype Object =3D O (String -> [Object] -> IO Object)
then you can't do anything useful unless you encode data in method
names, because the only thing you can do with an object is to send
it a message, so you can't know which damn integer number an object
represents to implement arithmetic or to index a sequence. Some
objects or some messages must be special.

There is a balance between concrete and abstractness. It's impossible
to be completely abstract, and should not be completely concrete
because it's very painful to use (as in C and Pascal).

> It also defines a particular object layout as you say. I'd say that
> should be considered a part of the interface, too.

Most of the time, no.

> > SequenceType describes an interface, i.e. how an object can be used.
>=20
> How an object can be used, isn't that exactly also what ListType
> describes?

ListType describes too much.

BTW, functional languages, especially lazy ones, better express
abstract interfaces in terms of concrete types than traditional
statically typed OO languages. In "functional" function closures are
important here, not the immutability of data.

A function closure can have many implementations. As long as they
use the same argument and result types, they are expressed in the
same function type. Contrast it with virtual methods in C++ or Java,
where different implementations of an abstract class are considered
separate types which can only be coerced to the common supertype.
Haskell doesn't have subtyping, but many OO style examples can be
straightforwardly converted to closures.

Laziness implies that a String object can cause arbitrary computation
to be performed while it is examined. Haskell doesn't use iterators
because lazy lists are enough. A list of ints has the same type as
all lists of ints even if evaluation of this particular list traverses
a tree in preorder.

These two aspects are related. Laziness can be expressed in a language
without builtin laziness using function closures. Lazy values are
similar to nullary functions.

> > A basic misunderstanding of some languages is confusing subtyping
> > (the ability to use one type if another is required, i.e. a relation
> > on interfaces) and inheritance (basing a class definition on another
> > class, i.e. reusing the implementation).
>=20
> So if I am understanding you correctly:
>=20
> The problem is you get a type (=3D class) that is based on a particular
> implementation? And not being able to control the interface separately
> from the implementation?

Here I had in mind Eiffel's problems.

> Yes in C++ there is no interface concept separate from classes, right?

Right. And thus C++ doesn't allow to change method signatures in
subclasses (except covariant change of the return type if it's
a pointer or reference; this feature is not implemented by all
compilers).

Fortunately in C++ it doesn't bite much because genericity is not
always expressed as subclassing, but by templates. Parametrization
by types is powerful and works well with static typing.

> > But you try to unify types / classes with interfaces, which
> > doesn't work well.
>=20
> No I wasnt trying to unify to 1 concept. Just to 2 concepts: types
> and classes.

This is very strange. Python's types and classes are used for the
same purpose: to implement objects, define how method calls are
dispatched, and finally get to object's data assuming it has the
right type or class (no matter if it's PyInt_AsLong in a C code
or a private attribute reference in pure Python's code).

Interfaces are used for another purpose: to define what kinds of
objects are expected in which places, and to formalize what is needed
to use the same general functionality on different kinds of objects.

An object has one type and one class (even if can be considered to
implicitly have its superclasses too), but implements many interfaces
independently of each other.

Types define the representation. Interfaces define the usage. Classes
are somewhat between. In dynamically typed languages they are much
closer to types, because usage is not checked wrt. declared classes -
in Python subclassing is only used to inherit behavior, not to create
subtypes.

Sorry, unifying types and interfaces while leaving classes alone is
complete nonsense.

--=20
 __("<  Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZAST=CAPCZA
QRCZAK