Towards type inference

A powerful feature of Haskell is the automatic type inference of expressions. In the next few lectures, we will attempt to give an idea of how the type inference algorithm works. Ofcourse giving the type inference algorithm for the entire Haskell language is beyond the scope of this lecture so we take a toy example. Our aim is to give a complete type inference algorithm for an enriched version of lambda calculus, that has facilities to do integer arithmetic. Therefore, our lambda calulus expressions have, besides the other stuff we have seen, integer constants and the built in function '+'. We limit ourselves to only one operator because it is straight forward to extend our algorithm to work with other operation

Syntax of our Enriched Lambda calculus.

The syntax is given below. Here v and x stands for arbitrary variable and e1,e2 stands for arbitrary expressions.

e=...-101...+ve1e2λx.e

We will write the lambda calculus expression +e1e2 in its infix form e1+e2 for ease of readability.

The haskell datatype that captures our enriched lambda calculus expression is the following

1
2
3
4
5
6
7
8
9
10

> module Lambda where
>
> -- | The enriched lambda calculus expression.
> data Expr = C Integer -- ^ A constant
> | P -- ^ The plus operator
> | V String -- ^ The variable
> | A Expr Expr -- ^ function application
> | L String Expr -- ^ lambda abstraction
> deriving (Show, Eq, Ord)

Clearly stuff like A (C 2) (C 3) are invalid expressions but we ignore this for time being. One thing that the type checker can do for us is to catch such stuff.

Types

We now want to assign types to the enriched lambda calculus that we have. As far as we are concerned the types for us are

t=Zαt1t2

Here α is an arbitrary type variable. Again, we capture it in a Haskell datatype.

1
2
3
4

> data Type = INTEGER
> | TV String
> | TA Type Type deriving (Show, Eq, Ord)

Conventions

We will follow the following convention when dealing with type inference. The lambda calculus expressions will be denoted by Latin letters e, f, g etc with appropriate subscripts. We will reserve the Latin letters x, y, z and t for lambda calculus variables. Types will be represented by the Greek letter τ and σ with the letters α and β reserved for type variables.

Type specialisation

The notion of type specialisation is intuitivly clear. The type αβ is a more general type than αα. We use στ to denote the fact that σ is specialisation of τ. How do we formalise this notion of specialisation ? Firstly note that any constant type like for example integer cannot be specialised further. Secondly notice that a variable α can be specialised to a type τ as long as τ does not have an occurance of α in it. We will denote a variable specialisation by ατ. When we have a set of variable specialisation we have to ensure that there is no cyclicity indirectly. We doe this as follows. We say a sequence Σ={α1τ1,,αnτn} is a consistent set of specialisation if for each i, τi does not contain any of the variables αj, 1ji. Now we can define what a specialisation is. Given a consistent sequence of specialisation Σ let τ[Σ] denote the type obtained by substituting for variables in τ with their specialisations in Σ. Then we say that στ if there is a specialisation Σ such that τ[Σ]=σ. The specialisation order gives a way to compare two types. It is not a partial order but can be converted to one by appropriate quotienting. We say two types τ σ are isomorphic, denoted by στ if στ and τσ. It can be shown that forms an equivalence relation on types. Let τ denote the equivalence class associated with τ then, it can be show that is a partial order on τ.

Type environment

Recall that the value of a closed lambda calculus expression, i.e. a lambda calculus expression with no free variables, is completely determined. More generally, given an expression M, its value depends only on the free variables in it. Similary the type of an expression M is completely specified once all its free variables are assigned types. A type environment is an assignment of types to variables. So the general task is to infer the type of a lambda calculus expression M in a given type environment Γ where all the free varaibles of M have been assigned types. We will denote the type environments with with capital Greek letter Γ with appropriate subscripts if required. Some notations that we use is the following.

  1. We write x::τ to denote that the variable x has been assigned the type τ.

  2. For a variable x, we use Γ(x) to denote the type that the type environment Γ assigns to x.

  3. We write xΓ if Γ assigned a type for the variable x.

  4. The type environment Γ1Γ2 denotes the the type environment Γ such that Γ(x)=Γ2(x) if xΓ2 and Γ1(x) otherwise, i.e. the second type environment has a precedence.

As we described before, given a type environment Γ, the types of any lambda calculus expression whose free variables are assigned types in Γ can be infered. We use the notation Γe::τ to say that under the type environment Γ one can infer the type τ for e.

The type inference is like theorem proving: Think of infering e::τ as proving that the expression e has type τ. Such an inference requires a set of rules which for us will be the type inference rules. We express this inference rules in the following notation

Premise 1,,Premise nconclusion

The type inference rules that we have are the following

Rule Const : Γn::Z where n is an arbitrary integer.

Rule Plus : Γ+::ZZZ

Rule Var : Γ{x::τ}x::τ

Rule Apply : Γf::στ, Γe::σΓfe::τ

Rule Lambda : Γ{x::σ}e::τΓλx.e::στ

Rule Specialise : Γe::τ,στΓe::σ

The goal of the type inference algorithm is to infer the most general type, i.e. Given an type environment Γ and an expression e find the type τ that satisfies the following two conditions

  1. Γe::τ and,

  2. If Γe::σ then στ.

Exercises

  1. A pre-order is a relation that is both reflexive and transitive.
  2. Prove that if σ and τ are two types such that στ then prove that there is a bijection between the set Var(σ) and Var(τ) given by αiβi such that σ[Σ]=τ where Σ is a specialisation {αiβi1in}. In particular isomorphic types have same number of variables. (Hint: use induction on the number of variables that occur in σ and τ).