Unification algorithm

On of the key steps involved in type inference is the unification algorithm. Given two type τ1 and τ2 a unifier if it exists is a type τ such that τ is a specialisation of both τ1 and τ2. The most general unifier of τ1 and τ2, if it exists, is the unifier which is most general, i.e. it is the unifier τ* of τ1 and τ2 such that all other unifiers τ is a specialisation of τ*. In this lecture we develop an algorithm to compute the most general unifier of two types τ1 and τ2.

1
2
3
4
5
6
7
8
9
10

> {- | Module defining the unification algorithm -}
>
> module Unification where
> import Lambda -- See the last lecture
> import qualified Data.Map as M -- To define specialisation.
> import Data.Either
> import Data.Maybe
>
> type Map = M.Map

Although we defined the unifier as a type, it is convenient when computing the unifier of τ1 and τ2 to compute the type specialisation Σ that unifies them to their most general unifier. We will therefore abuse the term most general unifier to also mean this specialisation. Also for simplicity we drop the adjective ``most general'' and just use unifier to mean the most general unifier.

1
2
3
4
5
6

> type Specialisation = Map String Type -- ^ A type specialisation
> type Result = Either String -- ^ The error message if it failed
> Specialisation -- ^ The unifier
>
> unify :: Type -> Type -> Result

A type specialisation is captured by a Map from type variable names to the corresponding specialisation. We given an function to compute τ[Σ] given a specialisation Σ.

1
2
3
4
5

> specialise :: Specialisation -> Type -> Type
> specialise sp t@(TV x) = maybe t (specialise sp) $ M.lookup x sp
> specialise sp (TA s t) = TA (specialise sp s) (specialise sp t)
> specialise sp i = i

We generalise the unification in two ways:

  1. We consider unification of types under a particular specialisation Σ. The unifier of τ1 and τ2 under the specialisation Σ is nothing but the unifier of τ1[Σ] and τ2[Σ]. The unification of two types can then be seen as unification under empty specialisation.
1
2
3

> genUnify :: Specialisation -> Type -> Type -> Result
> unify = genUnify M.empty
  1. Instead of considering unifiers of pairs of types τ1 and τ2, we consider the simultaneous unifier of a sequence of pairs {(τ1,σ1),,(τn,σn)}. A unifier of such a sequence is a specialisation Σ such that τi[Σ]=σi[Σ] for all 1in. Of course we want to find the most general of such unifier. We call this function genUnify'. Given, genUnify it is easy to define genUnify'.
1
2
3
4
5
6
7

> genUnify' :: Specialisation -- ^ pair of types to unify
> -> [(Type,Type)] -- ^ the specialisation to unify under
> -> Result -- ^ the resulting unifier
> genUnify' = foldl fld . Right
> where fld (Right sp) (tau,sigma) = genUnify sp tau sigma
> fld err _ = err

What is left is the definition of genUnify.

1
2
3
4
5
6
7

> genUnify sp (TV x) t = unifyV sp x t
> genUnify sp t (TV x) = unifyV sp x t
> genUnify sp INTEGER t = unifyI sp t
> genUnify sp t INTEGER = unifyI sp t
> genUnify sp ap1 ap2 = unifyA sp ap1 ap2
>

The order of the pattern matches are important. For example notice that in line 6, both ap1 and ap2 have to be an arrow type (why?) Also in lines 3 and 4, t can either be INTEGER or an arrow type.

So our rules of unification can are split into unifying a variable α with a type τ, a constant types (here Integer) with a type τ and unifying two arrow types. We capture these rules with functions unifyV, unifyI and unifyA respectively.

1
2
3
4

> unifyV :: Specialisation -> String -> Type -> Result
> unifyI :: Specialisation -> Type -> Result
> unifyA :: Specialisation -> Type -> Type -> Result

Unifying a variable and a type

Unification of variables can be tricky since specialisations are involved. Firstly, notice that if α occurs in a type τ, we cannot unify α with τ unless τ is α itself. We first give a function to check this.

1
2
3
4
5
6

> occurs :: Specialisation -> String -> Type -> Bool
> occurs sp x (TV y) | x == y = True
> | otherwise = maybe False (occurs sp x) $ M.lookup y sp
> occurs sp x (TA s t) = occurs sp x s || occurs sp x t
> occurs sp x _ = False

The rules of unification of a variable α and a type τ are thus

  1. If α has a specialisation σ, unify σ and tau,

  2. If τ is α or can be specialised to α then we already have the specialisation.

  3. If α is unspecialised then specialise it with τ provided there occurs no recursion either directly or indirectly.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

> unifyV sp x t = maybe (unifyV' sp x t) (genUnify sp t) $ M.lookup x sp
>
> unifyV' sp x INTEGER = Right $ M.insert x INTEGER sp
> unifyV' sp x var@(TV y) | x == y = Right $ sp
> | otherwise = maybe (Right $ M.insert x var sp)
> (unifyV' sp x)
> $ M.lookup y sp
> unifyV' sp x ap@(TA s t) | occurs sp x s = failV sp x ap
> | occurs sp x t = failV sp x ap
> | otherwise = Right $ M.insert x ap sp
>
> failV sp x ap = Left $ unwords [ "Fail to unify", x, "with" , ppSp sp ap
> , ": recursion detected."
> ]
>

Notice here that M.lookup x sp returns Nothing if x is not specialised under the specialisation sp. The function unifyVʹ unifies only an unspecialised variable with τ.

Unifying an integer and type

Notice here that the type τ is not a variable. Then it can only be Integer or an arrow type.

1
2
3
4
5

> unifyI sp INTEGER = Right sp
> unifyI sp t = Left $ unwords [ "Failed to unify Integer with type"
> , ppSp sp t
> ]

Unifying two arrow types

1
2
3
4
5
6
7

> unifyA sp ap1@(TA a b) ap2@(TA c d) = either errmsg Right
> $ genUnify' sp [(a,c),(b,d)]
> where errmsg str = Left $ unwords [ "while unifying"
> , ppSp sp ap1, "and"
> , ppSp sp ap2
> ] ++ "\n" ++ str

Helper functions.

We now document the helper functions used in the unification algorithm. First we give an algorithm to pretty print types. This makes our error messages more readable. Notice that this version does not put unnecessary brackets.

1
2
3
4
5
6
7
8
9
10
11
12
13
14

> -- | Pretty print a type
>
> pp :: Type -> String
> pp INTEGER = "Integer"
> pp (TV x) = x
> pp (TA s t) = bracket s ++ " -> " ++ pp t
> where bracket t@(TA r s) = "(" ++ pp t ++ ")"
> bracket s = pp s
>
> -- | Pretty print a specialised type
> ppSp :: Specialisation -> Type -> String
> ppSp sp = pp . specialise sp

Testing this code using ghci

Since the lecture used the module of the previous lecture you need to give the following commandline arguments.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

$ ghci src/lectures/Unification-algorithm.lhs src/lectures/Towards-type-inference.lhs
GHCi, version 7.0.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading package ffi-1.0 ... linking ... done.
[1 of 2] Compiling Lambda ( src/lectures/Towards-type-inference.lhs, interpreted )
[2 of 2] Compiling Unification ( src/lectures/Unification-algorithm.lhs, interpreted )
Ok, modules loaded: Unification, Lambda.
*Unification> let [a,b,c,d,e] = map TV $ words "a b c d e"
*Unification> unify (TA INTEGER a) b
Loading package array-0.3.0.2 ... linking ... done.
Loading package containers-0.4.0.0 ... linking ... done.
Right (fromList [("b",TA INTEGER (TV "a"))])
*Unification> unify (TA INTEGER a) (TA b (TA b c))
Right (fromList [("a",TA (TV "b") (TV "c")),("b",INTEGER)])
*Unification>
*Unification> let t = (TA a b)
*Unification> unify (TA INTEGER a) (TA c t)
Left "while unifying Integer -> a and c -> a -> b\nFail to unify a with a -> b : recursion detected."
*Unification> let Left l = unify (TA INTEGER a) (TA c t)
*Unification> putStrLn l
while unifying Integer -> a and c -> a -> b
Fail to unify a with a -> b : recursion detected.
*Unification>