SUPERVISED LEARNING OF OPTIMAL GOAL SCORING BEHAVIOURSFOR A SIMULATED SOCCER ENVIRONMENT |
Sreangsu Acharyya 9810540
Indian Institute of Technology - Kanpur : August 1998
Consider the utopian case of a static ball before an untended goal :
If the environment is strictly deterministic you can aim anywhere between the goal posts . But real worlds are noisy and hostile so you have to select an aiming point such that the probability of scoring the goal is maximum considering the uncertainity in your ability of directing the ball exactly and those defenders out their to stop you etc.
The problem is not only to get the best shooting solution because their are other action options.So one needs to consider the best of all these actions and select the one which is the most relevant. So the problem has an outer and an inner loop
The
problem can be approached in several ways.
They have reported using simple feed -forward neural networks for learning shooting to an un-tended goal . Where the aiming point within the goal was fixed apriori. [1].
They have also used memory based learning techniques to learn shooting behaviour in presence of a goalie that revolves on a circle of a preassigned radius with a fixed velocity in front of the goal , goal scoring accuracy obtained was of the order of 60%.[2]
Decision tree learning algorithms have also been used to estimate the probability of succes of a pass to a tem member. This information then can be fed forward for strategiic planning. [3]
The CMUnited team is currently
working on a re-inforcement learning model that can be applied to hidden
state or opaque markov chain models.[4]
The approach will be following Stone & Veloso's paradigm of layered learning , in which strategies are developed in a bottom up hierarchy of increasingly complex behaviours.
At the ground level will be the most basic: like the locomotive behaviours.
The still higher level will include
The solution
will be to incorporate both depending upon heir applicability
To impart a substantial amount of domain knowledge that one
is confident about and is not expected to change and leave the rest to
learning. So as to minimise re-inventing the wheel and still get an effective
system.
X, Y ,\THETA are the externally supplied variables internally we may need to keep internal representation asX , Y, \THETA ,Vx ,VY for each player and ball in addition we may have a boolean variable signifying which team has posession.
To facillitate analysis and reduce the search space we need to quantise the search space.
If Number of position states( x, y , \theta) = P
Number of velocity states = V
Asuming one position state can be occupied by a single player the state space size will be the order of
p(p-1)(p-2)...(p-2n) v2n *2 [where n is the size of each team ]
Which no doubt is a huge workspace and the curse of dimensionality is very real.REDUCTION:
i ) Through fuzzification of the variables like
penalty , forward , midfield
left flank right flank , centre.
ii) Consider symmetry i.e tranlation and rotation invariant variables and rules. like representing position as a relative to position of other players and goal,using abstracted informaton like direction{North, East,West , South} & accesibility window{broad,Mid , Narrow} .etc.
This requires 2 tiered optimisation.i.e.
An inner level optimisation that generates the best parameters for a fixed behaviour. Like selecting the best shooting angle .This inner loop has to be run for all the available behaviours to generate their individual best parameters . Make the most intelligent chice from among those optimised strategies,from the outer level.
PROBABILISTIC BASIS OF DECISION
Let C = Current state or configuration.
Action i = Ai
i) COARSE MODEL
P(success/C)=\sum P(sucess/Ai ).P(Ai /C)
Where P(sucess/Ai) approximately represents the skill of an agent as it is the probability that the action i will be executed succesfully.
On the other hand P(Ai /C) approximately represents the intelligence of the agent as it is the probability that the action Ai will be selected in the state c. The aim of the project would be to learn an apprppriate P(Ai /C)
Before presenting the objective function 2 types of failures on the soccer field needs to be distinguished.
i) Losing control to the opponent team F1
ii) Losing control but not to the opponent team i.e no man's land.F2Obviously F1 is more damaging than F2
Thus we can define
P(win/C) = [ \alpha. P(success/C) + \beta . P(F2/C) ] -\alphaoppnent [P(successopponent/Ai )
where
\alpha is the gross probability of winning when in control.\beta is the probability of winning when ball goes into no mans land(this will be very small)
The above will form the objective function to be maximised with P(Ai /C) as the decision variable.
ii) More Detailed Model
Here we introduce another term
P(w/Ai ,C ) = Probability of win if action Ai is executed succesfully in the state C.This will bring bout relative importance of different actions at a given configuration.For instance P(w/shoot to goal,C) is expected to be very near 1
Now
P(win/C) =\sum P(sucess/Ai ) .P(w/Ai ,C ) .P(Ai /C)
The above expression will serve as our new objective function to be optimised with respect to P(Ai /C)
RE-DEFINITION OF THE PROJECT
IN A MORE TANGIBLE FORM
In this project it is intended to train a neural network so that it returns the optimum parameters for a particular behaviour. In effect it is the inner layer optimisation. Online optimisation is ruled out with the present resources as the values have to be generated in real time.Thus an algorithm for numerical optimisation will train the network offline.It is hoped that the network will generalise sufficiently well to provide the optimal values for given configuration.
It is also intended that the network returns the value of the optimised objective function.
Thus the network will be learning P(sucess/Ai ) function as a supervised learning the numerival optimisation module acting as the coach or the teacher.
The system model that is being considered adds uniformly distributed noise with exponentially decreasing magnitude to the ball and player motions. These are guided by the following equations.
The amount of acceleration a player can impart to
the ball is limited by his stamina. The velocity of the ball is the
vector summation of a decayed fraction of the past corrupted
velocity with an additive noise whose range is proportional to the magnitude
of the velocity. The new positiion after unit time step is determined by
incrementing the past position with the current velocity.
Since the nature of the noise that is being added to the horizontal and vertical components of the velocities are known it is possible to deduce analytically the probability distribution of the angle \theta . Knowing this distribution it is possible to estimate the probability that the shot remains within the goal boundary by integrating it within appropriate limits. This can be understood from the foolowing diagram.
Real world has another complication ;that of defenders trying to stop the ball from entering the goal. This the probability of goal can be defined as
P(Goal) = P(shot remains within goal) && P(the ball un-intercepted).
From the formulation of the angle made with the x axis we get
Now one has to model the P(ball un-intercepted)
term.
Because of the complicated model it is not faesible to model the velocity
distribution of the players because of there highly nonlinear nature and
the abscence of an applicable inverse function. So for this project
simulation techniques were used to model the second term.
The governing equations are : for the path along the shot
has to be greater than the time taken by the ball.
One can combine these 2 probabilities to arrive at the probability of goal given the configuration of the field and the shooting angle.
/TO VIEW SOME OF
THE PLOTS GENERATED.
RADIAL BASIS FUNCTION NEURAL NETWORKS AND FUZZY LOGIC
The topology of the network is as shown.
Radial basis network formulation forms an alternative to modelling with feed-forward sigmoid neural networks. The network consits of a single hidden layer of neurons having radial activation functions . Here Gaussian function is used. The output of the units are given as
Here the outputs are calculated as
The final out put is obtained as
DETERMINATION OF FREE PARAMETERS
The frre parameters that are to be determined are the
Detrmination Of centres:
The centres of the radial basis
units can be obtained either by supervised
techniques ( gradient decent) or by unsupervised
techniques (clustering). Though supervised
learning is expected to result in further minimisation
of error here only unsupervised techniques are being
used as they are much faster .
Unsupervised learning can also be divided into 2 categories
HARD COMPETITIVE LEARNING:
Hard competitive learning essentially involves clustering of data vectors into optimum number of classes and finding a prototype to represent those clusters.
The algorithm is same as that of K-means
Clustering , usually only the input vectors are clustered ;however
in this implementation an extended vector containing all the input
components and some viatal output components have also been included to
avoid the possibility of the algorithm to return sets well clustered in
the input space but poorly clustered in the output space. Moreover to account
for the fact that the means of the individual components are differrnt
by orders of magnitude they have been apprpriately scaled.
The algorithm can be described as.
SOFT COMPETITIVE LEARNING:
Unlike hard competitive learning here a training instance may be assigned to different centres with varying degree of membership depending on the distance from th proposed centers. The membership values can range from [0 1]
The algorithm can be summarised as
It has been found that hard competitive learning often results in empty clusters, soft clustering is largely immune to this problem but suffers from the problem of repeated or very close centers which poses problems for determining the weight values of the output nodes.
DETERMINATION OF WIDTH SIGMA:
The sigma parameter or the width
of the radial basis units may also be determined by supervised learning.
But here they have been determined heuristically using two rules. Later
those estimates wee improved by supervised learning or gradient decent.
The rules followed for determining
the sigma values are
DETERMINATION OF WEIGHTS
The weights are obtained by the minimisationof the squred error with respect to the training set. The minimisation is done by equating the derivative of the error to 0. Though gradient decent may be used for the minimisation this can be solved analytically as it results in a system of linear equations.
GENERALISATION ERROR:
For good results it is not enough to minimise the error for the training set only. The error should be llow even for cases not contained by the training set. This is the ability of the network to generalise.
This can be implemented by splitting up the data set into training and validation sets. The obvious drawback being that the entire data set cannot be utilised as some of the data has to be sacrificed for validation.
There are other methods by which one can utilise the entire training set and still ensure a good degree of generalisation. eg
The project consists of two parts
Using these inputs the code trainx2.c can generate the probability of scoring a goal for different shooting angles; and thus identify the optimal shooting angle.
Some of the outputs generated can be seen HERE
The second part of the project is devoted to the finding
of the free parameters of the radial basis network . It includes
Ds,Dg,Dd1,Dd2,Theta_s,Theta_g,Theta_d1,Theta_d2.
RESULTS OBTAINED ,CONCLUSIONS & FUTURE EXTENSION
It can be seen that the dimension of the input vector is eight. Thus even with two fuzzy linguistic variable attached to these variables number of rules necessary is quite high i.e 2^8 .
Till now networks with 10 ~ 25 radial basis units have been tried and the error has been reduced to the order of 0.3. Thus it can be seen that with considerably less number of rules the error over the entire training set has been reduced.
However 0.3 is still not adequate for precise control of the shooting trajectory as envisioned during prposition of the project. Thus the current trained network can be used to identify 3 broad zones within the goal post where to shoot and not the exact optimal angle.
Since only heuristic approaches where taken to locate the centres it is expected that the performance will be better if those were selected through optimisation procedures. Gentic Algorithms being an obvious choice which will allow similar centres to compete with each other and the dissimilar to co-operate thereby effectively mapping the input and the output space.
Once an aceptable error margin has been crossed this network itself
can then be used to generate the optimal defensive strategies.
That will be possible because the
nework model being continuous and differentiable one can consider
the free parameters to be fixed and differentiate the output with respect
to the goalie and defender location and equate the deriavative to zero
to obtain the locations of the goalie and the defenders. Thus the network
will be used as a domain knowledge. One advantage of this will be the resolution
of conflict between the defenders so that each of them do not
run to the same spot.
1. Peter Stone, Manuela Veloso, Patrick Riley "Learning low-level Multi agent behaviours" available at http://www.cs.cmu.edu/afs/cs/usr/pstone/public/papers/
2. Peter Stone & Manuela Veloso "A layered approach to Learning
Client Behaviours in Robocup soccer" AAI vol 12 -1998
available at http://www.cs.cmu.edu/afs/cs/usr/pstone/public/papers/
3. Peter Stone & Manuela Veloso " Beating a Defender in a Robotic soccer: Memory based learning of continuous functions" Neural Information Processing Systems -1995, avilable at http://www.cs.cmu.edu/afs/cs/usr/pstone/public/papers/
4. Peter Stone & Manuela Veloo " Using Decision Tree confidence Factors for Multi-agent control" In "RoboCup-97: The First Robot World Cup Soccer Games and Conferences", H. Kitano (ed.), 1998. Springer Verlag, Berlin. available at http://www.cs.cmu.edu/afs/cs/usr/pstone/public/papers/
5. Peter Stone and Manuela Veloso " Task Decomposition and Dynamic Role
Assignment for Real-Time Strategic Teamwork." Fifth International
Workshop on Agent Theories, Architectures, and Languages (ATAL'98 ) avialble
at http://www.cs.cmu.edu/afs/cs/usr/pstone/public/papers/
Annotated Bibliography