Aim of the project -
In this course project we propose to build a simulator for theTable -tennis game with the main focus on designing various approaches to the other side of the table. using to make the players learn the shots in the virtual table-tennis environment that we will simulate. The simulator will provide an alternative to experimenting with real robots by modelling virtual robot players with neural networks controlling them.Why Reinforcement Learning ?
Past Work -
Reinforcement Learning has been used for control problems like Elevator Dispatching, dynamic channel allocation and strategy games like Back-Gammon and Checkers with very large state spaces of the order of 10^20. An alternative form of learning called supervised learning is learning from examples provided by an external agent, but alone it is not enough for learning from interaction. In interactive problems such as ours is a bit impractical to get examples of desired behavior that are both accurate and representative of all the situations in which the agent has to act. In uncharted territory---where one would expect learning to be most useful---an agent must be able to learn from its own experience. A game like Table-tennis ( or similar racquet sports) provide an interesting mix of control problems and strategy problems making the task of developing good players quite challenging.
Table-Tennis
Simulator used for Neural Networks -
Some illustrative work has been done by D Aulignac, A. Moschovinos and S. Lucas in building a 2D virtual Table-tennis simulator at Vase Labs, Univ. of Essex with the aim of holding a table tennis tournament where different players( robot controllers) could play against each other. They chose to have a 2D simulation and design neural networks and then train them using training sets generated by another program (the algorithmic controller). The simulator they initially built used a multilayer perceptron (MLP) architecture and later comparisons were made with radial-basis functions (RBF) architecture. A second approach which they have not implemented but suggested is the use of modular neural networks. It involves decomposing the task into a smaller sub-tasks where each is handled by a specialist network.The Acrobot
Reinforcement Learning has been applied to the task of simulating a gymnast as a two-link robot arm that learns to swing on a highbar. The system has been widely studied by control engineers (eg. Spong, 1994) and machine learning researchers (eg. Dejong and Spong, 1994; Boone, 1997). The learning algorithm used was Sarsa(lambda) with linear function approximation, tile coding, and replacing traces with a separate set of tilings for each action. Although this project has nothing to do with Table-tennis, this is essentially modelled as a Markov Decision Problem (MDP) and the reinforcement learning method used here is linear function approximation and tile coding. This introduces the key issue of generalisation, how experience with a limited subset of the state space is generalized to produce a good approximation over a much larger subset..This issue is also important in our Table-tennis agentMotivation
Methodology
Perhaps the greatest drive/motivation for this project is that little work has been done in this area - although similar problems have been tackled. Also the simulation can be a precursor to solutions for other aspects of this problem - namely vision and robotics, and integrating these solutions, we can perhaps get to see an actual robot playing table-tennis! By building the simulator we are creating a framework where we can test appropriate algorithms for the controlling a robot without going into the physical aspects. Also since racket sports like tennis, badminton, squash, and even games like baseball - are essentially the same in principal as far as reaching the ball is concerned - only difference being the performance measure of a shot and other physical aspects, with some changes in parameters, without modifying the basic framework, the simulator can model other sports. By incorporating human-like physical constraints for racket motion - the simulation can provide insights into the game of table-tennis.
The physics of the real game are simulated to adapt the requirements of a real-time virtual environment. These include
Performance measure for a shot : To train the algorithm for a specific shots. for instance , hitting the ball deep, along the lines - the reward function for Q-learning algorithm can be defined over the 2-d space of the table .
Approach 1
Underlying Assumptions : We assume that the velocity vector and coordinates for the bat are independent of each other - under this assumption we can decompose the problem of determing coordinates and velocity of the bat into two individual and unrelated parts.
The two modules are as follows:
- Intercepting the ball - A multi-layered perceptron network gives a sequence of actions for the bat to intercept the ball. Since the time steps in this case can be broken down naturally into a sequence of states - with each state characterised by - ball postion, ball velocity vector, bat position . In this case the action for the RL algorithm is increments in bat coordinates - x,y,z and the reward function is -1 for missing the ball, 1 for intercept and 0 for others.
where,S is the problem state
A is the action set for the state S.
X ,Y,Z are coordinates
V is velocity
The superscripts b and r refer to the ball and racket respectively.
The subscripts denote the components
terms supercripted with ' are determined by the simulator
terms superscripted by * are determined by the agent action
Approach IIDetermining the velocity/force of the bat at the point of interception : This phase again uses a neural network and a training set (described below). The input to the neural network is the interception point and the output produced is the velocity of the bat that will return the ball to the other side of the table. The generation of input/output pairs (training set) is done by a separate program that considers a particular perfomance measure. In this case at each timestep the action set comprises of increments in the velocity components, which using the coordinates of the bat in the previous state determines the next state. Each state is characterised by - ball position, ball velocity, bat position , bat velocity. Here also we use a Q-learning network with a different reward function R
R(s) = a high -ve value for missing the ball altogether
= r(x,z)
when the ball is returned and (x,z) is the point where the ball lands on the table (see fig.)
Here the trajectory of the bat will be continuous because the increments in the position coordinates
are dependent on the increments in velocity components.
Back to Contents
where,S is the problem state
A is the action set for the state S.
X ,Y,Z are coordinates
V is velocity
The superscripts b and r refer to the ball and racket respectively.
The subscripts denote the components
terms supercripted with ' are determined by the simulator
terms superscripted by * are determined by the agent action
The results in approach I will be more predictable especially from the second module because this does not involve Q-learning.Approach I
The following images show the frames captured with the corresponding state representation during the phase of finding intercept point between ball and bat. The motion of the bat is shown in dotted lines (our guesses)Frame No - 1ball position coordinates = 6.50 8.25 1.90
ball velocity-components = -50.00 -2.98 -1.00Approach II
(open image in new window for better view)Frame No - 2
ball position coordinates = 2.86 7.98 1.83
ball velocity-components= -22.69 -2.33 -0.45
Frame No - 3
ball position coordinates = -0.88 6.46 1.96
ball velocity-components = -13.53 -5.92 -0.71
Frame No - 4
ball position coordinates = -6.26 5.85 1.59
ball velocity-components = -5.75 1.93 -0.14
In this case the motion of the bat will be continuous. Also we expect some creative solutions in this case, for instance, the one illustrated below. It will favour such human-like solutions (shots) with back-swing , i.e., a better prepared player will play a better shot.
Back to ContentsReferences
Back to ContentsLinks to Resources: