PROJECT PROPOSAL

COURSE-ME768
ARTIFICIAL INTELLIGENCE IN ENGINEERING
INSTRUCTOR: Dr. AMITABHA MUKHERJEE

ARTICULATED AGENT MOTIONS BASED ON NL INPUT

Kesari Anandsudhakar
Rajesh Rajasekar
Vikrant Kumar
email: { ands, rajraj, vikrantk }@iitk.ac.in

Motivation
Examples
Past Work
Techniques

Scene Creation
Path Planning
Camera Control
Motion Energy

Sample Input-Output
Bibliography

Motivation

The Virtual Director project at IIT Kanpur attempts to build a virtual environment and produce actions in it according to a natural language story. This project fits into the big picture beyond the stage where natural language has been parsed for grammar and semantic information, and the spatial data defuzzified. At this point, the input would be in the form of spatial descriptors and action directives which are at a level of complexity below natural language but higher than the most primitive of directives.

Objectives:

Perform path planing in the limited spatial domain of the story.
In order to amplify the impact of the visual presentation of the story, a judicious use of camera effects such as angle, position, and view angle will be made based on the storyline.
Each character, depending on his mental state, physique, age, and other factors has an "energy level" associated with his movements. This will be modelled.

Examples

The woman on the left narrated a joke animatedly. The fat man and the other woman listened. The other woman was amused by the joke, but the fat man wasn't
For this short segment from a story a sequence of images like the following may be produced.

Past Work

Work has been done in this field at the Indian Institute of Technology, Kanpur by Dr. Amitabha Mukherjee as a part of the Virtual Director project.

The aspects of anthropometrics and modelling are discussed by Norman Badler and Stephen Smoliar [Badler/Smoliar:1979]. Physical aspects such as the dynamics of complex objects are studied by Hoffman and Hopcroft [Hoffman/Hopcroft:1987]. This will be used to perform character animation.

Norman Badler has developed Jack, a basic human body animation system [Badler]. An extension of the basic Jack systm by Levison et.al. [Geib/Libby/Moore:1994] is SodaJack which simulates a soda bar operator. Action planning in the context of search and manipulation of objects is discussed.

Salesin et.al have created a precise language for the otherwise fuzzy process of defining shots and other basic elemnts of cinematography [Salesin/Cohen/Christianson:96]. Camera control to amplify the impact of the story can be based on this convention.

Lozano and Lozano-Perez explain the method of using visibility graphs to perform path planning in a domain with polygonal obstacles [Lozano/Lozano-Perez:1996].

Scene Creation: Potential Fields

We shall make use of continium functions(potential fields) for placing animated agents in our domain. These continium fields were derived from psycholinguistic experiments conducted on few people As a part of the Virtual Director Project. For placing an object with respect to other objects, the potential field of the objects will be superimposed. The new object will be placed on the minimum of the resultant potential field.

Path Planning: Visibility Graphs and Snakes

One of the tasks is to translate a directive like
A : move to t1
into an actual path from the present location of A to the specified location t1. This involves avoiding collisions with obstacles on the path and having some degree of error forgiveness which lend a higher level of realism.

The standard method of visibility graphs shall be used and information about convexity of each entity shall be encapsulated in that object itself and not left to be determined at run-time. The following is an illustraton to elucidate the process

A visibility graph is a graph where the nodes are the vertices of the polygonal obstacles and an edge is drawn from one vertex to another if the edge is "visible". An edge is visible if it does not intersect any obstacles. Once the static graph is constructed, goal and origin vertices are added and their visible edges are computed. This graph can then be searched for a shortest path between the origin and goal.

Create a grid and superimpose it on the configuration space. For every edge in every obstacle, check for intersection against grid lines. For each of the two endpoints and each intersection point, add it to the region that it belongs to. The intersection points are shared by more than one region.
For every region construct the visibility graph for points in that region, including shared points. In other words, for every pair of points in the region, that are "visible", add an edge between them in the global visibilty graph.
In addition to the points shared by two regions when an obstacle overlaps those regions, we need "glue points" in order to make sure that the set of local visibilty graphs is globally connected. Otherwise, empty regions can disconnect sections of the global visibility graph. At a minimum, we need to add a point to the corners of every region.
Once the static visibility graph is constructed, operations such as adding a vertex to the graph, adding an obstacle, or removing an obstacle. For example, to add a start or goal vertex, find the region that contains the vertex, and find the visible edges from the vertex to the local points maintained by that region. Add the vertex and those edges to the visibility graph.
In order to plan a path, we insert the start vertex and the goal vertex, and then search the global graph for a path between the two.

Images from

Lozano, Perez:1996


The highlighted regions represents where the origin and goal points are located. Only visible edges are added to points within these regions.	This shows the shortest path through the graph.

The resulting path contains a number of sharp corners. These will be removed by using a snake which has the path produced by the visibility-graph process as its initial configuration. Internal energy is a function of the length and sharpness of the curve. External energy is due to the potential fields of the obstacles.

Declarative Camera Control Language

Due to the work of Salesin et.al. there exists a precise language in which the configuration of cameras and lights can be specified. At every stage, the story has a focus which is the locale where the story is unfolding. Assuming this to be a single region, the camera continuously configures itself based on some action-specific idioms which are produced automatically from the storyline.

Camera Control

Here is an example of an idiom used to depict an interaction between two persons.
Images from

Salesin et.al:1996

Fragments are the atomic elements of a movie. Some of the fragments defined by DCCL are:

Motion Energy

Motion energy provides visual cues about the character's personality and emotions. Qualified verb phrases like "walked hurriedly", "looked cautiously", and "rose lazily"; and physical descriptors such as "old man", "coy maiden", and "ferocious dog" evoke distinct mental images.

These characteristics are imparted to a character by adjusting parameters of its gait and posture like speed of movements, length of strides, stoop of shoulder, swing of hands, and direction of gaze.

Sample Input-Output

The input to the program will be a description of the scene followed by a transcript of the interactions between the various components of the scene.
A man is trying to play ball with his girlfriend. But the robot seems to charm her more. Here is a sample of what the input may be like:


Sky.color=blue
Grass.color=green
//1
tree=new Tree
bench=new Bench
woman=new Woman
wall=new Wall
robot=new Robot
ball=new Ball
man=new Man
//2
tree.location=(-10,50)
bench.location=(0,20)
bench.color=red
woman.posture=sit
woman.location=bench
woman.wear=new Shirt(red)
woman.wear=new Skirt(blue)
wall.location=(-8,40)
wall.length=5
wall.orientation=45
robot.location=(3,18)
ball.color=red
ball.container=wall.top
ball.location=(-7,41)
man.location=(-9,50)
man.wear=new Shirt(blue)
man.wear=new Trouser(black)
//3
man.goto=ball
man.pickup=ball
man.goto=woman
man.give.object=ball
man.give.dest=woman
woman.accept.src=man.give
woman.wait=man.giving
man.give.go
woman.accept.go
woman.throw.object=ball
woman.throw.direction=robot
woman.throw.go
robot.wait=ball.land
robot.goto=ball
robot.pickup=ball
robot.throw.object=ball
robot.throw.direction=woman
man.goto=ball
man.pickup=ball
man.goto=woman
man.give.object=ball
man.give.dest=woman
woman.accept.src=man.give
woman.wait=man.giving
man.give.go
woman.accept.go
end

In this domain, Sky and Grass are global objects whose characteristics are defined in the input. Between 1 and 2, the other instances of other classes such as Man, Tree, and Ball are initialized. between points 2 and 3, each object's characteristics are set. Uninitialized properties of objects will default to preset values. This presetting may be absolute or in terms of other properties. At this stage, the output will be a rendering a scene with all the objects in their initial configurations.

Beyond point 3, the actions of the various objects is specified. 'goto', 'pickup', and so on are simple actions where only one effector and one affected play a part. In an action like 'give', there may be a corresponding 'accept'. These two constitute a composite process wich need to be coordinated temporally. This is achieved through the use of the 'wait' construct. For example, when a Man 'gives' a Ball to a Woman, he generates a 'giving' message. So issuing a 'woman.wait=man.giving' instruction makes woman, the instance of Woman, wait till it(she?) recieves a 'giving' message from man. To me, this appears to be quite analogous to the role played by body language in the interaction between human beings.

Whenever an action is executed, it is known what the involved objects are. This information is processed by the Camera Control module. The objects give the position of and area covered by the event. The type of event ('goto', 'give', etc) is used to decide the appropriate camera idiom.

The output will be a graphical rendering of the scene and the events therein; similar to the previously illustrated sequence of images. Presently, the choice for the format of output is VRML. This provides a platform independent and compact representaion for the scene and enables easy transmission over networks.

Online Links

Human modelling at the University of Penn.: http://www.cis.upenn.edu/~hms/home.html
The JACK project: http://www.cis.upenn.edu/~hms/jack.html
References to Virtual Human works: http://www.pasociety.org/perfanim
Ken Perlin's IMPROV: http://www.mrl.nyu.edu/perlin

Bibliography

@Article{Badler/Smoliar:1979,
  author=       { Badler, Norman I. and Smoliar, Stephen W.},
  year=         { 1979},
  institution=  { U. of Pennsylvania 2-->National U. of Singapore},
  title=        { Digital Representations of Human Movement},
  journal=      { ACM Computing Surveys},
  month=        { march},
  volume=       { 11},
  email=        { badler@central.cis.upenn.edu; Smoliar@ISS.nus.sg},
  annote=       {
The techniques of representing a human being as a computer generated 
graphic entity are discussed. A brief description of the Labanotation 
used to presicely quantify body postures is given. It is emphasised 
that although direct representation in a 2D space is possible, in the 
general case, the better approach is to construct a 3D model and then 
project it to a plane. Three aspects are discussed:
1 Representations of the human body
   this is done by the following methods:
    stick figures : the simplest and most unimpressive
    surface models: excellent results, but slightly unrealistic and 
                    a fair share of blemishes.
    vomule models : the human body is decomposed into primitive solids
                    such as cylinders, spheres, and ellipsoids.
2 Representation of movement
  Given a model for a human body, the process of animating it involves 
  producing a succession of frames, each slightly different from the 
  previous. This is attained by key frames, where a set of important 
  frames is provided and the intermediate parts are interpolated; by 
  movement functions where labanotation is used to choreagraph the 
  motion; and simulation where the mechanics of the human body are also
  encapsulated in the model.

3 Finally, an architecture for such a system is discussed. 
-k.anandsudhakar feb/2k }

}

@Misc{Geib/Levison/Moore:1994,
  author=       { Geib, Christopher and Levison, Libby and
                  Moore, Michael B.},
  year=         { 1994},
  institution=  { upenn},
  title=        { SodaJack: An Architecture for Agents that Search for 
                  and Manipulate Objects},
  month=        { january},
  email=        { (geib,libby,mmoore)@linc.cis.upenn.edu},
  annnote=      {			
This paper deals with the problem of an agent whose aim is to search 
and undertake manipulation tasks in an environment. They have 
implemented the approach in a system called SODAJACK, which does the 
animation. The agent receives as input high level commands like 
"fetch the scoop" and the system has to figure out the exact low-level 
action to do the job. This involves a knowledge about the possible 
locations of the scoop, plan a route to those, explore them, and then 
finally the act of lifting it up. 
The system has been divided into a hiearchy of three planners that 
respond to the input goal, and give as output the action outline to 
acheive the goal. The task division is like this:
1.search planner, it converts the goals into a plan to search.
2.object specific planner, this relates to each search plan by the 
search planner a particular object and undertakes feasibility tests 
for the action plans generated.
3.hierarchical planner(ItPlans), this supervises the other two and 
delegates the control first to search to get a plan and then to the 
object to make it specific.  -Vikrant Kumar 11/02/2000 }

}

@Misc{Salesin/et.al:1994,
  author=       { Christianson, David B. and Anderson, Sean E. and
                  He, Li-wei and Salesin, David H. and 
                  Weld, Daniel S. and Cohen, Michael F.},
  year=         { 1994},
  instituiton=  { 4U. of Washington; Microsoft Research, Redmond
                  2-->Stanford},
  title=        { Declarative Camera Control for Automatic
                  Cinematography},
  email=        { (dbc1,lhe,salesin,weld)@cs.washington.edu;
                  seander@stanford.edu; mcohen@microsoft.com},
  annote=       {
For long programmers haven't made use of cinematographic principles in 
computer animations. The authors try to fill this gap by making the 
rules of cinematic storytelling lend themselves easily to programming. 
For this purpose they have formalized the rules into a  Declarative 
Camera Control Language(DCCL). Such a thing will be very useful as it 
will allow programs to present a dramatic point of view aesthetically.
The authors first introduce the language of the cinema like the breakup 
of a film into scenes and shots, shots being the smallest unit. Another 
thing is the placement of the camera, which depending on the scene can 
be apex, internal, external or parallel. Cinematograohers have 
identified certain field of views of the shots which give pleasing 
results. And then there are certain constraints on a shot which should 
be satisfied like parallel editing and break movement. Next comes the 
concept of idioms, which is the way cinematographers describe 
situations in a film. DCCL is an attempt to formalize this idiom. 
The DCCL is composed of four basic components fragments, views, 
placements and movement endpoints.Fragment is the time interval during 
which the camera performs a simple motion. A simple shot may comprise 
of one or more fragments.

Next they define the Camera Placement System(CPS). The CPS is a three 
stage pipeline consisting of 
1.the sequence planner
2.the  compiler
3.the heuristic evaluator.
The basic aim of the CPS is to give the camera positions depending on 
the input of the positions of the various interacting entities. And 
the authors have implemented this approach in a video game.
The authors succeed to bring to the highlight the importance of using 
the cinematographic techniques in computer animations so as to make the 
experience more enriching.             -Vikrant Kumar  11/02/2000 }

}

@Article{Lozano/Lozano-Perez:1996,
  author=       { Lozano, Oded Maron Tomas and Lozano-Perez, Tomas},
  year=         { 1996},
  institution=  { 2mit},
  title=        { Visible Decomposition: Real Time Path Planning in 
                  Large Planar Environments},
  journal=      { AI Memo},
  month=        { Januaray},
  www=          { ftp://ftp.ai.mit.edu/pub/users/oded\
                  /papers/planning.ps.Z},
  email=        {oded@ai.mit.edu,tlp@ai.mit.edu},

  annote=       {
This paper deals with the use of visibility graphs to do motion 
planning.
	  -Rajesh Rajasekar 2/2000}
}

@Article{Hoffman/Hopcroft:1987
  author=       { Hoffman, Christoph M. and Hopcroft, John E.},
  year=         { 1987},
  institution=  { Purdue-cs; Cornell-cs},
  title=        { Simulation of Physical Systems from Geometric Models},
  journal=      { IEEE J. of Robotics and Automation},
  month=        { june},
  vol=          { RA-3},
  annote=       {
The mechanics of simulation are discusssed. -k. anandsudhakar feb/2k}

}

Kesari Anandsudhakar, Vikrant Kumar, Rajesh Rajshekhar at IITK