Rajesh Rajasekar
Vikrant Kumar
email: {
The Virtual Director project at IIT Kanpur attempts to build a virtual environment and produce actions in it according to a natural language story. This project fits into the big picture beyond the stage where natural language has been parsed for grammar and semantic information, and the spatial data defuzzified. At this point, the input would be in the form of spatial descriptors and action directives which are at a level of complexity below natural language but higher than the most primitive of directives.
The woman on the left narrated a joke animatedly.
The fat man and the other woman listened.
The other woman was amused by the joke, but the fat man wasn't
For this short segment from a story a sequence of images like the following may be produced.
![]() |
![]() |
![]() |
![]() |
Work has been done in this field at the Indian Institute of Technology, Kanpur by Dr. Amitabha Mukherjee as a part of the Virtual Director project.
The aspects of anthropometrics and modelling are discussed by Norman Badler and Stephen Smoliar [Badler/Smoliar:1979]. Physical aspects such as the dynamics of complex objects are studied by Hoffman and Hopcroft [Hoffman/Hopcroft:1987]. This will be used to perform character animation.
Norman Badler has developed Jack, a basic human body animation system [Badler]. An extension of the basic Jack systm by Levison [Geib/Libby/Moore:1994] is SodaJack which simulates a soda bar operator. Action planning in the context of search and manipulation of objects is discussed.
Salesin have created a precise language for the otherwise fuzzy process of defining shots and other basic elemnts of cinematography [Salesin/Cohen/Christianson:96]. Camera control to amplify the impact of the story can be based on this convention.
Lozano and Lozano-Perez explain the method of using visibility graphs to perform path planning in a domain with polygonal obstacles [Lozano/Lozano-Perez:1996].
One of the tasks is to translate a directive like
A : move to t1
into an actual path from the present location of A
to the specified location t1
This involves avoiding collisions with obstacles on the path and having some degree of error forgiveness
which lend a higher level of realism.
The standard method of visibility graphs shall be used and information about convexity of each entity shall be encapsulated in that object itself and not left to be determined at run-time. The following is an illustraton to elucidate the process
A visibility graph is a graph where the nodes are the vertices of the polygonal obstacles and an edge is drawn from one vertex to another if the edge is "visible". An edge is visible if it does not intersect any obstacles. Once the static graph is constructed, goal and origin vertices are added and their visible edges are computed. This graph can then be searched for a shortest path between the origin and goal.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() | ![]() |
The resulting path contains a number of sharp corners. These will be removed by using a snake which has the path produced by the visibility-graph process as its initial configuration. Internal energy is a function of the length and sharpness of the curve. External energy is due to the potential fields of the obstacles.
Due to the work of Salesin there exists a precise language in which the configuration of cameras and lights
can be specified. At every stage, the story has a focus which is the locale where the story is unfolding. Assuming this
to be a single region, the camera continuously configures itself based on some action-specific idioms which are produced
automatically from the storyline.
Fragments are the atomic elements of a movie. Some of the fragments defined
by DCCL are:
Motion energy provides visual cues about the character's personality and emotions. Qualified verb phrases like
"walked hurriedly", "looked cautiously", and "rose lazily"; and physical descriptors such as "old man", "coy maiden",
and "ferocious dog" evoke distinct mental images.
These characteristics are imparted to a character by adjusting
parameters of its gait and posture like speed of movements, length of strides, stoop of shoulder, swing of hands, and
direction of gaze.
Camera Control
Here is an example of an idiom used to depict an interaction between two
Images from Salesin
tree=new Tree
bench=new Bench
woman=new Woman
wall=new Wall
robot=new Robot
ball=new Ball
man=new Man
woman.wear=new Shirt(red)
woman.wear=new Skirt(blue)
man.wear=new Shirt(blue)
man.wear=new Trouser(black)
In this domain, Sky and Grass are global objects whose characteristics are defined in the input. Between 1 and 2, the other instances of other classes such as Man, Tree, and Ball are initialized. between points 2 and 3, each object's characteristics are set. Uninitialized properties of objects will default to preset values. This presetting may be absolute or in terms of other properties. At this stage, the output will be a rendering a scene with all the objects in their initial configurations.
Beyond point 3, the actions of the various objects is specified. 'goto', 'pickup', and so on are simple actions where only one effector and one affected play a part. In an action like 'give', there may be a corresponding 'accept'. These two constitute a composite process wich need to be coordinated temporally. This is achieved through the use of the 'wait' construct. For example, when a Man 'gives' a Ball to a Woman, he generates a 'giving' message. So issuing a '' instruction makes woman, the instance of Woman, wait till it(she?) recieves a 'giving' message from man. To me, this appears to be quite analogous to the role played by body language in the interaction between human beings.
Whenever an action is executed, it is known what the involved objects are. This information is processed by the Camera Control module. The objects give the position of and area covered by the event. The type of event ('goto', 'give', etc) is used to decide the appropriate camera idiom.
The output will be a graphical rendering of the scene and the events therein; similar to the previously illustrated sequence of images. Presently, the choice for the format of output is VRML. This provides a platform independent and compact representaion for the scene and enables easy transmission over networks.
@Article{Badler/Smoliar:1979, author= { Badler, Norman I. and Smoliar, Stephen W.}, year= { 1979}, institution= { U. of Pennsylvania 2-->National U. of Singapore}, title= { Digital Representations of Human Movement}, journal= { ACM Computing Surveys}, month= { march}, volume= { 11}, email= {;}, annote= { The techniques of representing a human being as a computer generated graphic entity are discussed. A brief description of the Labanotation used to presicely quantify body postures is given. It is emphasised that although direct representation in a 2D space is possible, in the general case, the better approach is to construct a 3D model and then project it to a plane. Three aspects are discussed: 1 Representations of the human body this is done by the following methods: stick figures : the simplest and most unimpressive surface models: excellent results, but slightly unrealistic and a fair share of blemishes. vomule models : the human body is decomposed into primitive solids such as cylinders, spheres, and ellipsoids. 2 Representation of movement Given a model for a human body, the process of animating it involves producing a succession of frames, each slightly different from the previous. This is attained by key frames, where a set of important frames is provided and the intermediate parts are interpolated; by movement functions where labanotation is used to choreagraph the motion; and simulation where the mechanics of the human body are also encapsulated in the model. 3 Finally, an architecture for such a system is discussed. -k.anandsudhakar feb/2k } }
@Misc{Geib/Levison/Moore:1994, author= { Geib, Christopher and Levison, Libby and Moore, Michael B.}, year= { 1994}, institution= { upenn}, title= { SodaJack: An Architecture for Agents that Search for and Manipulate Objects}, month= { january}, email= { (geib,libby,mmoore)}, annnote= { This paper deals with the problem of an agent whose aim is to search and undertake manipulation tasks in an environment. They have implemented the approach in a system called SODAJACK, which does the animation. The agent receives as input high level commands like "fetch the scoop" and the system has to figure out the exact low-level action to do the job. This involves a knowledge about the possible locations of the scoop, plan a route to those, explore them, and then finally the act of lifting it up. The system has been divided into a hiearchy of three planners that respond to the input goal, and give as output the action outline to acheive the goal. The task division is like this: planner, it converts the goals into a plan to search. 2.object specific planner, this relates to each search plan by the search planner a particular object and undertakes feasibility tests for the action plans generated. 3.hierarchical planner(ItPlans), this supervises the other two and delegates the control first to search to get a plan and then to the object to make it specific. -Vikrant Kumar 11/02/2000 } }
@Misc{Salesin/, author= { Christianson, David B. and Anderson, Sean E. and He, Li-wei and Salesin, David H. and Weld, Daniel S. and Cohen, Michael F.}, year= { 1994}, instituiton= { 4U. of Washington; Microsoft Research, Redmond 2-->Stanford}, title= { Declarative Camera Control for Automatic Cinematography}, email= { (dbc1,lhe,salesin,weld);;}, annote= { For long programmers haven't made use of cinematographic principles in computer animations. The authors try to fill this gap by making the rules of cinematic storytelling lend themselves easily to programming. For this purpose they have formalized the rules into a Declarative Camera Control Language(DCCL). Such a thing will be very useful as it will allow programs to present a dramatic point of view aesthetically. The authors first introduce the language of the cinema like the breakup of a film into scenes and shots, shots being the smallest unit. Another thing is the placement of the camera, which depending on the scene can be apex, internal, external or parallel. Cinematograohers have identified certain field of views of the shots which give pleasing results. And then there are certain constraints on a shot which should be satisfied like parallel editing and break movement. Next comes the concept of idioms, which is the way cinematographers describe situations in a film. DCCL is an attempt to formalize this idiom. The DCCL is composed of four basic components fragments, views, placements and movement endpoints.Fragment is the time interval during which the camera performs a simple motion. A simple shot may comprise of one or more fragments. Next they define the Camera Placement System(CPS). The CPS is a three stage pipeline consisting of 1.the sequence planner 2.the compiler 3.the heuristic evaluator. The basic aim of the CPS is to give the camera positions depending on the input of the positions of the various interacting entities. And the authors have implemented this approach in a video game. The authors succeed to bring to the highlight the importance of using the cinematographic techniques in computer animations so as to make the experience more enriching. -Vikrant Kumar 11/02/2000 } }
@Article{Lozano/Lozano-Perez:1996, author= { Lozano, Oded Maron Tomas and Lozano-Perez, Tomas}, year= { 1996}, institution= { 2mit}, title= { Visible Decomposition: Real Time Path Planning in Large Planar Environments}, journal= { AI Memo}, month= { Januaray}, www= {\ /papers/}, email= {,},
annote= { This paper deals with the use of visibility graphs to do motion planning. -Rajesh Rajasekar 2/2000} }
@Article{Hoffman/Hopcroft:1987 author= { Hoffman, Christoph M. and Hopcroft, John E.}, year= { 1987}, institution= { Purdue-cs; Cornell-cs}, title= { Simulation of Physical Systems from Geometric Models}, journal= { IEEE J. of Robotics and Automation}, month= { june}, vol= { RA-3}, annote= { The mechanics of simulation are discusssed. -k. anandsudhakar feb/2k} }