Q1. Which two instructions in the "programming language" of the 2011 HW would be the most difficult for robots to follow?
Picking up a pen and writing on a paper is a task which humans do without any effort but which even the state-of-art robots have difficulty in doing. The reason for this is that human hands have millions of sensors and humans have been over the generations train to use these sensors to do task with precision. Robots on the other hand do not have either the luxury of so many sensors nor of the elaborate training. Thus difficult motions particularly relative motions between various parts are difficult to handle for robots. Thus in our view the two most difficult tasks would be
- Move pencil (keeping orientation of P-axis constant), 1cm closer to 'source point'.
- Rotate point of contact between I-affector and object, about axis defined by line passing through points of contact of M and T affectors (with object), by an angle such that P-axis lies at perpendicular distance of 'r' from 'source point'.
Q2. The robot following the learning paradigm as in Kalakrishnan is clearly gaining some expertise. Which aspects of the execution may be called implicit or automatic, and which aspects may be more explicit? What could be the "chunks" in this structure?
For the robot learning how to open doors and picking up pen, there are various variables and thus equivalently various paths to take in this search space. For the case of Kalakrishnan, the variables will be force, various orientations, end-effector positions. The algorithm of Kalakrishnan involves having cost functions and reinforcement strategy when it holds the pen for a longer time or when it is able to turn the handle by a larger angle. Thus eventually after some trials it is able to learn a path in the search space. In this learning the values of various variables, the favourable paths found and the constraints are implicit part. These things are not known to the programmer at the begining of learning and their values are in a way 'implicit' to the robot that is they are not apparent to observer and cannot figured out unless by scanning the memory. Some features are explicit in this learning. The initial position of robot's hand is explicit and can be observed.
The low-dimensional embeddings are the chunks in this case. Informally this low-dimensional embedding consists of the constraint relations between variables that exist for a successful execution.
Q3. Comment on whether human learning may also be following similar "reward" based processes? Consider the learning process for the fire-fighting expert who knows how to fight complex fires.
According to the discussion we had, almost every task in human life has some kind of rewards associated with it. This reward can be in form of some advantage like coin, promotion etc. or escapement of disadvantage like surviving a fight due to training etc. Fire fighters too have their rewards while fighting fires. As shown in the videos [3] extinguishing fire is an extremely meticulous task. Novice have great difficult in extinguishing fires and will mostly get burns if they do so. Thus fire fighters who do so are trained and the equivalent reward is that they do not get burns. Another kind of reward can be applause from crowd or people whom they save, inner satisfaction etc.