Q1. Which two instructions in the "programming language" of the 2011 HW would be the most difficult for robots to follow?
In the video of Kalakrishnan's robot, we saw that the seemingly simple task of picking up a pen is not so easy for a robot to perform. The variables of the problem space i.e. the position of the end-effectors, orientation, force and torque have to be correct throughout the execution for a successful hold.
The task of holding a pen in the writing position like humans do is even more complex. The video by Kalakrishnan shows the robot doing Step 2 of the algorithm and successful execution of this step itself is extremely difficult for the robot. However, we think that the steps which involve relative motion between the robot and the pen should be more difficult because then we have to be very precise about the variables(position of the end-effectors, orientation, force and torque) otherwise the pen would slip from the grip of the robot on the slightest mistake. Two steps in the algorithm involve relative motion between the robot and the pen. They are -
Q2. The robot following the learning paradigm as in Kalakrishnan is clearly gaining some expertise. Which aspects of the execution may be called implicit or automatic, and which aspects may be more explicit? What could be the "chunks" in this structure?
For Kalakrishnan's robot, the variables in the problem space are end-effector positions, orientation, force and torque. The robot is exploring this space and learning the favorable patterns using feedback from the cost function which is the time duration for which the pen stays in the robot's hand before slipping out. Implicit learning involves interacting with the environment and we think that the learning process in this case is an implicit aspect of the execution.
The initial positioning of the hand could be one explicit aspect which has to be specified at the beginning of each trial.
As discussed earlier, the learning here involves the robot exploring various instances in the problem space, identifying favorable patterns in terms of performance metrics.
The favorable regions here can be seen as low-dimensional embeddings in the problem space. These represent implicit constraints among the variables. Chunks, in this case, can be the dimensions in the low-dimensional embedding obtained. These dimensions represent an inter-relation between the variables(end-effector positions, orientation, force and torque) which must hold for favorable execution.
Q3. Comment on whether human learning may also be following similar "reward" based processes? Consider the learning process for the fire-fighting expert who knows how to fight complex fires.
We could not find any counter example to the question of whether human learning is a reward based process. We discussed that every task we do has some reward associated with it and it can be positive or negative, intrinsic or external. Satisfaction, discontent can be seen as rewards for many day to day tasks.
In the fire fighters example, we saw that the process cannot be carried out in any arbitrary manner and experts have through practice learned the best possible ways of carrying it out. The reward function in this case could be minimizing the damage, extinguishing fire as fast and effectively as possible.