Q1. Which two instructions in the "programming language" of the 2011 HW would be the most difficult for robots to follow?
We found the following two steps to be the most difficult:
Holding the pen is a difficult task since there is no sensory feedback and hence takes a lot of training for the robot to ‘learn’ the optimum amount of pressure to be applied to avoid slipping
This step involves loosening of grip to the point where it is just enough to allow the rotation of pencil but does not allow the pencil to slip. It is very difficult to achieve this intricate pressure condition without the use of any sensory feedback.
Q2. The robot following the learning paradigm as in Kalakrishnan is clearly gaining some expertise. Which aspects of the execution may be called implicit or automatic, and which aspects may be more explicit? What could be the "chunks" in this structure?
Implicit learning involves interacting with the environment and we think that the learning process in this case is an implicit aspect of the execution.
The initial positioning of the hand could be one explicit aspect which has to be specified at the beginning of each trial.
As discussed earlier, the learning here involves the robot exploring various instances in the problem space, identifying favorable patterns in terms of performance metrics. The favorable regions here can be seen as low-dimensional embeddings in the problem space. These represent implicit constraints among the variables. Chunks, in this case, can be the dimensions in the low-dimensional embedding obtained. These dimensions represent an inter-relation between the variables(end-effector positions, orientation, force and torque) which must hold for favorable execution.
Q3. Comment on whether human learning may also be following similar "reward" based processes? Consider the learning process for the fire-fighting expert who knows how to fight complex fires.
We could not find any counter example to the question of whether human learning is a reward based process. We discussed that every task we do has some reward associated with it and it can be positive or negative, intrinsic or external. Satisfaction, discontent can be seen as rewards for many day to day tasks.
In the fire fighters example, we saw that the process cannot be carried out in any arbitrary manner and experts have through practice learned the best possible ways of carrying it out. The reward function in this case could be minimizing the damage, extinguishing fire as fast and effectively as possible.
References