Efficient In-Context Imitation Learning for Robotics using Keypoint Action Tokens and Large Language Models
Large text-pretrained Transformers can effectively act as efficient in-context imitation learning machines for robotics, without the need for any additional training on robotics data.