In their research, the team introduces Relational Keypoint Constraints (ReKep) as a visually-grounded representation for robotic manipulation tasks. This method allows for the optimization of robot actions in real-time without the need for task-specific training or environment models. By using large vision models and vision-language models, ReKep can be generated from free-form language instructions and RGB-D observations. The system can perform various manipulation tasks, including bimanual and reactive behaviors, by representing tasks as sequences of Relational Keypoint Constraints. Surprisingly, the system can generate novel strategies for folding clothes based on different scenarios, reflecting human-like folding techniques.
https://rekep-robot.github.io/