R2C: Mapping Room to Chessboard to Unlock LLM As Low-Level Action Planner

Institute of Computing Technology, Chinese Academy of Sciences

^*Indicates Equal Contribution

Abstract

LLMs have been proved to be a general high-level task planner for diverse embodied tasks. However, there exists a natural gap between LLM and the low-level executor. LLM cannot sufficiently perceives the whole complex environment state, often leads to unavailable and unattainable plans that robot cannot follow. In this work, we achieve high-level and low-level planning in an integrated manner by providing a communication platform. Our insight is to formulate the task as playing chess with LLMs. We map the room into a semantic chessboard, which we call Room to Chessboard (R2C). Each grid represents the position and size of objects inside the room. We find that chessboard is concise enough for LLMs to conduct semantic searches with global view of the room. Also, the chessboard is informative enough to convey detailed environment state for LLMs to predict executable low-level actions. To enhance LLM's low-level decision-making capacity, we also design a Chain-of-Thought fine-tuning paradigm to guide LLMs make more interpretable planning. We implement the R2C framework with both fine-tuned open-sourced LLMs and close-sourced LLMs like GPT-4. Evaluations on challenging ALFRED benchmark indicate that using LLMs as low-level action planners is feasible by efficiently communication between LLM and the robot. We also demonstrate that R2C is a versatile framework for diverse free-form robotic planning tasks.