DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics

27Citations
Citations of this article
89Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally physically arranging the objects according to that goal image. We show that this is possible zero-shot using DALL-E, without needing any further example arrangements, data collection, or training. DALL-E-Bot is fully autonomous and is not restricted to a pre-defined set of objects or scenes, thanks to DALL-E's web-scale pre-training. Encouraging real-world results, with both human studies and objective metrics, show that integrating web-scale diffusion models into robotics pipelines is a promising direction for scalable, unsupervised robot learning.

Cite

CITATION STYLE

APA

Kapelyukh, I., Vosylius, V., & Johns, E. (2023). DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics. IEEE Robotics and Automation Letters, 8(7), 3956–3963. https://doi.org/10.1109/LRA.2023.3272516

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free