Transporter networks: rearranging the visual world for robot manipulation

Transporter networks: rearranging the visual world for robot manipulation

HomeAndy ZengTransporter networks: rearranging the visual world for robot manipulation
Transporter networks: rearranging the visual world for robot manipulation
ChannelPublish DateThumbnail & View CountDownload Video
Channel AvatarPublish Date not found Thumbnail
0 Views
More information: https://transporternets.github.io/

Abstract: Robotic manipulation can be formulated as inducing a series of spatial displacements: where the displaced space can include an object, part of an object, or an end effector. In this work, we propose the Transporter Network, a simple model architecture that rearranges deep features to infer spatial displacements from visual input – allowing to parameterize robot actions. It does not assume objectivity (e.g. canonical poses, models or key points), it exploits spatial symmetries and is orders of magnitude more efficient than our benchmarked alternatives in learning vision-based manipulation tasks: from stacking a pyramid of blocks to assemble building kits with invisible objects; from manipulating deformable ropes to pushing stacks of small objects with closed-loop feedback. Our method can represent complex multimodal policy distributions and generalizes to multi-step sequential tasks, as well as 6DoF pick-and-place. Experiments on ten simulated tasks show that it learns faster and generalizes better than a variety of end-to-end baselines, including policies that use ground-truth object poses. We validate our methods with real-world hardware.

Spoken text: Laura Graesser

Please take the opportunity to connect and share this video with your friends and family if you find it helpful.