Back to home

Control Robots with Gemini AI

Multimodal command translation from language to robot actions

This project experiments with using Gemini to interpret instructions, scene context, and constraints, then translate them into executable robotic control sequences.

Gemini APIROS/ROS2Robot SimulatorVision PipelineControl API

Project Goals

  • Convert natural-language objectives into safe motion/action plans.
  • Use multimodal inputs (camera + text) for context-aware decisions.
  • Support simulator-first validation before physical execution.

Core Capabilities

  • Task decomposition into robot primitives
  • Perception-aware planning with camera feedback
  • Safety envelope checks and emergency stop hooks
  • Simulation playback before live run
  • Post-run summaries with failure diagnostics

Architecture Outline

  1. 1.Operator issues prompt with target objective.
  2. 2.Planner agent generates an action graph from robot primitives.
  3. 3.Safety guard filters out unsafe or unreachable actions.
  4. 4.Simulator validates path and expected outcomes.
  5. 5.Controller executes approved actions on hardware and reports telemetry.

Image Placeholders

Image Placeholder

Robot control cockpit UI

Image Placeholder

Action graph visualization for generated plan

Image Placeholder

Camera feed with detected objects and overlays

Video Placeholders

Video Placeholder

Demo: prompt to pick-and-place sequence

Video Placeholder

Demo: simulation pass then real robot execution

Links

Project Write-up

Add long-form article URL

Demo Video

Add YouTube/Vimeo URL

Source Code

Add repository URL