Back to home
Control Robots with Gemini AI
Multimodal command translation from language to robot actions
This project experiments with using Gemini to interpret instructions, scene context, and constraints, then translate them into executable robotic control sequences.
Gemini APIROS/ROS2Robot SimulatorVision PipelineControl API
Project Goals
- • Convert natural-language objectives into safe motion/action plans.
- • Use multimodal inputs (camera + text) for context-aware decisions.
- • Support simulator-first validation before physical execution.
Core Capabilities
- Task decomposition into robot primitives
- Perception-aware planning with camera feedback
- Safety envelope checks and emergency stop hooks
- Simulation playback before live run
- Post-run summaries with failure diagnostics
Architecture Outline
- 1.Operator issues prompt with target objective.
- 2.Planner agent generates an action graph from robot primitives.
- 3.Safety guard filters out unsafe or unreachable actions.
- 4.Simulator validates path and expected outcomes.
- 5.Controller executes approved actions on hardware and reports telemetry.
Image Placeholders
Image Placeholder
Robot control cockpit UI
Image Placeholder
Action graph visualization for generated plan
Image Placeholder
Camera feed with detected objects and overlays
Video Placeholders
Video Placeholder
Demo: prompt to pick-and-place sequence
Video Placeholder
Demo: simulation pass then real robot execution
Links
Project Write-up
Add long-form article URL
Demo Video
Add YouTube/Vimeo URL
Source Code
Add repository URL