Skip to content
All work
Agentic AI2026

Onagi

A multimodal autonomous reasoning agent

Role
AI engineering + product
Timeline
Gemini 3 Hackathon build

Real-time multimodal RAG over live screen content

Native function-calling that triggers DB writes and API calls

Sub-millisecond reasoning latency via context caching

The problem

Most 'AI assistants' can describe what they'd do but can't actually do it, and the ones that act tend to act blindly. The brief for Onagi was an agent that genuinely understands what's on screen, decides what the user wants, and takes the right action, without losing the plot on large, fast-moving context.

Constraints

  • Hackathon timeline: a working multimodal pipeline, not a slide deck
  • Vision + retrieval + action had to run fast enough to feel live
  • Actions touch real data, so intent detection had to be reliable

The approach

01

See, then reason

A Playwright + Gemini 3 Vision pipeline captures live screen content and feeds it into a retrieval-augmented reasoning loop, so the agent reasons over what is actually there rather than a stale snapshot.

02

Intent to action via function calling

Native function calling maps the user's intent to concrete tools, database writes and API integrations, so the agent doesn't just answer, it executes the right operation.

03

Keep huge context cheap

Context caching holds massive datasets in working memory, bringing reasoning latency down to sub-millisecond on repeat queries instead of re-reading everything each turn.

Results

  • A demoable agent that turns a screen plus a request into a grounded action.
  • Showed that multimodal RAG + function calling can run at interactive speed.
  • A reusable pattern for vision-grounded agents, not a one-off script.

Tradeoffs

Built under hackathon pressure, Onagi prioritized a convincing end-to-end loop over breadth of tools and hardened guardrails. The architecture leaves clear seams to add permissioning and a wider tool registry before anything like production use.

Built with

Gemini 3 VisionPlaywrightRAGFunction CallingPython
Share X LinkedIn