
Google AI Whitepaper: Agents
A detailed executive summary of Google's whitepaper on agents, explaining the key concepts and building blocks of agent technology.
Google AI Whitepaper “Agents” — Detailed Executive Summary
(Julia Wiesinger, Patrick Marlow & Vladimir Vuskovic, Sept 2024)
1. What an “Agent” Is—and Is Not
Large‑Language Model | Agent (LLM + Cognitive wrapper) | |
---|---|---|
Knowledge base | Frozen at training‑time | Extended in real‑time via external tools/APIs |
Interaction | One‑shot inference | Multi‑turn loop with memory & planning |
Tool use | None (or hard‑coded) | Native, model chooses & invokes |
Reasoning | Prompt‑level only | Explicit orchestration layer using ReAct / CoT / ToT |
Key idea: An agent observes → reasons → acts autonomously toward a goal by chaining an LLM with tools and an orchestration loop.
2. The Three Building Blocks
- Model – the decision‑maker (Gemini, PaLM, etc.).
- Tools – bridges to the outside world (APIs, code interpreters, vector DBs, …).
- Orchestration layer – cyclical control flow that feeds context → thinks → chooses action → executes → observes → repeats.
3. Reasoning Frameworks inside the Orchestration Layer
Framework | Best for |
---|---|
ReAct (Reason + Act) | step‑wise tool selection / data collection |
Chain‑of‑Thought | linear logical decomposition |
Tree‑of‑Thoughts | strategic look‑ahead / search problems |
These prompt styles supply the “thinking traces” the orchestration loop relies on.
4. Tooling Taxonomy
Tool type | Where it runs | When to pick | Illustrative example |
---|---|---|---|
Extensions | Agent side | Let the agent hit an external API directly | Google Flights; Code‑Interpreter |
Function Calling | Client side | Need tighter control, auth, batching, human‑in‑loop | LLM emits display_cities() JSON; your app calls Google Places |
Data Stores | Agent side | Retrieval‑Augmented Generation over docs / DBs | Vector‑DB lookup → feed back into context |
5. Targeted Learning to Improve Tool Selection
- In‑context learning – few‑shot examples in the prompt.
- Retrieval‑based in‑context – fetch relevant examples at runtime.
- Fine‑tuning – pre‑train on a large task‑specific dataset so the model “knows” the tools beforehand.
Combining the three yields the best latency vs. reliability trade‑off.
6. Hands‑On Quick‑Start (LangChain/LangGraph)
A 30‑line Python snippet shows Gemini‑1.5‑Flash + SerpAPI + Google Places answering:
“Who did the Texas Longhorns play last week? What’s the other team’s stadium address?”
The agent reasons, searches, fetches places data, then returns the answer—all inside one ReAct loop.
7. From Prototype to Production: Vertex AI Agents
Google’s managed stack (Agent Builder, Extensions hub, Function Calling, Example Store, evaluation tooling) supplies:
- drag‑and‑drop definition of goals, tools, sub‑agents
- built‑in monitoring & debugging
- fully managed infra / scaling
A reference architecture diagram shows chat UI → Vertex Agent → Tools / Data Stores → observer loop.
8. Big Takeaways
- Agents unlock autonomy by planning and acting with real‑time data.
- The orchestration layer—think “executive brain”—is the heart of an agent.
- Extensions, Function Calling and Data Stores are the essential “senses & hands.”
- Future direction: agent chaining, richer tool ecosystems, tighter evaluation loops.
Why It Matters to You
- Enterprise fit: Tooling taxonomy maps cleanly onto an AI‑Ops capability model—extensions for ops automations, functions for regulated data flows, data stores for RAG knowledge bases.
- Developer velocity: LangChain/LangGraph example is boilerplate you can drop into a Next.js service.
- Governance: Fine‑tuning vs. retrieval trade‑offs inform segmentation of sensitive vs. public tasks.
- Scalability: Vertex AI’s managed agent layer parallels the orchestration module you’re building—worth benchmarking.
Recommended Next Steps
- Prototype a minimal “agent crew” for one high‑value workflow (e.g., automated data‑catalog enrichment).
- Instrument the orchestration loop with evaluation hooks (success/failure, tool‑usage stats).
- Incrementally add Data‑Store retrieval to ground outputs on proprietary documents.
- Explore Vertex Agent Builder vs. self‑hosted LangGraph to decide build‑vs‑buy for production.