Google AI Whitepaper: Agents

(Julia Wiesinger, Patrick Marlow & Vladimir Vuskovic, Sept 2024)

	Large‑Language Model	Agent (LLM + Cognitive wrapper)
Knowledge base	Frozen at training‑time	Extended in real‑time via external tools/APIs
Interaction	One‑shot inference	Multi‑turn loop with memory & planning
Tool use	None (or hard‑coded)	Native, model chooses & invokes
Reasoning	Prompt‑level only	Explicit orchestration layer using ReAct / CoT / ToT

Key idea: An agent observes → reasons → acts autonomously toward a goal by chaining an LLM with tools and an orchestration loop.

Model – the decision‑maker (Gemini, PaLM, etc.).
Tools – bridges to the outside world (APIs, code interpreters, vector DBs, …).
Orchestration layer – cyclical control flow that feeds context → thinks → chooses action → executes → observes → repeats.

Framework	Best for
ReAct (Reason + Act)	step‑wise tool selection / data collection
Chain‑of‑Thought	linear logical decomposition
Tree‑of‑Thoughts	strategic look‑ahead / search problems

These prompt styles supply the “thinking traces” the orchestration loop relies on.

Tool type	Where it runs	When to pick	Illustrative example
Extensions	Agent side	Let the agent hit an external API directly	Google Flights; Code‑Interpreter
Function Calling	Client side	Need tighter control, auth, batching, human‑in‑loop	LLM emits `display_cities()` JSON; your app calls Google Places
Data Stores	Agent side	Retrieval‑Augmented Generation over docs / DBs	Vector‑DB lookup → feed back into context

In‑context learning – few‑shot examples in the prompt.
Retrieval‑based in‑context – fetch relevant examples at runtime.
Fine‑tuning – pre‑train on a large task‑specific dataset so the model “knows” the tools beforehand.
Combining the three yields the best latency vs. reliability trade‑off.

A 30‑line Python snippet shows Gemini‑1.5‑Flash + SerpAPI + Google Places answering:

“Who did the Texas Longhorns play last week? What’s the other team’s stadium address?”

The agent reasons, searches, fetches places data, then returns the answer—all inside one ReAct loop.

Google’s managed stack (Agent Builder, Extensions hub, Function Calling, Example Store, evaluation tooling) supplies:

A reference architecture diagram shows chat UI → Vertex Agent → Tools / Data Stores → observer loop.

Agents unlock autonomy by planning and acting with real‑time data.
The orchestration layer—think “executive brain”—is the heart of an agent.
Extensions, Function Calling and Data Stores are the essential “senses & hands.”
Future direction: agent chaining, richer tool ecosystems, tighter evaluation loops.

Enterprise fit: Tooling taxonomy maps cleanly onto an AI‑Ops capability model—extensions for ops automations, functions for regulated data flows, data stores for RAG knowledge bases.
Developer velocity: LangChain/LangGraph example is boilerplate you can drop into a Next.js service.
Governance: Fine‑tuning vs. retrieval trade‑offs inform segmentation of sensitive vs. public tasks.
Scalability: Vertex AI’s managed agent layer parallels the orchestration module you’re building—worth benchmarking.

Prototype a minimal “agent crew” for one high‑value workflow (e.g., automated data‑catalog enrichment).
Instrument the orchestration loop with evaluation hooks (success/failure, tool‑usage stats).
Incrementally add Data‑Store retrieval to ground outputs on proprietary documents.
Explore Vertex Agent Builder vs. self‑hosted LangGraph to decide build‑vs‑buy for production.